Last modified: 2014-11-11 16:16:21 UTC
For repositories that we're (also) mirroring at a custom name, they are currently being mirrored twice. For example: * https://github.com/wikimedia/VisualEditor https://github.com/wikimedia/VisualEditor-VisualEditor * https://github.com/wikimedia/oojs https://github.com/wikimedia/oojs-core * https://github.com/wikimedia/puppet-kafka https://github.com/wikimedia/operations-puppet-kafka This is confusing and also decentralises things like Watching and Fork tracking. But, more importantly, it also makes our Star rating and Pull requests harder to maintain. I don't know whether we intentionally do both, or whether it was just easier to implement this way. I'd recommend we get rid of the full names where we want to publish the repo under a different name. If it was a conscious choice to do this I'm curious why. The only reason I can think of is to make them easier to find on GitHub, which is hardly a big issue since they have excellent search for repo names right on https://github.com/wikimedia. Another reason might be being able to automatically generate a GitHub address for a Git repo, but I'm not sure that's a big issue. Also note that GitHub supports renaming of repositories and automatically keeps redirects[1]. So we can remove the "unwanted" names from Gerrit replication config and have our github.com/wikimedia admins delete the repos and create a redirect (by renaming the good one to the bad one and then renaming it back). As an example I've created a redirect from https://github.com/wikimedia/mediawiki to https://github.com/wikimedia/mediawiki-core. Eventually this should be done the other way around but this was easier to set up since Gerrit replag currently did not push to /mediawiki yet (unlike oojs and VisualEditor). [1] https://github.com/blog/1508-repository-redirects-are-here [2] https://help.github.com/articles/renaming-a-repository
> If it was a conscious choice to do this I'm curious why. We want all gerrit repos replicated on github. We could setup replication for each repository in our gerrit separately. That would allow full control of repository names at github. But it is tedious, time-consuming, and has other warts too. So in order to escape such a messy, verbose replication setup, gerrit replicates each and every projects to github under the name that gets used on gerrit [1]. For this replication, we do not prune leading paths, as that would map different names on gerrit to the same names on github [2]. However, some people felt strongly that they needed a better repository name on github, or had a following on github that they would not want to loose. So they had an additional, separate replication target set up. And hence, those repositories get replicated twice to github. Once for the canonical name. And a second time for the custom name of their liking. > Also note that GitHub supports renaming of repositories and > automatically keeps redirects[...] Thanks! That wasn't available when we added custom replication. We could use that to solve most issues. > As an example I've created a redirect from > https://github.com/wikimedia/mediawiki to > https://github.com/wikimedia/mediawiki-core. Eventually this should be > done the other way around [...] Not too sure about this one. Doing it “the other way around” would make it even more confusing for people, because the “canonical” github name would differ from the gerrit name too much to be able to map it clearly. [1] github does not like “/”s in repo names, so we're having them replaced by “-”s. [2] For example we currently have three repositories in gerrit that end in “/data”. Or which of “analytics/kraken” and “analytics/vagrant/kraken” should become github's “kraken”? Same problems for “/Nostalgia”.
(In reply to christian from comment #1) > > If it was a conscious choice to do this I'm curious why. > > We want all gerrit repos replicated on github. > > We could setup replication for each repository in our gerrit > separately. That would allow full control of repository names at > github. But it is tedious, time-consuming, and has other warts too. > > So in order to escape such a messy, verbose replication setup, gerrit > replicates each and every projects to github under the name that gets > used on gerrit [1]. For this replication, we do not prune leading > paths, as that would map different names on gerrit to the same names > on github [2]. I wasn't suggesting that we do it manually. All repos should be replicated. I'm also well-aware of naming conflicts if we were to prune leading paths. Nobody was suggesting that. > However, some people felt strongly that they needed a better > repository name on github, or had a following on github that they > would not want to loose. So they had an additional, separate > replication target set up. And hence, those repositories get replicated > twice to github. Once for the canonical name. And a second time for > the custom name of their liking. Exactly. And there's no reason to have to replications. It should be easy to remove the additional one. For projects using github more publicly, they tend to have a better name there to hide internal implementation details of Gerrit or Wikimedia (e.g. VisualEditor, MediaWiki and OOjs have stupidly named repos in gerrit of VisualEditor/VisualEditor and oojs/core and mediawiki/core, this is due to conventions and limitations in our use of Gerrit). > > Also note that GitHub supports renaming of repositories and > > automatically keeps redirects[...] > > Thanks! That wasn't available when we added custom replication. > We could use that to solve most issues. > > > As an example I've created a redirect from > > https://github.com/wikimedia/mediawiki to > > https://github.com/wikimedia/mediawiki-core. Eventually this should be > > done the other way around [...] > > Not too sure about this one. > Doing it “the other way around” would make it even more confusing for > people, because the “canonical” github name would differ from the > gerrit name too much to be able to map it clearly. > I'm not sure you understood that I meant. Right now we replicate all repos from <gerrit-id> to github:<gerrit-id-escaped> and have additional replications for those that needed better names. I'm saying: Omit the ones with custom names from the wildcard replication so that there's only one. For existing repos already replicated twice, we'll have to manually remove the (now stale) repository from the admin panel at github. While at it, we can turn those into redirects to the custom name to make sure any urls stay working.
The redirects GitHub has in place apply both to the HTTP protocol (web interface) as well as the Git interface itself. We can simply remove the duplicate "pretty name" replications and perform the renames on GitHub. That way there's only one repo Gerrit is pushing two and both urls work. I've gone ahead and done this with mediawiki-core as example. From Gerrit's perspective it's a straight up replication from gerrit:mediawiki/core to github:mediawiki-core. There is no fancy-name duplicate set up for it. On GitHub it has been renamed from mediawiki-core with a full redirect in place. Gerrit is now effectively pushing to https://github.com/wikimedia/mediawiki while all urls and logs for mediawiki-core continue to work as expected.
Change 167162 had a related patch set uploaded by Krinkle: gerrit: Remove duplicate mirrors https://gerrit.wikimedia.org/r/167162
Change 167162 merged by Dzahn: gerrit: Remove duplicate mirrors https://gerrit.wikimedia.org/r/167162
Fwiw, we renamed the parsoid repository to https://github.com/wikimedia/parsoid with no ill effects. Even the travis tests kept working.