Last modified: 2014-11-11 16:16:21 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T70054, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 68054 - Gerrit: Use GitHub redirects instead of duplicate mirrors
Gerrit: Use GitHub redirects instead of duplicate mirrors
Status: PATCH_TO_REVIEW
Product: Wikimedia
Classification: Unclassified
Git/Gerrit (Other open bugs)
wmf-deployment
All All
: Normal enhancement (vote)
: ---
Assigned To: Krinkle
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-07-15 18:28 UTC by Krinkle
Modified: 2014-11-11 16:16 UTC (History)
6 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Krinkle 2014-07-15 18:28:40 UTC
For repositories that we're (also) mirroring at a custom name, they are currently being mirrored twice.

For example:

* https://github.com/wikimedia/VisualEditor
  https://github.com/wikimedia/VisualEditor-VisualEditor


* https://github.com/wikimedia/oojs
  https://github.com/wikimedia/oojs-core

* https://github.com/wikimedia/puppet-kafka
  https://github.com/wikimedia/operations-puppet-kafka

This is confusing and also decentralises things like Watching and Fork tracking. But, more importantly, it also makes our Star rating and Pull requests harder to maintain.


I don't know whether we intentionally do both, or whether it was just easier to implement this way.

I'd recommend we get rid of the full names where we want to publish the repo under a different name.

If it was a conscious choice to do this I'm curious why. The only reason I can think of is to make them easier to find on GitHub, which is hardly a big issue since they have excellent search for repo names right on https://github.com/wikimedia.

Another reason might be being able to automatically generate a GitHub address for a Git repo, but I'm not sure that's a big issue.

Also note that GitHub supports renaming of repositories and automatically keeps redirects[1]. So we can remove the "unwanted" names from Gerrit replication config and have our github.com/wikimedia admins delete the repos and create a redirect (by renaming the good one to the bad one and then renaming it back).

As an example I've created a redirect from https://github.com/wikimedia/mediawiki to https://github.com/wikimedia/mediawiki-core. Eventually this should be done the other way around but this was easier to set up since Gerrit replag currently did not push to /mediawiki yet (unlike oojs and VisualEditor).


[1] https://github.com/blog/1508-repository-redirects-are-here
[2] https://help.github.com/articles/renaming-a-repository
Comment 1 christian 2014-07-16 09:16:56 UTC
> If it was a conscious choice to do this I'm curious why.

We want all gerrit repos replicated on github.

We could setup replication for each repository in our gerrit
separately. That would allow full control of repository names at
github. But it is tedious, time-consuming, and has other warts too.

So in order to escape such a messy, verbose replication setup, gerrit
replicates each and every projects to github under the name that gets
used on gerrit [1]. For this replication, we do not prune leading
paths, as that would map different names on gerrit to the same names
on github [2].

However, some people felt strongly that they needed a better
repository name on github, or had a following on github that they
would not want to loose. So they had an additional, separate
replication target set up. And hence, those repositories get replicated
twice to github. Once for the canonical name. And a second time for
the custom name of their liking.

> Also note that GitHub supports renaming of repositories and
> automatically keeps redirects[...]

Thanks! That wasn't available when we added custom replication.
We could use that to solve most issues.

> As an example I've created a redirect from
> https://github.com/wikimedia/mediawiki to
> https://github.com/wikimedia/mediawiki-core. Eventually this should be
> done the other way around [...]

Not too sure about this one.
Doing it “the other way around” would make it even more confusing for
people, because the “canonical” github name would differ from the
gerrit name too much to be able to map it clearly.



[1] github does not like “/”s in repo names, so we're having them
    replaced by “-”s.

[2] For example we currently have three repositories in gerrit that
    end in “/data”.
    Or which of “analytics/kraken” and “analytics/vagrant/kraken”
    should become github's “kraken”?
    Same problems for “/Nostalgia”.
Comment 2 Krinkle 2014-08-25 12:27:34 UTC
(In reply to christian from comment #1)
> > If it was a conscious choice to do this I'm curious why.
> 
> We want all gerrit repos replicated on github.
> 
> We could setup replication for each repository in our gerrit
> separately. That would allow full control of repository names at
> github. But it is tedious, time-consuming, and has other warts too.
> 

> So in order to escape such a messy, verbose replication setup, gerrit
> replicates each and every projects to github under the name that gets
> used on gerrit [1]. For this replication, we do not prune leading
> paths, as that would map different names on gerrit to the same names
> on github [2].

I wasn't suggesting that we do it manually. All repos should be replicated. I'm also well-aware of naming conflicts if we were to prune leading paths. Nobody was suggesting that.

> However, some people felt strongly that they needed a better
> repository name on github, or had a following on github that they
> would not want to loose. So they had an additional, separate
> replication target set up. And hence, those repositories get replicated
> twice to github. Once for the canonical name. And a second time for
> the custom name of their liking.

Exactly. And there's no reason to have to replications. It should be easy to remove the additional one. For projects using github more publicly, they tend to have a better name there to hide internal implementation details of Gerrit or Wikimedia (e.g. VisualEditor, MediaWiki and OOjs have stupidly named repos in gerrit of VisualEditor/VisualEditor and oojs/core and mediawiki/core, this is due to conventions and limitations in our use of Gerrit).

> > Also note that GitHub supports renaming of repositories and
> > automatically keeps redirects[...]
> 
> Thanks! That wasn't available when we added custom replication.
> We could use that to solve most issues.
> 
> > As an example I've created a redirect from
> > https://github.com/wikimedia/mediawiki to
> > https://github.com/wikimedia/mediawiki-core. Eventually this should be
> > done the other way around [...]
> 
> Not too sure about this one.
> Doing it “the other way around” would make it even more confusing for
> people, because the “canonical” github name would differ from the
> gerrit name too much to be able to map it clearly.
> 

I'm not sure you understood that I meant.

Right now we replicate all repos from <gerrit-id> to github:<gerrit-id-escaped> and have additional replications for those that needed better names.

I'm saying: Omit the ones with custom names from the wildcard replication so that there's only one.

For existing repos already replicated twice, we'll have to manually remove the (now stale) repository from the admin panel at github. While at it, we can turn those into redirects to the custom name to make sure any urls stay working.
Comment 3 Krinkle 2014-10-17 02:13:46 UTC
The redirects GitHub has in place apply both to the HTTP protocol (web interface) as well as the Git interface itself. We can simply remove the duplicate "pretty name" replications and perform the renames on GitHub. That way there's only one repo Gerrit is pushing two and both urls work.

I've gone ahead and done this with mediawiki-core as example.

From Gerrit's perspective it's a straight up replication from gerrit:mediawiki/core to github:mediawiki-core. There is no fancy-name duplicate set up for it.

On GitHub it has been renamed from mediawiki-core with a full redirect in place. Gerrit is now effectively pushing to https://github.com/wikimedia/mediawiki while all urls and logs for mediawiki-core continue to work as expected.
Comment 4 Gerrit Notification Bot 2014-10-17 02:34:00 UTC
Change 167162 had a related patch set uploaded by Krinkle:
gerrit: Remove duplicate mirrors

https://gerrit.wikimedia.org/r/167162
Comment 5 Gerrit Notification Bot 2014-10-20 23:13:24 UTC
Change 167162 merged by Dzahn:
gerrit: Remove duplicate mirrors

https://gerrit.wikimedia.org/r/167162
Comment 6 Gabriel Wicke 2014-11-11 16:16:21 UTC
Fwiw, we renamed the parsoid repository to https://github.com/wikimedia/parsoid with no ill effects. Even the travis tests kept working.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links