Last modified: 2014-10-16 11:52:55 UTC
bugzilla turns a timestamp like 20130718 into a "Git commit" link to a gerrit query. See how this messed up e.g. bug 50204 comment 11. This arises from the regex in https://git.wikimedia.org/blob/wikimedia%2Fbugzilla%2Fmodifications/HEAD/bugzilla-4.2%2Fextensions%2FWikimedia%2FExtension.pm my $replacerGitCommit = { match => qr{(?:^|(?<=[\s\[\{\(]))([a-f0-9]{8,40})}i, replace => \&_createGitCommitLink }; I think the match should require at least one a-f in its hexadecimal digits to avoid these false matches with large numbers. I don't know if it should require a digit as well, to avoid turning words like "defaced" into a commit. My perlre fu is unclear on how to make these assertions. Also, since `git log --oneline` produces 7-digit commit SHA-1s, the match should start at 7 hex digits. You could add a match on "qr{(?:commit\s)([a-f0-9]{7,40})}i" so people can force a link to "commit 1234567".
As there can also be commit SHAs that only consist of numbers (without a-e letters), we might either have "some stuff is autolinked but shouldn't" (now) or "some stuff is not autolinked but should"... Wondering if disabling autolinking for exactly 8 digits && starting with 200, 201, 202 could work.
Yes that's a better solution, also starting with 199. There are indeed all-numeric commit 7-digit SHAs: (10/16)^7 suggests they'll be 3.7% of our commits (and `git log --oneline | ack '^[0-9]{7}' | wc` agrees).
Seeing the two numbers in the bug summary of bug 53840 autolinked (Allowed memory size of 183500800 bytes exhausted (tried to allocate 2018670161 bytes)) I wonder to generally doubt how useful $replacerGitCommit actually is with its current configuration of lengths between 8 and 40 characters ({8,40}) as it links against Gerrit's query functionality (hence searching for a 5 digit changeset would also be a potential target here, but that would like collide with bug numbers so whoever set that up likely started with 8 chars because of that).