Last modified: 2014-10-16 11:52:55 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T53720, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 51720 - [Regex] auto-linking to gerrit commits is too aggressive for dates such as 20130101
[Regex] auto-linking to gerrit commits is too aggressive for dates such as 20...
Status: NEW
Product: Wikimedia
Classification: Unclassified
Bugzilla (Other open bugs)
wmf-deployment
All All
: Lowest minor (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-07-20 00:39 UTC by spage
Modified: 2014-10-16 11:52 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description spage 2013-07-20 00:39:51 UTC
bugzilla turns a timestamp like 20130718 into a "Git commit" link to a gerrit query.  See how this messed up e.g. bug 50204 comment 11.

This arises from the regex in https://git.wikimedia.org/blob/wikimedia%2Fbugzilla%2Fmodifications/HEAD/bugzilla-4.2%2Fextensions%2FWikimedia%2FExtension.pm

   my $replacerGitCommit = {
       match => qr{(?:^|(?<=[\s\[\{\(]))([a-f0-9]{8,40})}i,
       replace => \&_createGitCommitLink
   };

I think the match should require at least one a-f in its hexadecimal digits to avoid these false matches with large numbers. I don't know if it should require a digit as well, to avoid turning words like "defaced" into a commit. My perlre fu is unclear on how to make these assertions. Also, since `git log --oneline` produces 7-digit commit SHA-1s, the match should start at 7 hex digits.

You could add a match on "qr{(?:commit\s)([a-f0-9]{7,40})}i" so people can force a link to "commit 1234567".
Comment 1 Andre Klapper 2013-07-22 10:55:57 UTC
As there can also be commit SHAs that only consist of numbers (without a-e letters), we might either have "some stuff is autolinked but shouldn't" (now) or "some stuff is not autolinked but should"... 

Wondering if disabling autolinking for exactly 8 digits && starting with 200, 201, 202 could work.
Comment 2 spage 2013-07-23 01:50:52 UTC
Yes that's a better solution, also starting with 199.  There are indeed all-numeric commit 7-digit SHAs: (10/16)^7 suggests they'll be 3.7% of our commits (and `git log --oneline | ack '^[0-9]{7}' | wc` agrees).
Comment 3 Andre Klapper 2013-09-06 16:25:06 UTC
Seeing the two numbers in the bug summary of bug 53840 autolinked (Allowed memory size of 183500800 bytes exhausted (tried to allocate 2018670161 bytes)) I wonder to generally doubt how useful $replacerGitCommit actually is with its current configuration of lengths between 8 and 40 characters ({8,40}) as it links against Gerrit's query functionality (hence searching for a 5 digit changeset would also be a potential target here, but that would like collide with bug numbers so whoever set that up likely started with 8 chars because of that).

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links