Last modified: 2014-09-17 13:47:17 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T59909, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 57909 - Bogus entries in externallinks table due to unescaping of &%=+
Bogus entries in externallinks table due to unescaping of &%=+
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
Parser (Other open bugs)
1.23.0
All All
: Normal normal (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-12-03 03:06 UTC by Brad Jorsch
Modified: 2014-09-17 13:47 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Brad Jorsch 2013-12-03 03:06:31 UTC
Consider this URL:

 http://example.com/index.php?foo=bar%26baz%3Dquux%2Bquux

It has one parameter, foo, with the value "bar&baz=quux+quux". Place this in an article and the externallinks table will contain this URL instead:

 http://example.com/index.php?foo=bar&baz=quux+quux

This has *two* parameters, foo with the value "bar" and baz with the value "quux quux".

Then try this URL:

 http://example.com/index.php?foo=%25xx

The value of foo is "%xx". But put it into an article, and externallinks will contain this URL instead:

 http://example.com/index.php?foo=%xx

That's not even valid.


The problem lies in Parser::replaceUnusualEscapesCallback, it will unescape %25, %26, %2B, and %3D despite these all having special meaning in a URL when unescaped. I see a similar-sounding problem was reported in bug 4781, which was closed as "fixed" with no reference to the revision in which it was fixed. Bug 40267 also touched upon this issue, but these real problems appear to have been overlooked since the reporter there focused on the unescaping of various safe characters rather than only these unsafe ones.
Comment 1 Brad Jorsch 2013-12-03 03:10:52 UTC
So the question I have is: Can we just change replaceUnusualEscapesCallback (leaving externallinks inconsistent until all these pages happen to be reparsed), or should we try to figure out which pages are affected and run a maintenance script of some sort over them, or is externallinks supposed to contain such broken entries?
Comment 2 MZMcBride 2013-12-03 03:33:50 UTC
(In reply to comment #1)
> So the question I have is: Can we just change replaceUnusualEscapesCallback
> (leaving externallinks inconsistent until all these pages happen to be
> reparsed), or should we try to figure out which pages are affected and run a
> maintenance script of some sort over them, or is externallinks supposed to
> contain such broken entries?

You could null edit all the pages. :-)
Comment 3 Gerrit Notification Bot 2014-08-08 11:09:09 UTC
Change 152889 had a related patch set uploaded by Anomie:
Improve Parser::replaceUnusualEscapes

https://gerrit.wikimedia.org/r/152889
Comment 4 Gerrit Notification Bot 2014-09-16 23:07:59 UTC
Change 152889 merged by jenkins-bot:
Improve/rename Parser::replaceUnusualEscapes

https://gerrit.wikimedia.org/r/152889

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links