Last modified: 2014-09-17 13:47:17 UTC
Consider this URL: http://example.com/index.php?foo=bar%26baz%3Dquux%2Bquux It has one parameter, foo, with the value "bar&baz=quux+quux". Place this in an article and the externallinks table will contain this URL instead: http://example.com/index.php?foo=bar&baz=quux+quux This has *two* parameters, foo with the value "bar" and baz with the value "quux quux". Then try this URL: http://example.com/index.php?foo=%25xx The value of foo is "%xx". But put it into an article, and externallinks will contain this URL instead: http://example.com/index.php?foo=%xx That's not even valid. The problem lies in Parser::replaceUnusualEscapesCallback, it will unescape %25, %26, %2B, and %3D despite these all having special meaning in a URL when unescaped. I see a similar-sounding problem was reported in bug 4781, which was closed as "fixed" with no reference to the revision in which it was fixed. Bug 40267 also touched upon this issue, but these real problems appear to have been overlooked since the reporter there focused on the unescaping of various safe characters rather than only these unsafe ones.
So the question I have is: Can we just change replaceUnusualEscapesCallback (leaving externallinks inconsistent until all these pages happen to be reparsed), or should we try to figure out which pages are affected and run a maintenance script of some sort over them, or is externallinks supposed to contain such broken entries?
(In reply to comment #1) > So the question I have is: Can we just change replaceUnusualEscapesCallback > (leaving externallinks inconsistent until all these pages happen to be > reparsed), or should we try to figure out which pages are affected and run a > maintenance script of some sort over them, or is externallinks supposed to > contain such broken entries? You could null edit all the pages. :-)
Change 152889 had a related patch set uploaded by Anomie: Improve Parser::replaceUnusualEscapes https://gerrit.wikimedia.org/r/152889
Change 152889 merged by jenkins-bot: Improve/rename Parser::replaceUnusualEscapes https://gerrit.wikimedia.org/r/152889