Last modified: 2013-11-15 11:06:08 UTC
Created attachment 13209 [details] screenshot wfEscapeWikiText does not escape enough characters, allowing undesirable formatting through in certain cases. To reproduce, open the following URL. This is a search for "__TOC__ OR<CR>;a<CR>ISBN<TAB>978-3-16-148410-0<CR> a". https://en.wikipedia.org/w/index.php?title=Special%3ASearch&profile=all&search=__TOC__%20OR%0D;a%0D:ISBN%09978-3-16-148410-0%0D%20a&fulltext=Search Expected Result: The text "Results 1–6 of 6 for " (from message 'showingresultsheader') is followed by "__TOC__ OR ;a ISBN 978-3-16-148410-0 a", with no special formatting or linking beyond the bolding applied by the message text. Actual Result: __TOC__ disappears. The first "a" appears on the next line. The ISBN is indented (as a definition in a definition list) and linked to Special:BookSources. The second "a" appears as monospaced text inside a pre element.
(In reply to comment #0) > To reproduce, open the following URL. This is a search for > "__TOC__ OR<CR>;a<CR>ISBN<TAB>978-3-16-148410-0<CR> a". Actually for "__TOC__ OR<CR>;a<CR>:ISBN<TAB>978-3-16-148410-0<CR> a"
To be clear, things that need to be handled here are: 1. Double underscore magic words 2. Magic links using a non-space whitespace 3. Newlines using CR instead of LF
Found some others as well: https://en.wikipedia.org/wiki/Special:Search/PMID_1 https://en.wikipedia.org/wiki/Special:Search/urn:foo Grepping the code reveals that Sanitizer::safeEncodeAttribute does handle the former, though not some of the other things wfEscapeWikiText is supposed to.
Change 82460 had a related patch set uploaded by Anomie: Improve wfEscapeWikiText https://gerrit.wikimedia.org/r/82460
Change 82462 had a related patch set uploaded by Anomie: Improve mw.text.nowiki https://gerrit.wikimedia.org/r/82462
Change 82462 merged by jenkins-bot: Improve mw.text.nowiki https://gerrit.wikimedia.org/r/82462
Change 82460 merged by jenkins-bot: Improve wfEscapeWikiText https://gerrit.wikimedia.org/r/82460
Changes merged. They should be deployed to WMF wikis with 1.22wmf16, see https://www.mediawiki.org/wiki/MediaWiki_1.22/Roadmap for the schedule.
What about two or more consecutive newlines? Should all newlines be escaped (not just those preceding #, *, etc.)? For example: > $m = new RawMessage( '$1' ); var_dump( $m->params( wfEscapeWikiText( "foo\n\n\nbar" ) )->parse() ); As of a86240a37aa729494bd4d7c7935afff4e5b62b22 I get: string(21) "foo\n</p><p><br />\nbar" I would expect this to be: string(21) "foo bar"
Change 85233 had a related patch set uploaded by Anomie: Improve wfEscapeWikiText, part 2 https://gerrit.wikimedia.org/r/85233
Change 85234 had a related patch set uploaded by Anomie: Improve mw.text.nowiki, part 2 https://gerrit.wikimedia.org/r/85234
Change 85233 merged by jenkins-bot: Improve wfEscapeWikiText, part 2 https://gerrit.wikimedia.org/r/85233
Change 85234 merged by jenkins-bot: Improve mw.text.nowiki, part 2 https://gerrit.wikimedia.org/r/85234
Changes merged. They should be deployed to WMF wikis with 1.22wmf19, see https://www.mediawiki.org/wiki/MediaWiki_1.22/Roadmap for the schedule.
Change 95420 had a related patch set uploaded by MarkAHershberger: Improve mw.text.nowiki https://gerrit.wikimedia.org/r/95420
Change 95421 had a related patch set uploaded by MarkAHershberger: Improve mw.text.nowiki, part 2 https://gerrit.wikimedia.org/r/95421
Change 95421 abandoned by MarkAHershberger: Improve mw.text.nowiki, part 2 https://gerrit.wikimedia.org/r/95421
Change 95420 abandoned by MarkAHershberger: Improve mw.text.nowiki https://gerrit.wikimedia.org/r/95420
No open patches to review here (backport patches got abandoned), hence restting status to RESOLVED FIXED. Backport_to_Stable flag might be set to "-" by hexmode.