Last modified: 2014-05-15 02:37:24 UTC
Compare these two TOCs: http://en.wikipedia.beta.wmflabs.org/wiki/Igloo http://en.m.wikipedia.beta.wmflabs.org/wiki/Igloo (in beta mode) The desktop one converts & to & (which is correct): You & your mom & your dog & your shirt The mobile one converts & to & (which is wrong): You & your mom & your dog & your shirt
Prioritization and scheduling of this bug is tracked on Trello card https://trello.com/c/IIXEuTOZ
I think it's these lines in PageApi.js transformSections(): // Store text version so users of API do not have to worry about styling e.g. Table of Contents // FIXME: [API] should probably do this for us - we want to be able to control the styling of these headings - no inline styles! section.lineText = $( '<div>' ).html( section.line ).text(); ... $( '<h' + section.level + '>' ).attr( 'id', section.anchor ).html( section.line ) Since this seems to be already plaintext, and not HTML, performing the conversion has the nice side effect of executing any <script> contained in the text, without leaving it in the DOM to be seen.
Hmmm, even if I change that entire each loop to: $.each( sections, function( i, section ) { section.lineText = ''; section.children = []; result.push( section ); } ); It still executes the section title as HTML.
(In reply to Ryan Kaldari from comment #3) > Hmmm, even if I change that entire each loop to: > $.each( sections, function( i, section ) { > section.lineText = ''; > section.children = []; > result.push( section ); > } ); > It still executes the section title as HTML. Hmm, I can confirm BUT it only executes it ONCE instead of twice, so it seems to be an improvement. :D There must be another similar one hiding around somewhere...
If I add... section.line = ''; ...to that loop, it eliminates the last execution. Of course that breaks a few features though.
I have it fixed locally. Will upload diff in a sec...
Created attachment 15321 [details] fix for section unescaping bug patch attached
Created attachment 15322 [details] better patch file
mw.sections[].line is defined in ApiMobileView: https://github.com/wikimedia/mediawiki-extensions-MobileFrontend/blob/a893faf37b78e6/includes/api/ApiMobileView.php#L402-L408 $data['sections'] = $parserOutput->getSections(); Which comes from mediawiki-core Parser: https://github.com/wikimedia/mediawiki-core/blob/b31c093855be5e0/includes/parser/Parser.php#L4469-L4490 Server-side code uses it as html (the parser claims it has been tripped and escaped where needed, it is considered html, not text). For example, Linker::tocLine, embeds it raw: https://github.com/wikimedia/mediawiki-core/blob/b31c093855be5e0/includes/Linker.php#L1657-L1665 return ... '<span class="toctext">' . $tocline . '</span>' .. Back to the MobileFrontend code, it should be used as text or as html. I think in this case it is supposed to be used as html. If that's unsafe, then we're likely vulnerable in mediawiki-core's html response as well and it should be fixed there instead. Either way, .html( .. ).text() is a horrible hack that should not be used. If you use mw.html.escape, use it directly and only. If you find yourself doing: var fooHtml = mw.html.escape(somethingText); $something.html(fooHtml); Please don't. This is extra overhead in both escaping (string manipulation), and parsing (by the browser) for no good reason. Instead just pass the original value to $something.text(), which will instruct the browser to create a native string of text, bypassing the need to escape and parse as html altogether (remember, when in javascript, there is no such thing as HTML, you're interacting with the DOM directly. You wouldn't "append" a string of php syntax to a php array either, instead you'd set a property on the array or use array push, and that is safe. Same goes for text on an element, using .text() goes straight to where it should go without the need to escape or parse.
Created attachment 15360 [details] get rid of lineText completely, fix transformSections
Created attachment 15361 [details] again as a patch
https://gerrit.wikimedia.org/r/#/c/133382/