Last modified: 2013-11-27 20:06:44 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T50772, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 48772 - Protect Parsoid-generated token-names, about-ids, typeofs from conflicts with user-provided names.
Protect Parsoid-generated token-names, about-ids, typeofs from conflicts with...
Status: NEW
Product: Parsoid
Classification: Unclassified
General (Other open bugs)
unspecified
All All
: Normal normal
: ---
Assigned To: Gabriel Wicke
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-05-24 10:44 UTC by ssastry
Modified: 2013-11-27 20:06 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description ssastry 2013-05-24 10:44:46 UTC
User can enter html tags or provide tag attributes or attribute values that Parsoid (and downstream clients) treat specially.

Ex: #mwt3 for about-ids, or mw:Object/* for typeofs or <template> tags or any of the special attributes/types found on http://www.mediawiki.org/wiki/Parsoid/MediaWiki_DOM_spec

If Parsoid doesn't detect these and escape/handle them somehow, Parsoid will mislead clients like VisualEditor and/or incorrectly serialize them.
Comment 1 C. Scott Ananian 2013-06-17 23:24:38 UTC
There are three interconnected issues:
1)
  <span prefix="mw: http://evil.bad">
is valid wikitext, which would create malformed Parsoid DOM.  We should sanitize the wikitext (but that has to happen *before* we create the DOM, since otherwise we can't tell which prefix attributes are good and which are evil.)

2) VE needs to prevent users from authoring content which sets prefix attributes, etc.  Currently it does so, but it would be nice to make Parsoid more robust against malformed DOM, and/or to add layers of protection so that front ends aren't solely responsible for sanitizing user input.

3) Longer term we should probably think about use cases where the user wants to deliberately author RDFa markup on their content, and ensure that they are able to do so in a safe way.

This bug is primarily about #1 (the short term issue) and I'll tackle it tomorrow.
Comment 2 Gabriel Wicke 2013-06-17 23:32:57 UTC
1) Is not an issue, as prefix is  not allowed in the PHP sanitizer (and not in our sanitizer either).

2) Is something for later. The risk here is mainly crashes during serialization.

3) Should be supported. We only want to protect our own values where necessary. The typeof attribute for example is multi-valued, so we only need to strip mw:-prefixed user-supplied values. The about attribute on the other hand is single-valued, so we need to override user-supplied values unconditionally where necessary.
Comment 3 Andre Klapper 2013-07-04 10:34:16 UTC
[Parsoid component reorg by merging JS/General and General. See bug 50685 for more information. Filter bugmail on this comment. parsoidreorg20130704]
Comment 4 Gerrit Notification Bot 2013-07-19 06:06:39 UTC
Change 74586 had a related patch set uploaded by Arlolra:
Protect Parsoid-generated attributes.

https://gerrit.wikimedia.org/r/74586
Comment 5 C. Scott Ananian 2013-07-22 21:40:46 UTC
[Discussed this on IRC with subbu]

To support case 3 from comment 1 and 2, rather than *strip* the attributes, I suggest encapsulating them instead.  For example, if the user authors:

<div about="http://dbpedia.org/resource/Albert_Einstein" typeof="foaf:Person">
  <span property="foaf:name">Albert Einstein</span>
  <span property="foaf:givenName">Albert</span>
</div>

..instead of stripping the RDFa, we're just going to rename the attributes, to:

<div data-user-about="http://dbpedia.org/resource/Albert_Einstein" data-user-typeof="foaf:Person">
  <span data-user-property="foaf:name">Albert Einstein</span>
  <span data-user-property="foaf:givenName">Albert</span>
</div>

That ensures that the user content is safe for Parsoid and VE to manipulate.  In the future we can re-enable the RDFa in the output with a DOM post-processing ("render") pass.  For example, the user content:

<div about="http://dbpedia.org/resource/Albert_Einstein" typeof="foaf:Person">
{{Albert Einstein}}
</div>

might have all sorts of Parsoid RDFa markup inside the {{Albert Einstein}} expansion, which the about="http://dbpedia.org/resource/Albert_Einstein" attribute would screw up the context for.  But for rendering to the web, the Parsoid cruft could be elided and the user's RDFa markup unwrapped.
Comment 6 ssastry 2013-07-22 21:47:42 UTC
If we do go that route, the fixup for rendering would have to be in client-side JS since we may not use different cached content for editing vs rendering -- or so it seems to me at this time without any sense of how user RDFa markup will need to be supported in MW content.
Comment 7 C. Scott Ananian 2013-07-22 21:52:04 UTC
Or we could write a DOM postprocessor stage that was a bit smarter about which RDFa it unwrapped.  ie, it would unwrap the first example from comment 5, but not the final example there, since it would be able to see that there's parsoid RDFa inside the div which would be broken.  Or other options.
Comment 8 Gerrit Notification Bot 2013-07-27 17:42:42 UTC
Change 74586 merged by jenkins-bot:
Protect Parsoid-generated attributes.

https://gerrit.wikimedia.org/r/74586
Comment 9 Arlo Breault 2013-07-29 16:56:44 UTC
Patch landed as suggested in comment 5. However, Subbu mentioned that, "we still need to deal with complexity that templates bring."

Ex: <div {{echo|about}}="foo">bar</div>
Comment 10 ssastry 2013-09-03 22:49:44 UTC
One more thing that would be good to fix:

Currently, the tokenizer unconditionally escapes typeof and about attributes to data-x-typeof and data-x-about. But, if RDFa attrs from wikitext will also get escaped in this manner. Not sure if RDFa is actually used in wikitext, but that support does exist in MW. So, the escaping could be refined to leave alone any typeof and about attrs that wont confuse Parsoid and clients.

Related FIXME comments from the sanitizer (ext.core.Sanitizer.js) from https://gerrit.wikimedia.org/r/#/c/81569/ that I am pasting here:

----
// SSS FIXME: There is a test in mediawiki.environment.js that doles out
// and tests about ids. There are probably some tests in mediawiki.Util.js
// as well. We should move all these kind of tests somewhere else.
----
// Bypass RDFa/whitelisting checks for Parsoid-inserted attrs
// Safe to do since the tokenizer renames about/typeof attrs.
// unconditionally. FIXME: The escaping solution in tokenizer
// maybe aggressive. There is no need to escape typeof strings
// that or about ids that don't resemble Parsoid types/about ids.
----

Any fix made here should also verify that the sanitization code in sanitizeTagAttrs properly strips dangerous values from unescaped typeofs even when Parsoid updates them with parsoid typeofs.

Ex: <tag typeof="dangerous-text">..</tag> gets updated by Parsoid parser to <tag typeof="mw:ParsoidTypeof dangerous-text">..</tag> which should then be handled by the sanitizer to yield <tag typeof="mw:ParsoidTypeof">...</tag>

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links