Last modified: 2013-12-03 23:08:19 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T52536, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 50536 - Unclosed formatting elements leaking out of transclusions
Unclosed formatting elements leaking out of transclusions
Status: NEW
Product: Parsoid
Classification: Unclassified
General (Other open bugs)
unspecified
All All
: High normal
: ---
Assigned To: Gabriel Wicke
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-07-01 23:34 UTC by Roan Kattouw
Modified: 2013-12-03 23:08 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Roan Kattouw 2013-07-01 23:34:34 UTC
http://parsoid.wmflabs.org/en/User:Adjwilley/sandbox10?oldid=562331693

Relevant wikitext:

*{{Cite book| last=Fádil-i-Mázindarání | first= Asadu'lláh | authorlink = Mírzá Asadu’llah Fádil Mázandarání | year=1967 | title=Asráu'l-Á<u>th<u>ár, Vol.I| pages= 453 | publisher=Bahá'í Publishing Trust, Tehran | url = http://www.h-net.org/~bahai/areprint/authors/mazandarani/asrar.htm | ref=harv}}

(note the <u> isn't opened then closed, but double-opened)

Output HTML (with data-*, about and a few others stripped):

<li><span typeof="mw:Transclusion"><a rel="mw:WikiLink" href="...">Fádil-i-Mázindarání, Asadu'lláh</a> (1967). <a rel="mw:ExtLink" href="..."><i>Asráu'l-Á<u>th<u>ár, Vol.I</u></u></i></a><u><u>. Bahá'í Publishing Trust, Tehran. p.<span typeof="mw:Entity">&nbsp;</span>453.</u></u></span><u><u><span class="Z3988"><span style="display:none;"><span typeof="mw:Entity">&nbsp;</span></span></span></u></u></li><u><u>

<li>(next item here, note how we're still inside of <u><u>)
Comment 1 Roan Kattouw 2013-07-01 23:35:16 UTC
Diff showing resulting VE breakage: https://en.wikipedia.org/w/index.php?title=User:Adjwilley/sandbox10&curid=38366631&diff=562462441&oldid=562331693 . I'm not sure if the categories being dropped is on the VE or Parsoid side yet.
Comment 2 Gabriel Wicke 2013-07-02 01:03:39 UTC
Since this is a plain transclusion this is pretty much the behavior I would expect. The formatting is processed in accordance with the HTML spec.

I suspect tidy has a heuristic that closes the <u> at some point in contrast with the HTML spec.

Something that looks broken on the Parsoid side is the missing encapsulation for formatting that spans list items:

echo '<ul><li>{{echo|<u>}}</li><li>foo</li></ul>' | node parse
<body><ul>
  <li><u about="#mwt1" typeof="mw:Transclusion" data-mw='{"target":{"wt":"echo","href":"./Template:Echo"},"params":{"1":{"wt":"<u>"}},"i":0}'></u></li>
  <li><u>foo</u></li>
</ul>
<u>
</u></body>

(reformatted the above a bit and removed data-parsoid for clarity)

We should in theory be able to detect that the leaked <u>s were completely auto-inserted by the treebuilder. Maybe we can use that information to extend the encapsulation for cases like this one.
Comment 3 ssastry 2013-07-05 19:52:14 UTC
No, the tpl encapsulation looks correct to me.  The tree builder correction in this case is quite broad and it would be hard to detect auto-correction as well.  On this last example, there are no autoInserted* flags except on the innermost <u> tag.  As long as we are using a html5 compliant tree builder, our options are a little limited on this kind of html without doing more complicated tracking/analyses.  I am not convinced those are worth the trouble and headaches.
Comment 4 Gabriel Wicke 2013-07-11 21:03:47 UTC
The template encapsulation does not catch all template-affected content, so is not correct really. Alternatively we should be able to undo the leak to match the PHP parser's semantics. That would remove the need for extending the encapsulation in this case.
Comment 5 Gerrit Notification Bot 2013-07-11 23:45:28 UTC
Change 73369 had a related patch set uploaded by Subramanya Sastry:
WIP: (Bug 50536) Improved handling of tree-builder fixup

https://gerrit.wikimedia.org/r/73369
Comment 7 Gabriel Wicke 2013-12-03 23:08:19 UTC
Re-titling to reflect our plan [1] to avoid the leaking and match PHP parser semantics instead.,

[1]: https://www.mediawiki.org/wiki/Parsoid/DOM_notes

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links