Last modified: 2013-07-04 10:33:45 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T44251, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 42251 - Implicit end tag rt code sometimes eats explicit end tags
Implicit end tag rt code sometimes eats explicit end tags
Status: RESOLVED FIXED
Product: Parsoid
Classification: Unclassified
General (Other open bugs)
unspecified
All All
: Normal normal
: ---
Assigned To: Gabriel Wicke
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-11-19 02:37 UTC by Gabriel Wicke
Modified: 2013-07-04 10:33 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Gabriel Wicke 2012-11-19 02:37:53 UTC
Test case:

echo -e '<FONT COLOR="#000000">a</FONT><FONT COLOR="#FF3300">b</FONT>' | nodejs parse --wt2wt
<font COLOR="#000000">a<font COLOR="#FF3300">b
Comment 1 ssastry 2012-11-20 20:37:53 UTC
This seems to be a bug (?) in the HTML5 tree builder. It seems to be deleting upper-case tags and then implicitly closes unmatched tags which is what introduces the bug reproted here.

https://gist.github.com/51b7a850b2b5cb3f3579 demonstrates this.

While we can fix this by sending the html5 tree builder lower case tag names, this will introduce dirty diffs on upper vs. lower case.  But then we are already introducing this right now since we are not tracking the source-case, so maybe not a big deal if we rely on selective serialization to deal with this.
Comment 2 ssastry 2012-11-20 20:38:24 UTC
Pasting the contents of the gist inline here in case that goes away at a later point.

[subbu@earth tests] echo "<b>foo</b><i>bar</i>" | node parse.js --trace html
---- <chunk> ----
T:html: {"type":"TagTk","name":"p","attribs":[],"dataAttribs":{}}
T:html: {"type":"TagTk","name":"b","attribs":[],"dataAttribs":{"tsr":[0,3],"stx":"html"}}
T:html: "foo"
T:html: {"type":"EndTagTk","name":"b","attribs":[],"dataAttribs":{"tsr":[6,10],"stx":"html"}}
T:html: {"type":"TagTk","name":"i","attribs":[],"dataAttribs":{"tsr":[10,13],"stx":"html"}}
T:html: "bar"
T:html: {"type":"EndTagTk","name":"i","attribs":[],"dataAttribs":{"tsr":[16,20],"stx":"html"}}
T:html: {"type":"EndTagTk","name":"p","attribs":[],"dataAttribs":{}}
T:html: {"type":"NlTk","dataAttribs":{}}
T:html: {"type":"EOFTk"}
---- </chunk> ----
<p data-parsoid="{&quot;dsr&quot;:[0,20]}"><b data-parsoid="{&quot;tsr&quot;:[0,3],&quot;stx&quot;:&quot;html&quot;,&quot;dsr&quot;:[0,10]}">foo</b><i data-parsoid="{&quot;tsr&quot;:[10,13],&quot;stx&quot;:&quot;html&quot;,&quot;dsr&quot;:[10,20]}">bar</i></p>

[subbu@earth tests] echo "<B>foo</B><I>bar</I>" | node parse.js --trace html
---- <chunk> ----
T:html: {"type":"TagTk","name":"p","attribs":[],"dataAttribs":{}}
T:html: {"type":"TagTk","name":"B","attribs":[],"dataAttribs":{"tsr":[0,3],"stx":"html"}}
T:html: "foo"
T:html: {"type":"EndTagTk","name":"B","attribs":[],"dataAttribs":{"tsr":[6,10],"stx":"html"}}
T:html: {"type":"TagTk","name":"I","attribs":[],"dataAttribs":{"tsr":[10,13],"stx":"html"}}
T:html: "bar"
T:html: {"type":"EndTagTk","name":"I","attribs":[],"dataAttribs":{"tsr":[16,20],"stx":"html"}}
T:html: {"type":"EndTagTk","name":"p","attribs":[],"dataAttribs":{}}
T:html: {"type":"NlTk","dataAttribs":{}}
T:html: {"type":"EOFTk"}
---- </chunk> ----
<p data-parsoid="{&quot;dsr&quot;:[0,20]}"><b data-parsoid="{&quot;tsr&quot;:[0,3],&quot;stx&quot;:&quot;html&quot;,&quot;autoInsertedEnd&quot;:true,&quot;dsr&quot;:[0,20]}">foo<i data-parsoid="{&quot;tsr&quot;:[10,13],&quot;stx&quot;:&quot;html&quot;,&quot;autoInsertedEnd&quot;:true,&quot;dsr&quot;:[10,null]}">bar</i></b></p>
Comment 3 ssastry 2012-11-20 21:05:51 UTC
Fixed in https://gerrit.wikimedia.org/r/34364
Comment 4 Andre Klapper 2013-07-04 10:33:45 UTC
[Parsoid component reorg by merging JS/General and General. See bug 50685 for more information. Filter bugmail on this comment. parsoidreorg20130704]

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links