Last modified: 2014-01-03 17:49:19 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T59469, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 57469 - Extension tag text (ex: <ref>) with opening/closing tags in different DOM nodes are not nowiki-escaped
Extension tag text (ex: <ref>) with opening/closing tags in different D...
Status: RESOLVED FIXED
Product: Parsoid
Classification: Unclassified
serializer (Other open bugs)
unspecified
All All
: High normal
: ---
Assigned To: Gabriel Wicke
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-11-23 01:12 UTC by James Forrester
Modified: 2014-01-03 17:49 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description James Forrester 2013-11-23 01:12:28 UTC
Input HTML

  <p>Foo<ref>Bar</ref>Baz</p>

Expected wikitext

  Foo<nowiki><ref>Bar</ref></nowiki>Baz

Actual wikitext

  Foo<ref>Bar</ref>Baz

This is rather problematic if a user is documenting MediaWiki operations. :-(
Comment 1 ssastry 2013-11-23 04:41:59 UTC
Hmm ... I guess somehow this one fell through the cracks or we have had regressions (which means our wt-escaping tests have holes in them). I was sure this was being handled already, and a cursory glance through the escaping code shows that there are checks for extensions, etc. Anyway, we will investigate next week.
Comment 2 ssastry 2013-11-23 21:54:20 UTC
So, actually, this is a slightly different use case than what is currently supported.

Try entering any of the extension tags in VE and Review changes. Parsoid escapes the ext-tag like text as long as VE sends them back to Parsoid as text, i.e. <p>Foo&ot;ref&gt;Bar&lt;/ref&gt;Baz</p>. I verified this behavior in my local install of VE. This is similar to entering <div> or any other HTML tag as text in VE. If VE does not escape those tag strings to text, Parsoid will not be able to escape them. 

Currently, Domino parses "<ref>foo</ref>" as a valid XML tree which is the same behavior as the browser (verified in the web console). So, while Parsoid could check the node name against the set of valid HTML5 tags and nowiki the others (since we are not serializing a generic XML domtree), why would VE not entity-escape "<" and ">" chars entered as text before sending them to Parsoid, i.e. is this a valid use case in the context of VE editing?
Comment 3 Gabriel Wicke 2013-11-25 19:15:44 UTC
Sounds a lot like a VE bug then. James, can you verify that you are sending those tags over the wire with correct escaping?
Comment 4 Roan Kattouw 2013-11-26 10:03:37 UTC
(In reply to comment #2)
> So, actually, this is a slightly different use case than what is currently
> supported.
> 
> Try entering any of the extension tags in VE and Review changes. Parsoid
> escapes the ext-tag like text as long as VE sends them back to Parsoid as
> text,
> i.e. <p>Foo&ot;ref&gt;Bar&lt;/ref&gt;Baz</p>. I verified this behavior in my
> local install of VE. This is similar to entering <div> or any other HTML tag
> as
> text in VE. If VE does not escape those tag strings to text, Parsoid will not
> be able to escape them. 
> 
That's how it should work, but on current master I get:

$ echo "<p>&lt;ref&gt;</p>" | node js/tests/parse.js --html2wt
<ref>
$ echo "<p>&lt;ref&gt; <code>&lt;ref&gt;</code></p>" | node js/tests/parse.js --html2wt
<ref> <code><ref></code>
Comment 5 ssastry 2013-11-26 13:08:57 UTC
Ah, I see .. The first example is correct behavior since isolated "<ref>" or "</ref>" are always parsed back as text and hence there is no need to nowiki-escape them.

However, the escaping code does need an update to deal with the second example. It currently only handles matching opening/closing extension tags in the same text node. 

[subbu@earth tests] echo "&lt;ref&gt;foo&lt;/ref&gt;" | node parse --html2wt
<nowiki><ref>foo</ref></nowiki>
[subbu@earth tests] echo "&lt;ref&gt;foo" | node parse --html2wt
<ref>foo
[subbu@earth tests] echo "&lt;ref&gt;foo<p>&lt;/ref&gt;</p>" | node parse --html2wt
<ref>foo

</ref>

Unlike escaping for link/heading tags, where we only need to worry about wikitext on a single-line, the opening and closing tags for extensions can be split anywhere in the DOM. So, a conservative approach would be to always escape individual <ref> or </ref> tags which, in some cases, might lead to excess nowiki escaping, but will always be safe.
Comment 6 Roan Kattouw 2013-11-26 13:23:18 UTC
(In reply to comment #5)
> So, a conservative approach would be to always
> escape individual <ref> or </ref> tags which, in some cases, might lead to
> excess nowiki escaping, but will always be safe.
That seems like a sane approach to me.
Comment 7 Gerrit Notification Bot 2013-12-24 22:00:50 UTC
Change 103598 had a related patch set uploaded by Subramanya Sastry:
(Bug 57469) Fix for nowiki escaping of ext-tag like text

https://gerrit.wikimedia.org/r/103598
Comment 8 Gerrit Notification Bot 2014-01-02 19:33:05 UTC
Change 103598 merged by jenkins-bot:
(Bug 57469) Fix for nowiki escaping of ext-tag like text

https://gerrit.wikimedia.org/r/103598
Comment 9 ssastry 2014-01-03 17:15:26 UTC
Stray </ref> tags on these pages leads to dirty diffs (but acceptable, IMO) because of the fix:

* http://parsoid.wmflabs.org/_rt/enwiki/Ken_Curtis
* http://parsoid.wmflabs.org/_rt/enwiki/Koshi_Barrage
* http://parsoid.wmflabs.org/_rt/arwiki/%D8%AD%D8%A7%D8%B6%D9%86%D8%A7%D8%AA_%D8%A7%D9%84%D8%A3%D8%B9%D9%85%D8%A7%D9%84

and a bunch of other pages.

Adding this info here for the record in case we need to dig up pages later.
Comment 10 ssastry 2014-01-03 17:21:57 UTC
Looks like stray </ref> tags are not that uncommon .. A few tens of dirty diffs in rt-testing are all probably because of this.

Another source of nowiki-ed dirty diffs seem to be because of buggy ref tags. That is captured in bug 59598

Together, these two categories of buggy ref-wikitext account for ~50 dirty diffs in rt-testing (out of about 160K pages tested). Still a small number overall, and selser will hide most of this in production, but could show up once in a while, but nothing that warrants more complex code in Parsoid to hide them.
Comment 11 Gabriel Wicke 2014-01-03 17:49:19 UTC
(In reply to comment #10)
> Still a small number
> overall, and selser will hide most of this in production, but could show up
> once in a while, but nothing that warrants more complex code in Parsoid to
> hide
> them.

Agreed. These diffs are also classified as syntactical in our rt test setup, so should not distract too much from serious semantic diffs.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links