Last modified: 2014-10-09 15:42:44 UTC
(See https://en.wikipedia.org/wiki/User:Cscott/Sandbox#Leaky_LI). Parser test cases: !! test Leaky <li> (1) !! options parsoid=wt2html !! wikitext <ol> <li>a<small>b</li> <li>c</li> </ol> !! html/php+tidy <ol> <li>a<small>b</li></small></li> <li><small>c</small> <p><small></ol></small></p> </li> </ol> !! html/parsoid <ol> <li>a<small>b</small></li> <small> <li>c</li> </small></ol> !! end Not sure we want to emulate the </li> part of the above output, but it does seem odd that the <small> tag "escapes" and surrounds the <li>, rather than surrounding the *content* of the <li>. Similarly: !! test Leaky <li> (2) !! options parsoid=wt2html !! wikitext == Leaky LI == <li>A <li>B <small> C <li>D == Next Heading == x !! html/php+tidy <h2><span class="mw-headline" id="Leaky_LI">Leaky LI</span><span class="mw-editsection"><span class="mw-editsection-bracket">[</span><a href="/index.php?title=Parser_test&action=edit&section=1" title="Edit section: Leaky LI">edit</a><span class="mw-editsection-bracket">]</span></span></h2> <ul> <li>A</li> <li>B <small>C</small></li> <li><small>D</small></li> </ul> <h2><small><span class="mw-headline" id="Next_Heading">Next Heading</span><span class="mw-editsection"><span class="mw-editsection-bracket">[</span><a href="/index.php?title=Parser_test&action=edit&section=2" title="Edit section: Next Heading">edit</a><span class="mw-editsection-bracket">]</span></span></small></h2> <p>x</p> !! html/parsoid <h2>Leaky LI</h2> <li>A</li> <li>B <small> C</small></li> <li><small>D <h2>Next Heading</h2> <p>x</p> </small></li> !! end Here the <small> isn't around the <li> (that's an improvement, although a mysterious one). But the trailing <li> has swallowed up the <h2>, which surely isn't right. May be related to bug 71185.
Similarly: ( echo '<h2>bla<h2>blub'; echo 'text' ) | tests/parse.js --normalize <h2>bla</h2> <h2>blub <p>text</p> </h2> The PHP parser cleans this up with tidy, giving: <h2><span class="mw-headline" id="bla.3Ch2.3Eblub.0Atext">bla<h2>blub text</span></h2> ...which isn't pretty, but at least it doesn't try to jam a <p> into the <h2>.
Change 165749 had a related patch set uploaded by Cscott: WIP: Document differences in HTML fixup between tidy and Parsoid. https://gerrit.wikimedia.org/r/165749