Last modified: 2014-03-26 22:40:58 UTC
This is the unfixed part of bug 62569. Good: $ echo "'''''" | tests/parse.js --wt2wt ''''' Bad: $ echo "<p><b><i></i></b></p>" | tests/parse.js --html2wt '''''''''' $ echo "<p><b><i></i></b></p>" | tests/parse.js --html2html --normalize <body><p>'''''<b><i></i></b></p></body> This is the "5 quotes, code coverage +1 line" tests case in parserTests.txt. It seems like there's still a quote-handling bug in WTS if we don't have data-parsoid to guide us.
This seems like a bug in the front-end tokenizer, not the serializer (which serializes the html just fine). [subbu@earth lib] echo "<p><b><i></i></b></p>" | node parse --html2wt | node parse --trace peg-tokens trace/peg-tokens : TOKS: ["'''''",{"type":"SelfclosingTagTk","name":"mw-quote","attribs":[],"dataAttribs":{"tsr":[5,10]},"value":"'''''"}] The first 5 quotes are tokenized as a plain string rather than as a mw-quote token.
Another case: $ echo "''foo'''''" | tests/parse.js --normalize=parsoid <body><p><i>foo</i><b></b></p></body> $ echo "<p><i>foo</i><b></b></p>" | parse.js --html2html --normalize=parsoid <body><p><i>foo'''</i><b></b></p></body> This is the "Italics and bold: 2-quote opening sequence: (2,5+3)" test case.
I am not sure how much effort we should invest in preserving html2html for empty quote nodes as in these examples. But, that said, one way to fix "<b><i></i></b>" is to insert a <nowiki/> in the empty node to break the quote block. '''''<nowiki/>'''''. This will still not preserve html2html exactly, but it will preserve semantics.
In the case in comment 2: $ echo "<p><i>foo</i><b></b></p>" | tests/parse.js --html2wt ''foo'''''''' $ echo "''foo''''''''" | tests/parse.js --normalize <body><p><i>foo'''</i><b></b></p></body> $ echo "''foo''''''''" | php maintenance/parse.php <p><i>foo'''</i> </p> But: $ echo "''foo'''''<nowiki/>'''" | tests/parse.js --normalize <body><p><i>foo</i><b><meta/></b></p></body> $ echo "''foo'''''<nowiki/>'''" | php maintenance/parse.php <p><i>foo</i> </p> So it does seem like our WTS should insert the <nowiki/> node there to preserve the semantics of the HTML.
Change 121141 had a related patch set uploaded by Cscott: Fix WTS of empty quote nodes. https://gerrit.wikimedia.org/r/121141
Change 121141 merged by jenkins-bot: Fix WTS of empty quote nodes. https://gerrit.wikimedia.org/r/121141