Last modified: 2014-03-26 22:40:58 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T65119, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 63119 - WTS: 5 quotes
WTS: 5 quotes
Status: RESOLVED FIXED
Product: Parsoid
Classification: Unclassified
General (Other open bugs)
unspecified
All All
: Unprioritized normal
: ---
Assigned To: C. Scott Ananian
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-03-26 17:46 UTC by C. Scott Ananian
Modified: 2014-03-26 22:40 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description C. Scott Ananian 2014-03-26 17:46:45 UTC
This is the unfixed part of bug 62569.

Good:
$ echo "'''''" | tests/parse.js --wt2wt
'''''

Bad:
$ echo "<p><b><i></i></b></p>" | tests/parse.js --html2wt
''''''''''
$ echo "<p><b><i></i></b></p>" | tests/parse.js --html2html --normalize
<body><p>'''''<b><i></i></b></p></body>

This is the "5 quotes, code coverage +1 line" tests case in parserTests.txt.

It seems like there's still a quote-handling bug in WTS if we don't have data-parsoid to guide us.
Comment 1 ssastry 2014-03-26 17:59:26 UTC
This seems like a bug in the front-end tokenizer, not the serializer (which serializes the html just fine).

[subbu@earth lib] echo "<p><b><i></i></b></p>" | node parse --html2wt | node parse --trace peg-tokens
trace/peg-tokens              : TOKS:  ["'''''",{"type":"SelfclosingTagTk","name":"mw-quote","attribs":[],"dataAttribs":{"tsr":[5,10]},"value":"'''''"}]

The first 5 quotes are tokenized as a plain string rather than as a mw-quote token.
Comment 2 C. Scott Ananian 2014-03-26 18:22:54 UTC
Another case:
$ echo "''foo'''''" | tests/parse.js --normalize=parsoid
<body><p><i>foo</i><b></b></p></body>
$ echo "<p><i>foo</i><b></b></p>" | parse.js --html2html --normalize=parsoid
<body><p><i>foo'''</i><b></b></p></body>

This is the "Italics and bold: 2-quote opening sequence: (2,5+3)" test case.
Comment 3 ssastry 2014-03-26 18:31:11 UTC
I am not sure how much effort we should invest in preserving html2html for empty quote nodes as in these examples.

But, that said, one way to fix "<b><i></i></b>" is to insert a <nowiki/> in the empty node to break the quote block. '''''<nowiki/>'''''. This will still not preserve html2html exactly, but it will preserve semantics.
Comment 4 C. Scott Ananian 2014-03-26 19:20:52 UTC
In the case in comment 2:
$ echo "<p><i>foo</i><b></b></p>" | tests/parse.js --html2wt
''foo''''''''
$ echo "''foo''''''''" | tests/parse.js  --normalize
<body><p><i>foo'''</i><b></b></p></body>
$ echo "''foo''''''''" | php maintenance/parse.php 
<p><i>foo'''</i>
</p>

But:
$ echo "''foo'''''<nowiki/>'''" | tests/parse.js  --normalize
<body><p><i>foo</i><b><meta/></b></p></body>
$ echo "''foo'''''<nowiki/>'''" | php maintenance/parse.php 
<p><i>foo</i>
</p>

So it does seem like our WTS should insert the <nowiki/> node there to preserve the semantics of the HTML.
Comment 5 Gerrit Notification Bot 2014-03-26 20:15:37 UTC
Change 121141 had a related patch set uploaded by Cscott:
Fix WTS of empty quote nodes.

https://gerrit.wikimedia.org/r/121141
Comment 6 Gerrit Notification Bot 2014-03-26 21:02:17 UTC
Change 121141 merged by jenkins-bot:
Fix WTS of empty quote nodes.

https://gerrit.wikimedia.org/r/121141

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links