Last modified: 2014-04-22 13:44:35 UTC
We have multiple html2wt regressions in the "Italics and bold: multiple quote sequences" tests, which were hidden because these tests were on the blacklist (the <nowiki> insertion previously made the html2wt tests fail). The failing tests, along with their wt2wt output (as "After") are: Italics and bold: multiple quote sequences: (3,4,2) Before: '''foo''''bar'' <b>foo'</b>bar<i></i> After: '''<nowiki>foo'</nowiki>'''bar'''' <b>foo'</b>bar'<b></b> Italics and bold: multiple quote sequences: (3,4,3) Before: '''foo''''bar''' <b>foo'</b>bar<b></b> After: '''<nowiki>foo'</nowiki>'''bar'''''' '<i>foo<b>bar'</b></i> In the process of fixing these bugs, we should probably also write "clean" versions of the tests which include the proper <nowiki> tags in the input and therefore round-trip correctly, to ensure that we are better protected against regressions in the future. These next tests are ok, but should have a clean version added. The "Before" version is what's in the test currently; after one round of --wt2wt it becomes the equivalent "Clean" version, which subsequently round-trips correctly: Italics and bold: multiple quote sequences: (2,4,2) Before: ''foo''''bar'' Clean: ''<nowiki>foo'</nowiki>'''bar''''' Italics and bold: multiple quote sequences: (2,4,3) Before: ''foo''''bar''' Clean: ''<nowiki>foo'</nowiki>'''bar''''' Italics and bold: multiple quote sequences: (2,4,4) Before: ''foo''''bar'''' Clean: ''<nowiki>foo'</nowiki>'''<nowiki>bar'</nowiki>'''''
Another legit bug: 5 quotes, code coverage +1 line Before: ''''' <b><i></i></b> After: '''''''''' '''''<b><i></i></b>
See also bug 64321, which is quote-related.
(In reply to C. Scott Ananian from comment #0) > We have multiple html2wt regressions in the "Italics and bold: multiple > quote sequences" tests, which were hidden because these tests were on the > blacklist (the <nowiki> insertion previously made the html2wt tests fail). > > The failing tests, along with their wt2wt output (as "After") are: > > Italics and bold: multiple quote sequences: (3,4,2) > Before: '''foo''''bar'' > <b>foo'</b>bar<i></i> > After: '''<nowiki>foo'</nowiki>'''bar'''' > <b>foo'</b>bar'<b></b> > > Italics and bold: multiple quote sequences: (3,4,3) > Before: '''foo''''bar''' > <b>foo'</b>bar<b></b> > After: '''<nowiki>foo'</nowiki>'''bar'''''' > '<i>foo<b>bar'</b></i> This is messy because of the context-sensitive nature of how quotes get parsed. Adding additional quotes for auto-closed italic tags changes the tokenization and hence parse. While we do have the auto-insertion flags in the HTML, we cannot use this unconditionally since edits could have changed the node. However, for empty nodes that have auto-inserted flags, we can check DSR widths to determine whether it is safe to use information from the flags. Will experiment.
Change 120802 had a related patch set uploaded by Subramanya Sastry: WTS: Skip auto-inserted tags for empty nodes with zero-range child dsr https://gerrit.wikimedia.org/r/120802
(In reply to C. Scott Ananian from comment #2) > See also bug 64321, which is quote-related. Looks like you provided the wrong bug id here.
Change 120802 merged by jenkins-bot: WTS: Skip auto-inserted tags for empty nodes with 0-wide subtree DSR https://gerrit.wikimedia.org/r/120802
The "five quotes" case is still broken. (See comment 1.) Should I open a new bug?
Opened bug 63119 for the unfixed case.
Reopening as the fix from https://gerrit.wikimedia.org/r/120802 is not considered safe. Test case simulating a wrapper removal like <table><td><b></td></table>foo -> <b>foo: $ cat /tmp/test.html <b data-parsoid='{"stx":"html","autoInsertedEnd":true,"dsr":[15,18,3,0]}'></b>foo $ cat /tmp/test.html | node parse --html2wt <b>foo
From partially bold to all bold, doesn't work as expected: is this a relevant example? https://it.wikipedia.org/w/index.php?title=Utente%3AElitre_%28WMF%29%2FSandbox_VE&diff=65498155&oldid=65498146