Last modified: 2014-04-22 13:44:35 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T64569, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 62569 - Italics and bold: multiple quote sequences
Italics and bold: multiple quote sequences
Status: REOPENED
Product: Parsoid
Classification: Unclassified
General (Other open bugs)
unspecified
All All
: Unprioritized normal
: ---
Assigned To: ssastry
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-03-12 17:18 UTC by C. Scott Ananian
Modified: 2014-04-22 13:44 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description C. Scott Ananian 2014-03-12 17:18:51 UTC
We have multiple html2wt regressions in the "Italics and bold: multiple quote sequences" tests, which were hidden because these tests were on the blacklist (the <nowiki> insertion previously made the html2wt tests fail).

The failing tests, along with their wt2wt output (as "After") are:

Italics and bold: multiple quote sequences: (3,4,2)
Before: '''foo''''bar''
        <b>foo'</b>bar<i></i>
After:  '''<nowiki>foo'</nowiki>'''bar''''
        <b>foo'</b>bar'<b></b>

Italics and bold: multiple quote sequences: (3,4,3)
Before: '''foo''''bar'''
        <b>foo'</b>bar<b></b>
After:  '''<nowiki>foo'</nowiki>'''bar''''''
        '<i>foo<b>bar'</b></i>

In the process of fixing these bugs, we should probably also write "clean" versions of the tests which include the proper <nowiki> tags in the input and therefore round-trip correctly, to ensure that we are better protected against regressions in the future.

These next tests are ok, but should have a clean version added.  The "Before" version is what's in the test currently; after one round of --wt2wt it becomes the equivalent "Clean" version, which subsequently round-trips correctly:

Italics and bold: multiple quote sequences: (2,4,2)
Before: ''foo''''bar''
Clean:  ''<nowiki>foo'</nowiki>'''bar'''''

Italics and bold: multiple quote sequences: (2,4,3)
Before: ''foo''''bar'''
Clean:  ''<nowiki>foo'</nowiki>'''bar'''''

Italics and bold: multiple quote sequences: (2,4,4)
Before: ''foo''''bar''''
Clean:  ''<nowiki>foo'</nowiki>'''<nowiki>bar'</nowiki>'''''
Comment 1 C. Scott Ananian 2014-03-12 20:26:42 UTC
Another legit bug:

5 quotes, code coverage +1 line
Before: '''''
        <b><i></i></b>
After:  ''''''''''
        '''''<b><i></i></b>
Comment 2 C. Scott Ananian 2014-03-14 21:00:32 UTC
See also bug 64321, which is quote-related.
Comment 3 ssastry 2014-03-24 23:05:15 UTC
(In reply to C. Scott Ananian from comment #0)
> We have multiple html2wt regressions in the "Italics and bold: multiple
> quote sequences" tests, which were hidden because these tests were on the
> blacklist (the <nowiki> insertion previously made the html2wt tests fail).
> 
> The failing tests, along with their wt2wt output (as "After") are:
> 
> Italics and bold: multiple quote sequences: (3,4,2)
> Before: '''foo''''bar''
>         <b>foo'</b>bar<i></i>
> After:  '''<nowiki>foo'</nowiki>'''bar''''
>         <b>foo'</b>bar'<b></b>
> 
> Italics and bold: multiple quote sequences: (3,4,3)
> Before: '''foo''''bar'''
>         <b>foo'</b>bar<b></b>
> After:  '''<nowiki>foo'</nowiki>'''bar''''''
>         '<i>foo<b>bar'</b></i>

This is messy because of the context-sensitive nature of how quotes get parsed. Adding additional quotes for auto-closed italic tags changes the tokenization and hence parse. While we do have the auto-insertion flags in the HTML, we cannot use this unconditionally since edits could have changed the node.

However, for empty nodes that have auto-inserted flags, we can check DSR widths to determine whether it is safe to use information from the flags. Will experiment.
Comment 4 Gerrit Notification Bot 2014-03-25 14:52:40 UTC
Change 120802 had a related patch set uploaded by Subramanya Sastry:
WTS: Skip auto-inserted tags for empty nodes with zero-range child dsr

https://gerrit.wikimedia.org/r/120802
Comment 5 ssastry 2014-03-25 21:59:10 UTC
(In reply to C. Scott Ananian from comment #2)
> See also bug 64321, which is quote-related.

Looks like you provided the wrong bug id here.
Comment 6 Gerrit Notification Bot 2014-03-26 12:26:46 UTC
Change 120802 merged by jenkins-bot:
WTS: Skip auto-inserted tags for empty nodes with 0-wide subtree DSR

https://gerrit.wikimedia.org/r/120802
Comment 7 C. Scott Ananian 2014-03-26 17:42:09 UTC
The "five quotes" case is still broken. (See comment 1.) Should I open a new bug?
Comment 8 C. Scott Ananian 2014-03-26 17:47:34 UTC
Opened bug 63119 for the unfixed case.
Comment 9 Gabriel Wicke 2014-03-31 22:02:43 UTC
Reopening as the fix from https://gerrit.wikimedia.org/r/120802 is not considered safe.

Test case simulating a wrapper removal like <table><td><b></td></table>foo -> <b>foo:

$ cat /tmp/test.html 
<b data-parsoid='{"stx":"html","autoInsertedEnd":true,"dsr":[15,18,3,0]}'></b>foo
$ cat /tmp/test.html | node parse --html2wt
<b>foo
Comment 10 Elitre 2014-04-22 13:44:35 UTC
From partially bold to all bold, doesn't work as expected:
is this a relevant example? https://it.wikipedia.org/w/index.php?title=Utente%3AElitre_%28WMF%29%2FSandbox_VE&diff=65498155&oldid=65498146

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links