Last modified: 2013-07-03 05:33:17 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T52420, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 50420 - Wikitext list after implicit <td> not properly recognized
Wikitext list after implicit <td> not properly recognized
Status: RESOLVED FIXED
Product: Parsoid
Classification: Unclassified
serializer (Other open bugs)
unspecified
All All
: Normal normal
: ---
Assigned To: ssastry
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-06-29 13:24 UTC by Maggie Dennis
Modified: 2013-07-03 05:33 UTC (History)
6 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Maggie Dennis 2013-06-29 13:24:32 UTC
I did not add this to closed bug 50012 because while it may be related it's not exactly the same.

This edit to the text of the article added extraneous pipes throughout (http://en.wikipedia.org/w/index.php?title=List_of_male_kickboxers&diff=562103247&oldid=561869162) and also dropped some formatting on the end (|}|}|}|}|}|})

See http://en.wikipedia.org/wiki/Wikipedia:VisualEditor/Feedback#Unwanted_format_changes_throughout_article
Comment 1 Oliver Keyes 2013-06-29 15:09:42 UTC
I would consider this a blocker on Monday's deployment. Things not having support (browsers or features) I can deal with, but we cannot have this sort of corruption being introduced.
Comment 2 ssastry 2013-06-30 04:03:41 UTC
The problem is that this page had about 13 tables that were unclosed (missing |} wikitext).  Parsoid does handle a lot of corner cases, but at this time, unclosed tables (especially when there are lots of them) are harder for Parsoid to recover from.  But, we have some ideas for recovering from them in the future.  But, till such time, it is easier to fix the wikitext of the page like I did (https://en.wikipedia.org/w/index.php?title=List_of_male_kickboxers&diff=562162447&oldid=562113696)

However, there is definitely a bug there which is about adding "|" after the first table row for which I'll repurpose this bug for fixing in Parsoid.
Comment 3 ssastry 2013-06-30 04:08:51 UTC
https://en.wikipedia.org/w/index.php?title=List_of_male_kickboxers&diff=562167553&oldid=562162447 shows an instance where converting an implicit <td> into an explicit <td> on serialization is a semantic error.

Reduced test case:

{|
|-
*a
|}

This cannot be fixed by selser since on editing the list item, the <td> will go through regular serialization.

We should be able to safely use the autoInserted* flags from data-parsoid in this instance since VE does not have table-editing capability yet.
Comment 4 Gerrit Notification Bot 2013-07-01 03:53:47 UTC
Change 71314 had a related patch set uploaded by Subramanya Sastry:
(Bug 50420) Tweak selser to reuse td/th/tr wrappers from source

https://gerrit.wikimedia.org/r/71314
Comment 5 ssastry 2013-07-01 04:02:01 UTC
I was wrong.  This bug was fixed by tweaking selser as in the patch above since VE does not have table-editing capability yet (only table contents, not table structure).

There is an unrelated bug in the parser where leading list wikitext in a td-cell is parsed as plain text rather than as a list item.  So, the pipe problem is fixed by the above commit, but there will still be a dirty-diff (the first list item will be wrapped in nowiki on serialization).  That will be addressed separately.
Comment 6 Gerrit Notification Bot 2013-07-01 17:54:01 UTC
Change 71314 merged by jenkins-bot:
(Bug 50420) Tweaked selser to reuse td/th/tr wrappers from source

https://gerrit.wikimedia.org/r/71314
Comment 7 Gabriel Wicke 2013-07-01 17:57:54 UTC
Ok, renamed the bug to reflect the repurposing.
Comment 8 Gerrit Notification Bot 2013-07-03 00:05:00 UTC
Change 71744 had a related patch set uploaded by Subramanya Sastry:
WIP: (Bug 50420) Recognize sol-content in implicit <td> after a <tr>

https://gerrit.wikimedia.org/r/71744
Comment 9 Gerrit Notification Bot 2013-07-03 05:32:04 UTC
Change 71744 merged by jenkins-bot:
(Bug 50420) Recognize sol-content in implicit <td> after a <tr>

https://gerrit.wikimedia.org/r/71744

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links