Last modified: 2014-02-12 18:34:32 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T55464, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 53464 - Parsoid changes </tr> to </tr in HTML tables with </tr> but no <tr>
Parsoid changes </tr> to </tr in HTML tables with </tr> but no <tr>
Status: RESOLVED FIXED
Product: Parsoid
Classification: Unclassified
General (Other open bugs)
unspecified
All All
: Normal minor
: ---
Assigned To: ssastry
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-08-28 10:43 UTC by Chris McKenna
Modified: 2014-02-12 18:34 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Chris McKenna 2013-08-28 10:43:52 UTC
When the wikitext contains an HTML table that includes </tr> tags but no corresponding <tr> tags, the table renders fine in viewing mode and in VE. 
e.g.
<table>
<tr>
<th>H1</th>
<th>H2</th>
<th>H3</th>
</tr>
<td>[[Fish]]</td>
<td>[[Cake]]</td>
<td>[[Scone]]</td>
</tr>
<td>[[Goldfish]]</td>
<td>[[Golden syrup]]</td>
<td>[[Cream]]</td>
</tr>
</table>
Renders as intended. 

However, when any edit to the table is made, Parsoid (presumably) strips the < from all the </tr> tags. 
Sandbox: https://en.wikipedia.org/w/index.php?title=User:Thryduulf/sandbox&diff=570513680&oldid=570513611
Reported dif: https://en.wikipedia.org/w/index.php?title=2005%E2%80%9306_Polish_Basketball_League&diff=570501618&oldid=569248952

This isn't something that Parsoid can expect to encounter frequently, but it shouldn't make a mess of it when it does.
Comment 1 ssastry 2013-08-28 15:50:22 UTC
Unrelated thing we could also fix.  the "|-" could instead be serialized as <tr>..</tr> based on HTML syntax rest of the table.

[subbu@earth tests] node parse --wt2wt --fetchConfig false < /tmp/tbl
<table>
<tr>
<th>H1</th>
<th>H2</th>
<th>H3</th>
</tr>
|-<td>[[Fish]]</td>
<td>[[Cake]]</td>
<td>[[Scone]]</td>

|-<td>[[Goldfish]]</td>
<td>[[Golden syrup]]</td>
<td>[[Cream]]</td>

</table>
Comment 2 Gabriel Wicke 2013-08-28 15:51:42 UTC
This sounds a lot like a slightly wrong DSR.
Comment 3 ssastry 2013-08-28 22:45:01 UTC
There are two different pieces of this:

1. With the missing HTML <tr> tag, Parsoid incorrectly assumes that the missing <tr> tag inserted by the tree builder comes from wikitext (which can be seen by the |- output in the non-selser serialization above). This confusion is the reason for the off-by-1 DSR error.  I can fix that in the parser (and have a fix for it) with an additional check to determine whether a <tr> comes from wikitext.

2. However, this also needs a fix in the serializer. Where the content of the <tr> is unchanged, selser accurated spits back the original wikitext. However, the serializer also needs a fix so that it attempts HTML-tag serialization instead of wikitext-tag serialization on the <tr>. That will fix the second issue.

As for 1. the problem is in migrateTrailingNLs. I fixed it, but I think we added this dom pass to deal with some issues in the WTS during refactoring to eliminate spurious diffs around trailing newlines and separators. I am tempted to simply rip out that pass and see what happens (which as a side effect, also fixes part 1. of this bug). For now, maybe I will push a fix for both, but I think we should consider getting rid of that pass altogether, if possible since it may not be necessary anymore.
Comment 4 ssastry 2013-08-28 23:05:47 UTC
You know, I can fix this, but it will be kind of hacky ... I am going to push my changes into a local branch and revisit this after I look at removing the migrateTrailingNLs handler. Deleting code is better than adding code. :-) This is not a critical bug anyway.
Comment 5 Gabriel Wicke 2013-08-28 23:30:49 UTC
I believe we added migrateTrailingNLs at some point in the past to accomodate missing trailing WS handling in VE. We tried to keep all trailing whitespace outside of paragraphs etc. Afaik VE can handle that now, so there should not be a need for the migration any more.
Comment 6 Gerrit Notification Bot 2013-08-28 23:41:21 UTC
Change 81600 had a related patch set uploaded by Subramanya Sastry:
(Bug 53464): Improved detection of missing opening HTML tags

https://gerrit.wikimedia.org/r/81600
Comment 7 ssastry 2013-08-28 23:42:25 UTC
Never mind. I found a simple and easy non-hacky fix which might apply in other situations as well and fixes all issues reported in this bug. I'll look at getting rid of migrateTrailingNLs separately tomorrow/next week.
Comment 8 Gerrit Notification Bot 2013-08-29 00:35:12 UTC
Change 81600 merged by jenkins-bot:
(Bug 53464): Improved detection of missing opening HTML tags

https://gerrit.wikimedia.org/r/81600
Comment 9 Andre Klapper 2014-02-12 15:55:44 UTC
Subbu / Gabriel: Patch was merged month ago - is there more work left here, or can you close this ticket as RESOLVED FIXED?

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links