Last modified: 2014-01-02 20:21:20 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T59360, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 57360 - Fix parsing of "|}" on non-empty lines (table end tag should always be on a new line)
Fix parsing of "|}" on non-empty lines (table end tag should always be on a n...
Status: RESOLVED FIXED
Product: Parsoid
Classification: Unclassified
tokenizer (Other open bugs)
unspecified
All All
: High normal
: ---
Assigned To: Gabriel Wicke
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-11-21 17:41 UTC by ssastry
Modified: 2014-01-02 20:21 UTC (History)
1 user (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description ssastry 2013-11-21 17:41:56 UTC
PHP parser does not recognize "|}" as a table closing tag on a non-empty line (which is how we end up with a pages on WPs with stray trailing |} wikitext on some lines). However, Parsoid recognizes them as a valid closing tag which then causes us to spectacularly bomb on those pages (Parsoid tries to recover and fix up, etc. but which doesn't always work).

The right fix is to fix the tokenizer to require "|}" to be on a new line (leading whitespace and and other sol-transparent text should be fine).
Comment 1 ssastry 2013-11-26 16:12:54 UTC
This will also require fixing the Parsoid serializer to emit "|}" on new lines.
Comment 2 Gerrit Notification Bot 2013-12-24 19:18:19 UTC
Change 103572 had a related patch set uploaded by Subramanya Sastry:
(Bug 57360) Fix parser/serializer to accept/emit "|}" in SOL posns

https://gerrit.wikimedia.org/r/103572
Comment 3 Gerrit Notification Bot 2014-01-02 17:52:57 UTC
Change 103572 merged by jenkins-bot:
(Bug 57360) Fix serializer to emit "|}" in SOL posn

https://gerrit.wikimedia.org/r/103572
Comment 4 ssastry 2014-01-02 19:38:34 UTC
Followup patch coming from gwicke.

<gwicke> Re the {| |} issue, I re-did my grep search with a better regexp and am now finding quite a few matches that look like {| <some attributes |}
<gwicke> the PHP parser strips the end tag in those cases, so maybe we should just strip it too?
<gwicke> {| class="wikitable"|} is a construct I see repeatedly
<gwicke> also {| class="wikitable"|}" style="text-align:center"
<gwicke> would  be interesting to see where that was all copy & pasted from ;)
<gwicke> {|border=1 align=left cellpadding=0 cellspacing=0 style="width: 48%" {{Election city polls FPTP begin|locale = town| title=[[Canadian federal election, 2006]]<br>Hudson's Hope polls in Prince George—Peace River<ref name=06fed/>}}|}
<gwicke> just dropping the end tag token should be good enough I think
<gwicke> and accepting it anywhere in the attribute sequence
<gwicke> can write a patch for that
Comment 5 Gerrit Notification Bot 2014-01-02 20:13:11 UTC
Change 105019 had a related patch set uploaded by GWicke:
Bug 57360: Eat stray table end tags in table start tag attributes

https://gerrit.wikimedia.org/r/105019
Comment 6 Gerrit Notification Bot 2014-01-02 20:20:03 UTC
Change 105019 merged by jenkins-bot:
Bug 57360: Eat stray table end tags in table start tag attributes

https://gerrit.wikimedia.org/r/105019

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links