Last modified: 2014-11-19 02:04:23 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T48811, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 46811 - Fix parsing of infobox templates used in table attribute position
Fix parsing of infobox templates used in table attribute position
Status: NEW
Product: Parsoid
Classification: Unclassified
General (Other open bugs)
unspecified
All All
: Normal normal
: ---
Assigned To: Arlo Breault
http://parsoid-prod.wmflabs.org/testw...
:
: 52889 53139 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-04-02 19:57 UTC by ssastry
Modified: 2014-11-19 02:04 UTC (History)
6 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description ssastry 2013-04-02 19:57:43 UTC
Wikitext of this form:
---
{| {{Infobox aircraft begin
 .. 
}} ... 
|}
---

is used on several pages (ex: [[en:B757]]).

This essentially generates table attrs. and also table content and is currently not parsed properly by Parsoid.  The tokenizer adds the template to the table's attribute array, and it gets stuck there for good through the entire parse which means the entire tpl content is "lost".

There are a couple different ways of handling this:

(1) Fix AttributeExpander so that it hoists tokens seen after a NlTk out of the attribute, and after the token.  Inelegant, but might do the trick.

(2) Fix this template (and any other template using this technique) so that it is self-contained rather than generate pieces of the table.  At the very least, move "{|" inside the "Infobox aircraft begin" template.  Since it is not feasible to edit all pages that transclude the template, we might have to provide support for the extra "{|" table tag found outside the template.

Just recording all this information here for now, along with a test case below:
--------
{|{{Infobox aircraft begin
  |name= Boeing 757
  |image= File:Icelandair Boeing 757-256 Wedelstaedt.jpg<!-- Flight images are preferred for aircraft. Do not change image without a talk page discussion first, thanks. -->
  |caption= [[Icelandair]] Boeing 757-200 on final approach
  |alt=A mostly white Boeing 757 with blue and yellow trim preparing for landing against a blue sky. Landing gear and flaps are extended in final approach configuration.
}}{{Infobox aircraft type
  |type= [[Narrow-body aircraft|Narrow-body]] [[jet airliner]]
  |national origin= United States
  |manufacturer= [[Boeing Commercial Airplanes]]
  |designer=
  |first flight= February 19, 1982
  |introduction= January 1, 1983 with [[Eastern Air Lines]]
  |retired=
  |status= In service
  |primary user= [[Delta Air Lines]] <!--Limit one (1) primary user. Top 4 users listed in 'primary user' and 'more users' fields based on the number in their fleets. -->
  |more users= [[United Airlines]] <br>[[American Airlines]] <br>[[UPS Airlines]] <!-- Limit is three (3) in 'more users' field, four (4) total users with primary user. Please separate with <br/>.-->
  |produced= 1981–2004
  |number built= 1,050<ref name="last757built">{{cite web|url=http://www.boeing.com/news/releases/2004/q4/nr_041028g.html |title=Boeing Marks Completion of its 757 Commercial Airplane Program |date=October 28, 2004 |publisher=Boeing |accessdate=January 26, 2011}}</ref>
  |unit cost= 757-200: US$65&nbsp;million (2002) <br>757-300: US$80&nbsp;million (2002)
  |variants with their own articles= [[Boeing C-32]]
}}
|}
-------
Comment 1 Gabriel Wicke 2013-04-02 20:30:17 UTC
Variant 2) (extra table start tag inside the template) is supported by the PHP parser, but currently not by Parsoid. There might be other broken template transclusions that are currently broken in Parsoid for this reason, so we should probably add support for it in any case.
Comment 2 Andre Klapper 2013-07-04 10:35:05 UTC
[Parsoid component reorg by merging JS/General and General. See bug 50685 for more information. Filter bugmail on this comment. parsoidreorg20130704]
Comment 3 ssastry 2014-06-04 22:24:30 UTC
Smaller test case than the one in the Description.

{|{{Infobox Aircraft Begin
 |name=Mitsubishi 1MT
 |image=Mitsubishi 1MT.jpg
 |caption=
}}{{Infobox Aircraft Type
 |type=[[Triplane]] [[torpedo bomber]]
 |national origin=Japan
 |manufacturer=[[Mitsubishi]]
 |designer=Herbert Smith
 |first flight={{avyear|1922}}
 |introduced=
 |retired=
 |status=
 |primary user=[[Imperial Japanese Navy Air Service]]
 |more users=
 |produced=
 |number built=20
 |variants with their own articles=
}}
Comment 4 Arlo Breault 2014-06-09 17:31:46 UTC
*** Bug 53139 has been marked as a duplicate of this bug. ***
Comment 5 Arlo Breault 2014-06-09 17:34:24 UTC
*** Bug 52889 has been marked as a duplicate of this bug. ***
Comment 6 Gabriel Wicke 2014-06-09 19:52:34 UTC
I guess one hacky solution to this would be to add another 'in table attribute position' flag to the transclusion processing pipeline. We could then pre-pend '{|' to the returned transclusion source in order to make it parse correctly. The tricky bit is that we'll likely have to also strip the extra table token to avoid foster-parenting of the entire table.

These templates seem to be quite common and won't all be fixed up soon, so it might be worth adding such a hack.
Comment 7 ssastry 2014-08-30 03:53:57 UTC
The soln in Comment 6 could work but it has to be able to continue to deal with (currently supported) scenarios like:

{| style='color:red;' {{echo|class='test'}}\n|foo\n|}
{| {{echo|class='test'}} style='color:red;'\n|foo\n|}

So, in general, it has to handle the following scenario.

{| <0 or more attrs> {{tpl here}} <0 or more attrs>
..
|}

In addition, this has to also deal with:

{| <0 or more attrs> {{tpl-that-generates-more-attrs-and-content}}
..
|}

So, this has the potential to get complex / fragile. I think the alternative -- Variant (1) in the description -- might potentially work out simpler and more robust since it can handle all of these scenarios identically.

So, this is what the AttributeExpander sees in @ line 150 for newK.

["class=\"infobox \" style=\"float: right; clear: right; width: 315px; border-spacing: 2px; text-align: left; font-size: 90%;\"",{"type":"NlTk","dataAttribs":{}},{"type":"TagTk","name":"th","attribs": ... }]

Hoisting everything from NlTk after the table token could work. But, you still need to clean up after this fixup since the template expansion would have been wrapped in meta-tags. So, you would have to hoist the tpl-start meta-tag to a position before the table-start ...

So, instead of returning: cb( { tokens: [token] } ) .. you might fix this to return [meta-start, token-with-fixed-attrs, NlTk-and-all-other-tokens-found-in-attr]

Lot of hand-waving going on ... but, in either scenario, the template wrapping will need fixing up in some fashion.
Comment 8 Kelson [Emmanuel Engelhart] 2014-10-04 13:17:14 UTC
I think, this is an other occurence of this bug:
http://parsoid-lb.eqiad.wikimedia.org/idwiki/Kota_Batu?oldid=8180803
Comment 9 Jackmcbarn 2014-10-28 20:57:25 UTC
A really reduced test case is at https://test.wikipedia.org/wiki/46811
Comment 10 Gerrit Notification Bot 2014-11-17 15:55:46 UTC
Change 173834 had a related patch set uploaded by Subramanya Sastry:
WIP: Bug 46811: Quick test hack

https://gerrit.wikimedia.org/r/173834

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links