Last modified: 2013-08-15 00:32:48 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T54488, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 52488 - Roundtripping issue leads to infobox deletion
Roundtripping issue leads to infobox deletion
Status: RESOLVED FIXED
Product: Parsoid
Classification: Unclassified
serializer (Other open bugs)
unspecified
All All
: Highest critical
: ---
Assigned To: Gabriel Wicke
:
Depends on: 52638
Blocks:
  Show dependency treegraph
 
Reported: 2013-08-02 23:52 UTC by Erik Moeller
Modified: 2013-08-15 00:32 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Erik Moeller 2013-08-02 23:52:49 UTC
Any edit to this rev
https://en.wikipedia.org/w/index.php?title=Raven-Symon%C3%A9&oldid=566906720&veaction=edit

leads to the infobox being removed.
Comment 1 ssastry 2013-08-03 04:20:33 UTC
This is because VE seems to have stripped data-mw from a category <link> element.

In this particular instance, because of fostering of a category link from the infobox table, the <link> happens to be the very first element in the transclusion HTML from the infobox and gets the entire roundripping information of the infobox.  However, if I understand correctly, these metadata tags are handled differently in VE which might be contributing to the lost of data-mw on these tags (and subsequently Parsoid never sees the infobox when it comes to serialization).

So, this can happen on any page where the infobox generates a table and the specific HTML output emits a category link outside a table-tag.  The specific place where category links are placed relative to table tags does not matter for the PHP parser since category link wikitext effectively disappear from the HTML from the place where they appeared (and hence there is nothing left to foster out of the table).  But, in the case of Parsoid, because of roundtripping requirements, the links generate an actual HTML element which (depending on where they occur in the HTML flow) can get fostered out and can (depending on how they interact with the surrounding context) can get roundtripping data-mw information.

So, if VE can preserve this information for now, that will fix this problem.

If there is a problem dealing with link tags and preserving data-mw for whatever reason, we should figure out a strategy of dealing with these tags.  Note that it does not matter what HTML category links translate to -- as long as a HTML element is generated, they are subject to fostering out of tables.

I will not outline other possible solutions for now -- we can discuss on IRC if we there is anything needed on the Parsoid end to support these scenarios.
Comment 2 ssastry 2013-08-03 04:25:47 UTC
As for "outside a table-tag": I really meant to say, outside a table-content tag, i.e. inside a table but outside <td>,<th>, or <caption> tags.  So, a <link> that ends up being a direct child of <table>,<tbody>,<th> tags will get moved out of the table.
Comment 3 Roan Kattouw 2013-08-08 05:17:20 UTC
(In reply to comment #1)
> This is because VE seems to have stripped data-mw from a category <link>
> element.
> 
That sounds scary. I'll investigate.

> In this particular instance, because of fostering of a category link from the
> infobox table, the <link> happens to be the very first element in the
> transclusion HTML from the infobox and gets the entire roundripping
> information
> of the infobox.  However, if I understand correctly, these metadata tags are
> handled differently in VE which might be contributing to the lost of data-mw
> on
> these tags (and subsequently Parsoid never sees the infobox when it comes to
> serialization).
> 
They're only handled differently if they occur in isolation. But mw:Transclusion takes precedence over other things, as does about-grouping. So if there's a <link> tag that's either about-grouped with other things or has mw:Transclusion set, it won't (shouldn't) be treated as metadata. I'll have to see what's going on here.

> So, this can happen on any page where the infobox generates a table and the
> specific HTML output emits a category link outside a table-tag.  The specific
> place where category links are placed relative to table tags does not matter
> for the PHP parser since category link wikitext effectively disappear from
> the
> HTML from the place where they appeared (and hence there is nothing left to
> foster out of the table).  But, in the case of Parsoid, because of
> roundtripping requirements, the links generate an actual HTML element which
> (depending on where they occur in the HTML flow) can get fostered out and can
> (depending on how they interact with the surrounding context) can get
> roundtripping data-mw information.
> 
I don't quite understand what kind of fostering behavior you're talking about here exactly, but I'll investigate the linked article and see.
Comment 4 Roan Kattouw 2013-08-08 05:27:03 UTC
I checked, and VE isn't dirtying the DOM. It's also not stripping the data-mw attribute from the first <link> tag on the page (I'm curious to see where you saw that behavior).

I think this is a selser bug. If I don't make any edits, the returned DOM is the same as what we received and the wikitext diff is empty. If I add a character to a paragraph, the DOM diff is only that paragraph, but the diff has that plus the removal of the infobox.
Comment 5 Roan Kattouw 2013-08-08 09:22:52 UTC
Confirmed this is a selser bug, see bug 52638.
Comment 6 ssastry 2013-08-08 14:52:33 UTC
(In reply to comment #4)
> I checked, and VE isn't dirtying the DOM. It's also not stripping the data-mw
> attribute from the first <link> tag on the page (I'm curious to see where you
> saw that behavior).

That is strange.  I did a minor edit, dumped the html from chrome and after doing a dom-diff, noticed the diff and searched for Infobox to verify and didn't find it.  I dont have the files with me anymore to confirm or if it was just the late-night debugging effect.  I am going to put it down to the latter for now.
 
> I think this is a selser bug. If I don't make any edits, the returned DOM is
> the same as what we received and the wikitext diff is empty. If I add a
> character to a paragraph, the DOM diff is only that paragraph, but the diff
> has that plus the removal of the infobox.

Thanks.  I will take a look at the other bug report you filed and investigate.
Comment 7 Gerrit Notification Bot 2013-08-08 16:06:32 UTC
Change 78242 had a related patch set uploaded by Subramanya Sastry:
(Bug 52638) Fix selser regression introduced by fix for bug 51217

https://gerrit.wikimedia.org/r/78242
Comment 8 Gerrit Notification Bot 2013-08-08 16:58:53 UTC
Change 78242 merged by jenkins-bot:
(Bug 52638) Fix selser regression introduced by fix for bug 51217

https://gerrit.wikimedia.org/r/78242
Comment 9 ssastry 2013-08-08 18:56:06 UTC
Will be fixed on next parsoid deploy (monday-wednesday next week?)
Comment 10 Gabriel Wicke 2013-08-15 00:32:48 UTC
This is deployed, but the cache is not yet purged. That should happen tomorrow after the next deploy. Until then the fix only applies to cache misses and pages re-rendered by template or image updates.

Resolving as fixed. Please verify tomorrow after the cache purge.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links