Last modified: 2013-08-27 14:14:00 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T53217, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 51217 - Implement way to round-trip tables with fosterable content that migrates out of the table
Implement way to round-trip tables with fosterable content that migrates out ...
Status: RESOLVED FIXED
Product: Parsoid
Classification: Unclassified
DOM (Other open bugs)
unspecified
All All
: Normal normal
: ---
Assigned To: ssastry
:
: 50754 51964 51993 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-07-12 04:35 UTC by ssastry
Modified: 2013-08-27 14:14 UTC (History)
8 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description ssastry 2013-07-12 04:35:19 UTC
Simple example:

{|
foo
|}

This doesn't RT correctly. 

Real world test case: [[All-Ireland_Senior_Camogie_Championship_1992]]
Related report: https://en.wikipedia.org/wiki/Wikipedia:VisualEditor/Feedback#A_simple_table_gets_doubled_after_an_unrelated_edit_of_plain_text

Interestingly enough, we have more robust handling for templated content.  The example below is handled properly and RTs correctly.

{|
{{echo|foo}}
|}
Comment 1 ssastry 2013-07-12 04:43:35 UTC
Jotting down an idea:

https://gerrit.wikimedia.org/r/#/c/73369/ is going to add tag-ids to dom elements.  Since they are added linearly, it will always be the case that tag-id(node) < tag-id(node.nextSibling) and tag-id(node) < tag-id(node.child) except when a child is fostered out. 

So, we could actually use these tag ids to robustly detect fostered content and add some encapsulation typeof and about ids on the content and the table so that VE as well as Parsoid's serializer can deal with it.
Comment 2 Gabriel Wicke 2013-07-12 05:22:33 UTC
(In reply to comment #1)
> Jotting down an idea:
> 
> https://gerrit.wikimedia.org/r/#/c/73369/ is going to add tag-ids to dom
> elements.  Since they are added linearly, it will always be the case that
> tag-id(node) < tag-id(node.nextSibling) and tag-id(node) < tag-id(node.child)
> except when a child is fostered out. 

http://www.mediawiki.org/w/index.php?title=Parsoid/Todo&oldid=554715#DOM_tree_builder is making a cameo appearance ;)

After detection we'll have to figure out a way to make this work with selser though. Simply enlarging the start tag width to include the fostered content won't work. Fostered content can also come from several places in the table. The best is probably to prevent selser from serializing any of the fostered content, but serialize all of it when a part of it is edited.

Generally we should not try to do more than avoiding dirty diffs. On edit to the fostered content (or even the table in general) we should take the liberty to fix the wikitext to more closely mirror the way that content actually renders.
Comment 3 ssastry 2013-07-17 19:22:50 UTC
Looks like this corruption is because content gets fostered out of the table on
load in VE.  One more instance of fosterable content that needs to be handled.

https://en.wikipedia.org/w/index.php?title=List_of_Jessie_episodes&diff=564684769&oldid=564554799
Comment 4 Carl Fürstenberg 2013-07-17 19:25:29 UTC
See http://youtu.be/QYgkdnQ6Yng for realtime feeling about the issue
Comment 5 Derk-Jan Hartman 2013-07-17 22:57:07 UTC
Why not just disable VE on pages like these for now. Edit page, parsoid render, "VisualEditor encountered a structure that it currently cannot handle. You will be redirected to the source editor."

Breaking pages == bad.
Comment 6 ssastry 2013-07-18 22:00:23 UTC
On the Parsoid end, detecting that the page cannot be handle is equivalent to fixing it.  Same analysis work has to be done.  On the VE end, perhaps they could use the 've-needscheck' verification to detect something like this -- that is something that VE developers can respond to best.  Adding Roan to this ticket who might be better able to answer that part.
Comment 7 Gabriel Wicke 2013-07-18 22:06:40 UTC
The problem on the VE end is that foster-parenting happens already when the browser parses the received HTML. Doing a simple string comparison with the received HTML won't normally work because of quoting differences.
Comment 8 Roan Kattouw 2013-07-18 23:42:49 UTC
(In reply to comment #7)
> The problem on the VE end is that foster-parenting happens already when the
> browser parses the received HTML. Doing a simple string comparison with the
> received HTML won't normally work because of quoting differences.
This is right. We have no reasonable way to tell whether the browser's HTML parser introduced changes, because inferring anything about the HTML string requires using the browser's HTML parser which... well you get the idea.

On the Parsoid end, this is easier to detect, because you have the original DOM. You could serialize that DOM to a string, then parse it into a DOM again, and compare that against the original DOM using .isEqualNode() (if that exists in your DOM library; alternatively, use your DOM differ).
Comment 9 ssastry 2013-07-19 20:40:16 UTC
So, I investigated the bug with this page (https://en.wikipedia.org/w/index.php?title=List_of_Jessie_episodes&diff=564684769&oldid=564554799) (comment #3 above).  

This is quite nasty and nothing to do with foster parenting as I mistakenly/hastily believed.  The page has since been edited to remove the problematic wikitext (https://en.wikipedia.org/w/index.php?title=List_of_Jessie_episodes&diff=564803163&oldid=564788048) so the problem will no longer manifest there.

However, here is a snippet from 

{{Episode list
 |EpisodeNumber   = 1
...
...
 |ProdCode        = 101<ref name="futon" />
 |Viewers         = ...
...
...
|}

Note the input to the 'ProdCode' param.  Here is a snippet for the template source of Episode list (https://en.wikipedia.org/w/index.php?title=Template:Episode_list&action=edit)

... <td id="pc{{{ProdCode}}}">{{{ProdCode}}}</td> ...

See what that will do with the parameter passed in from the page.  Here is the expanded source that Parsoid sees:

... <td id="pc101<ref name="futon" />">101<ref name="futon" /></td> ...

So, that td-tag is totally broken and Parsoid renders that as literal text (rather than a html td tag) which then promptly gets fostered out (by Parsoid itself) which then adds noise to the HTML output which is what you see when you opened that page in VE (see video above added by AzaToth). 

The editor Geraldo Perez has already fixed the page as I indicated earlier, so this is not a problem anymore.

But, I dont think we should even bother trying to handle broken wikitext like this.  We do go to great lengths to fixup/handler broken wikitext while still preserving it as far as possible to not introduce dirty diffs, but dirty diffs are sometimes inevitable in situations like this.  So for now, I am going to classify this example as unfixable in Parsoid, unless someone has smart ideas.
Comment 10 Gabriel Wicke 2013-07-19 22:06:53 UTC
From IRC:
[14:44] <gwicke> subbu, re https://bugzilla.wikimedia.org/show_bug.cgi?id=51217#c9: we should be able to detect when something was fostered out by our treebuilder, and should also be able to ensure that everything fosterable is dropped or fostered at the end of DOMPostProcessing
[14:45] <gwicke> yeah, I wonder what he is up to
[14:45] <subbu> in that example, what do we do with the information at all .. itis not helpful since the output still looks corrupt.
[14:45] <gwicke> subbu: IMO it is fine to drop such content on edit, but if we can avoid dirty diffs with selser then that would be great
[14:46] <gwicke> we could strip the fostered content from the DOM
[14:46] <subbu> is it the right thing to do in general?
[14:46] <gwicke> and rig selser so that it restores it unless the content is touched 
[14:47] <gwicke> or we can keep it in the DOM, but mark it so that selser does not serialize it
[14:47] <gwicke> ie, a zero-length dsr
[14:48] <subbu> yes, the rig-selser bit is missing right now.
[14:48] <gwicke> subbu: detecting foster-parented content and then excluding it from dsr calculations might already be pretty effective
[14:49] <gwicke> I guess the gaps in dsr would be spanned by the dsr algorithm
[14:49] <subbu> yes, that should work.
[14:49] <gwicke> (the holes where content disappeared)
[14:50] <subbu> i was wondering if we should mark blocks of fostered content with a special div/marker so that VE can treat it specially if there is a use-case for it (not sure there is ..)
[14:50] <gwicke> subbu: I'd vote for data-parsoid
[14:50] <subbu> data-parsoid is fine for internal use.
[14:50] <gwicke> if the user edits the fostered content, we serialize it
[14:50] <gwicke> i.e., do the dirty diff then
[14:51] <subbu> but, i mean: if VE can do something special if it knew about it.  not sure what that would be .. i doubt this is a high priority thing in any case.
[14:51] <gwicke> when something adjacent to the former location is edited, we dirty-diff it there
[14:51] <gwicke> subbu: IMO we should abstract such issues
[14:52] <gwicke> bot authors would not want to deal with it either
[14:52] <gwicke> and the DOM should reflect the rendered semantics 
[14:53] <subbu> dom should reflect rendered semantics for sure.
Comment 11 John Mark Vandenberg 2013-07-20 07:41:49 UTC
This looks very similar to bug 50754.
Comment 12 ssastry 2013-07-20 14:14:58 UTC
*** Bug 50754 has been marked as a duplicate of this bug. ***
Comment 13 Gerrit Notification Bot 2013-07-22 21:45:06 UTC
Change 75253 had a related patch set uploaded by Subramanya Sastry:
WIP: (Bug 51217) Detect fostered content and ignore in selser

https://gerrit.wikimedia.org/r/75253
Comment 14 ssastry 2013-07-24 14:58:47 UTC
*** Bug 51964 has been marked as a duplicate of this bug. ***
Comment 15 ssastry 2013-07-24 21:37:03 UTC
*** Bug 51993 has been marked as a duplicate of this bug. ***
Comment 16 Gerrit Notification Bot 2013-07-27 12:36:34 UTC
Change 75253 merged by jenkins-bot:
(Bug 51217) Detect fostered content and ignore in selser

https://gerrit.wikimedia.org/r/75253
Comment 17 ssastry 2013-07-28 04:31:41 UTC
Currently undergoing rt-testing and will be deployed Monday/Tuesday.
Comment 18 ssastry 2013-07-28 04:32:51 UTC
Another related patch: https://gerrit.wikimedia.org/r/#/c/76324/
Comment 19 ssastry 2013-07-31 21:31:15 UTC
Now deployed in production.  I tested dummy test cases and it seems to work.  Would be good to keep an eye out for this in production.
Comment 20 Gerrit Notification Bot 2013-08-08 16:06:40 UTC
Change 78242 had a related patch set uploaded by Subramanya Sastry:
(Bug 52638) Fix selser regression introduced by fix for bug 51217

https://gerrit.wikimedia.org/r/78242
Comment 21 Gerrit Notification Bot 2013-08-08 16:59:00 UTC
Change 78242 merged by jenkins-bot:
(Bug 52638) Fix selser regression introduced by fix for bug 51217

https://gerrit.wikimedia.org/r/78242
Comment 22 Gabriel Wicke 2013-08-15 00:33:57 UTC
Fix deployed, but cache is not yet purged. The purge will happen tomorrow.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links