Last modified: 2014-09-26 20:08:10 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T73185, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 71185 - DOMFragment unpacking doesn't generate well-formed DOMs always.
DOMFragment unpacking doesn't generate well-formed DOMs always.
Status: NEW
Product: Parsoid
Classification: Unclassified
DOM (Other open bugs)
unspecified
All All
: Low normal
: ---
Assigned To: Parsoid Team
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-09-23 18:20 UTC by C. Scott Ananian
Modified: 2014-09-26 20:08 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description C. Scott Ananian 2014-09-23 18:20:29 UTC
http://parsoid.wmflabs.org/dewiki/Englische_Sprache?oldid=134283860 contains <li> tags outside any containing <ul> in the "Länder der Welt, in denen Englisch gesprochen wird" figure.


This also causes a visual diff, presumably as a result of the missing container tag.

From IRC:
subbu: surprised that the html tree builder didn't fix it.
cscott-free: yeah, me too.  but maybe it's peculiar to <figure> parsing somehow.
Comment 1 ssastry 2014-09-23 18:29:21 UTC
This seems to be happening because the dom-fragment for the caption is <li>..</li><li>..</li> and when the dom-fragment is unwrapped and inserted into the parent DOM, the the <li>s aren't fixed up. This is a problem with our dom-fragment unpacker which uses some heuristics to make sure the parent dom is well-formed.

Reproducible with:

echo "[[Image:Foobar.jpg|right|this is a caption {{echo|<li>foo</li>}}]]" | node parse
Comment 2 ssastry 2014-09-26 20:08:10 UTC
I was mistaken. Looks like the parsing spec doesn't dictate that bare <li> nodes be fixed up to be enclosed in ul/ol nodes. So, while Tidy does fix up these uses in the PHP parser scenario, we can deprecate such uses for now and fix up source wikitext where possible.

If this is deemed to be a problem, we can probably handle this as part of a generic "content-model-fixup" pass that takes care of these and other issues. But for now, this is going to be a lower-priority issue to tackle, and we can continue to fixup individual instances of problematic wikitext wherever it shows up.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links