Last modified: 2013-12-13 19:55:40 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T54760, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 52760 - Close tags are stripped
Close tags are stripped
Status: RESOLVED WONTFIX
Product: Parsoid
Classification: Unclassified
General (Other open bugs)
unspecified
All All
: Low normal
: ---
Assigned To: Gabriel Wicke
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-08-12 18:19 UTC by C. Scott Ananian
Modified: 2013-12-13 19:55 UTC (History)
1 user (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description C. Scott Ananian 2013-08-12 18:19:23 UTC
$ echo '</b>' | php maintenance/parse.php 
<p>&lt;/b&gt;
</p>
$ echo '</b>' | tests/parse.js 
<body data-parsoid="{}"><p data-parsoid='{"dsr":[0,4,0,0]}'></p>
</body>

The lonely close tag is stripped by Parsoid, but it is sanitized (treated as literal non-markup text) by the PHP parser.
Comment 1 ssastry 2013-08-12 18:42:19 UTC
[subbu@earth tests] echo "</b>" | node parse --editMode false
<body data-parsoid="{}"><p data-parsoid='{"dsr":[0,4,0,0]}'><meta typeof="mw:Placeholder/StrippedTag" data-parsoid='{"src":"</b>","name":"B","dsr":[0,4,null,null]}'></p>
</body>

We just need to find a way of converting these non-editmode stripped tags into plain text in certain situations.
Comment 2 C. Scott Ananian 2013-08-12 18:54:55 UTC
(02:15:50 PM) subbu: that is because the tree builder removes it.
(02:16:06 PM) subbu: and we recognize the stripping with a dom analysis and add that meta-tag.
(02:16:23 PM) subbu: but we could instead add a text-version of the stripped tag in some cases like this.
Comment 3 Gerrit Notification Bot 2013-08-12 18:55:52 UTC
Change 78842 had a related patch set uploaded by Cscott:
Improve parser test for bug 52760 (close tags are being stripped).

https://gerrit.wikimedia.org/r/78842
Comment 4 Gabriel Wicke 2013-08-13 17:54:56 UTC
(In reply to comment #1)
> [subbu@earth tests] echo "</b>" | node parse --editMode false
> <body data-parsoid="{}"><p data-parsoid='{"dsr":[0,4,0,0]}'><meta
> typeof="mw:Placeholder/StrippedTag"
> data-parsoid='{"src":"</b>","name":"B","dsr":[0,4,null,null]}'></p>
> </body>
> 
> We just need to find a way of converting these non-editmode stripped tags
> into
> plain text in certain situations.

In my testing that is not what the PHP parser & tidy are doing, so this would be a change of content semantics.

Cleaning up stray close tags when nearby content is edited is a good thing in my opinion. Selective serialization ensures that end tags in unmodified parts of the page are preserved to avoid dirty diffs. Simply re-inserting stray end tags based on StrippedTag info is not safe in the presence of editing, and making it safe would add a lot of complexity for little gain.

For these reasons I am closing this as WONTFIX. Please reopen this bug if there are cases where the PHP parser renders stray end tags as text, but we don't.
Comment 5 C. Scott Ananian 2013-08-13 20:13:27 UTC
Reopening.  See the bug description for an example, as well as https://gerrit.wikimedia.org/r/78842
Comment 6 ssastry 2013-08-13 20:17:03 UTC
Interesting ... so, tidy bites us again?  http://en.wikipedia.org/wiki/User:Ssastry/bug_52760 says that gwicke is right.
Comment 7 C. Scott Ananian 2013-08-13 20:31:07 UTC
Huh, weird.  The PHP parser is definitely emitting the escaped text.  How is tidy getting to it to remove it? Hmm.
Comment 8 C. Scott Ananian 2013-08-15 20:56:45 UTC
According to gwicke, "there is a different PHP cleanup pass in the parser that might do the &lt; escaping. that pass is enabled when tidy is not enabled."

Parsoid attempts to be consistent with the tidy-enabled behavior of the PHP parser.  See bug 52899 for a better way to document/enforce these behaviors in parserTests.
Comment 9 Gerrit Notification Bot 2013-12-13 19:55:40 UTC
Change 78842 merged by jenkins-bot:
Improve parser test for bug 52760 (close tags are being stripped).

https://gerrit.wikimedia.org/r/78842

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links