Last modified: 2013-12-13 20:16:10 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T56438, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 54438 - Port HTML5 default mode change (core 97caae596) to Parsoid
Port HTML5 default mode change (core 97caae596) to Parsoid
Status: RESOLVED FIXED
Product: Parsoid
Classification: Unclassified
General (Other open bugs)
unspecified
All All
: Normal normal
: ---
Assigned To: Gabriel Wicke
: easy
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-09-21 19:55 UTC by Kelson [Emmanuel Engelhart]
Modified: 2013-12-13 20:16 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Kelson [Emmanuel Engelhart] 2013-09-21 19:55:31 UTC
This is my interpretation of this bad rendering:
http://parsoid.wmflabs.org/frwikisource/Auteur:Abb%C3%A9%20Pierre

versus the good one:
http://fr.wikisource.org/wiki/Auteur:Abb%C3%A9_Pierre
Comment 1 Brion Vibber 2013-09-22 14:17:59 UTC
The parsoid.wmflabs.org link just times out; can you describe it?
Comment 2 Kelson [Emmanuel Engelhart] 2013-09-22 14:39:13 UTC
It seems that somehow <time> tags are not recognized as HTML tags and parsed as text (encoded with HTML entities).

Here is the interesting part of the output for the example above (purged of Parsoid specific attributes - sorry for that):

Henri Grouès, dit l’abbé Pierre, était un prêtre catholique français, résistant puis député, fondateur du Mouvement Emmaüs</span> (&lt;time class,&lt;span&gt;,=,&lt;/span&gt;,"bday"='' datetime,&lt;span&gt;,=,&lt;/span&gt;,"1912"=''&gt;1912&lt;/time&gt;<link href=./Catégorie:Naissance_en_1912><link href=./Catégorie:Auteurs_du_XXe_siècle>– &lt;time class,&lt;span&gt;,=,&lt;/span&gt;,"dday"='' datetime,&lt;span&gt;,=,&lt;/span&gt;,"2007"=''&gt;2007&lt;/time&gt;<link href=./Catégorie:Décès_en_2007><link href=./Catégorie:Auteurs_du_XXIe_siècle>)
Comment 3 ssastry 2013-09-22 15:25:06 UTC
Can be verified on a simple test case:

[subbu@earth lib] echo "<time>foo</time>" | node parse --fetchConfig false
<body data-parsoid='{"dsr":[0,17,0,0]}'><p data-parsoid='{"dsr":[0,16,0,0]}'>&lt;time&gt;foo&lt;/time&gt;</p>
</body>

The sanitizer in Parsoid is the culprit -- it uses a list of whitelisted html tags to accept in wikitext and <time> is not one of them. Maybe our port of PHP sanitizer has a bug or we need to update our port. To be investigated.
Comment 4 Gerrit Notification Bot 2013-12-04 00:26:46 UTC
Change 99011 had a related patch set uploaded by GWicke:
Bug 54438: First part of core change 97caae596: support time/data/mark elements

https://gerrit.wikimedia.org/r/99011
Comment 5 Gabriel Wicke 2013-12-04 00:28:13 UTC
Leaving this bug open until other parts of 97caae596 are ported too.
Comment 6 Gerrit Notification Bot 2013-12-04 01:18:24 UTC
Change 99011 merged by jenkins-bot:
Bug 54438: First part of core change 97caae596: support time/data/mark elements

https://gerrit.wikimedia.org/r/99011
Comment 7 Kelson [Emmanuel Engelhart] 2013-12-04 10:51:11 UTC
It seems to me, but not sure this is related to this bug fix, Parsoid generates an additional/unnecessary " " character after the closing time tag.

Examples:
* http://parsoid.wmflabs.org/frwikisource/Auteur%3AVictor_Hugo
* http://parsoid.wmflabs.org/frwikisource/Auteur:Abb%C3%A9%20Pierre
Comment 8 Gabriel Wicke 2013-12-06 22:59:50 UTC
(In reply to comment #7)
> It seems to me, but not sure this is related to this bug fix, Parsoid
> generates
> an additional/unnecessary " " character after the closing time tag.
> 
> Examples:
> * http://parsoid.wmflabs.org/frwikisource/Auteur%3AVictor_Hugo
> * http://parsoid.wmflabs.org/frwikisource/Auteur:Abb%C3%A9%20Pierre

This seems to work fine when testing with master:

echo '<time>1900</time>foo' | node parse
<body data-parsoid='{"dsr":[0,21,0,0]}'><p data-parsoid='{"dsr":[0,20,0,0]}'><time data-parsoid='{"stx":"html","dsr":[0,17,6,7]}'>1900</time>foo</p>
</body>

Can you try to find a minimal test case at http://parsoid.wmflabs.org/_wikitext/ ?

This patch was also deployed on Wednesday (see https://www.mediawiki.org/wiki/Parsoid/Deployments#Wednesday.2C_December_4.2C_13:00-14:00_PST_Y_Deployed_0ac82a28), so these tags are now supported in production.
Comment 9 Kelson [Emmanuel Engelhart] 2013-12-08 20:12:40 UTC
The smaller example I was able to get with a difference is:
* https://fr.wikisource.org/wiki/Utilisateur:Kelson/test
* http://parsoid-lb.eqiad.wikimedia.org/frwikisource/Utilisateur%3AKelson%2Ftest

Your test proves probably that this "problem" has nothing to do with the original bug, should I open a new ticket?
Comment 10 Gabriel Wicke 2013-12-09 18:43:19 UTC
(In reply to comment #9)
> The smaller example I was able to get with a difference is:
> * https://fr.wikisource.org/wiki/Utilisateur:Kelson/test
> *
> http://parsoid-lb.eqiad.wikimedia.org/frwikisource/
> Utilisateur%3AKelson%2Ftest
> 
> Your test proves probably that this "problem" has nothing to do with the
> original bug, should I open a new ticket?

Yes, that would be great. This looks more like a template whitespace folding issue.
Comment 11 Kelson [Emmanuel Engelhart] 2013-12-10 21:29:45 UTC
Here it is:
https://bugzilla.wikimedia.org/show_bug.cgi?id=58289
Certainly not the most exciting bug to investigate...

I close the ticket at this bug seems to be fixed now. Thank you very much.
Comment 12 Gabriel Wicke 2013-12-10 23:21:37 UTC
Reopened this bug, as there are still HTML5-by-default changes from 97caae596 to port.
Comment 13 Gerrit Notification Bot 2013-12-13 19:53:59 UTC
Change 101277 had a related patch set uploaded by GWicke:
Merge "Bug 54438: First part of core change 97caae596: support time/data/mark elements"

https://gerrit.wikimedia.org/r/101277
Comment 14 Gerrit Notification Bot 2013-12-13 19:55:27 UTC
Change 101329 had a related patch set uploaded by GWicke:
Bug 54438: First part of core change 97caae596: support time/data/mark elements

https://gerrit.wikimedia.org/r/101329
Comment 15 Gerrit Notification Bot 2013-12-13 19:57:09 UTC
Change 101277 merged by GWicke:
Merge "Bug 54438: First part of core change 97caae596: support time/data/mark elements"

https://gerrit.wikimedia.org/r/101277
Comment 16 Gerrit Notification Bot 2013-12-13 19:57:11 UTC
Change 101329 merged by GWicke:
Bug 54438: First part of core change 97caae596: support time/data/mark elements

https://gerrit.wikimedia.org/r/101329

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links