Last modified: 2012-08-06 19:25:42 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T35052, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 33052 - Parsoid / Serializer: Escape wikitext tags written by hand in VisualEditor interface
Parsoid / Serializer: Escape wikitext tags written by hand in VisualEditor in...
Status: RESOLVED FIXED
Product: Parsoid
Classification: Unclassified
General (Other open bugs)
unspecified
All All
: Normal major
: ---
Assigned To: Gabriel Wicke
:
: 33090 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-12-13 22:14 UTC by Bergi
Modified: 2012-08-06 19:25 UTC (History)
10 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Bergi 2011-12-13 22:14:56 UTC
As <pre> is a preprocessor tag, you can very easily mess up with it. Writing </pre> or even making links in preformatted paragraphs gives very ugly wikitext results. Please use the indentation syntax to build preformatted texts.
Comment 1 Trevor Parscal 2011-12-13 22:25:13 UTC
In the Wikitext output, we do use a single space indentation for pre-formatted blocks. I'm confused what you are filing this bug about.
Comment 2 Bergi 2011-12-13 22:50:01 UTC
Oops, I've mixed up wikitext and html output.

However, typing "<pre>whatever</pre>" in a paragraph and applying a link to it will lead to 
* wikitext output "<pre>what[[ever|ever]]</pre>", which doesn't match the visual preview (on the left)
* html/preview output "<p><pre>what<a href="/wiki/ever">ever</a></pre></p>", which is not what the current parser would render.
Comment 3 Inez Korczyński 2011-12-19 11:00:24 UTC
It is a problem in a to html serializer. <pre> tag should be escaped and handled just as a text - not as a html tag.
Comment 4 Bergi 2011-12-19 11:55:40 UTC
Yes, exactly. Both 2html and 2wikitext serializers should escape any preprocessor-handled tags, as well as template inclusions. It would be enough to map "<" to "&lt;" and "{" to "&#123;", maybe with a few rules to reduce it to the absolute essential.
Comment 5 Liangent 2011-12-28 11:23:07 UTC
*** Bug 33090 has been marked as a duplicate of this bug. ***
Comment 6 Bergi 2011-12-28 11:53:26 UTC
I think we should distinguish between "preprocessor syntax" escaping (this bug), and "wikisyntax" escaping (Bug 33090).
Of course we could escape everything that looks the least bit of a parser instruction, but the output wouldn't be readable. But the conditions of what to escape when differ a lot between preprocessor and wiki syntax, especially as we already have a DOM of the latter.
Comment 7 Liangent 2011-12-28 14:43:52 UTC
(In reply to comment #6)
> I think we should distinguish between "preprocessor syntax" escaping (this
> bug), and "wikisyntax" escaping (Bug 33090).
> Of course we could escape everything that looks the least bit of a parser
> instruction, but the output wouldn't be readable. But the conditions of what to
> escape when differ a lot between preprocessor and wiki syntax, especially as we
> already have a DOM of the latter.

Yeah but I don't really know how VisualEditor works... However if bug 33090 is resolved I guess this bug is resolved automatically. Maybe bug dependency?
Comment 8 James Forrester 2012-06-22 01:07:32 UTC
Triage: I believe that currently entering wikitext will trigger it to be converted into the HTML equivalents and displayed appropriately on round-trip, but shouldn't.
Comment 9 Gabriel Wicke 2012-06-22 08:25:05 UTC
The Parsoid serializer tokenizes all text content from the DOM and wraps all non-text tokens (any wiki or html syntax) into <nowiki> blocks.
Comment 10 Liangent 2012-06-22 20:41:17 UTC
@MZMcBride: Are we expected to repeat the component field in bug subject?
Comment 11 MZMcBride 2012-06-22 21:28:41 UTC
(In reply to comment #10)
> @MZMcBride: Are we expected to repeat the component field in bug subject?

Yes. The bug summary should be a short and succinct snippet that describes the bug. It may be a bit redundant, but including the component name (whether that's an extension, "MediaWiki core" or something else entirely) makes the bug summary vastly more informative and useful.

In this case, "Escape wikitext tags written by hand" doesn't tell me what this bug is about. "Escape wikitext tags written by hand in VisualEditor interface" does tell me what this bug is about.
Comment 12 James Forrester 2012-06-22 22:05:22 UTC
Mass-moving items into VisualEditor product
Comment 13 James Forrester 2012-06-23 01:33:57 UTC
Mass-move out of "General" to "Data Model".
Comment 14 Liangent 2012-06-23 09:25:38 UTC
(In reply to comment #9)
> The Parsoid serializer tokenizes all text content from the DOM and wraps all
> non-text tokens (any wiki or html syntax) into <nowiki> blocks.

One more thing is ampersands: <nowiki> doesn't work on them.

http://www.mediawiki.org/w/index.php?title=Project:Sandbox&diff=554116&oldid=553897 This is rendered as "<"
Comment 15 Gabriel Wicke 2012-06-23 18:30:47 UTC
(In reply to comment #14)
> (In reply to comment #9)
> > The Parsoid serializer tokenizes all text content from the DOM and wraps all
> > non-text tokens (any wiki or html syntax) into <nowiki> blocks.
> 
> One more thing is ampersands: <nowiki> doesn't work on them.
> 
> http://www.mediawiki.org/w/index.php?title=Project:Sandbox&diff=554116&oldid=553897
> This is rendered as "<"

This should be fixed with https://gerrit.wikimedia.org/r/#/c/12722/ once it is deployed. There are also a few other fixes to the wikitext escape algorithm in Parsoid that are now waiting for deployment.
Comment 16 Liangent 2012-06-23 18:42:12 UTC
(In reply to comment #15)
> This should be fixed with https://gerrit.wikimedia.org/r/#/c/12722/ once it is
> deployed. There are also a few other fixes to the wikitext escape algorithm in
> Parsoid that are now waiting for deployment.

Is it really helpful? It seems to handle "<", ">" only, without "&".
Comment 17 Gabriel Wicke 2012-06-23 18:59:05 UTC
For now it only handles those two special cases since they are the most urgent.

As described in the commit summary, the real fix will be to remove entity decoding in the tokenizer, and move it to a token stream transformer instead. This will give us 'html_entity' tokens which we can escape properly without having to escape all '&' characters that are not part of html entities.

Until then it makes little sense to escape ampersands on plain text content since those ampersands that are actually part of entities are already decoded by the tokenizer at that stage, so would not be matched. We could instead pre-escape the input to the tokenizer, but that would then produce ugly wikitext for non-entity '&' characters.
Comment 18 Gabriel Wicke 2012-06-29 21:25:32 UTC
HTML entities in plain text content entered in the VE are now escaped, while plain ampersands outside entities are not. HTML entities in wikitext are still decoded for display (as required), but round-tripped to their original form with a span wrapper.

Closing as fixed, please reopen if there are still issues after the next Parsoid code update in the VE demo install, or at http://parsoid.wmflabs.org/_html/ (which we can actually update quickly).
Comment 19 James Forrester 2012-08-06 19:25:42 UTC
Mass-moving bugs into the new 'Parsoid' product.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links