Last modified: 2012-08-06 19:25:42 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T35052, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 33052 - Parsoid / Serializer: Escape wikitext tags written by hand in VisualEditor interface


Summary:	Parsoid / Serializer: Escape wikitext tags written by hand in VisualEditor in...

Status:	RESOLVED FIXED

Product:	Parsoid
Classification:	Unclassified
Component:	General (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	Normal major
Target Milestone:	---
Assigned To:	Gabriel Wicke

URL:
Whiteboard:
Keywords:

Duplicates:	33090 (view as bug list)
Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2011-12-13 22:14 UTC by Bergi
Modified:	2012-08-06 19:25 UTC (History)
CC List:	10 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Bergi 2011-12-13 22:14:56 UTC

As <pre> is a preprocessor tag, you can very easily mess up with it. Writing </pre> or even making links in preformatted paragraphs gives very ugly wikitext results. Please use the indentation syntax to build preformatted texts.

Comment 1 Trevor Parscal 2011-12-13 22:25:13 UTC

In the Wikitext output, we do use a single space indentation for pre-formatted blocks. I'm confused what you are filing this bug about.

Comment 2 Bergi 2011-12-13 22:50:01 UTC

Oops, I've mixed up wikitext and html output.

However, typing "<pre>whatever</pre>" in a paragraph and applying a link to it will lead to 
* wikitext output "<pre>what[[ever|ever]]</pre>", which doesn't match the visual preview (on the left)
* html/preview output "<p><pre>what<a href="/wiki/ever">ever</a></pre></p>", which is not what the current parser would render.

Comment 3 Inez Korczyński 2011-12-19 11:00:24 UTC

It is a problem in a to html serializer. <pre> tag should be escaped and handled just as a text - not as a html tag.

Comment 4 Bergi 2011-12-19 11:55:40 UTC

Yes, exactly. Both 2html and 2wikitext serializers should escape any preprocessor-handled tags, as well as template inclusions. It would be enough to map "<" to "&lt;" and "{" to "&#123;", maybe with a few rules to reduce it to the absolute essential.

Comment 5 Liangent 2011-12-28 11:23:07 UTC

*** Bug 33090 has been marked as a duplicate of this bug. ***

Comment 6 Bergi 2011-12-28 11:53:26 UTC

I think we should distinguish between "preprocessor syntax" escaping (this bug), and "wikisyntax" escaping (Bug 33090).
Of course we could escape everything that looks the least bit of a parser instruction, but the output wouldn't be readable. But the conditions of what to escape when differ a lot between preprocessor and wiki syntax, especially as we already have a DOM of the latter.

Comment 7 Liangent 2011-12-28 14:43:52 UTC

(In reply to comment #6)
> I think we should distinguish between "preprocessor syntax" escaping (this
> bug), and "wikisyntax" escaping (Bug 33090).
> Of course we could escape everything that looks the least bit of a parser
> instruction, but the output wouldn't be readable. But the conditions of what to
> escape when differ a lot between preprocessor and wiki syntax, especially as we
> already have a DOM of the latter.

Yeah but I don't really know how VisualEditor works... However if bug 33090 is resolved I guess this bug is resolved automatically. Maybe bug dependency?

Comment 8 James Forrester 2012-06-22 01:07:32 UTC

Triage: I believe that currently entering wikitext will trigger it to be converted into the HTML equivalents and displayed appropriately on round-trip, but shouldn't.

Comment 9 Gabriel Wicke 2012-06-22 08:25:05 UTC

The Parsoid serializer tokenizes all text content from the DOM and wraps all non-text tokens (any wiki or html syntax) into <nowiki> blocks.

Comment 10 Liangent 2012-06-22 20:41:17 UTC

@MZMcBride: Are we expected to repeat the component field in bug subject?

Comment 11 MZMcBride 2012-06-22 21:28:41 UTC

(In reply to comment #10)
> @MZMcBride: Are we expected to repeat the component field in bug subject?

Yes. The bug summary should be a short and succinct snippet that describes the bug. It may be a bit redundant, but including the component name (whether that's an extension, "MediaWiki core" or something else entirely) makes the bug summary vastly more informative and useful.

In this case, "Escape wikitext tags written by hand" doesn't tell me what this bug is about. "Escape wikitext tags written by hand in VisualEditor interface" does tell me what this bug is about.

Comment 12 James Forrester 2012-06-22 22:05:22 UTC

Mass-moving items into VisualEditor product

Comment 13 James Forrester 2012-06-23 01:33:57 UTC

Mass-move out of "General" to "Data Model".

Comment 14 Liangent 2012-06-23 09:25:38 UTC

(In reply to comment #9)
> The Parsoid serializer tokenizes all text content from the DOM and wraps all
> non-text tokens (any wiki or html syntax) into <nowiki> blocks.

One more thing is ampersands: <nowiki> doesn't work on them.

http://www.mediawiki.org/w/index.php?title=Project:Sandbox&diff=554116&oldid=553897 This is rendered as "<"

Comment 15 Gabriel Wicke 2012-06-23 18:30:47 UTC

(In reply to comment #14)
> (In reply to comment #9)
> > The Parsoid serializer tokenizes all text content from the DOM and wraps all
> > non-text tokens (any wiki or html syntax) into <nowiki> blocks.
> 
> One more thing is ampersands: <nowiki> doesn't work on them.
> 
> http://www.mediawiki.org/w/index.php?title=Project:Sandbox&diff=554116&oldid=553897
> This is rendered as "<"

This should be fixed with https://gerrit.wikimedia.org/r/#/c/12722/ once it is deployed. There are also a few other fixes to the wikitext escape algorithm in Parsoid that are now waiting for deployment.

Comment 16 Liangent 2012-06-23 18:42:12 UTC

(In reply to comment #15)
> This should be fixed with https://gerrit.wikimedia.org/r/#/c/12722/ once it is
> deployed. There are also a few other fixes to the wikitext escape algorithm in
> Parsoid that are now waiting for deployment.

Is it really helpful? It seems to handle "<", ">" only, without "&".

Comment 17 Gabriel Wicke 2012-06-23 18:59:05 UTC

For now it only handles those two special cases since they are the most urgent.

As described in the commit summary, the real fix will be to remove entity decoding in the tokenizer, and move it to a token stream transformer instead. This will give us 'html_entity' tokens which we can escape properly without having to escape all '&' characters that are not part of html entities.

Until then it makes little sense to escape ampersands on plain text content since those ampersands that are actually part of entities are already decoded by the tokenizer at that stage, so would not be matched. We could instead pre-escape the input to the tokenizer, but that would then produce ugly wikitext for non-entity '&' characters.

Comment 18 Gabriel Wicke 2012-06-29 21:25:32 UTC

HTML entities in plain text content entered in the VE are now escaped, while plain ampersands outside entities are not. HTML entities in wikitext are still decoded for display (as required), but round-tripped to their original form with a span wrapper.

Closing as fixed, please reopen if there are still issues after the next Parsoid code update in the VE demo install, or at http://parsoid.wmflabs.org/_html/ (which we can actually update quickly).

Comment 19 James Forrester 2012-08-06 19:25:42 UTC

Mass-moving bugs into the new 'Parsoid' product.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links