Last modified: 2013-07-30 10:07:32 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T45067, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 43067 - Parsoid: Square brackets sometimes (but not always) get escaped with <nowiki>s when PHP parser wouldn't
Parsoid: Square brackets sometimes (but not always) get escaped with <nowiki>...
Status: RESOLVED FIXED
Product: Parsoid
Classification: Unclassified
General (Other open bugs)
unspecified
All All
: Low normal
: ---
Assigned To: Mark Holmquist
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-12-13 12:34 UTC by Gregor Hagedorn
Modified: 2013-07-30 10:07 UTC (History)
6 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Gregor Hagedorn 2012-12-13 12:34:20 UTC
Visual Editor sometimes creates nowiki markup around simple square text brackets. Interestingly, it does that only in some cases, and not in others. For the same structure, it can also handle the text with nowiki, but does it in others. Text case:

 The example below shows that in some cases this is done, in others not:

# Remains unchanged: Test text for the VisualEditor: [[Link target|These are ['''just''' normal brackets]]].
# Superflous nowiki-tags added on editing: Test text for the VisualEditor: [[Link target|These are '''just''' [normal brackets]]].


Edit with Visual Editor to see the effect.
Comment 1 James Forrester 2012-12-14 17:47:04 UTC
This appears to be a Parsoid bug; adjusting to reflect that.

If you use the RT form - http://parsoid.wmflabs.org/_rtform/ - on the above string:

[[Link target|These are '''just''' [normal brackets]]]
->
[[Link target|These are '''just'''<nowiki> [normal brackets]</nowiki>]]

However, it generates correct HTML; for the other string:

[[Link target|These are ['''just''' normal brackets]]]
->
<p data-parsoid="{&quot;dsr&quot;:[0,54]}"><a rel="mw:WikiLink" href="./Link_target" data-parsoid="{&quot;tsr&quot;:[0,53],&quot;a&quot;:{&quot;href&quot;:&quot;./Link_target&quot;},&quot;sa&quot;:{&quot;href&quot;:&quot;Link target&quot;},&quot;stx&quot;:&quot;piped&quot;,&quot;dsr&quot;:[0,53]}">These are [<b data-parsoid="{&quot;tsr&quot;:[25,28],&quot;dsr&quot;:[25,35]}">just</b> normal brackets</a>]</p>

Whereas canonical HTML from the PHP parser is:

<p><a href="/wiki/Link_target" title="Link target">These are [<b>just</b> normal brackets]</a></p>

[i.e., the closing not-used-for-a-link square bracket is inside the link, not outside it - however, this is probably a separate bug; happy to re-file.]
Comment 2 ssastry 2012-12-14 18:01:57 UTC
The nowiki-escaping is related to how Parsoid parses the text: [[Link target|These are ['''just''' normal brackets]]] and the fact that the last "]" is outside the link.  The nowiki escaping strategy is safe (but conservative) and tries not to accidentally create a wikilink when a "]" precedes a regular link bracket.  We (Parsoid team) should fix the parse first, add new tests, and then update the nowiki escaping.
Comment 3 Gregor Hagedorn 2012-12-14 18:37:34 UTC
(In reply to comment #2)
> and the fact that the last
> "]" is outside the link.  

The first ] is inside the link and the last 2 are the closing link.

I agree that there may be more pressing matters, but compare that in one case:

[[Link target|These are ['''just''' normal brackets]]

the parser already works correct (does not add superfluous nowiki). Only if there is no formatting inside single square brackets it adds nowiki. I suspect this is not very logical.

(... and annyoing on the live testing on en.wikipedia.)
Comment 4 Gabriel Wicke 2013-01-28 23:22:26 UTC
One underlying issue is that our tokenizer is relatively liberal in what it accepts as an external link target, while a stricter validation is applied after templates etc are expanded. Our wikitext escaper also makes use of the tokenizer (plus some regexps), but does not (yet) consider external links that turn out to have invalid targets.

Current output:

echo "[[Link target|These are '''just''' [normal brackets]]]" | nodejs parse --wt2wt
[[Link target|These are '''just'''<nowiki> [normal brackets]</nowiki>]]
Comment 5 Gregor Hagedorn 2013-01-29 08:33:02 UTC
I believe adding illogical nowiki markup is somewhat irritating to existing editors.

Perhaps a simple solution: Is there any justification for the tokenizer to consider an opening [ as a token? I believe only [http://, [https://, [irc://, [ircs://, [ftp://, [news://, [mailto: and [gopher:// should be external link tokens.
Comment 6 ssastry 2013-01-29 15:08:34 UTC
The problem is that till templates are expanded, it is hard to know whether the content of a "[..]" is going to a link or not.  That is the reason the tokenizer defers decisions for all [..] content till all templates are expanded so it can all be handled in a single place.

But, there is another ticket for addressing this a bit more selectively which should address nowiki situations like this: https://bugzilla.wikimedia.org/show_bug.cgi?id=44449 --
Comment 7 ssastry 2013-03-01 20:56:14 UTC
https://gerrit.wikimedia.org/r/49190 added URL validation in the tokenizer
based on the wiki's configured URL protocols. This now makes the nowiki escaping more precise and fixes the snippet in this bug report.

[subbu@earth lib] echo '[[Link target|These are '''just''' [normal brackets]]]' | node parse --wt2wt
[[Link target|These are just [normal brackets]]]

However, this wont be visible in the VE yet till a new deploy of Parsoid (another couple weeks, I expect).
Comment 8 ssastry 2013-03-01 20:58:25 UTC
Note about the lost "'''" in the example above: that is a different bug unrelated to nowiki-escaping -- it should be fixed after the ongoing serializer refactor is complete.
Comment 9 Gabriel Wicke 2013-03-01 21:50:11 UTC
Much of this is now resolved. [foo] is no longer nowiki-escaped, but [''foo''] is due to the apostrophes.

There is still some room for (minor) improvement by moving the nowiki inside the brackets, although in some cases it is actually easier to read to wrap an entire line in nowiki instead of several nested nowikis.
Comment 10 Andre Klapper 2013-07-04 10:35:08 UTC
[Parsoid component reorg by merging JS/General and General. See bug 50685 for more information. Filter bugmail on this comment. parsoidreorg20130704]

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links