Last modified: 2014-01-02 19:36:20 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T60001, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 58001 - Tokenizer doesn't tokenize [foo] correctly when it is the trailing substring of wikilink content (Ex: [[Foo|[bar]]])
Tokenizer doesn't tokenize [foo] correctly when it is the trailing substring ...
Status: RESOLVED FIXED
Product: Parsoid
Classification: Unclassified
General (Other open bugs)
unspecified
All All
: Normal normal
: ---
Assigned To: Gabriel Wicke
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-12-04 20:30 UTC by C. Scott Ananian
Modified: 2014-01-02 19:36 UTC (History)
1 user (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description C. Scott Ananian 2013-12-04 20:30:21 UTC
http://parsoid-lb.eqiad.wikimedia.org/enwiki/Durian?oldid=582846792 contains the following markup for endnote 'a' (in the "Flavour and odour" section):

<sup class="reference" id="ref_anone"><a rel="mw:WikiLink" href="./Durian#endnote_anone">[a</a>]</sup>

Note that the close square bracket is not properly included in the <a> tag.

The PHP parser generates the following (at https://en.wikipedia.org/wiki/Special:Redirect/revision/582846792 ):

<sup class="reference" id="ref_anone"><a href="#endnote_anone">[a]</a></sup>

Note:

1) the PHP output places the close square bracket correctly
2) the PHP output uses a document relative href (it doesn't reload the page when followed)
Comment 1 ssastry 2013-12-04 20:36:18 UTC
Reproducible with this snippet:

[subbu@earth tests] echo 'a<ref>b</ref>{{Ref label|c|c|none}}' | node parse

<body data-parsoid='{"dsr":[0,36,0,0]}'><p data-parsoid='{"dsr":[0,35,0,0]}'>a<span about="#mwt2" class="reference" data-mw='{"name":"ref","body":{"html":"b"},"attrs":{}}' id="cite_ref-1-0" rel="dc:references" typeof="mw:Extension/ref" data-parsoid='{"src":"&lt;ref>b&lt;/ref>","dsr":[1,13,5,6]}'><a href="#cite_note-1">[1]</a></span><sup class="reference" id="ref_cnone" about="#mwt3" typeof="mw:Transclusion" data-mw='{"parts":[{"template":{"target":{"wt":"Ref label","href":"./Template:Ref_label"},"params":{"1":{"wt":"c"},"2":{"wt":"c"},"3":{"wt":"none"}},"i":0}}]}' data-parsoid='{"stx":"html","dsr":[13,35,null,null],"pi":[[{"k":"1","spc":["","","",""]},{"k":"2","spc":["","","",""]},{"k":"3","spc":["","","",""]}]]}'><a rel="mw:WikiLink" href="./Main_Page#endnote_cnone" data-parsoid='{"stx":"piped","a":{"href":"./Main_Page#endnote_cnone"},"sa":{"href":"#endnote_cnone"}}'>[c</a>]</sup></p>
</body>
Comment 2 ssastry 2013-12-04 20:39:10 UTC
It is actually a tokenizer issue.

[subbu@earth tests] echo "[[#endnote_anone|[a]]]" | node parse
<body data-parsoid='{"dsr":[0,23,0,0]}'><p data-parsoid='{"dsr":[0,22,0,0]}'><a rel="mw:WikiLink" href="./Main_Page#endnote_anone" data-parsoid='{"stx":"piped","a":{"href":"./Main_Page#endnote_anone"},"sa":{"href":"#endnote_anone"},"dsr":[0,21,17,2]}'>[a</a>]</p>
</body>

Discovered with: echo '{{Ref label|a|a|none}}' | node parse --dump tplsrc
Comment 3 Gabriel Wicke 2013-12-05 00:02:45 UTC
Not sure if this can be tweaked just by changing the precedence of nested external vs. internal links.

Do we have information on how common this issue is?
Comment 4 Gerrit Notification Bot 2013-12-31 00:07:28 UTC
Change 104693 had a related patch set uploaded by Subramanya Sastry:
WIP (Bug 58001) Handle trailing extlink-like text in a wikilink

https://gerrit.wikimedia.org/r/104693
Comment 5 Gerrit Notification Bot 2014-01-02 18:25:58 UTC
Change 104693 merged by jenkins-bot:
(Bug 58001) Handle trailing extlink-like text in a wikilink

https://gerrit.wikimedia.org/r/104693

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links