Last modified: 2013-07-19 03:05:28 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T52841, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 50841 - <nowiki> tags should be applied to the minimal rather than maximal content (or well, a compromise thereof..)
<nowiki> tags should be applied to the minimal rather than maximal content (o...
Status: RESOLVED FIXED
Product: Parsoid
Classification: Unclassified
General (Other open bugs)
unspecified
All All
: High normal
: ---
Assigned To: ssastry
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-07-05 22:29 UTC by James Forrester
Modified: 2013-07-19 03:05 UTC (History)
5 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description James Forrester 2013-07-05 22:29:00 UTC
Right now the following HTML entered by a user:

<p>Hello my [[name]] is Julian and I live in the city of Rochester, New York with my friend Sandy; we write [[books]].</p>

… turns into:

<p>Hello my <nowiki>[[name]] is Julian and I live in the city of Rochester, New York with my friend Sandy; we write [[books]].</nowiki></p>

… whereas ideally it should be minimal rather than maximal:

<p>Hello my <nowiki>[[</nowiki>name<nowiki>]]</nowiki> is Julian and I live in the city of Rochester, New York with my friend Sandy; we write <nowiki>[[</nowiki>books<nowiki>]]</nowiki>.</p>

This would significantly reduce issues with users accidentally inputting wikitext.  We're taking actions in VisualEditor-land to discourage such input, but if Parsoid could help it would be hugely useful.
Comment 1 ssastry 2013-07-05 22:41:30 UTC
Let us discuss more on irc monday, but the general problem of escaping minimal contexts can complicate the algorithm.  That said, we could implement some special cases maybe. 

Also, a lot of nowiki-tags (as in this example) looks ugly to me (personally), and could potentially induce greater grumbling among editors.  So, my suggestion is to minimize incidences of nowiki contexts via VE (however you do it) first, and then implement a version of this proposal in Parsoid after that.

Yet another (not fully thought out) alternative is to convert wikitext chars into HTML entities which eliminates nowikis altogether.  We currently use this entity-based solution when we have to escape pipes in url/ext. links in template arg contexts and can just use that more broadly.
Comment 2 Liangent 2013-07-05 22:46:11 UTC
I forgot whether I pointed this out or not: applying <nowiki> on "normal" text disabled language conversion on it too; we may want to <nowiki> wikitext meta characters only.
Comment 3 ssastry 2013-07-05 22:51:08 UTC
Actually, another way is to use the existing algorithm to detect contexts that would have been wrapped in <nowiki>..</nowiki> and apply nowiki-escaping to all possible wikitext chars in there (which is safe, but excessive).  This will not complicate the algorithm anymore, but will also introduce more nowiki tags than strictly necessary.
Comment 4 Robert Rohde 2013-07-05 22:55:29 UTC
It should be noted, that at present VE aggressively grabs normal text on either side of the offending elements as well.  For example, using VE to enter:

- I am the very model of a modern [[Major-General]], I've information vegetable, animal, and mineral, I know the kings of England, and I quote the fights historical

Becomes

- <nowiki>I am the very model of a modern [[Major-General]], I've information vegetable, animal, and mineral, I know the kings of England, and I quote the fights historical</nowiki>

Even though the plain text at either end of the wikilink could reasonably be ignored without increasing the number of nowikis.  Since VE can not currently handle nowiki elements, that also means that the entire sentence becomes uneditable in VE after saving.
Comment 5 Gabriel Wicke 2013-07-09 03:22:18 UTC
Maybe we could still use a single wrapper but exclude unsuspicious leading / trailing text from the nowiki block. I agree with Subbu that escaping each run of dangerous characters individually would often result in too much nowiki noise.

Re language conversion:
How often is nowiki used to suppress language conversion? Could -{ }- or some other syntax be used instead?
Comment 6 ssastry 2013-07-09 04:11:41 UTC
Actually, why not use entities for escaping?  That will eliminate nowikis, make them editable, and display as text.

[[Foo]] ==> &#91;&#91;Foo&#93;&#93;

and the like.  Am I missing something that makes this problematic?
Comment 7 Liangent 2013-07-09 04:43:08 UTC
(In reply to comment #5)
> Re language conversion:
> How often is nowiki used to suppress language conversion? Could -{ }- or some
> other syntax be used instead?

It's more like some side effect...
Comment 8 Gabriel Wicke 2013-07-09 16:51:08 UTC
(In reply to comment #6)
> Actually, why not use entities for escaping?  That will eliminate nowikis,
> make
> them editable, and display as text.
> 
> [[Foo]] ==> &#91;&#91;Foo&#93;&#93;
> 
> and the like.  Am I missing something that makes this problematic?

Numeric entities are very hard for users to decipher. While unwanted nowiki tags can simply be removed, fixing up unwanted entity escapes is much more work.

(In reply to comment #7)
> (In reply to comment #5)
> > Re language conversion:
> > How often is nowiki used to suppress language conversion? Could -{ }- or some
> > other syntax be used instead?
> 
> It's more like some side effect...

For us it would be helpful if we could separate those two aspects. If little existing content relies on language conversion being disabled in nowiki, then that might actually be possible.
Comment 9 ssastry 2013-07-09 16:52:49 UTC
Gabriel and I discussed the entity solution on IRC, and Gabriel made a good observation that fixing those in 'edit source' will become much harder compared to nowiki tags.  So, entity-based escaping makes fixups harder using the source editor, but makes it easier with VE.  With nowiki, it is the opposite.  But, presumably the noneditability of nowiki sections in VE is temporary.

I'll tackle this with nowikis today by only changing how identified text blocks are escaped without trying to optimize placement itself (Ex:, in reality it is sufficient to only escape "[[" or "]]" in [[Foo]], but that requires more global knowledge which will complicate this).

We'll try to wrap closely occurring wikitext chars in a single nowiki block.  So, "[[foo]]" will be wrapped as <nowiki>[[foo]]</nowiki>.  "Closely occurring" will be heuristic based, say within X chars.
Comment 10 Liangent 2013-07-09 17:18:30 UTC
(In reply to comment #8)
> For us it would be helpful if we could separate those two aspects. If little
> existing content relies on language conversion being disabled in nowiki, then
> that might actually be possible.

There shouldn't be much. It's the case just because in Parser.php,

$text = $this->mStripState->unstripNoWiki( $text );

is placed after

$text = $this->getConverterLanguage()->convert( $text );

but I'm not 100% sure this is just something accidental (or placed in this order delibrately).
Comment 11 Liangent 2013-07-09 17:20:06 UTC
(In reply to comment #10)
> (In reply to comment #8)
> > For us it would be helpful if we could separate those two aspects. If little
> > existing content relies on language conversion being disabled in nowiki, then
> > that might actually be possible.
> 
> There shouldn't be much. It's the case just because in Parser.php,
> 
> $text = $this->mStripState->unstripNoWiki( $text );
> 
> is placed after
> 
> $text = $this->getConverterLanguage()->convert( $text );
> 
> but I'm not 100% sure this is just something accidental (or placed in this
> order delibrately).

Oh I got it. We want to have <nowiki>-{ }-</nowiki> work, so we can't place -{ }- back until all other -{ }- markups have been processed, but this also mean other text in nowiki misses the chance to be converted...
Comment 12 Gabriel Wicke 2013-07-09 18:37:17 UTC
(In reply to comment #11)

> Oh I got it. We want to have <nowiki>-{ }-</nowiki> work, so we can't place
> -{
> }- back until all other -{ }- markups have been processed, but this also mean
> other text in nowiki misses the chance to be converted...

I guess -{ and }- in nowiki could be escaped before applying language conversion.
Comment 13 Gabriel Wicke 2013-07-09 18:38:08 UTC
Possibly interesting nowiki watch list to monitor nowiki application in live edits:
https://en.wikipedia.org/w/index.php?title=Special:AbuseLog&offset=&limit=500&wpSearchFilter=550
Comment 14 Gerrit Notification Bot 2013-07-09 23:10:04 UTC
Change 72858 had a related patch set uploaded by Subramanya Sastry:
(Bug 50841): First pass reducing scope of nowiki tags

https://gerrit.wikimedia.org/r/72858
Comment 15 Gerrit Notification Bot 2013-07-10 03:28:43 UTC
Change 72858 merged by jenkins-bot:
(Bug 50841): First pass reducing scope of nowiki tags

https://gerrit.wikimedia.org/r/72858
Comment 16 Gerrit Notification Bot 2013-07-17 03:51:53 UTC
Change 74110 had a related patch set uploaded by Subramanya Sastry:
Take #2: (Bug 50841) Reduce scope of nowiki tags

https://gerrit.wikimedia.org/r/74110
Comment 17 Gerrit Notification Bot 2013-07-18 21:04:46 UTC
Change 74110 merged by jenkins-bot:
Take #2: (Bug 50841) Reduce scope of nowiki tags

https://gerrit.wikimedia.org/r/74110
Comment 18 Gabriel Wicke 2013-07-18 21:10:53 UTC
This will likely be deployed later today.
Comment 19 ssastry 2013-07-18 21:15:35 UTC
There are still some scenarios where nowiki wrapping will be applied to a longer string than required.  I haven't tried to optimize it all, and if there are still issues, I can revisit it.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links