Last modified: 2014-11-07 20:12:13 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T74844, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 72844 - Fix nowiki heuristics for quotes
Fix nowiki heuristics for quotes
Status: PATCH_TO_REVIEW
Product: Parsoid
Classification: Unclassified
serializer (Other open bugs)
unspecified
All All
: High normal
: ---
Assigned To: ssastry
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-11-01 03:09 UTC by ssastry
Modified: 2014-11-07 20:12 UTC (History)
5 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description ssastry 2014-11-01 03:09:22 UTC
See https://fr.wikipedia.org/w/index.php?title=Univers_cin%C3%A9matographique_Marvel&curid=5796605&diff=108676945&oldid=108668984 for a particularly bad sequence of nowikis.

We should examine this and NicoV's suggestions below and see what is realistic / feasible. But, at least a couple of back-to-back nowikis should be avoidable.

NicoV says (on https://en.wikipedia.org/w/index.php?title=Wikipedia:VisualEditor/Feedback&oldid=631914857#Unnecessary_succession_of_nowiki)

-----
    There are 4 sets of nowiki tags:

        The first one is a combination of an opening tag and a closing tag <nowiki>...</nowiki>.
        The second one is a self-closing nowiki tag <nowiki/> (just after the previous set) : it's totally useless, and brings nothing except more complexity to the wikitext
        The third one is a self-closing nowiki tag <nowiki/> (just before the next set) : same remark as above
        The fourth one is a combination of an opening tag and a closing tag <nowiki>...</nowiki>.

    My report is that 2 sets are completely useless (first and second are redundant; third and fourth are redundant).
    Of course, there's also the problem reported many times, a long time ago, that if nowiki are added they should be added only around the part that needs them not around a whole part of a sentence, but it was not the main issue I was reporting. For this, as it already been said so many times, my preferred methods (in order):

        Use {{'}} (it should be possible to configure VE on each wiki to know if there's such a template, that's what I'm doing for example in WPCleaner's configuration)
        Use '<nowiki/>'' (nowiki between the single quote and the italic formatting)
        Use <nowiki>'</nowiki>'' (nowiki around the single quote)
-----
Comment 1 ssastry 2014-11-04 23:22:13 UTC
This seems to be selser-specific.

This below is what selser generates:
[subbu@earth lib] node parse --selser --html2wt --oldtextfile /tmp/quote.wt --oldhtmlfile /tmp/old.html < /tmp/new.html
*<nowiki> MARVEL Studios et ABC prépare cinq séries '</nowiki><nowiki/>''Netflix Original''<nowiki/><nowiki>'de </nowiki>[[Film de super-héros|super-héros]] pour [[Netflix]]. La diffusion de ces séries s'étalera sur plusieurs années : ''[[Daredevil (série télévisée)|Daredevil]]'' sera la première, puis ''Jessica Jones'', ''Iron Fist'' et enfin ''Luke Cage''.

This below is what the full serializer generates:
[subbu@earth lib] node parse --html2wt < /tmp/new.html
* MARVEL Studios et ABC prépare cinq séries '<nowiki/>''Netflix Original''<nowiki/>'de [[Film de super-héros|super-héros]] pour [[Netflix]]. La diffusion de ces séries s'étalera sur plusieurs années : ''[[Daredevil (série télévisée)|Daredevil]]'' sera la première, puis ''Jessica Jones'', ''Iron Fist'' et enfin ''Luke Cage''.
Comment 2 Gerrit Notification Bot 2014-11-04 23:53:54 UTC
Change 171154 had a related patch set uploaded by Subramanya Sastry:
WIP: (Bug 72844): Fixes to quote nowiki protection in selser mode.

https://gerrit.wikimedia.org/r/171154
Comment 3 ssastry 2014-11-07 20:12:13 UTC
I may not be able to get this done quickly enough -- I have an updated version up now, which eliminates the excess nowikis in this example by consolidating all heuristics in once place, but it might require more smarts. In this new version, it will insert a <nowiki/> whenever it sees a quote-char as the preceding/next sibling or first/last child of a I/B element in the DOM. That said, because of selective serializer, this might only kick in when new quotes are needed in those positions.

Still have to figure out some improvements to the heuristics .. but given the context-sensitive nature of quote-parsing where stuff at the end of line can change parse behavior earlier in the line, these might be elusive. 

I am going to be travelling and so cannot get to this again before next week.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links