Last modified: 2014-10-14 18:24:24 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T57461, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 55461 - Quotes and whitespace normalized in unedited <ref> tags
Quotes and whitespace normalized in unedited <ref> tags
Status: REOPENED
Product: Parsoid
Classification: Unclassified
serializer (Other open bugs)
unspecified
All All
: High normal
: ---
Assigned To: Gabriel Wicke
:
: 64162 64197 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-10-08 08:53 UTC by Elitre
Modified: 2014-10-14 18:24 UTC (History)
12 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Elitre 2013-10-08 08:53:29 UTC
Filing this since subbu wanted to take a closer look ("Yes, VE/Parsoid shouldn't be making dirty diffs -- and these kind of dirty diffs is an exception rather than the rule. This is a bug."):

<<VE seems to make unnecessary changes. I wouldn't mind so much if it always helped to maintain the layout of invisible elements, but normally it just screws them (like infoboxes, or position of templates), without any effect on the rendered page but with a less easily readable wikitext editing window.
What I notice now though are totally unnecessary and hard to explain changes to ref names. For some reason, VE decides that all ref names need to be in quotes, which is not true at all and doesn't help one bit. It also decides that between the final quote and the forward slash, there needs to be a space. All this has no effect on the rendered page, and doesn't make the wikitext page any easier (or harder) to read, so why does VE bother doing this, which only helps to make diffs a lot longer.
Take a look at e.g. [10]: can you easily see if any significant change has been made, and what that may be?
First difference; <ref name="globsec-arjun"/> becomes 
<ref name="globsec-arjun" /> Further: <ref name=nomorearjuns> becomes <ref name="nomorearjuns">
and so on. I have not found the actual significant difference between the old and new version. What is the purpose and benefit of this? If VE would always make the "invisible" layout clearer and better, fine, but VE doesn't care one bit about wikitext layout, so I don't get this useless editing. Fram (talk) 09:27, 7 October 2013 (UTC)>>

https://en.wikipedia.org/w/index.php?title=Arjun_%28tank%29&curid=7648336&diff=576092971&oldid=575064244
Comment 1 Roan Kattouw 2013-10-08 09:02:11 UTC
Yup, that's a bug. VE should not be normalizing <ref name="globsec-arjun"/> to <ref name="globsec-arjun" /> nor should it be normalizing <ref name=nomorearjuns> to <ref name="nomorearjuns">.

Since the revision wasn't tagged with the visualeditor-check tag, VE didn't detect any corruption on its end, which makes me suspect that this is a Parsoid bug.
Comment 2 Roan Kattouw 2013-11-06 01:40:57 UTC
This seems to be related to bugs in selser (Parsoid) when a cache URL is set. I've set a cache URL locally and will investigate.
Comment 3 ssastry 2013-11-06 05:44:19 UTC
Gabriel and I now think this happened when Parsoid incurred a cache miss for the HTML (rare) and also suffered a timeout on reparsing (possible on large pages since we had a timeout of 16 sec). But, a couple commits today should address this and further reduce the likelihood of selser not getting triggered on the edit.

The relevant commits are:

1. https://gerrit.wikimedia.org/r/#/c/93896/ (increases cache timeout to 60 sec)
2. https://gerrit.wikimedia.org/r/#/c/93902/ (improves parse time when selser suffers a cache miss fetching original HTML).
Comment 4 Wikifram@gmail.com (Account disabled) 2013-11-07 12:32:50 UTC
It seems that quotes are still being added to ref tags as of today: [https://en.wikipedia.org/w/index.php?title=Sting_%28wrestler%29&curid=655575&diff=580574777&oldid=580516764]
Comment 5 ssastry 2013-11-07 14:54:40 UTC
The new code I mentioned in #c3 hasn't been deployed to production yet. Will update this ticket when that happens.
Comment 6 ssastry 2013-11-08 06:39:48 UTC
New code has been deployed with a bunch of fixes related to cache misses that would cause Parsoid to run the regular serializer on the entire dom. This should fix this issue on most pages. 

We'll also make some fixes to minimize dirty diffs even when there is a cache miss -- we've identified some improvements we could make to the DOM diff algorithm.
Comment 7 Wikifram@gmail.com (Account disabled) 2013-11-12 13:53:10 UTC
...still happens apparently: [https://en.wikipedia.org/w/index.php?title=New_%28album%29&diff=581113730&oldid=581109664] (noted by another editor on [[en:WP:VEF]])
Comment 8 Wikifram@gmail.com (Account disabled) 2013-11-13 09:17:32 UTC
Not just the quotes, also the additional spaces: [https://en.wikipedia.org/w/index.php?title=Gray_wolf&curid=33702&diff=581450926&oldid=581450732] I can't judge whether this is fixed on most pages or not, but it certainly seems to happen quite reguilarly still.
Comment 9 ssastry 2013-11-15 00:51:05 UTC
Yes, those were probably from cache misses. We deployed newer code y'day that should eliminate these unrelated normalizations in most cases (barring bugs). 

Note however that if a <ref> is edited, Parsoid will still normalize the quotes and whitespace in the attributes of the edited ref.

Let us wait for a week to see if we still see unrelaed normalizations and if not, close this as resolved.
Comment 10 Wikifram@gmail.com (Account disabled) 2013-11-18 08:23:01 UTC
Still happened on the 16th: [https://en.wikipedia.org/w/index.php?title=KLM&diff=581951122&oldid=581950935]
Comment 11 ssastry 2013-11-18 15:24:40 UTC
This last diff seems to have happened in the context of some error the editor encountered. Recording here for further debugging.

https://en.wikipedia.org/w/index.php?title=Wikipedia:VisualEditor/Feedback&oldid=581952520#Error:Unknown_error
Comment 12 Gabriel Wicke 2013-11-19 01:14:53 UTC
The diff in comment #10 might have been caused by an unrelated issue with one of the Varnish caches in production.
Comment 13 Gabriel Wicke 2013-12-03 01:09:04 UTC
I believe this was fixed by 67fca5bdc7 and 923c79102a.

These test cases from comments above now round-trip fine:

* comment #1: https://en.wikipedia.org/wiki/User:Gwicke/Test/Arjun_%28tank%29
* comment #4: https://en.wikipedia.org/wiki/User:Gwicke/Test/Wrestler
* comment #8: https://en.wikipedia.org/wiki/User:Gwicke/Test/Gray_wolf

Please reopen if this problem still exists.
Comment 14 Wikifram@gmail.com (Account disabled) 2013-12-03 15:01:22 UTC
I've reopened this, since it appears that this still happens: [https://en.wikipedia.org/w/index.php?title=Triceratops&curid=54410&diff=584366006&oldid=584365568]
Comment 15 ssastry 2014-04-21 21:42:01 UTC
*** Bug 64197 has been marked as a duplicate of this bug. ***
Comment 16 ssastry 2014-04-21 21:42:25 UTC
*** Bug 64162 has been marked as a duplicate of this bug. ***
Comment 17 WhatamIdoing 2014-04-22 02:58:09 UTC
This is just a note that Bug 64197, merged here, is about {{sfn}} templates, which sort of are and sort of aren't ref tags.  See https://en.wikipedia.org/w/index.php?title=Jack_Parsons_%28rocket_engineer%29&diff=604788783&oldid=604787564
Comment 18 WhatamIdoing 2014-04-23 02:21:49 UTC
Here's another diff that is probably the same bug:

https://en.wikipedia.org/w/index.php?title=Rights&curid=51490&diff=603988349&oldid=601154438

In this one, an oddly named-but-unnamed ref tag (<ref name="">) has its whitespace normalized.
Comment 19 C. Scott Ananian 2014-07-14 20:34:56 UTC
Another one, found while looking at diffs after the deploy of d51e64097:
https://fr.wikipedia.org/w/index.php?title=Cendrillon_(Disney)&curid=3040885&diff=105440424&oldid=104520814
Comment 20 WhatamIdoing 2014-10-14 18:24:24 UTC
This is still happening regularly.  Another example is at  https://en.wikipedia.org/w/index.php?diff=629369061

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links