Last modified: 2014-10-14 18:24:24 UTC
Filing this since subbu wanted to take a closer look ("Yes, VE/Parsoid shouldn't be making dirty diffs -- and these kind of dirty diffs is an exception rather than the rule. This is a bug."): <<VE seems to make unnecessary changes. I wouldn't mind so much if it always helped to maintain the layout of invisible elements, but normally it just screws them (like infoboxes, or position of templates), without any effect on the rendered page but with a less easily readable wikitext editing window. What I notice now though are totally unnecessary and hard to explain changes to ref names. For some reason, VE decides that all ref names need to be in quotes, which is not true at all and doesn't help one bit. It also decides that between the final quote and the forward slash, there needs to be a space. All this has no effect on the rendered page, and doesn't make the wikitext page any easier (or harder) to read, so why does VE bother doing this, which only helps to make diffs a lot longer. Take a look at e.g. [10]: can you easily see if any significant change has been made, and what that may be? First difference; <ref name="globsec-arjun"/> becomes <ref name="globsec-arjun" /> Further: <ref name=nomorearjuns> becomes <ref name="nomorearjuns"> and so on. I have not found the actual significant difference between the old and new version. What is the purpose and benefit of this? If VE would always make the "invisible" layout clearer and better, fine, but VE doesn't care one bit about wikitext layout, so I don't get this useless editing. Fram (talk) 09:27, 7 October 2013 (UTC)>> https://en.wikipedia.org/w/index.php?title=Arjun_%28tank%29&curid=7648336&diff=576092971&oldid=575064244
Yup, that's a bug. VE should not be normalizing <ref name="globsec-arjun"/> to <ref name="globsec-arjun" /> nor should it be normalizing <ref name=nomorearjuns> to <ref name="nomorearjuns">. Since the revision wasn't tagged with the visualeditor-check tag, VE didn't detect any corruption on its end, which makes me suspect that this is a Parsoid bug.
This seems to be related to bugs in selser (Parsoid) when a cache URL is set. I've set a cache URL locally and will investigate.
Gabriel and I now think this happened when Parsoid incurred a cache miss for the HTML (rare) and also suffered a timeout on reparsing (possible on large pages since we had a timeout of 16 sec). But, a couple commits today should address this and further reduce the likelihood of selser not getting triggered on the edit. The relevant commits are: 1. https://gerrit.wikimedia.org/r/#/c/93896/ (increases cache timeout to 60 sec) 2. https://gerrit.wikimedia.org/r/#/c/93902/ (improves parse time when selser suffers a cache miss fetching original HTML).
It seems that quotes are still being added to ref tags as of today: [https://en.wikipedia.org/w/index.php?title=Sting_%28wrestler%29&curid=655575&diff=580574777&oldid=580516764]
The new code I mentioned in #c3 hasn't been deployed to production yet. Will update this ticket when that happens.
New code has been deployed with a bunch of fixes related to cache misses that would cause Parsoid to run the regular serializer on the entire dom. This should fix this issue on most pages. We'll also make some fixes to minimize dirty diffs even when there is a cache miss -- we've identified some improvements we could make to the DOM diff algorithm.
...still happens apparently: [https://en.wikipedia.org/w/index.php?title=New_%28album%29&diff=581113730&oldid=581109664] (noted by another editor on [[en:WP:VEF]])
Not just the quotes, also the additional spaces: [https://en.wikipedia.org/w/index.php?title=Gray_wolf&curid=33702&diff=581450926&oldid=581450732] I can't judge whether this is fixed on most pages or not, but it certainly seems to happen quite reguilarly still.
Yes, those were probably from cache misses. We deployed newer code y'day that should eliminate these unrelated normalizations in most cases (barring bugs). Note however that if a <ref> is edited, Parsoid will still normalize the quotes and whitespace in the attributes of the edited ref. Let us wait for a week to see if we still see unrelaed normalizations and if not, close this as resolved.
Still happened on the 16th: [https://en.wikipedia.org/w/index.php?title=KLM&diff=581951122&oldid=581950935]
This last diff seems to have happened in the context of some error the editor encountered. Recording here for further debugging. https://en.wikipedia.org/w/index.php?title=Wikipedia:VisualEditor/Feedback&oldid=581952520#Error:Unknown_error
The diff in comment #10 might have been caused by an unrelated issue with one of the Varnish caches in production.
I believe this was fixed by 67fca5bdc7 and 923c79102a. These test cases from comments above now round-trip fine: * comment #1: https://en.wikipedia.org/wiki/User:Gwicke/Test/Arjun_%28tank%29 * comment #4: https://en.wikipedia.org/wiki/User:Gwicke/Test/Wrestler * comment #8: https://en.wikipedia.org/wiki/User:Gwicke/Test/Gray_wolf Please reopen if this problem still exists.
I've reopened this, since it appears that this still happens: [https://en.wikipedia.org/w/index.php?title=Triceratops&curid=54410&diff=584366006&oldid=584365568]
*** Bug 64197 has been marked as a duplicate of this bug. ***
*** Bug 64162 has been marked as a duplicate of this bug. ***
This is just a note that Bug 64197, merged here, is about {{sfn}} templates, which sort of are and sort of aren't ref tags. See https://en.wikipedia.org/w/index.php?title=Jack_Parsons_%28rocket_engineer%29&diff=604788783&oldid=604787564
Here's another diff that is probably the same bug: https://en.wikipedia.org/w/index.php?title=Rights&curid=51490&diff=603988349&oldid=601154438 In this one, an oddly named-but-unnamed ref tag (<ref name="">) has its whitespace normalized.
Another one, found while looking at diffs after the deploy of d51e64097: https://fr.wikipedia.org/w/index.php?title=Cendrillon_(Disney)&curid=3040885&diff=105440424&oldid=104520814
This is still happening regularly. Another example is at https://en.wikipedia.org/w/index.php?diff=629369061