Last modified: 2014-10-28 20:46:51 UTC
$ echo '<link rel="mw:PageProp/Category" href="./Category:Toxine_bactérienne"/><link rel="mw:PageProp/Category" href="./Category:Toxine_bact%C3%A9rienne"/>' | node tests/parse.js --html2wt --apiURL=http://fr.wikipedia.org/w/api.php [[MediaWiki:Badtitletext]] [[MediaWiki:Badtitletext]] Serialization works correctly if the href matches data-parsoid, but only if the client hasn't URL-encoded the href. This is why VE is introducing corruption like https://fr.wikipedia.org/w/index.php?title=Exotoxine&diff=prev&oldid=107508831 , but only in Firefox because Firefox URL-encodes é in hrefs whereas Chrome doesn't.
A quick look at wts.LinkHandler.js reveals that modified links go through wikilink content escaping which is where this gets tripped up. And modification detection is based on data-parsoid inspection and comparing with href, etc. So, something is broken in state.env.isValidLinkTarget(linkTarget) function (used in escapeWikiLinkContentString).
(In reply to ssastry from comment #1) > A quick look at wts.LinkHandler.js reveals that modified links go through > wikilink content escaping which is where this gets tripped up. And > modification detection is based on data-parsoid inspection and comparing > with href, etc. > > So, something is broken in state.env.isValidLinkTarget(linkTarget) function > (used in escapeWikiLinkContentString). This is also broken for links with special characters whose hrefs then get URL-encoded. This leads to the links being normalized to underscore form. $ echo '<a href="../Le_Maillon_faible_%28jeu_t%C3%A9l%C3%A9vis%C3%A9%29" rel="mw:WikiLink" data-parsoid="{"stx":"piped","a":{"href":"../Le_Maillon_faible_(jeu_télévisé)"},"sa":{"href":"Le Maillon faible (jeu télévisé)"},"dsr":[133,184,35,2]}" title="Le Maillon faible (jeu télévisé)">Maillon faible</a>' | node tests/parse.js --html2wt --prefix frwiki [[Le Maillon_faible_(jeu_télévisé)|Maillon faible]] $ echo '<a href="../Le_Maillon_faible_(jeu_télévisé)" rel="mw:WikiLink" data-parsoid="{"stx":"piped","a":{"href":"../Le_Maillon_faible_(jeu_télévisé)"},"sa":{"href":"Le Maillon faible (jeu télévisé)"},"dsr":[133,184,35,2]}" title="Le Maillon faible (jeu télévisé)">Maillon faible</a>' | node tests/parse.js --html2wt --prefix frwiki [[Le Maillon faible (jeu télévisé)|Maillon faible]]
I thought it was weird the first space didn't get converted to an underscore there, but that seems to be happening in general: $ echo '<a href="./Le_Maillon_faible_(jeu_télévisé)" rel="mw:WikiLink">Maillon faible</a>' | node tests/parse.js --html2wt --prefix frwiki [[Le Maillon_faible_(jeu_télévisé)|Maillon faible]] Happens without ./ too
Change 160795 had a related patch set uploaded by Subramanya Sastry: (Bug 70894) Fix bugs serializing modified wikilinks https://gerrit.wikimedia.org/r/160795
Change 160795 merged by jenkins-bot: (Bug 70894) Fix bugs serializing modified wikilinks https://gerrit.wikimedia.org/r/160795
Change 161141 had a related patch set uploaded by Subramanya Sastry: (Bug 70894) Fix regressions introduced by 6e302233 (found in RT-testing) https://gerrit.wikimedia.org/r/161141
Change 161141 merged by jenkins-bot: (Bug 70894) Fix regressions introduced by 6e302233 (found in RT-testing) https://gerrit.wikimedia.org/r/161141
Change 163292 had a related patch set uploaded by Subramanya Sastry: New parser tests for lang/category/wiki links (wt2wt and html2wt modes) https://gerrit.wikimedia.org/r/163292
Change 163292 merged by jenkins-bot: New parser tests for lang/category/wiki links (wt2wt and html2wt modes) https://gerrit.wikimedia.org/r/163292