Last modified: 2014-10-21 20:44:39 UTC
To reproduce, go to https://de.wikipedia.org/wiki/Bundeswettbewerb_Mathematik and download it as PDF file (using the new rdf2latex writer). Open the created PDF file (I only have an Adobe Reader 10.1.2 to test) and hover over the links. Those with only ASCII characters are as expected (e.g. "Mathematikwettbewerb" links to "https://de.wikipedia.org/wiki/Mathematikwettbewerb"), but those with non-ASCII characters aren't, e.g. "Stifterverband für die Deutsche Wissenschaft" links to "file:///E|/þÿ" (this link seems to be relative to the PDF file). I also tested a random article from el.wikipedia, and all the links were messed up.
Hm, seems to be an issue with Adobe Reader, the first online PDF-to-HTML-converter I could find, handled the links correctly. Anyway, Adobe Reader should be important enough to make the PDF files compatible to it.
I can confirm the problem with evince/poppler on Linux. When clicking such a link I get: Error when getting information for file '/var/tmp/��': No such file or directory
*** Bug 71589 has been marked as a duplicate of this bug. ***
That þÿ at the start of the link seems to be a Byte Order Mark encoded as UTF-16 BE, but interpreted as ISO/IEC 8859-1.
Change 165983 had a related patch set uploaded by Cscott: PDF can't handle UTF-8 URLs. https://gerrit.wikimedia.org/r/165983
Change 165983 merged by jenkins-bot: PDF can't handle UTF-8 URLs. https://gerrit.wikimedia.org/r/165983
Fix merged and deployed.