Last modified: 2014-10-10 13:10:19 UTC
Japanese-language characters are rendered as tofu when I try to export a page translated into Japanese using the Translate extension. An example is <https://meta.wikimedia.org/wiki/Tech/News/2014/40/ja?oldid=10019558&>, rendered as <https://meta.wikimedia.org/w/index.php?title=Special:Book&bookcmd=render_article&arttitle=Tech%2FNews%2F2014%2F40%2Fja&oldid=10019558&writer=rdf2latex>. In the linked PDF file, only latin characters are rendered properly.
Just to add: the description above is based on what I see on GNOME Document Viewer (evince) 3.10.3.
The issue is that the Translation extension is not adding the proper "lang" attribute on the translated message, so we are trying to render the entire thing as English text.
What translated message? The content is wrapped inside <div id="mw-content-text" lang="ja" dir="ltr" class="mw-content-ltr"></div>. I do not understand what is the problem and how it is related to Translate.
Ah, sorry -- you're right. That's in the PHP parser output. It's missing from the Parsoid output, however: http://parsoid-lb.eqiad.wikimedia.org/metawiki/Tech/News/2014/40/ja?oldid=10019558 That has lang=en. Could you describe how your extension sets the lang attribute on the content (if you know) before I reassign this bug back over to Parsoid? Presumably I need to get this information via some API, as the desired content language is not present in https://meta.wikimedia.org/w/index.php?title=Tech/News/2014/40/ja&action=raw for instance (which I assume is the raw text which Parsoid is working with).
Via the hook: https://github.com/wikimedia/mediawiki-extensions-Translate/blob/master/tag/PageTranslationHooks.php#L67 I am surprised if the page content language is not yet exposed in the API in any way. If not, let's add it.
Ok, great. I'm going to reassign it to Parsoid; I'll open a new bug if it turns out the page content language isn't exposed via some API.