Last modified: 2014-10-10 13:10:19 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T73380, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 71380 - Japanese fonts on translated pages print as Tofu because "lang" attribute is missing.
Japanese fonts on translated pages print as Tofu because "lang" attribute is ...
Status: NEW
Product: Parsoid
Classification: Unclassified
General (Other open bugs)
unspecified
All All
: Normal normal
: ---
Assigned To: Parsoid Team
: i18n
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-09-27 12:43 UTC by Yusuke Matsubara
Modified: 2014-10-10 13:10 UTC (History)
10 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Yusuke Matsubara 2014-09-27 12:43:12 UTC
Japanese-language characters are rendered as tofu when I try to export a page translated into Japanese using the Translate extension.

An example is <https://meta.wikimedia.org/wiki/Tech/News/2014/40/ja?oldid=10019558&>,
rendered as
<https://meta.wikimedia.org/w/index.php?title=Special:Book&bookcmd=render_article&arttitle=Tech%2FNews%2F2014%2F40%2Fja&oldid=10019558&writer=rdf2latex>.  In the linked PDF file, only latin characters are rendered properly.
Comment 1 Yusuke Matsubara 2014-09-27 12:46:38 UTC
Just to add: the description above is based on what I see on GNOME Document Viewer (evince) 3.10.3.
Comment 2 C. Scott Ananian 2014-10-09 17:44:36 UTC
The issue is that the Translation extension is not adding the proper "lang" attribute on the translated message, so we are trying to render the entire thing as English text.
Comment 3 Niklas Laxström 2014-10-09 18:55:47 UTC
What translated message? The content is wrapped inside <div id="mw-content-text" lang="ja" dir="ltr" class="mw-content-ltr"></div>. I do not understand what is the problem and how it is related to Translate.
Comment 4 C. Scott Ananian 2014-10-09 20:24:09 UTC
Ah, sorry -- you're right.  That's in the PHP parser output.

It's missing from the Parsoid output, however:
http://parsoid-lb.eqiad.wikimedia.org/metawiki/Tech/News/2014/40/ja?oldid=10019558

That has lang=en.

Could you describe how your extension sets the lang attribute on the content (if you know) before I reassign this bug back over to Parsoid?  Presumably I need to get this information via some API, as the desired content language is not present in https://meta.wikimedia.org/w/index.php?title=Tech/News/2014/40/ja&action=raw for instance (which I assume is the raw text which Parsoid is working with).
Comment 5 Niklas Laxström 2014-10-09 22:02:57 UTC
Via the hook: https://github.com/wikimedia/mediawiki-extensions-Translate/blob/master/tag/PageTranslationHooks.php#L67

I am surprised if the page content language is not yet exposed in the API in any way. If not, let's add it.
Comment 6 C. Scott Ananian 2014-10-09 22:30:44 UTC
Ok, great.  I'm going to reassign it to Parsoid; I'll open a new bug if it turns out the page content language isn't exposed via some API.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links