Last modified: 2014-10-31 16:35:39 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T73869, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 71869 - Bidirectional text sometimes in wrong order in PDF
Bidirectional text sometimes in wrong order in PDF
Status: NEW
Product: OCG
Classification: Unclassified
PDF renderer (Other open bugs)
unspecified
All All
: Unprioritized normal
: ---
Assigned To: C. Scott Ananian
: i18n
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-10-09 08:16 UTC by Michael M.
Modified: 2014-10-31 16:35 UTC (History)
5 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Michael M. 2014-10-09 08:16:24 UTC
Use [[de:Dad (Arabischer Buchstabe)]] as test case. There are multiple issues with wrongly ordered text in the PDF:

Swapped words:

Should be: ‏نظام التشابه‎ / niẓām at-tašābuh / ‚Regel der Ähnlichkeit‘
Is: نظام التشابه / at-tašābuh niẓām / ‚Regel der Ähnlichkeit‘

Swapped letters:

Should be: ‏روادف‎ / rawādif / ‚Nachkömmlinge‘
Is: رواد ف / rawādfi / ‚Nachkömmlinge‘
Comment 1 man77-wiki 2014-10-09 18:01:53 UTC
In case it's not obvious: These errors occur when one of the templates {{ar}}, {{arF}}, {{arS}} (or similar) of :w:de: is in use. These are in use for Arabic text, transcription (parameter "w"), transliteration (parameter "d") and translation (parameter "b"). The wrongly ordered passages of the PDFs derive from the parameters "w" and "d", whose entries are, just as for "b", always completely ltr.
Comment 2 Michael M. 2014-10-11 07:58:31 UTC
I tracked this down to lang="ar-Latn", which is used to tag the Latin transcription of the Arabic text:

<span lang="ar-Latn">niẓām at-tašābuh</span>, <span lang="ar-Latn">rawādif</span>

is rendered as

at-tašābuh niẓām, rawādfi
Comment 3 C. Scott Ananian 2014-10-28 16:32:40 UTC
Michael: so you're saying that 'ar-Latn' text should be rendered LTR?
Comment 4 man77-wiki 2014-10-28 18:00:25 UTC
Latn stands for Latin script. Latin script is usually written from left to right, even if it is used for transliteration.
Comment 5 Michael M. 2014-10-31 10:43:26 UTC
Actually, I don't think the lang attribute should *ever* have any influence on the direction of the text. It can be used to select an appropriate font, but only the dir attribute, <bdi> and <bdo> tags, and the unicode-bidi CSS property should have influence on the writing direction.

So I even expect

<span lang="ar">niẓām at-tašābuh</span>, <span lang="ar">rawādif</span>

to render as

niẓām at-tašābuh, rawādif

(and even if you treated lang="ar" as an override on the direction, the result should be

hubāšat-ta māzin, fidāwar

as it would be for <bdo dir="rtl">, but not the strange partially reversed display that currently is produced.)
Comment 6 C. Scott Ananian 2014-10-31 16:35:39 UTC
Hm.  You're probably right.  The lang->dir implication is left over from before we had proper unicode bidi algorithm support.  That can probably be removed now (but I should verify that we have top-level dir attributes in the Parsoid output where needed -- some wikis were adding these via various hacky methods on the outer <html> element IIRC).

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links