Last modified: 2013-02-06 17:51:11 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T46631, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 44631 - The collation order for non-Latin scripts changed by putting Latin in the back
The collation order for non-Latin scripts changed by putting Latin in the back
Status: RESOLVED INVALID
Product: Wikimedia
Classification: Unclassified
Site requests (Other open bugs)
wmf-deployment
All All
: Normal normal (vote)
: ---
Assigned To: Nobody - You can work on this!
: i18n
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-02-04 06:10 UTC by Gerard Meijssen
Modified: 2013-02-06 17:51 UTC (History)
8 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Gerard Meijssen 2013-02-04 06:10:22 UTC
Unicode has changed the CLDR for non-Latin scripts by putting Latin characters after the characters by the script.

I blogged about this: http://ultimategerardm.blogspot.nl/2013/02/cldr-gets-sorting-right.html

The document with the languages involved: https://docs.google.com/spreadsheet/ccc?key=0Ag3w_MjvUEoRdHdOVGJCM0pjMEZhaTkzdl9CcWNtSHc#gid=0

Thanks,
     GerardM
Comment 1 Alex Monk 2013-02-04 15:52:13 UTC
What actually needs to be done in the site config for this bug?
Comment 2 Gerard Meijssen 2013-02-04 16:06:34 UTC
I do not know site config. I do not know how search works. 

What I have indicated is that the standard for collation has changed and, I pointed to relevant documents. As I understand it, all Latin based items need to go behind the items in the local script. 

It is not my job to translate this to the site config.
Thanks,
    GerardM
Comment 3 Andre Klapper 2013-02-04 17:11:33 UTC
I guess the intention of the question was:  Why or where do you think that something in Wikimedia need changes (as you filed a bug report)?

Apart from that, where can info be find about this change specifically? So far I have not seen an authoritive source being mentioned. Is there something on http://cldr.unicode.org/ about this change in sort order?
Comment 4 Gerard Meijssen 2013-02-04 18:19:29 UTC
http://www.unicode.org/review/pri178/

Thanks,
    Gerard

PS I can also point to mail exchanges on the subject on the CLDR mailing list
Comment 5 Bawolff (Brian Wolff) 2013-02-06 00:34:40 UTC
Closing as invalid. This bug is not requesting anything.

Basically updating the icu libaries and rebuilding sortkeys would cause the sorting order for different scripts to switch quite a bit for tailored collations. But this would happen anyways on an icu update (although usually not to the same extent, but a small change and a big change are essentially equivalent from the code perspective). 

(Furthermore, at the moment we don't actually use any tailored collations, so this doubly doesn't affect us currently. Although I hope we will be using tailored collations at some point in the future).

So in conclusion this doesn't affect us because:
*Its assumed sort order changes on icu library update, so the change doesn't really impact us
*We don't use tailored collations (yet), which is where the change takes place, so it definitely doesn't affect us
*Only very few wikis even use the UCA based sorting algorithm (currently only ptwikipedia, ptwikibooks and mw.org )
Comment 6 Gerard Meijssen 2013-02-06 09:46:47 UTC
An example of the need for change in the collation order can be found on any of the special:allpages of the non-Latin wikis. For instance the one in Malayalam.

http://ml.wikipedia.org/wiki/%E0%B4%AA%E0%B5%8D%E0%B4%B0%E0%B4%A4%E0%B5%8D%E0%B4%AF%E0%B5%87%E0%B4%95%E0%B4%82:%E0%B4%8E%E0%B4%B2%E0%B5%8D%E0%B4%B2%E0%B4%BE%E0%B4%A4%E0%B4%BE%E0%B4%B3%E0%B5%81%E0%B4%95%E0%B4%B3%E0%B5%81%E0%B4%82

As you can see Latin goes in front and it should not be.
Thanks,
    Gerard
Comment 7 Bartosz Dziewoński 2013-02-06 09:52:11 UTC
(In reply to comment #6)
> An example of the need for change in the collation order can be found on any
> of
> the special:allpages of the non-Latin wikis. For instance the one in
> Malayalam.

That's because no wikis except for Portuguese Wikipedia and Wikibooks are currently *using Unicode collation at all*. See bug 30996 for the work on enabling it.
Comment 8 Gerard Meijssen 2013-02-06 10:45:47 UTC
bug 30996 excludes all non-Latin language. This bug is EXACTLY about all the non-Latin languages.

I do not care too much about the exact technology used. What I care about is that Latin is at the back where the Latin language is NOT the dominant script.
Thanks,
    Gerard
Comment 9 Bawolff (Brian Wolff) 2013-02-06 17:51:11 UTC
(In reply to comment #6)
> An example of the need for change in the collation order can be found on any
> of
> the special:allpages of the non-Latin wikis. For instance the one in
> Malayalam.
> 
> http://ml.wikipedia.org/wiki/
> %E0%B4%AA%E0%B5%8D%E0%B4%B0%E0%B4%A4%E0%B5%8D%E0%B4%AF%E0%B5%87%E0%B4%95%E0%B
> 4%82:
> %E0%B4%8E%E0%B4%B2%E0%B5%8D%E0%B4%B2%E0%B4%BE%E0%B4%A4%E0%B4%BE%E0%B4%B3%E0%B
> 5%81%E0%B4%95%E0%B4%B3%E0%B5%81%E0%B4%82
> 
> As you can see Latin goes in front and it should not be.
> Thanks,
>     Gerard

We are not sorting special:allpages with collation whatsoever. there are other bugs about that, I believe they were wontfixed. (And the sorting of latin first is the least of the problems with sorting on that list)

----

This is still an invalid bug. What happened
*A third party who makes a sorting algorithm changed their algorithm
*We don't use that algorithm at the moment, but plan to in some mysterious future
*When we do use that algorithm, how things are sorted will vary depending on which version of that algorithm we use. There is no requirement that we use the latest version, but if we use a recent enough version the local script will sort before latin on tailored collations.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links