Last modified: 2013-02-06 17:51:11 UTC
Unicode has changed the CLDR for non-Latin scripts by putting Latin characters after the characters by the script. I blogged about this: http://ultimategerardm.blogspot.nl/2013/02/cldr-gets-sorting-right.html The document with the languages involved: https://docs.google.com/spreadsheet/ccc?key=0Ag3w_MjvUEoRdHdOVGJCM0pjMEZhaTkzdl9CcWNtSHc#gid=0 Thanks, GerardM
What actually needs to be done in the site config for this bug?
I do not know site config. I do not know how search works. What I have indicated is that the standard for collation has changed and, I pointed to relevant documents. As I understand it, all Latin based items need to go behind the items in the local script. It is not my job to translate this to the site config. Thanks, GerardM
I guess the intention of the question was: Why or where do you think that something in Wikimedia need changes (as you filed a bug report)? Apart from that, where can info be find about this change specifically? So far I have not seen an authoritive source being mentioned. Is there something on http://cldr.unicode.org/ about this change in sort order?
http://www.unicode.org/review/pri178/ Thanks, Gerard PS I can also point to mail exchanges on the subject on the CLDR mailing list
Closing as invalid. This bug is not requesting anything. Basically updating the icu libaries and rebuilding sortkeys would cause the sorting order for different scripts to switch quite a bit for tailored collations. But this would happen anyways on an icu update (although usually not to the same extent, but a small change and a big change are essentially equivalent from the code perspective). (Furthermore, at the moment we don't actually use any tailored collations, so this doubly doesn't affect us currently. Although I hope we will be using tailored collations at some point in the future). So in conclusion this doesn't affect us because: *Its assumed sort order changes on icu library update, so the change doesn't really impact us *We don't use tailored collations (yet), which is where the change takes place, so it definitely doesn't affect us *Only very few wikis even use the UCA based sorting algorithm (currently only ptwikipedia, ptwikibooks and mw.org )
An example of the need for change in the collation order can be found on any of the special:allpages of the non-Latin wikis. For instance the one in Malayalam. http://ml.wikipedia.org/wiki/%E0%B4%AA%E0%B5%8D%E0%B4%B0%E0%B4%A4%E0%B5%8D%E0%B4%AF%E0%B5%87%E0%B4%95%E0%B4%82:%E0%B4%8E%E0%B4%B2%E0%B5%8D%E0%B4%B2%E0%B4%BE%E0%B4%A4%E0%B4%BE%E0%B4%B3%E0%B5%81%E0%B4%95%E0%B4%B3%E0%B5%81%E0%B4%82 As you can see Latin goes in front and it should not be. Thanks, Gerard
(In reply to comment #6) > An example of the need for change in the collation order can be found on any > of > the special:allpages of the non-Latin wikis. For instance the one in > Malayalam. That's because no wikis except for Portuguese Wikipedia and Wikibooks are currently *using Unicode collation at all*. See bug 30996 for the work on enabling it.
bug 30996 excludes all non-Latin language. This bug is EXACTLY about all the non-Latin languages. I do not care too much about the exact technology used. What I care about is that Latin is at the back where the Latin language is NOT the dominant script. Thanks, Gerard
(In reply to comment #6) > An example of the need for change in the collation order can be found on any > of > the special:allpages of the non-Latin wikis. For instance the one in > Malayalam. > > http://ml.wikipedia.org/wiki/ > %E0%B4%AA%E0%B5%8D%E0%B4%B0%E0%B4%A4%E0%B5%8D%E0%B4%AF%E0%B5%87%E0%B4%95%E0%B > 4%82: > %E0%B4%8E%E0%B4%B2%E0%B5%8D%E0%B4%B2%E0%B4%BE%E0%B4%A4%E0%B4%BE%E0%B4%B3%E0%B > 5%81%E0%B4%95%E0%B4%B3%E0%B5%81%E0%B4%82 > > As you can see Latin goes in front and it should not be. > Thanks, > Gerard We are not sorting special:allpages with collation whatsoever. there are other bugs about that, I believe they were wontfixed. (And the sorting of latin first is the least of the problems with sorting on that list) ---- This is still an invalid bug. What happened *A third party who makes a sorting algorithm changed their algorithm *We don't use that algorithm at the moment, but plan to in some mysterious future *When we do use that algorithm, how things are sorted will vary depending on which version of that algorithm we use. There is no requirement that we use the latest version, but if we use a recent enough version the local script will sort before latin on tailored collations.