Last modified: 2013-03-13 18:16:27 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T47444, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 45444 - Set $wgCategoryCollation to 'uca-uk' on Ukrainian Wikipedia and rebuild category sort keys
Set $wgCategoryCollation to 'uca-uk' on Ukrainian Wikipedia and rebuild categ...
Status: RESOLVED FIXED
Product: Wikimedia
Classification: Unclassified
Site requests (Other open bugs)
unspecified
All All
: Normal enhancement (vote)
: ---
Assigned To: Bartosz Dziewoński
: shell
Depends on: 41040
Blocks: collations
  Show dependency treegraph
 
Reported: 2013-02-26 20:22 UTC by Bartosz Dziewoński
Modified: 2013-03-13 18:16 UTC (History)
11 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Bartosz Dziewoński 2013-02-26 20:22:36 UTC
Set $wgCategoryCollation to 'uca-uk' on Ukrainian Wikipedia and rebuild category sort keys.

Needs community notification and discussion.
Comment 1 Andre Klapper 2013-02-27 19:11:00 UTC
(In reply to comment #0)
> Needs community notification and discussion.

Who to start that?
Comment 2 Bartosz Dziewoński 2013-02-27 19:14:42 UTC
I though Dmytro Dziuma would be interested in this? (He cc'd himself on this bug, and he was the one who filed bug 41040.)

Anybody from uk.wiki, actually.
Comment 3 Dmytro Dziuma 2013-02-27 19:17:37 UTC
What kind of discussion do you expect from uk.wiki? I will notify the community but I don't see any reason why anybody could be against this fix.
Comment 4 Bartosz Dziewoński 2013-02-27 19:21:30 UTC
Dmytro: pretty much a little discussion/voting in your wiki's equivalent of a village pump.

Here's what I did for the very same issue you're having here on pl.wiki: https://pl.wikipedia.org/wiki/Wikipedia:Kawiarenka/Propozycje#Zmiana_konfiguracji_.E2.80.93_w.C5.82.C4.85czenie_poprawnego_sortowania_artyku.C5.82.C3.B3w_na_stronach_kategorii
Comment 5 Dmytro Dziuma 2013-02-27 19:30:25 UTC
Is it possible to test the 'uca-uk' collation somewhere? It would be great if it could be easily to set up publicly accessible test wiki like you did for pl.wiki at http://users.v-lo.krakow.pl/~matmarex/testwiki
Comment 6 Bartosz Dziewoński 2013-02-27 21:15:18 UTC
I set up an open wiki for you at http://users.v-lo.krakow.pl/~matmarex/testwiki-uk/ . Feel free to use it however you wish, but be aware that it won't stay up forever, and that the server I'm running it on might have occasional hiccups.

Since I anticipate that I'm going to be setting up a lot of such testwikis :), I attached the script I used to do this at parent bug 45443.
Comment 8 Bartosz Dziewoński 2013-02-27 22:32:23 UTC
Also, please note that this also changes how non-Ukrainian characters will be sorted – accented letters such as Ä will be considered the same as their non-accented counterparts for the purposes of sorting (including being shown under one heading) – you can see this on the Polish testwiki. This is probably minor, but worth noting in the discussion. (I'd post there about this myself, but sadly I do not speak Ukrainian.)
Comment 9 Dmytro Dziuma 2013-02-28 08:48:36 UTC
Just as a side note. Is this collation used only for sorting in categories? I'm asking because as far I can see from http://users.v-lo.krakow.pl/~matmarex/testwiki-uk/index.php?title=Спеціальна:Усі_сторінки, in other places some other collation is used for sorting.

I doubt that this is important, but it could be nice to have consistent sorting across the whole wiki including API.
Comment 10 NickK 2013-02-28 09:04:21 UTC
As far as I can see from the test, this sorting takes into account an apostrophe (') as a regular letter. However, in Ukrainian apostrophe is not a letter and it should have no impact on the sorting order (for example, the words в'яз and вяз should have the same key). Is it possible to take this into account as well?
Comment 11 Bawolff (Brian Wolff) 2013-02-28 14:54:57 UTC
(In reply to comment #9)
> Just as a side note. Is this collation used only for sorting in categories?
> I'm
> asking because as far I can see from
> http://users.v-lo.krakow.pl/~matmarex/testwiki-uk/index.php?title=Спеціальна:
> Усі_сторінки,
> in other places some other collation is used for sorting.
> 
> I doubt that this is important, but it could be nice to have consistent
> sorting
> across the whole wiki including API.

Collation only affects categories. There's other bugs about sorting in other places. Most come to the conclusion that while nice its mostly not worth the effort.



(In reply to comment #10)
> As far as I can see from the test, this sorting takes into account an
> apostrophe (') as a regular letter. However, in Ukrainian apostrophe is not a
> letter and it should have no impact on the sorting order (for example, the
> words в'яз and вяз should have the same key). Is it possible to take this
> into
> account as well?


Hmm. Sounds like it should be primary ignorable (ie only used as a tie breaker). This may be an upstream bug, but theres also some options related to such characters (variable characters), so it may just be a configuration issue on our end
Comment 12 Anatoliy Goncharov 2013-02-28 15:28:49 UTC
Now we have in category such order:
* В'язь
* В’язь
* В`язь
* Воліючиневолію
* Вязь
so apostrophe count as separate letter.

Otherwise (if do not count apostrophe) we'll have
* Воліючиневолію
* В'язь
* В’язь
* В`язь
* Вязь
or at least
* В’язь
* В`язь
* Воліючиневолію
* Вязь
* В'язь
Comment 13 Dmytro Dziuma 2013-03-05 09:29:52 UTC
I think it's enough of the current support of the local community of ukwiki. I guess, you can proceed with the deployment of this fix.
Comment 14 Bartosz Dziewoński 2013-03-05 11:08:54 UTC
Thanks. This will have to wait for the deployment of 1.21wmf11 for the Wikipedias, due on March 13 [https://www.mediawiki.org/wiki/MediaWiki_1.21/Roadmap]. I'll propose a configuration change afterwards.

I'll try to look into the behavior of the apostrophes.

----

I scanned through the uk.wiki discussion with the help of Google Translate:

* If I got it right, someone mentioned that other Ukrainian-language projects should have their category pages sorted in the same way. Please feel free to open similar "mini-votes" on them, and link those discussions here once we're sure there is consensus.

* If I got it right, someone said that Ё and Ў should be sorted separately from Е and У. Not sure if this comment has any merit (I don't speak the language, but neither of these are even mentioned on https://en.wikipedia.org/wiki/Ukrainian_alphabet); however, if it does, it's certainly an upstream issue in the ICU library.
Comment 15 Anatoliy Goncharov 2013-03-05 21:01:40 UTC
How long should the voting last to be acceptable?
Comment 17 Bawolff (Brian Wolff) 2013-03-05 22:28:55 UTC
On the apostraphe question, see http://www.unicode.org/reports/tr10/#Variable_Weighting for some background. Try using a locale identifier of uk-u-ka-shifted (have not tested. In theory there should be per locale defaults that are most correct so may be an upstream bug).

(In reply to comment #15)
> How long should the voting last to be acceptable?

A week or so I suppose. There is no hard and fast rule as long as your average interested party would have a chance to object if they so desired. The main reason for such votes is to make sure such a change is wanted. In this case it seems fairly obvious it would be wanted but sometimes people request things that the relavent communities don't want which causes drama. Vote type things (or really any demonstration of community consensuss) is good just to make sure everyone is on the same page and the change is actually wanted.
Comment 18 Bartosz Dziewoński 2013-03-06 13:35:51 UTC
(In reply to comment #16)
> "Mini votes" are here:
> <snip>

Actually, I split those to bug 45776, for clarity. :) Let's keep this one only about the Wikipedia.
Comment 19 Bohdan 2013-03-07 13:28:29 UTC
Will it also change sorting in sortable tables, AllPages, API view of
Categories and in other lists avalible via special pages and API?
Comment 20 Bartosz Dziewoński 2013-03-07 20:59:36 UTC
(In reply to comment #19)
> Will it also change sorting in sortable tables, AllPages, API view of
> Categories and in other lists avalible via special pages and API?

Just for the record, this has been replied to in bug 45776 comment 2. The answer is no, except for the API view of the categories (which is the same as "user view"), but there are suggestions (and maybe even bugs, I'd have to look) to implement the same for them.
Comment 21 Anatoliy Goncharov 2013-03-08 14:13:17 UTC
(In reply to comment #20)
> (In reply to comment #19)
> > Will it also change sorting in sortable tables, AllPages, API view of
> > Categories and in other lists avalible via special pages and API?
> 
> Just for the record, this has been replied to in bug 45776 comment 2. The
> answer is no, except for the API view of the categories (which is the same as
> "user view"), but there are suggestions (and maybe even bugs, I'd have to
> look)
> to implement the same for them.

Well, in CategoryViewer and ApiQueryCategoryMembers classes we use collation for 'cl_sortkey' field in the table 'categorylinks'. What problem to use collation for 'page_title' field in the table 'page' for other purposes (i.e. ApiQueryAllPages)?
Comment 22 Bawolff (Brian Wolff) 2013-03-08 14:46:18 UTC
(In reply to comment #21)
> (In reply to comment #20)
> > (In reply to comment #19)
> > > Will it also change sorting in sortable tables, AllPages, API view of
> > > Categories and in other lists avalible via special pages and API?
> > 
> > Just for the record, this has been replied to in bug 45776 comment 2. The
> > answer is no, except for the API view of the categories (which is the same as
> > "user view"), but there are suggestions (and maybe even bugs, I'd have to
> > look)
> > to implement the same for them.
> 
> Well, in CategoryViewer and ApiQueryCategoryMembers classes we use collation
> for 'cl_sortkey' field in the table 'categorylinks'. What problem to use
> collation for 'page_title' field in the table 'page' for other purposes (i.e.
> ApiQueryAllPages)?

That's a bit of a simplification. There's a bit more overhead than that.

Theres concern that the overhead is not worth it given how few places people get a list of all articles. (See related comments like bug 24574 comment 3 about the user list) I also imagine we would want to see how well this entire system works out for categories first before moving to other lists.
Comment 23 Bartosz Dziewoński 2013-03-08 19:58:16 UTC
For the time being, let's get the category collations deployed, and once this works, we'll wonder how to go further. (I submitted a configuration change proposal as Ifd9b1dfe.
Comment 24 Sam Reed (reedy) 2013-03-10 20:59:30 UTC
Done

mysql:wikiadmin@db1041 [ukwiki]> select count(cl_collation), cl_collation from categorylinks group by cl_collation ;
+---------------------+--------------+
| count(cl_collation) | cl_collation |
+---------------------+--------------+
|             2313095 | uca-uk       |
+---------------------+--------------+
1 row in set (1.23 sec)



2312046 rows processed

real    1130m5.081s
user    9m5.834s
sys     0m50.911s
Comment 25 Anatoliy Goncharov 2013-03-13 14:33:32 UTC
Sorting looks good, but category navigation is broken.

For example here
http://uk.wikipedia.org/wiki/Категорія:Футбольні_клуби_України
when I click 'Next 200' I go to 2 items forward instead of 200, and by next clicking 'Next 200' I go to the same page.
Comment 26 Bawolff (Brian Wolff) 2013-03-13 17:53:28 UTC
(In reply to comment #25)
> Sorting looks good, but category navigation is broken.
> 
> For example here
> http://uk.wikipedia.org/wiki/Категорія:Футбольні_клуби_України
> when I click 'Next 200' I go to 2 items forward instead of 200, and by next
> clicking 'Next 200' I go to the same page.

Is this still the case? It looks fine now to me.

Sorting may have been a little screwed up during the process of switching sorting orders
Comment 27 Bartosz Dziewoński 2013-03-13 18:01:02 UTC
It was broken for me 30 minutes ago and I even started digging in the code, but seems okay for me as well right now.

May have something to do with updateCollation.php's re-run per bug 46036. No idea if that's the case.

If this persists for more than 24 hours, please reopen :)
Comment 28 Bawolff (Brian Wolff) 2013-03-13 18:16:27 UTC
(In reply to comment #27)

> May have something to do with updateCollation.php's re-run per bug 46036. No
> idea if that's the case.
> 

Actually that would make sense. The paging code assumes that cl_sortkey is encoded with the same version of icu as is currently on the server.if that's not the case, the next 200 link could generate an sql query where the paging part doesnt correspond to the last element of the previous query (since the next 200 link has the last page name in the url, not its cl_sortkey which would be full of binary data and possibly quite long. Also using cl_sortkey in the url would break people making those skip to letter foo templates.)

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links