Last modified: 2013-05-05 11:38:53 UTC
Some pages like [[en:Sticks and Bones (film)]] and [[en:Bill Gates]] are incorrectly sorted in their categories. See for example http://en.wikipedia.org/w/index.php?title=Category:American_films&pagefrom=Szz which incorrectly shows Sticks and Bones (film). This seems to caused by incorrect cl_sortkey in the categorylinks table: the contents of the column are not all uppercase for those articles. Also, cl_collation is an empty string for those pages. My assumption is that during the change to the current case-insensitive sorting, some pages weren't updated for some reason. Doing purge or null edit of the affected pages doesn't seem to fix the issue.
This should have been fixed when we ran the maintainence script way back. I guess the script (updateCollation.php) needs to be run again? (It should automatically only do things needing updating, so it should be fast if its only a small minority that are old). As an aside, the only way to fix this in the editing interface is to remove cat, save, re-add cat. Purging won't do anything. otoh, the cl_timestamp is 2011-07-07 06:18:18 which is quite recent, so maybe there's a larger problem...
The issue doesn't seem to be visible at the linked category anymore.
You're right. If I search for categorylinks where cl_collation != 'uppercase' on the toolserver, the only ones are for deleted pages. I think having categorylinks for deleted pages is an issue, but separate from this one. mysql> select count(*) from categorylinks where cl_collation != 'uppercase'; +----------+ | count(*) | +----------+ | 279 | +----------+ 1 row in set (0.00 sec) mysql> select count(*) from categorylinks join page on cl_from = page_id where cl_collation != 'uppercase'; +----------+ | count(*) | +----------+ | 0 | +----------+ 1 row in set (0.05 sec)