Last modified: 2013-09-03 18:58:41 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T55670, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 53670 - Run cleanupTitles.php across Wikimedia wikis
Run cleanupTitles.php across Wikimedia wikis
Status: RESOLVED FIXED
Product: Wikimedia
Classification: Unclassified
Site requests (Other open bugs)
wmf-deployment
All All
: Normal normal (vote)
: ---
Assigned To: Sam Reed (reedy)
https://bugs.php.net/bug.php?id=52981
: shell, utf8
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-09-02 04:21 UTC by MZMcBride
Modified: 2013-09-03 18:58 UTC (History)
6 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description MZMcBride 2013-09-02 04:21:53 UTC
As a follow-up to bug 22939, I'm seeing some strange behavior on the English Wikipedia:

MariaDB [enwiki_p]> select * from page where page_namespace = 2 and page_title = 'Ɑʀʇʉʀɵ/SmallCaps.charset'\G
*************************** 1. row ***************************
          page_id: 40422349
   page_namespace: 2
       page_title: Ɑʀʇʉʀɵ/SmallCaps.charset
page_restrictions: 
     page_counter: 0
 page_is_redirect: 0
      page_is_new: 1
      page_random: 0.582265889095
     page_touched: 20130902000030
      page_latest: 571150285
         page_len: 1107
1 row in set (0.09 sec)

MariaDB [enwiki_p]> select * from page where page_namespace = 2 and page_title = 'ɑʀʇʉʀɵ/SmallCaps.charset'\G
*************************** 1. row ***************************
          page_id: 8610689
   page_namespace: 2
       page_title: ɑʀʇʉʀɵ/SmallCaps.charset
page_restrictions: 
     page_counter: 0
 page_is_redirect: 0
      page_is_new: 0
      page_random: 0.6887380353870001
     page_touched: 20061226044311
      page_latest: 96503668
         page_len: 1107
1 row in set (0.10 sec)

"ɑʀʇʉʀɵ/SmallCaps.charset" is an inaccessible page title. It gets normalized to "Ɑʀʇʉʀɵ/SmallCaps.charset". Presumably the previous cleanupTitles.php run would have caught this, so... I'm not sure what's up.
Comment 1 Ori Livneh 2013-09-02 05:18:08 UTC
The majuscule form of 'ɑ' is U+2C6D Ɑ latin capital letter alpha. It was added to the Unicode standard for version 5.1, released in 2008. Up until version 5.3.3, PHP was using Unicode tables based on version 3.2 of the standard, released in 2002. When we last ran cleanupTitles.php (May 2012), we were still on PHP 5.3.2, which did not include the update.

See <https://bugs.php.net/bug.php?id=52981> for more details.

We should re-run cleanupTitles.php.
Comment 2 Sam Reed (reedy) 2013-09-03 06:32:03 UTC
http://noc.wikimedia.org/~reedy/53670.log.gz
Comment 3 MZMcBride 2013-09-03 18:54:20 UTC
Just for the record, these pages should now exist under "Broken/". The relevant results were:

$ grep "rows updated" 53670.log | grep -v "page... 0 of "
arwiki:  Finished page... 1 of 1416284 rows updated
bewikisource:  Finished page... 57 of 5972 rows updated
bgwiki:  Finished page... 3 of 343456 rows updated
brwiki:  Finished page... 3 of 96511 rows updated
bswiki:  Finished page... 34 of 222943 rows updated
bxrwiki:  Finished page... 2 of 4060 rows updated
cawiki:  Finished page... 1 of 1012795 rows updated
cewiki:  Finished page... 2 of 10910 rows updated
ckbwiki:  Finished page... 5 of 69631 rows updated
commonswiki:  Finished page... 4 of 25223432 rows updated
cswiki:  Finished page... 3 of 706918 rows updated
cuwiki:  Finished page... 1 of 4008 rows updated
cywikisource:  Finished page... 19 of 1104 rows updated
dawiki:  Finished page... 1 of 593711 rows updated
dewiki:  Finished page... 41 of 4537363 rows updated
dewikivoyage:  Finished page... 7 of 39145 rows updated
diqwiki:  Finished page... 17 of 18456 rows updated
dvwiktionary:  Finished page... 2 of 960 rows updated
elwiki:  Finished page... 2 of 241860 rows updated
enwiki:  Finished page... 157 of 31116095 rows updated
enwikinews:  Finished page... 1 of 731634 rows updated
enwikisource:  Finished page... 18 of 1447625 rows updated
eowiki:  Finished page... 1 of 401302 rows updated
eowikisource:  Finished page... 3 of 5680 rows updated
eowiktionary:  Finished page... 1 of 38034 rows updated
eswiki:  Finished page... 8 of 4325732 rows updated
etwiki:  Finished page... 1 of 293643 rows updated
fawiki:  Finished page... 3 of 1801439 rows updated
fiwiki:  Finished page... 3 of 886219 rows updated
fiwikisource:  Finished page... 310 of 12150 rows updated
frwiki:  Finished page... 44 of 5975233 rows updated
frwikibooks:  Finished page... 2 of 39266 rows updated
gdwiki:  Finished page... 1 of 19077 rows updated
glwiki:  Finished page... 1 of 230557 rows updated
guwiki:  Finished page... 1 of 42638 rows updated
hewiki:  Finished page... 1 of 629806 rows updated
hsbwiktionary:  Finished page... 4 of 5331 rows updated
huwiki:  Finished page... 3 of 835123 rows updated
hywiki:  Finished page... 1 of 280600 rows updated
idwiki:  Finished page... 3 of 1037501 rows updated
idwiktionary:  Finished page... 2 of 194297 rows updated
incubatorwiki:  Finished page... 1 of 563575 rows updated
itwiki:  Finished page... 4 of 3444211 rows updated
jawiki:  Finished page... 3 of 2418512 rows updated
kbdwiki:  Finished page... 3 of 3095 rows updated
kowiki:  Finished page... 2 of 809683 rows updated
kuwiki:  Finished page... 4 of 47462 rows updated
kuwikibooks:  Finished page... 2 of 531 rows updated
kuwikiquote:  Finished page... 5 of 1050 rows updated
kywiki:  Finished page... 1 of 36304 rows updated
lawiki:  Finished page... 4 of 179872 rows updated
metawiki:  Finished page... 3 of 2251097 rows updated
mhrwiki:  Finished page... 1 of 12486 rows updated
minwiki:  Finished page... 1 of 14535 rows updated
mlwikiquote:  Finished page... 1 of 3176 rows updated
nowiki:  Finished page... 10 of 943043 rows updated
plwiki:  Finished page... 6 of 1957700 rows updated
ptwiki:  Finished page... 5 of 3298730 rows updated
ruwiki:  Finished page... 13 of 3495752 rows updated
sawikisource:  Finished page... 1 of 11121 rows updated
skwiki:  Finished page... 1 of 395761 rows updated
sourceswiki:  Finished page... 1 of 37720 rows updated
srwiki:  Finished page... 2 of 689781 rows updated
svwiki:  Finished page... 3 of 3454807 rows updated
tawikisource:  Finished page... 11 of 4720 rows updated
test2wiki:  Finished page... 2 of 9689 rows updated
tewiktionary:  Finished page... 2 of 100105 rows updated
thwiki:  Finished page... 2 of 430332 rows updated
trwiki:  Finished page... 1 of 1085798 rows updated
ttwiki:  Finished page... 1 of 101891 rows updated
ukwiki:  Finished page... 6 of 1386835 rows updated
ukwikisource:  Finished page... 1 of 11063 rows updated
urwiki:  Finished page... 26 of 102498 rows updated
uzwiki:  Finished page... 14 of 635677 rows updated
zh_yuewiki:  Finished page... 1 of 78606 rows updated
zhwiki:  Finished page... 12 of 3086902 rows updated
zhwikibooks:  Finished page... 1 of 7431 rows updated
Comment 4 MZMcBride 2013-09-03 18:58:41 UTC
(In reply to comment #3)
> Just for the record, these pages should now exist under "Broken/".

Note: pages retain their namespace. For example:

* (0,'ӷ') to (0,'Broken/Ӷ')
* (3,'ɑʀʇʉʀɵ') to (3,'Ɑʀʇʉʀɵ')

So the pages will exist under "Broken/", but it requires checking every namespace if you're using Special:PrefixIndex.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links