Last modified: 2011-10-25 22:40:13 UTC
We need a maintenance script and cron job to purge cu_changes rows older than $wgCUDMaxAge, say once per day. The current system has CheckUserHooks::updateCheckUserData() purging it on one in every 100 edits, which works just fine for heavily-edited wikis, but for rarely-edited wikis, cu_changes rows can be left in the table for long periods of time.
Script added in r99187, r99193.
An ops person needs to add this to cron.d on hume. I'm assuming Tim will want a review of this first though.
Assigning to Tim for review
Reviewed, should I deploy this now?
(In reply to comment #4) > Reviewed, should I deploy this now? I'd go ahead, yes.
Aaron, could you backport this to the 1.18wmf1 branch, then lob a request in RT with pointers to the scripts? Tim, if this is still outstanding on Monday, could you push it out? Thanks!
Long running queries are causing replication lag. *************************** 2. row *************************** Id: 2 User: system user Host: db: enwiki Command: Connect Time: 425 State: updating Info: DELETE /* PurgeOldIPAddressData::prune */ FROM `cu_changes` WHERE (cuc_timestamp BETWEEN 20110726014800 AND 20110726014919)
The query is missing quotation marks around the timestamps, which causes the index to not be used correctly. mysql> explain select count(*) from cu_changes where cuc_timestamp BETWEEN 20110726014800 AND 20110726014919\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: cu_changes type: index possible_keys: cuc_timestamp key: cuc_timestamp key_len: 16 ref: NULL rows: 16846048 Extra: Using where; Using index 1 row in set (0.00 sec) mysql> explain select count(*) from cu_changes where cuc_timestamp BETWEEN '20110726014800' AND '20110726014919'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: cu_changes type: range possible_keys: cuc_timestamp key: cuc_timestamp key_len: 16 ref: NULL rows: 167 Extra: Using where; Using index 1 row in set (0.00 sec)
Deployed, now we just need to confirm that it gets run correctly from crond at midnight UTC.
(In reply to comment #9) > Deployed, now we just need to confirm that it gets run correctly from crond at > midnight UTC. Log is getting populated.