Last modified: 2011-10-25 22:40:13 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T33454, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 31454 - Cron job to purge cu_changes
Cron job to purge cu_changes
Status: RESOLVED FIXED
Product: MediaWiki extensions
Classification: Unclassified
CheckUser (Other open bugs)
unspecified
All All
: High normal (vote)
: ---
Assigned To: Aaron Schulz
: ops, platformeng
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-10-07 00:21 UTC by Tim Starling
Modified: 2011-10-25 22:40 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Tim Starling 2011-10-07 00:21:41 UTC
We need a maintenance script and cron job to purge cu_changes rows older than $wgCUDMaxAge, say once per day. The current system has CheckUserHooks::updateCheckUserData() purging it on one in every 100 edits, which works just fine for heavily-edited wikis, but for rarely-edited wikis, cu_changes rows can be left in the table for long periods of time.
Comment 1 Aaron Schulz 2011-10-07 04:58:42 UTC
Script added in r99187, r99193.
Comment 2 Aaron Schulz 2011-10-08 00:30:12 UTC
An ops person needs to add this to cron.d on hume. I'm assuming Tim will want a review of this first though.
Comment 3 Rob Lanphier 2011-10-11 20:58:24 UTC
Assigning to Tim for review
Comment 4 Tim Starling 2011-10-21 04:16:02 UTC
Reviewed, should I deploy this now?
Comment 5 Aaron Schulz 2011-10-21 04:28:09 UTC
(In reply to comment #4)
> Reviewed, should I deploy this now?

I'd go ahead, yes.
Comment 6 Rob Lanphier 2011-10-21 18:14:36 UTC
Aaron, could you backport this to the 1.18wmf1 branch, then lob a request in RT with pointers to the scripts?  Tim, if this is still outstanding on Monday, could you push it out?  Thanks!
Comment 7 Tim Starling 2011-10-24 02:00:33 UTC
Long running queries are causing replication lag.

*************************** 2. row ***************************
     Id: 2
   User: system user
   Host: 
     db: enwiki
Command: Connect
   Time: 425
  State: updating
   Info: DELETE /* PurgeOldIPAddressData::prune  */ FROM `cu_changes` WHERE (cuc_timestamp BETWEEN 20110726014800 AND 20110726014919)
Comment 8 Tim Starling 2011-10-24 02:20:59 UTC
The query is missing quotation marks around the timestamps, which causes the index to not be used correctly.

mysql> explain select count(*) from cu_changes where cuc_timestamp BETWEEN 20110726014800 AND 20110726014919\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: cu_changes
         type: index
possible_keys: cuc_timestamp
          key: cuc_timestamp
      key_len: 16
          ref: NULL
         rows: 16846048
        Extra: Using where; Using index
1 row in set (0.00 sec)

mysql> explain select count(*) from cu_changes where cuc_timestamp BETWEEN '20110726014800' AND '20110726014919'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: cu_changes
         type: range
possible_keys: cuc_timestamp
          key: cuc_timestamp
      key_len: 16
          ref: NULL
         rows: 167
        Extra: Using where; Using index
1 row in set (0.00 sec)
Comment 9 Tim Starling 2011-10-24 04:29:48 UTC
Deployed, now we just need to confirm that it gets run correctly from crond at midnight UTC.
Comment 10 Aaron Schulz 2011-10-25 22:40:13 UTC
(In reply to comment #9)
> Deployed, now we just need to confirm that it gets run correctly from crond at
> midnight UTC.

Log is getting populated.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links