Last modified: 2014-02-12 23:35:37 UTC
WikiExporter (as called via the dump code on snapshot2) is selecting the entire revision table joined with page in a single query: SELECT /* WikiExporter::dumpFrom 127.0.0.1 */ /*! STRAIGHT_JOIN */ * FROM `page` FORCE INDEX (PRIMARY) INNER JOIN `revision` ON ((page_id=rev_page)) ORDER BY page_id ASC This results in very long running transactions that have been impacting replication.
The fix is to batch up these queries so it's not selecting the whole table. The "difficult" part is what condition to continue from, due to the code having many paths...
Separating this query into batches means you'll get an inconsistent dump, unless you use a transaction to keep the read state frozen... in which case you're probably not gaining anything. Of course if nobody cares about consistency, we could possibly ignore new revisions after the beginning and it'd be "mostly consistent" except for things that get deleted during the dump might fully or partially disappear instead of staying in the snapshot. Probably still needs to read the whole 'page' table in a consistent transaction though.
It's already the case that the stub dump may contain revisions that are deleted by the time the page content dumps roll around. In other cases we wind up missing category links for pages because we're in between the delete and the insert when the rows for that page are updated (see LinksUpdate::incrTableUpdate). I'd like us to move towards more consistency than less though. Any thoughts about how we might get there?
I'm not yet familiar with WikiExporter code, so apologies if any of this is way off base: Firstly, in the short term could it simply be done using dbXX Tampa slaves only? Long-running queries still hurt there, but less so. For more consistency we could perhaps use one depooled Tampa slave per cluster. STOP SLAVE, wait for SQL thread, dump data to taste, START SLAVE. Binlogs hang around for 60 days so the process can be slow.
Er, sorry, that should have been: STOP SLAVE IO thread, wait for SQL thread...
Not tampa slaves, as this production service has now moved to eqiad (September is the deadline for all production services, iirc). The longest running dump is a toss-up between en wp and de wp, 12 days or so for en wp and I don't know yet for de wiki as it's set up now. I'm definitely interested in looking into this possibility. Dumps are run automatically on a rolling basis so I'm not sure how we'd handle depooling slaves unless there were one dedicated from each cluster dedicated for this purpose, except for en wp where it would be depooled only the two weeks we run. Needs some thought...