Last modified: 2014-09-29 16:24:40 UTC
We just had an issue where hung queries from the Web API were causing massive lag on one of the Wikidata servers. Example query snippet: --- SELECT /* ApiQueryAllUsers::execute */ ipb_deleted,COUNT(*) AS recentedits,user_name --- Lag as reported by <https://www.wikidata.org/w/api.php?action=query&meta=siteinfo&siprop=dbrepllag&sishowalldb=> was high: --- <?xml version="1.0"?> <api> <query> <dbrepllag> <db host="db1058" lag="0" /> <db host="db1005" lag="0" /> <db host="db1026" lag="26432" /> <db host="db1021" lag="0" /> </dbrepllag> </query> </api> --- Server log entry about killing the queries: <https://wikitech.wikimedia.org/w/index.php?title=Server_Admin_Log&diff=80744&oldid=80742&diffonly=1>. As soon as the queries were killed, lag decreased. ApiQueryAllUsers needs investigation.
Do we actually know what the problem query string was? The COUNT(*) AS recentedits looks slightly suspicious Depending on how bad this is, it might be worth disabling the module on wikidata for the time being.
For now I have a pt-kill job running to snipe these queries on s5 slaves after 10 mins. Will gather some examples.
https://gerrit.wikimedia.org/r/#/c/79761/
No more queries sniped yet. However, the example wikidata query in the comment 3 gerrit changeset showed MySQL changing execution plans between wikis due to quite different index cardinality values. ANALYZE TABLE helped there, so this might be a maintenance issue.
When did you do the ANALYZE TABLE? I was already seeing the sane results yesterday on wikidatawiki, using sql.php from terbium.
ANALYZE ran ~1h before comment 4. Were you connecting to the wikidatawiki master? Or one of the slaves?
Master, apparently, since I didn't specify the "slave" option to sql.php (in fact, I didn't know about that option until just now when I checked the code to find out what it did; I'll have to remember to use that in the future).
The query itself is listed in https://gerrit.wikimedia.org/r/#/c/79761/ but that patch is abandoned. Is there anything further to be done in this ticket? Or is this WORKSFORME now?
Is there anything further to be done in this ticket, as the patch is abandoned? Or is this WORKSFORME now?
Let's WORKSFORME this one and chalk it up to a bad query plan per comment 4. People can reopen if it's still a problem.