Last modified: 2012-05-06 20:22:44 UTC
In order to get some insight as to which queries need fixing or need indices upstream, I think we should gather some statistics. Our install is probably among the biggest out there. Since most testswarm requests are very similar there is no need to keep the monitor up for long. I'd say keep it for about 2-3 minutes. Hashar suggested the following: #log_slow_queries= /var/log/mysql/mysql-slow.log #long_query_time = 2 #log-queries-not-using-indexes
Moving bug back in the pool
I have submitted a basic change to let us log slow queries on gallium host: https://gerrit.wikimedia.org/r/4400 Pending ops approval and merging.
Now that Testswarm is back in action, we will soon face database slowness issues. Asher, WMF performance engineer, already reviewed change. So it is only a matter of having them deployed on production.
Checking with Ops to find out a maintenance window to apply this change on the `gallium` server
The slow query logger was enabled on the ci server.
Maybe I should have made it such as the slow logs end up in a readable file. We will see later.
Following the deployment on gallium, here are some slow queries. The first two are the most frequents: SELECT run_id, client_id, useragent_id FROM run_client, clients WHERE DATE_ADD(run_client.updated, INTERVAL 5 MINUTE) < NOW() AND clients.id = client_id AND run_client.status = 1; UPDATE run_useragent SET runs=0, completed=0, status=0 WHERE runs=max AND NOT EXISTS (SELECT * FROM run_client, clients WHERE run_client.run_id=run_useragent.run_id AND run_client.client_id=clients.id AND clients.useragent_id=run_useragent.useragent_id); And there are a few that looks like: SELECT useragents.engine as engine, useragents.name as name, (SELECT COUNT(*) FROM clients WHERE useragent_id=useragents.id AND updated > DATE_SUB(NOW(), INTERVAL 1 minute)) as clients, (engine='msie' AND '9.0' REGEXP version) as found FROM useragents WHERE active=1 AND mobile='0' ORDER BY engine, name; That one was with 'msie' and '9.0'. SELECT useragents.engine as engine, useragents.name as name, (SELECT COUNT(*) FROM clients WHERE useragent_id=useragents.id AND updated > DATE_SUB(NOW(), INTERVAL 1 minute)) as clients, (engine='' AND '' REGEXP version) as found FROM useragents WHERE active=1 AND mobile='1' ORDER BY engine, name;
Query plan for first query: explain SELECT run_id, client_id, useragent_id FROM run_client, clients WHERE DATE_ADD(run_client.updated, INTERVAL 5 MINUTE) < NOW() AND clients.id = client_id AND run_client.status = 1 \G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: run_client type: ALL possible_keys: idx_run_client_client_id key: NULL key_len: NULL ref: NULL rows: 537930 Extra: Using where *************************** 2. row *************************** id: 1 select_type: SIMPLE table: clients type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: testswarm.run_client.client_id rows: 1 Extra: 2 rows in set (0.00 sec) `run_client` table having 440k rows, only one of them having status == 1 right now. So maybe the`run_client` table could use an index on `status` ?
The first part of the second query (an UPDATE statement) could be simplified as: SELECT * FROM run_useragent WHERE runs=max; which has the following query plan: explain SELECT * FROM run_useragent WHERE runs=max \G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: run_useragent type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 480929 Extra: Using where 1 row in set (0.00 sec) Maybe an index on (runs, max) ? The second part of the query use the `run_client` table too but uses no key: > explain SELECT * FROM run_client, run_useragent, clients WHERE run_client.run_id=run_useragent.run_id AND run_client.client_id=clients.id AND clients.useragent_id=run_useragent.useragent_id \G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: run_client type: ALL possible_keys: PRIMARY,idx_run_client_run_id,idx_run_client_client_id key: NULL key_len: NULL ref: NULL rows: 428455 Extra: *************************** 2. row *************************** id: 1 select_type: SIMPLE table: clients type: eq_ref possible_keys: PRIMARY,idx_clients_useragent_id key: PRIMARY key_len: 4 ref: testswarm.run_client.client_id rows: 1 Extra: *************************** 3. row *************************** id: 1 select_type: SIMPLE table: run_useragent type: eq_ref possible_keys: PRIMARY,idx_run_useragent_run_id,idx_run_useragent_useragent_id key: PRIMARY key_len: 8 ref: testswarm.run_client.run_id,testswarm.clients.useragent_id rows: 1 Extra: 3 rows in set (0.00 sec)
I guess you want to talk about the above queries with the jQuery team and close this bug report meanwhile. The slow query log is enabled on gallium so any op could provide a new trace from /var/lib/mysql/gallium-slow.log .
A lot of stuff has been rewritten in 1.0.0 so we may want to get back to this later.
Marking fixed for now.