Last modified: 2012-05-06 20:22:44 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T37028, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 35028 - Get some statistics about testswarm db performance
Get some statistics about testswarm db performance
Status: RESOLVED FIXED
Product: Wikimedia
Classification: Unclassified
Continuous integration (Other open bugs)
unspecified
All All
: High enhancement (vote)
: ---
Assigned To: Antoine "hashar" Musso (WMF)
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-03-07 08:50 UTC by Krinkle
Modified: 2012-05-06 20:22 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Krinkle 2012-03-07 08:50:10 UTC
In order to get some insight as to which queries need fixing or need indices upstream, I think we should gather some statistics.

Our install is probably among the biggest out there.

Since most testswarm requests are very similar there is no need to keep the monitor up for long. I'd say keep it for about 2-3 minutes.

Hashar suggested the following:

#log_slow_queries= /var/log/mysql/mysql-slow.log
#long_query_time = 2
#log-queries-not-using-indexes
Comment 1 Antoine "hashar" Musso (WMF) 2012-03-16 08:03:31 UTC
Moving bug back in the pool
Comment 2 Antoine "hashar" Musso (WMF) 2012-04-06 13:29:35 UTC
I have submitted a basic change to let us log slow queries on gallium host:

https://gerrit.wikimedia.org/r/4400

Pending ops approval and merging.
Comment 3 Antoine "hashar" Musso (WMF) 2012-04-25 07:32:28 UTC
Now that Testswarm is back in action, we will soon face database slowness issues.  Asher, WMF performance engineer, already reviewed change. So it is only a matter of having them deployed on production.
Comment 4 Antoine "hashar" Musso (WMF) 2012-04-25 07:38:39 UTC
Checking with Ops to find out a maintenance window to apply this change on the `gallium` server
Comment 5 Krinkle 2012-04-26 19:44:17 UTC
The slow query logger was enabled on the ci server.
Comment 6 Antoine "hashar" Musso (WMF) 2012-04-27 16:20:06 UTC
Maybe I should have made it such as the slow logs end up in a readable file.  We will see later.
Comment 7 Antoine "hashar" Musso (WMF) 2012-05-02 14:15:11 UTC
Following the deployment on gallium, here are some slow queries. The first two are the most frequents:

 SELECT run_id, client_id, useragent_id FROM run_client, clients WHERE DATE_ADD(run_client.updated, INTERVAL 5 MINUTE) < NOW() AND clients.id = client_id AND run_client.status = 1;

 UPDATE run_useragent SET runs=0, completed=0, status=0 WHERE runs=max AND NOT EXISTS (SELECT * FROM run_client, clients WHERE run_client.run_id=run_useragent.run_id AND run_client.client_id=clients.id AND clients.useragent_id=run_useragent.useragent_id);


And there are a few that looks like:

SELECT useragents.engine as engine, useragents.name as name, (SELECT COUNT(*) FROM clients WHERE useragent_id=useragents.id AND updated > DATE_SUB(NOW(), INTERVAL 1 minute)) as clients, (engine='msie' AND '9.0' REGEXP version) as found FROM useragents WHERE active=1 AND mobile='0' ORDER BY engine, name;

That one was with 'msie' and '9.0'.


SELECT useragents.engine as engine, useragents.name as name, (SELECT COUNT(*) FROM clients WHERE useragent_id=useragents.id AND updated > DATE_SUB(NOW(), INTERVAL 1 minute)) as clients, (engine='' AND '' REGEXP version) as found FROM useragents WHERE active=1 AND mobile='1' ORDER BY engine, name;
Comment 8 Antoine "hashar" Musso (WMF) 2012-05-02 14:21:19 UTC
Query plan for first query:

explain SELECT run_id, client_id, useragent_id FROM run_client, clients WHERE DATE_ADD(run_client.updated, INTERVAL 5 MINUTE) < NOW() AND clients.id = client_id AND run_client.status = 1 \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: run_client
         type: ALL
possible_keys: idx_run_client_client_id
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 537930
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: clients
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: testswarm.run_client.client_id
         rows: 1
        Extra: 
2 rows in set (0.00 sec)


`run_client` table having 440k rows, only one of them having status == 1 right now. So maybe the`run_client` table could use an index on `status` ?
Comment 9 Antoine "hashar" Musso (WMF) 2012-05-02 14:31:54 UTC
The first part of the second query (an UPDATE statement) could be simplified as:

SELECT * FROM run_useragent WHERE runs=max; which has the following query plan:

explain SELECT * FROM run_useragent WHERE runs=max \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: run_useragent
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 480929
        Extra: Using where
1 row in set (0.00 sec)

Maybe an index on (runs, max) ?



The second part of the query use the `run_client` table too but uses no key:


> explain SELECT * FROM run_client, run_useragent, clients WHERE run_client.run_id=run_useragent.run_id AND run_client.client_id=clients.id AND clients.useragent_id=run_useragent.useragent_id \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: run_client
         type: ALL
possible_keys: PRIMARY,idx_run_client_run_id,idx_run_client_client_id
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 428455
        Extra: 
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: clients
         type: eq_ref
possible_keys: PRIMARY,idx_clients_useragent_id
          key: PRIMARY
      key_len: 4
          ref: testswarm.run_client.client_id
         rows: 1
        Extra: 
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: run_useragent
         type: eq_ref
possible_keys: PRIMARY,idx_run_useragent_run_id,idx_run_useragent_useragent_id
          key: PRIMARY
      key_len: 8
          ref: testswarm.run_client.run_id,testswarm.clients.useragent_id
         rows: 1
        Extra: 
3 rows in set (0.00 sec)
Comment 10 Antoine "hashar" Musso (WMF) 2012-05-02 14:37:16 UTC
I guess you want to talk about the above queries with the jQuery team and close this bug report meanwhile.

The slow query log is enabled on gallium so any op could provide a new trace from /var/lib/mysql/gallium-slow.log .
Comment 11 Krinkle 2012-05-02 15:04:03 UTC
A lot of stuff has been rewritten in 1.0.0 so we may want to get back to this later.
Comment 12 Krinkle 2012-05-06 20:22:34 UTC
Marking fixed for now.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links