Last modified: 2014-08-08 12:16:36 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T71110, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 69110 - Database upgrade MariaDB 10: 600 seconds timeout
Database upgrade MariaDB 10: 600 seconds timeout
Status: NEW
Product: Wikimedia Labs
Classification: Unclassified
Infrastructure (Other open bugs)
unspecified
All All
: Unprioritized normal
: ---
Assigned To: Sean Pringle
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-08-04 17:27 UTC by Incola
Modified: 2014-08-08 12:16 UTC (History)
5 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Incola 2014-08-04 17:27:32 UTC
I'm one of the maintainers of the Lists tool: http://tools.wmflabs.org/lists/

This tool executes a series of queries every day: after each query it runs another query to record some statistic information.

If the primary query runs for more than 600 seconds, the secondary one fails with the error "General error: 2006 MySQL server has gone away".

This issue has begun after the migration to MariaDB 10.
Comment 1 Andre Klapper 2014-08-04 18:52:57 UTC
Duplicate of bug 68753?
Comment 2 Incola 2014-08-04 20:53:58 UTC
I've found that the problem is related with the primary queries: some of them now are slower then before and exceeds the 600 seconds limit.

For example the query [1] runned in 170 seconds and now runs in more than 1000 seconds.

[1] http://tools.wmflabs.org/lists/itwiki/Voci/Voci_senza_uscita
Comment 3 Sean Pringle 2014-08-05 00:54:58 UTC
Please post examples of both queries.
Comment 4 Incola 2014-08-05 11:40:36 UTC
The query is at the bottom of the previous link. The query that fails is not important, the problem is that this query takes too long to run.
Comment 5 Sean Pringle 2014-08-05 13:05:55 UTC
I realize you think the speed is the problem, which I agree is an issue. However there is no "over 600s" type kill mechanism, so I'm interested in establishing two things:

1. Why the first query is slow. Thanks, I see the example now.

2. Why the second query dies and whether it is, in fact, related to the speed of the first query, or to something else unexpected. Hence I asked to see it too...
Comment 6 Andre Klapper 2014-08-05 13:10:03 UTC
Incola: "not important" doesn't really exist when trying to find steps to reproduce. :)
Comment 7 Incola 2014-08-05 14:18:14 UTC
The second query is something like:

insert into `executions` (`query_id`, `time`, `duration`, `results`) values (`23`, `2014-08-05 14:01:05`, `1879`, `5290`)
Comment 8 Sean Pringle 2014-08-06 12:46:53 UTC
The first query is heavily dependent on disk IO. It runs in ~1000s on both MariaDB 10 and 5.5 if data is cold, or if any other concurrent query is also bottle necked on disk. This should be reviewed once the switch back to SSD is done (to be scheduled very shortly after labsdb1003 migrates). 

Regarding the second query dying or losing connection, which still seems odd, it would be useful to know:

- If the first query always completes regardless of slow runtime, or sometimes fails/is-killed itself.

- If there is any delay between issuing the two queries on the same DB connection (seconds, minutes, etc ..).

- If there is any transaction in use, either via explicit BEGIN or AUTO_COMMIT=0.

- What client connector or library is used, and whether it could have any custom timeout settings.
Comment 9 Incola 2014-08-06 15:34:13 UTC
- The first query always runs correctly.

- They are on different connections.

- I don't know because I'm not the original author of the code and I don't know how works the framework that was used.

- The first query runs via shell command invocated by a PHP script, the second one via the PHP script directly. The script is this one: https://git.wikimedia.org/blob/labs%2Ftools%2Flists/d291a438ef6e1aa0e4630d501cd9a28bedb014cc/app%2Fcommands%2FExecCrontab.php
Comment 10 Incola 2014-08-08 12:16:36 UTC
After switching back to SSD no error is reported and the queries are run with their previous timing.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links