Last modified: 2014-09-29 02:05:06 UTC
After cleanup process, two essential databases are missing: - p50380g50769__wvs2 - p50380g50769__wvs2ds Pls bring them back online soon.
Added another listener for test, worked fine until 2014-09-22 05:39:56. Now: ERROR 1290 (HY000) at line 1: The MariaDB server is running with the --read-only option so it cannot execute this statement So it seems that someone definitely fiddles here. But still no word of explanation.
Documenting some things. (though supposedly re-attached & inactive) Mon Sep 22 18:49:01 UTC 2014 | Database | Table | In_use | +--------------------+------------------------+--------+ | p50380g50769__wvs2 | v_import_grok | 0 | | p50380g50769__wvs2 | topinfo | 3 | | p50380g50769__wvs2 | projectmap | 10 | | p50380g50769__wvs2 | catmap | 1 | | p50380g50769__wvs2 | rawstats2 | 1 | | p50380g50769__wvs2 | v_daystats_unique_grok | 0 | | p50380g50769__wvs2 | v_rawstats3 | 0 | | p50380g50769__wvs2 | tmptop_month | 2 | | p50380g50769__wvs2 | v_topstats | 0 | | p50380g50769__wvs2 | daystats2 | 1 | | p50380g50769__wvs2 | xstate | 1 | | p50380g50769__wvs2 | daystatsimp | 2 | | p50380g50769__wvs2 | v_topstats_dev | 0 | | p50380g50769__wvs2 | l10n | 1 | | p50380g50769__wvs2 | rawstats3 | 3 | | p50380g50769__wvs2 | v_daystats_unique | 0 | | p50380g50769__wvs2 | v_daystats | 0 | | p50380g50769__wvs2 | v_tmptop_day | 0 | | p50380g50769__wvs2 | xlog | 1 | | p50380g50769__wvs2 | xconfig | 1 | | p50380g50769__wvs2 | import_status | 1 | | p50380g50769__wvs2 | filter | 3 | | p50380g50769__wvs2 | tmp | 1 | | p50380g50769__wvs2 | v_daystatsimp | 0 | | p50380g50769__wvs2 | rawstats1 | 1 | | p50380g50769__wvs2 | import_grok | 2 | | p50380g50769__wvs2 | meta | 1 | | p50380g50769__wvs2 | import_dumps | 1 | | p50380g50769__wvs2 | tmptop_day | 2 | | p50380g50769__wvs2 | v_tmptop_month | 0 | | p50380g50769__wvs2 | xcache | 1 | | p50380g50769__wvs2 | pagemap | 12 | | p50380g50769__wvs2 | topstats | 3 | | p50380g50769__wvs2 | import_requests | 1 | | p50380g50769__wvs2 | v_rawstats3top | 0 | +--------------------+------------------------+--------+ 35 rows in set (0.00 sec) +----------------------+---------------+--------+ | Database | Table | In_use | +----------------------+---------------+--------+ | p50380g50769__wvs2ds | xlog | 1 | | p50380g50769__wvs2ds | _xlog_v1 | 1 | | p50380g50769__wvs2ds | daystats_grok | 2 | | p50380g50769__wvs2ds | daystats2 | 3 | +----------------------+---------------+--------+ 4 rows in set (0.00 sec)
The issue is known, and should correct itself once the database merge is complete.
Database is now accessible through c3 again, but unusable because of a continuing lock. Waiting for table metadata lock | SELECT * FROM p50380g50769__wvs2.v_daystats Limit 10 +----------------------+---------------+--------+ | Database | Table | In_use | +----------------------+---------------+--------+ | p50380g50769__wvs2ds | _xlog_v1 | 0 | | p50380g50769__wvs2ds | daystats_grok | 0 | | p50380g50769__wvs2ds | xlog | 1 | | p50380g50769__wvs2ds | daystats2 | 1 | +----------------------+---------------+--------+ I'm not able to unlock/see what kind of process is holding this lock. What's that process? After yesterdays IRC conversation db suddenly changed to a state as it should look like if unattached/unused: +--------------------+------------------------+--------+- | Database | Table | In_use | +--------------------+------------------------+--------+- | p50380g50769__wvs2 | xcache | 0 | | p50380g50769__wvs2 | v_daystatsimp | 0 | | p50380g50769__wvs2 | pagemap | 0 | | p50380g50769__wvs2 | daystats2 | 0 | | p50380g50769__wvs2 | import_dumps | 0 | | p50380g50769__wvs2 | v_import_grok | 0 | | p50380g50769__wvs2 | rawstats2 | 0 | | p50380g50769__wvs2 | tmp | 0 | | p50380g50769__wvs2 | meta | 0 | | p50380g50769__wvs2 | topstats | 0 | | p50380g50769__wvs2 | v_daystats_unique | 0 | | p50380g50769__wvs2 | v_topstats | 0 | | p50380g50769__wvs2 | projectmap | 0 | | p50380g50769__wvs2 | v_rawstats3 | 0 | | p50380g50769__wvs2 | import_status | 0 | | p50380g50769__wvs2 | filter | 0 | | p50380g50769__wvs2 | xstate | 0 | | p50380g50769__wvs2 | topinfo | 0 | | p50380g50769__wvs2 | v_rawstats3top | 0 | | p50380g50769__wvs2 | xlog | 0 | | p50380g50769__wvs2 | catmap | 0 | | p50380g50769__wvs2 | v_tmptop_day | 0 | | p50380g50769__wvs2 | rawstats3 | 0 | | p50380g50769__wvs2 | v_tmptop_month | 0 | | p50380g50769__wvs2 | tmptop_day | 0 | | p50380g50769__wvs2 | l10n | 0 | | p50380g50769__wvs2 | v_daystats | 0 | | p50380g50769__wvs2 | xconfig | 0 | | p50380g50769__wvs2 | rawstats1 | 0 | | p50380g50769__wvs2 | import_grok | 0 | | p50380g50769__wvs2 | import_requests | 0 | | p50380g50769__wvs2 | daystatsimp | 0 | | p50380g50769__wvs2 | v_daystats_unique_grok | 0 | | p50380g50769__wvs2 | v_topstats_dev | 0 | | p50380g50769__wvs2 | tmptop_month | 0 | +--------------------+------------------------+--------+- Still not sure what was going on here, but as said, the current persistent lock is blocking db-usage.
As part of this outage[1], p50380g50769__wvs2 and p50380g50769__wvs2ds had to be dumped and reloaded into a new db instance. Together they are really big and taking days to process. The dump process adds table locks for consistency. Presently up to: INSERT INTO `daystats2` VALUES ('2013-12-31' ... [1] https://lists.wikimedia.org/pipermail/labs-l/2014-September/002946.html
That's exactly what I feared! Coren wears sackcloth and ashes - that's indicated. A simple announcement *in advance* would have done it – like it happened in an exemplary manner for s1 and s2. I know that it's a big database and I also know it has been wiped out 3! times in the past without any announcement/notice/excuse... So my hope was like: Yeah, we've learned from this; hey lads, we're going to do some maintenance; you have a big database here (the biggest on the cluster); do this and that; there may be some downtime...; none of these things. /me shakes head and is going to reply to this labs-l posting. Back to the databases: - I assume daystats2 is still loading (as you mentioned before); much data still missing - I also assume p50380g50769__wvs2.pagemap is finished (no locks, no activity) it used to have ~190 M records: 2014-09-19 04:04:15, Status: max pagemap, 189,651,138 Currently it has 6188! records MariaDB [p50380g50769__wvs2]> select count(*) from pagemap; +----------+ | count(*) | +----------+ | 6818 | +----------+ - I didn't perform any other consistency check yet, but as of now the whole database is in an inconsitent - and therefore unusable state.
We have a full backup of p50380g50769__wvs2 and p50380g50769__wvs2ds. The loading processes were paused and adjusted to avoid the blocking table locks, and to load each month of data in parallel. More info to come.
This finished loading over the weekend and should be back to normal. Double check?