Last modified: 2014-09-29 02:05:06 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T73043, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 71043 - Replica MySQL: Wiki ViewStats databases completely missing!
Replica MySQL: Wiki ViewStats databases completely missing!
Status: ASSIGNED
Product: Wikimedia Labs
Classification: Unclassified
Infrastructure (Other open bugs)
unspecified
All All
: Unprioritized critical
: ---
Assigned To: Sean Pringle
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-09-19 17:04 UTC by metatron
Modified: 2014-09-29 02:05 UTC (History)
5 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description metatron 2014-09-19 17:04:38 UTC
After cleanup process, two essential databases are missing:
- p50380g50769__wvs2
- p50380g50769__wvs2ds
Pls bring them back online soon.
Comment 1 metatron 2014-09-22 18:15:13 UTC
Added another listener for test, worked fine until 2014-09-22 05:39:56.
Now:
ERROR 1290 (HY000) at line 1: The MariaDB server is running with the --read-only option so it cannot execute this statement

So it seems that someone definitely fiddles here.
But still no word of explanation.
Comment 2 metatron 2014-09-22 18:51:28 UTC
Documenting some things. (though supposedly re-attached & inactive)
Mon Sep 22 18:49:01 UTC 2014

| Database           | Table                  | In_use |
+--------------------+------------------------+--------+
| p50380g50769__wvs2 | v_import_grok          |      0 |
| p50380g50769__wvs2 | topinfo                |      3 |
| p50380g50769__wvs2 | projectmap             |     10 |
| p50380g50769__wvs2 | catmap                 |      1 |
| p50380g50769__wvs2 | rawstats2              |      1 |
| p50380g50769__wvs2 | v_daystats_unique_grok |      0 |
| p50380g50769__wvs2 | v_rawstats3            |      0 |
| p50380g50769__wvs2 | tmptop_month           |      2 |
| p50380g50769__wvs2 | v_topstats             |      0 |
| p50380g50769__wvs2 | daystats2              |      1 |
| p50380g50769__wvs2 | xstate                 |      1 |
| p50380g50769__wvs2 | daystatsimp            |      2 |
| p50380g50769__wvs2 | v_topstats_dev         |      0 |
| p50380g50769__wvs2 | l10n                   |      1 |
| p50380g50769__wvs2 | rawstats3              |      3 |
| p50380g50769__wvs2 | v_daystats_unique      |      0 |
| p50380g50769__wvs2 | v_daystats             |      0 |
| p50380g50769__wvs2 | v_tmptop_day           |      0 |
| p50380g50769__wvs2 | xlog                   |      1 |
| p50380g50769__wvs2 | xconfig                |      1 |
| p50380g50769__wvs2 | import_status          |      1 |
| p50380g50769__wvs2 | filter                 |      3 |
| p50380g50769__wvs2 | tmp                    |      1 |
| p50380g50769__wvs2 | v_daystatsimp          |      0 |
| p50380g50769__wvs2 | rawstats1              |      1 |
| p50380g50769__wvs2 | import_grok            |      2 |
| p50380g50769__wvs2 | meta                   |      1 |
| p50380g50769__wvs2 | import_dumps           |      1 |
| p50380g50769__wvs2 | tmptop_day             |      2 |
| p50380g50769__wvs2 | v_tmptop_month         |      0 |
| p50380g50769__wvs2 | xcache                 |      1 |
| p50380g50769__wvs2 | pagemap                |     12 |
| p50380g50769__wvs2 | topstats               |      3 |
| p50380g50769__wvs2 | import_requests        |      1 |
| p50380g50769__wvs2 | v_rawstats3top         |      0 |
+--------------------+------------------------+--------+
35 rows in set (0.00 sec) 

+----------------------+---------------+--------+
| Database             | Table         | In_use |
+----------------------+---------------+--------+
| p50380g50769__wvs2ds | xlog          |      1 |
| p50380g50769__wvs2ds | _xlog_v1      |      1 |
| p50380g50769__wvs2ds | daystats_grok |      2 |
| p50380g50769__wvs2ds | daystats2     |      3 |
+----------------------+---------------+--------+
4 rows in set (0.00 sec)
Comment 3 Marc A. Pelletier 2014-09-22 19:11:51 UTC
The issue is known, and should correct itself once the database merge is complete.
Comment 4 metatron 2014-09-23 19:04:58 UTC
Database is now accessible through c3 again, but unusable because of a continuing lock.

 Waiting for table metadata lock | SELECT * FROM p50380g50769__wvs2.v_daystats Limit 10 

+----------------------+---------------+--------+
| Database             | Table         | In_use |
+----------------------+---------------+--------+
| p50380g50769__wvs2ds | _xlog_v1      |      0 |
| p50380g50769__wvs2ds | daystats_grok |      0 |
| p50380g50769__wvs2ds | xlog          |      1 |
| p50380g50769__wvs2ds | daystats2     |      1 |
+----------------------+---------------+--------+

I'm not able to unlock/see what kind of process is holding this lock.
What's that process? 

After yesterdays IRC conversation db suddenly changed to a state as it should look like if unattached/unused:

+--------------------+------------------------+--------+-
| Database           | Table                  | In_use | 
+--------------------+------------------------+--------+-
| p50380g50769__wvs2 | xcache                 |      0 | 
| p50380g50769__wvs2 | v_daystatsimp          |      0 | 
| p50380g50769__wvs2 | pagemap                |      0 | 
| p50380g50769__wvs2 | daystats2              |      0 | 
| p50380g50769__wvs2 | import_dumps           |      0 | 
| p50380g50769__wvs2 | v_import_grok          |      0 | 
| p50380g50769__wvs2 | rawstats2              |      0 | 
| p50380g50769__wvs2 | tmp                    |      0 | 
| p50380g50769__wvs2 | meta                   |      0 | 
| p50380g50769__wvs2 | topstats               |      0 | 
| p50380g50769__wvs2 | v_daystats_unique      |      0 | 
| p50380g50769__wvs2 | v_topstats             |      0 | 
| p50380g50769__wvs2 | projectmap             |      0 | 
| p50380g50769__wvs2 | v_rawstats3            |      0 | 
| p50380g50769__wvs2 | import_status          |      0 | 
| p50380g50769__wvs2 | filter                 |      0 | 
| p50380g50769__wvs2 | xstate                 |      0 | 
| p50380g50769__wvs2 | topinfo                |      0 | 
| p50380g50769__wvs2 | v_rawstats3top         |      0 | 
| p50380g50769__wvs2 | xlog                   |      0 | 
| p50380g50769__wvs2 | catmap                 |      0 | 
| p50380g50769__wvs2 | v_tmptop_day           |      0 | 
| p50380g50769__wvs2 | rawstats3              |      0 | 
| p50380g50769__wvs2 | v_tmptop_month         |      0 | 
| p50380g50769__wvs2 | tmptop_day             |      0 | 
| p50380g50769__wvs2 | l10n                   |      0 | 
| p50380g50769__wvs2 | v_daystats             |      0 | 
| p50380g50769__wvs2 | xconfig                |      0 | 
| p50380g50769__wvs2 | rawstats1              |      0 | 
| p50380g50769__wvs2 | import_grok            |      0 | 
| p50380g50769__wvs2 | import_requests        |      0 | 
| p50380g50769__wvs2 | daystatsimp            |      0 | 
| p50380g50769__wvs2 | v_daystats_unique_grok |      0 | 
| p50380g50769__wvs2 | v_topstats_dev         |      0 | 
| p50380g50769__wvs2 | tmptop_month           |      0 | 
+--------------------+------------------------+--------+-

Still not sure what was going on here, but as said, the current persistent lock is blocking db-usage.
Comment 5 Sean Pringle 2014-09-24 00:25:39 UTC
As part of this outage[1], p50380g50769__wvs2 and p50380g50769__wvs2ds had to be dumped and reloaded into a new db instance. Together they are really big and taking days to process. The dump process adds table locks for consistency.

Presently up to:

INSERT INTO `daystats2` VALUES ('2013-12-31' ...

[1] https://lists.wikimedia.org/pipermail/labs-l/2014-September/002946.html
Comment 6 metatron 2014-09-24 18:04:56 UTC
That's exactly what I feared! Coren wears sackcloth and ashes - that's indicated. A simple announcement *in advance* would have done it – like it happened in an exemplary manner for s1 and s2. 
I know that it's a big database and I also know it has been wiped out 3! times in the past without any announcement/notice/excuse...
So my hope was like: Yeah, we've learned from this; hey lads, we're going to do some maintenance; you have a big database here (the biggest on the cluster); do this and that; there may be some downtime...; none of these things.

/me shakes head and is going to reply to this labs-l posting.

Back to the databases:
- I assume daystats2 is still loading (as you mentioned before); much data still missing

- I also assume p50380g50769__wvs2.pagemap is finished (no locks, no activity)
it used to have ~190 M records:
    2014-09-19 04:04:15, Status: max pagemap, 189,651,138
Currently it has 6188! records
MariaDB [p50380g50769__wvs2]> select count(*) from pagemap;
+----------+
| count(*) |
+----------+
|     6818 |
+----------+

- I didn't perform any other consistency check yet, but as of now the whole database is in an inconsitent - and therefore unusable state.
Comment 7 Sean Pringle 2014-09-25 07:53:39 UTC
We have a full backup of p50380g50769__wvs2 and p50380g50769__wvs2ds. The loading processes were paused and adjusted to avoid the blocking table locks, and to load each month of data in parallel. More info to come.
Comment 8 Sean Pringle 2014-09-29 02:05:06 UTC
This finished loading over the weekend and should be back to normal. Double check?

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links