Last modified: 2014-09-29 02:05:06 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T73043, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 71043 - Replica MySQL: Wiki ViewStats databases completely missing!


Summary:	Replica MySQL: Wiki ViewStats databases completely missing!

Status:	ASSIGNED

Product:	Wikimedia Labs
Classification:	Unclassified
Component:	Infrastructure (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	Unprioritized critical
Target Milestone:	---
Assigned To:	Sean Pringle

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2014-09-19 17:04 UTC by metatron
Modified:	2014-09-29 02:05 UTC (History)
CC List:	5 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description metatron 2014-09-19 17:04:38 UTC

After cleanup process, two essential databases are missing:
- p50380g50769__wvs2
- p50380g50769__wvs2ds
Pls bring them back online soon.

Comment 1 metatron 2014-09-22 18:15:13 UTC

Added another listener for test, worked fine until 2014-09-22 05:39:56.
Now:
ERROR 1290 (HY000) at line 1: The MariaDB server is running with the --read-only option so it cannot execute this statement

So it seems that someone definitely fiddles here.
But still no word of explanation.

Comment 2 metatron 2014-09-22 18:51:28 UTC

Documenting some things. (though supposedly re-attached & inactive)
Mon Sep 22 18:49:01 UTC 2014

| Database           | Table                  | In_use |
+--------------------+------------------------+--------+
| p50380g50769__wvs2 | v_import_grok          |      0 |
| p50380g50769__wvs2 | topinfo                |      3 |
| p50380g50769__wvs2 | projectmap             |     10 |
| p50380g50769__wvs2 | catmap                 |      1 |
| p50380g50769__wvs2 | rawstats2              |      1 |
| p50380g50769__wvs2 | v_daystats_unique_grok |      0 |
| p50380g50769__wvs2 | v_rawstats3            |      0 |
| p50380g50769__wvs2 | tmptop_month           |      2 |
| p50380g50769__wvs2 | v_topstats             |      0 |
| p50380g50769__wvs2 | daystats2              |      1 |
| p50380g50769__wvs2 | xstate                 |      1 |
| p50380g50769__wvs2 | daystatsimp            |      2 |
| p50380g50769__wvs2 | v_topstats_dev         |      0 |
| p50380g50769__wvs2 | l10n                   |      1 |
| p50380g50769__wvs2 | rawstats3              |      3 |
| p50380g50769__wvs2 | v_daystats_unique      |      0 |
| p50380g50769__wvs2 | v_daystats             |      0 |
| p50380g50769__wvs2 | v_tmptop_day           |      0 |
| p50380g50769__wvs2 | xlog                   |      1 |
| p50380g50769__wvs2 | xconfig                |      1 |
| p50380g50769__wvs2 | import_status          |      1 |
| p50380g50769__wvs2 | filter                 |      3 |
| p50380g50769__wvs2 | tmp                    |      1 |
| p50380g50769__wvs2 | v_daystatsimp          |      0 |
| p50380g50769__wvs2 | rawstats1              |      1 |
| p50380g50769__wvs2 | import_grok            |      2 |
| p50380g50769__wvs2 | meta                   |      1 |
| p50380g50769__wvs2 | import_dumps           |      1 |
| p50380g50769__wvs2 | tmptop_day             |      2 |
| p50380g50769__wvs2 | v_tmptop_month         |      0 |
| p50380g50769__wvs2 | xcache                 |      1 |
| p50380g50769__wvs2 | pagemap                |     12 |
| p50380g50769__wvs2 | topstats               |      3 |
| p50380g50769__wvs2 | import_requests        |      1 |
| p50380g50769__wvs2 | v_rawstats3top         |      0 |
+--------------------+------------------------+--------+
35 rows in set (0.00 sec) 

+----------------------+---------------+--------+
| Database             | Table         | In_use |
+----------------------+---------------+--------+
| p50380g50769__wvs2ds | xlog          |      1 |
| p50380g50769__wvs2ds | _xlog_v1      |      1 |
| p50380g50769__wvs2ds | daystats_grok |      2 |
| p50380g50769__wvs2ds | daystats2     |      3 |
+----------------------+---------------+--------+
4 rows in set (0.00 sec)

Comment 3 Marc A. Pelletier 2014-09-22 19:11:51 UTC

The issue is known, and should correct itself once the database merge is complete.

Comment 4 metatron 2014-09-23 19:04:58 UTC

Database is now accessible through c3 again, but unusable because of a continuing lock.

 Waiting for table metadata lock | SELECT * FROM p50380g50769__wvs2.v_daystats Limit 10 

+----------------------+---------------+--------+
| Database             | Table         | In_use |
+----------------------+---------------+--------+
| p50380g50769__wvs2ds | _xlog_v1      |      0 |
| p50380g50769__wvs2ds | daystats_grok |      0 |
| p50380g50769__wvs2ds | xlog          |      1 |
| p50380g50769__wvs2ds | daystats2     |      1 |
+----------------------+---------------+--------+

I'm not able to unlock/see what kind of process is holding this lock.
What's that process? 

After yesterdays IRC conversation db suddenly changed to a state as it should look like if unattached/unused:

+--------------------+------------------------+--------+-
| Database           | Table                  | In_use | 
+--------------------+------------------------+--------+-
| p50380g50769__wvs2 | xcache                 |      0 | 
| p50380g50769__wvs2 | v_daystatsimp          |      0 | 
| p50380g50769__wvs2 | pagemap                |      0 | 
| p50380g50769__wvs2 | daystats2              |      0 | 
| p50380g50769__wvs2 | import_dumps           |      0 | 
| p50380g50769__wvs2 | v_import_grok          |      0 | 
| p50380g50769__wvs2 | rawstats2              |      0 | 
| p50380g50769__wvs2 | tmp                    |      0 | 
| p50380g50769__wvs2 | meta                   |      0 | 
| p50380g50769__wvs2 | topstats               |      0 | 
| p50380g50769__wvs2 | v_daystats_unique      |      0 | 
| p50380g50769__wvs2 | v_topstats             |      0 | 
| p50380g50769__wvs2 | projectmap             |      0 | 
| p50380g50769__wvs2 | v_rawstats3            |      0 | 
| p50380g50769__wvs2 | import_status          |      0 | 
| p50380g50769__wvs2 | filter                 |      0 | 
| p50380g50769__wvs2 | xstate                 |      0 | 
| p50380g50769__wvs2 | topinfo                |      0 | 
| p50380g50769__wvs2 | v_rawstats3top         |      0 | 
| p50380g50769__wvs2 | xlog                   |      0 | 
| p50380g50769__wvs2 | catmap                 |      0 | 
| p50380g50769__wvs2 | v_tmptop_day           |      0 | 
| p50380g50769__wvs2 | rawstats3              |      0 | 
| p50380g50769__wvs2 | v_tmptop_month         |      0 | 
| p50380g50769__wvs2 | tmptop_day             |      0 | 
| p50380g50769__wvs2 | l10n                   |      0 | 
| p50380g50769__wvs2 | v_daystats             |      0 | 
| p50380g50769__wvs2 | xconfig                |      0 | 
| p50380g50769__wvs2 | rawstats1              |      0 | 
| p50380g50769__wvs2 | import_grok            |      0 | 
| p50380g50769__wvs2 | import_requests        |      0 | 
| p50380g50769__wvs2 | daystatsimp            |      0 | 
| p50380g50769__wvs2 | v_daystats_unique_grok |      0 | 
| p50380g50769__wvs2 | v_topstats_dev         |      0 | 
| p50380g50769__wvs2 | tmptop_month           |      0 | 
+--------------------+------------------------+--------+-

Still not sure what was going on here, but as said, the current persistent lock is blocking db-usage.

Comment 5 Sean Pringle 2014-09-24 00:25:39 UTC

As part of this outage[1], p50380g50769__wvs2 and p50380g50769__wvs2ds had to be dumped and reloaded into a new db instance. Together they are really big and taking days to process. The dump process adds table locks for consistency.

Presently up to:

INSERT INTO `daystats2` VALUES ('2013-12-31' ...

[1] https://lists.wikimedia.org/pipermail/labs-l/2014-September/002946.html

Comment 6 metatron 2014-09-24 18:04:56 UTC

That's exactly what I feared! Coren wears sackcloth and ashes - that's indicated. A simple announcement *in advance* would have done it – like it happened in an exemplary manner for s1 and s2. 
I know that it's a big database and I also know it has been wiped out 3! times in the past without any announcement/notice/excuse...
So my hope was like: Yeah, we've learned from this; hey lads, we're going to do some maintenance; you have a big database here (the biggest on the cluster); do this and that; there may be some downtime...; none of these things.

/me shakes head and is going to reply to this labs-l posting.

Back to the databases:
- I assume daystats2 is still loading (as you mentioned before); much data still missing

- I also assume p50380g50769__wvs2.pagemap is finished (no locks, no activity)
it used to have ~190 M records:
    2014-09-19 04:04:15, Status: max pagemap, 189,651,138
Currently it has 6188! records
MariaDB [p50380g50769__wvs2]> select count(*) from pagemap;
+----------+
| count(*) |
+----------+
|     6818 |
+----------+

- I didn't perform any other consistency check yet, but as of now the whole database is in an inconsitent - and therefore unusable state.

Comment 7 Sean Pringle 2014-09-25 07:53:39 UTC

We have a full backup of p50380g50769__wvs2 and p50380g50769__wvs2ds. The loading processes were paused and adjusted to avoid the blocking table locks, and to load each month of data in parallel. More info to come.

Comment 8 Sean Pringle 2014-09-29 02:05:06 UTC

This finished loading over the weekend and should be back to normal. Double check?

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links