Last modified: 2013-12-19 13:15:57 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T59642, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 57642 - Replicated database of dewiki is corrupted
Replicated database of dewiki is corrupted
Status: RESOLVED FIXED
Product: Wikimedia Labs
Classification: Unclassified
tools (Other open bugs)
unspecified
All All
: Highest critical
: ---
Assigned To: Marc A. Pelletier
:
: 57645 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-11-27 10:05 UTC by Liangent
Modified: 2013-12-19 13:15 UTC (History)
13 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Liangent 2013-11-27 10:05:24 UTC
max(rc_timestamp) stuck at 20131126202307.
Comment 1 Liangent 2013-11-27 11:15:08 UTC
Looks like http://lists.wikimedia.org/pipermail/labs-l/2013-November/001883.html but I can't find another bug filed for this.
Comment 2 Christian Thiele 2013-11-28 18:17:44 UTC
Replication of wikidatawiki and dewiki startet again yesterday, but than stopped again.

max(rctimestamp) is 20131127003303

for both: wikidatawiki and dewiki
Comment 3 jeremyb 2013-11-29 01:50:08 UTC
*** Bug 57645 has been marked as a duplicate of this bug. ***
Comment 4 Christian Thiele 2013-11-30 14:53:33 UTC
Replication for dewiki seems to be working now, but still two third of all revisions are missing.

SELECT count(*) FROM revision;
-> 41,249,221 (Special:Statistics: 130,634,662)

SELECT count(*) FROM page WHERE page_namespace=0;
-> 2,777,194 (~correct)

SELECT count(*) FROM page, revision WHERE page_latest=rev_id AND page_namespace=0;
-> 374,962 (!)
Comment 5 Liangent 2013-12-01 12:22:28 UTC
(In reply to comment #4)
> Replication for dewiki seems to be working now, but still two third of all
> revisions are missing.

wikidatawiki seems not affected.

MariaDB [wikidatawiki_p]> SELECT count(*) FROM page WHERE page_namespace=0;
+----------+
| count(*) |
+----------+
| 14117708 |
+----------+
1 row in set (3.12 sec)

MariaDB [wikidatawiki_p]> SELECT count(*) FROM page, revision WHERE page_latest=rev_id AND page_namespace=0;
+----------+
| count(*) |
+----------+
| 14117708 |
+----------+
1 row in set (6 min 46.84 sec)
Comment 6 Yellowcard 2013-12-02 12:46:22 UTC
Still many tools on Tool Labs are broken due to this bug. Please fix as soon as possible.

See also discussion in de.wikipedia:
https://de.wikipedia.org/wiki/Wikipedia_Diskussion:Kurier#Toolserver.2FLabs-Probleme

I changed back the priority to "Highest" as a fix within "one to six months" is way too slow. It actually makes Tool Labs currently not usable for German tool users (and coders).
Comment 7 Erik Moeller 2013-12-02 18:23:37 UTC
Hi Andre, when you notice database issues, please CC Sean Pringle for investigation (doing so now).
Comment 8 Sean Pringle 2013-12-02 23:40:27 UTC
The issue is on labsdb1002 and is a flow-on effect from this incident http://lists.wikimedia.org/pipermail/labs-l/2013-November/001883.html . The labsdb1002:3308 dewiki.revision table is still being synced from upstream by pt-table-sync and it is affecting replication. Context:

- Originally replication was stopped completely and a full dump/restore from upstream dewiki was done, however labsdb1002:3308 mysqld crashed in the process (see below). The revision table was only partially restored.

- To avoid blatting labs user data with a full rebuild affecting all wikis, I switched to using pt-table-sync with replication on the weekend to bring revision back up to full row count.

However labsdb1002 has since crashed again with the kernel OOM killer sniping mysqld:3308. The sync process is batched and low footprint (where the dump method was not) but other labsdb txns must still be slowed down enough to add up to an infrequent mem usage spike.

Therefore yesterday I reduced the InnoDB buffer pool size for all three labsdb1002 mysqld instances by 25%. OOM killer has not struck since and based on row counts the dewiki.revision sync process should resolve within the next 12h.
Comment 9 Giftpflanze 2013-12-03 17:47:23 UTC
Do you also repair the externallinks table which is also incomplete?
Comment 10 metatron 2013-12-03 22:10:28 UTC
Current water line:

MariaDB [dewiki_p]> SELECT count(*) FROM revision union SELECT count(*) FROM archive;
+-----------+
| count(*)  |
+-----------+
| 114264012 |
|  10414140 |
+-----------+
2 rows in set (26.56 sec)

So sum is:              124,678,152
API site stats report:  130.742.732
Difference:              −6,064,580

Has replication already finished? If yes, how can this difference be explained?
Comment 11 Marc A. Pelletier 2013-12-03 23:13:44 UTC
Toolserver doesn't remove some lines that it should, which are in fact removed on Tool Labs; mostly having to do with revision deletion and suppression (so the data wouldn't be available anyways).
Comment 12 Christian Thiele 2013-12-05 14:50:59 UTC
There is still revisions missing for dewiki (e.g. 125050920, 125087630, 125137961...).

flaggedpages has errors. Example:

MariaDB [dewiki_p]> SELECT * FROM flaggedpages WHERE fp_page_id=8507;
+------------+-------------+-----------+------------+------------------+
| fp_page_id | fp_reviewed | fp_stable | fp_quality | fp_pending_since |
+------------+-------------+-----------+------------+------------------+
|       8507 |           1 | 125054895 |          0 | NULL             |
+------------+-------------+-----------+------------+------------------+
1 row in set (0.03 sec)

But 125054895 isn't the current stable version, it should be 125144014, see http://de.wikipedia.org/w/index.php?title=Reformation&action=history
Comment 13 Christian Thiele 2013-12-08 16:27:56 UTC
Replication stopped again for dewiki:

MariaDB [dewiki_p]> SELECT max(rc_timestamp) FROM recentchanges;                                                                                                          
+-------------------+
| max(rc_timestamp) |
+-------------------+
| 20131207135755    |
+-------------------+
1 row in set (0.04 sec)
Comment 14 Christian Thiele 2013-12-09 13:37:44 UTC
Replication for dewiki works again, flaggedpages seems to be correct, too.

But there are still missing revisions in the revision table: I don't know if these are the six million stated above (SELECT (SELECT count(*) FROM revision)+(SELECT count(*) FROM archive); vs. Special:Statistics), but there are missing revisions. Examples:

125050920 - https://de.wikipedia.org/?oldid=125050920

MariaDB [dewiki_p]> SELECT * FROM revision WHERE rev_id=125050920;
Empty set (0.04 sec)

Same for 125087630 and 125137961. These are all revisions from December 2nd or December 6th.

MariaDB [dewiki_p]> SELECT count(*) FROM page WHERE page_namespace=0;
+----------+
| count(*) |
+----------+
|  2782237 |
+----------+
1 row in set (0.73 sec)

MariaDB [dewiki_p]> SELECT count(*) FROM page, revision WHERE
    -> page_latest=rev_id AND page_namespace=0;
+----------+
| count(*) |
+----------+
|  2781856 |
+----------+
1 row in set (22.13 sec)

These two numbers should be the same.
Comment 15 Christian Thiele 2013-12-12 13:24:12 UTC
The revisions from Comment 14 are back, but there are still several issues with the dewiki database. Maybe it's possible to do a "full comparison" or something like that?

Three examples:

The Talk page of "Clear_Cola" has the page_id 8005309 and page_latest is 67643465. This revision exists in the revision table, but rev_page for this revision is 4934436, which should be 8005309. page_id 4934436 does not exist.

The article "Morrill_Gesetz" (page_id 8004783) was deleted three days ago. The revisions are gone, but the article is still in the page table.

page_latest for the article "Boris_Zemelman" (page_id 8005384) is 125330034, this revision is missing from the revision table.
Comment 16 Silke Meyer (WMDE) 2013-12-17 11:56:32 UTC
Could we please have an update on this?
Comment 17 Sean Pringle 2013-12-17 13:31:24 UTC
The dewiki database on labs was dumped and reloaded with the buffer pool still reduced in size as per comment 8 -- the earlier resync process was too slow.

Things should be back to normal, at least to the point of consistency with the upstream sanitarium after data redaction.
Comment 18 Christian Thiele 2013-12-19 13:15:57 UTC
Looks good now. I can't see any of the above mentioned errors anymore. Marking this as resolved.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links