Last modified: 2008-08-01 02:00:02 UTC
It appears that recently some edits on the English Wikipedia (possibly elsewhere too?) have resulted in revisions that are blank or contain text from other, unrelated pages. Oddly, the byte count reported in the page history (based on the rev_len field), as well the corresponding information in the recentchanges table, match the content that _should've_ been there. For example, the revision http://en.wikipedia.org/w/index.php?title=Talk:Pikachu&oldid=227969847 is blank, even though the page history reports its length as 22,396 bytes. See also discussion at: http://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28technical%29#Bug:_revisions.2Fpagesizes.2Fpagerendering.2Fwikisource_not_matching_up.2C_resulting_in_blanking_or_page_replacements http://en.wikipedia.org/wiki/Wikipedia:Administrators%27_noticeboard/Incidents#SYSTEM_BUG:_rollback_replaced_a_page_by_an_irrelevant_page_instead_of_reverting I'm marking this as critical in case this is a symptom of more serious database corruption. Feel free to downgrade if it turns out to be something more benign.
from http://toolserver.org/~amidaniel/chanlogs/%23mediawiki/20080726.txt -> [09:35:22] <Sadik_Khalid> Hi, when I tried to edit this page (http://ml.wikipedia.org/wiki/%E0%B4%B2%E0%B5%82%E0%B4%AF%E0%B4%BF_%E0%B4%AA%E0%B4%BE%E0%B4%B8%E0%B5%8D%E0%B4%9A%E0%B4%B0%E0%B5%8D%E2%80%8D) I am getting Egypt page (http://ml.wikipedia.org/wiki/Egypt) [09:37:45] <Sadik_Khalid> History page don't mach with the content of the article
Changing title since this occurs outside enwiki.
Possibly related to bug 14930
Also may be related to the recent ext. storage problems on one cluster (https://wikitech.leuksman.com/view/Server_admin_log)
OK, I can't find any relevant software changes. I'm almost sure this is due to the above issue. As things are now, as of now, no *new* edits should be recorded wrongly anymore.
This happened due to a master switch on the external storage cluster. Apparently, the new master didn't have an up-to-date replica of the master, a few records were missing. Due to this, the same text IDs were used twice. The edits saved on the old master that were not replicated to the new master are lost, no way to get them back. I have to close this bug as "FIXED" because there's no "CANTFIX"
It wasn't fixed. srv104 still had an old copy of the configuration (because it's not reachable by ssh), and so it was still writing blobs to srv101. I've taken srv104 out of LVS rotation now. Maybe we'll be able to recover the edits from srv101 at some point, but it looks like it might be hanging on I/O now.
Occured here as well: http://no.wikipedia.org/w/index.php?title=Vinterkrigen&diff=next&oldid=4096373
I can confirm this on nl.wikipedia to. See e.g. http://nl.wikipedia.org/w/index.php?title=Yang_Yaozu&diff=13286529&oldid=13139324 In the recent changes this revision have added 15 bytes, but the page is empty: http://nl.wikipedia.org/w/index.php?title=Yang_Yaozu&action=edit&oldid=13286529 See also http://nl.wikipedia.org/w/index.php?title=Yang_Yaozu&action=history (2.159 bytes vs. 2.144 bytes).
Ok I didn't saw it was fixed.
Still happening: http://es.wikipedia.org/w/index.php?title=Wikipedia:Caf%C3%A9/Portal/Archivo/Propuestas/Actual&diff=19106298&oldid=19104167 Next two edits (both trying to revert it) also failed: http://es.wikipedia.org/w/index.php?title=Wikipedia:Caf%C3%A9/Portal/Archivo/Propuestas/Actual&diff=next&oldid=19106298 http://es.wikipedia.org/w/index.php?title=Wikipedia:Caf%C3%A9/Portal/Archivo/Propuestas/Actual&diff=next&oldid=19106364
Unsure if related, but these do not show the revision #798283: * http://meta.wikimedia.org/w/index.php?title=Help:Magic_words&oldid=798283 * http://meta.wikimedia.org/w/index.php?title=-&oldid=798283 * http://meta.wikimedia.org/w/index.php?title=Help:Magic_words&diff=prev&oldid=798283 And yet, these do (sort of): * http://meta.wikimedia.org/w/index.php?title=Help:Magic_words&diff=798284&oldid=798283 * http://meta.wikimedia.org/w/api.php?action=query&prop=revisions&revids=798283&rvprop=size|content Although, Per VP/T Tim said: > It looks like the anomalous blank revisions are just cache pollution, and will > fix themselves when the cache expires in a week. The revisions that show the > wrong article are due to database corruption, and will need to be fixed manually.
This edit is attributed to my bot http://commons.wikimedia.org/w/index.php?title=Image%3AHyena_pup.jpg&diff=13062289&oldid=12189366 But it is pretty much impossible that the bot performed it (nothing remotely similar to CopyVio tagging is in the source code). Might be due to the same server issue, although the nature of the glitch seems different from the ones reported.
(In reply to comment #13) > But it is pretty much impossible that the bot performed it (nothing remotely > similar to CopyVio tagging is in the source code). > > Might be due to the same server issue, although the nature of the glitch seems > different from the ones reported. Also note that the length reported in the history is larger than the edit. I understand this happens becaouse the write goes to the false master and then the real one reuses the same revision id. Probably we could find between the deleted revisions at a similar time, another with that same content. Another magic blanking: http://es.wikipedia.org/w/index.php?title=Wikipedia:Vandalismo_en_curso&diff=19107113&oldid=19107017
Should be fixed as of July 30, 03:00 UTC. Initially, ordinary edits processed by srv101/srv104 polluted the revision cache, which has an expiry of one week. This was identified and fixed (without me ever seeing this bug report) on July 27, by removing those servers from HTTP LVS. However, they continued to run the job queue, and refreshLinks jobs would have continued to pollute the revision cache. This was fixed on July 30, by firewalling srv101/104 from all core DB servers.
I'm running a script to fix the revision cache. This will make the old revision view and old revision edit work properly. Any broken diffs will have to be fixed manually by appending &action=purge to the diff URL.
Note that the script only affects page blankings (which are due to cache pollution), not replacement with unrelated text, which is due to corruption of the core DB with incorrect text rows referencing blob_ids on the old cluster17 master, srv101.