Last modified: 2014-10-28 11:36:16 UTC
Some time ago, we changed the serialization format of wikidata items. For consistency, we implemented on-the-fly conversion to the new format in the exporter (using the ContentHandler::exportTransform facility). This seems to work fine with Special:Export, and when I try it with dumpBackup.php locally. However, the dumps like wikidatawiki-20141009-pages-articles.xml.bz2 still contain revisions with the old style format, both . Is this because new revisions get stitched into old dumps? That's the only explanation I currently have. If this is the case, how do we reset this, so all revisions get re-exported? If this is not the case, how can we investigate what is going wrong? One alternative explanation would be if the host that generates the dump was running an old version of wikibase, I suppose.
Bumping to critical, since it may result in data loss for clients that cannot process the old style format. We really do not want them to implement that, we changed for a reason... Btw: In order to check for old style serializations, grep for "entity". To detect new style serialization, check for "descriptions" (plural).
See also https://lists.wikimedia.org/pipermail/wikidata-l/2014-October/004843.html
Just confirming, this only applies to XML dumps, and not the new JSON dumps?
The reason seems to be backupTextPass.inc, see bug 72361.
*** Bug 72613 has been marked as a duplicate of this bug. ***