Last modified: 2014-10-28 11:36:16 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T74348, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 72348 - Wikidata dumps contain old-style serialization.
Wikidata dumps contain old-style serialization.
Status: NEW
Product: Datasets
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: High major (vote)
: ---
Assigned To: Ariel T. Glenn
u=dev c=backend p=0
:
: 72613 (view as bug list)
Depends on: 72361 72478
Blocks:
  Show dependency treegraph
 
Reported: 2014-10-22 09:43 UTC by Daniel Kinzler
Modified: 2014-10-28 11:36 UTC (History)
8 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Daniel Kinzler 2014-10-22 09:43:34 UTC
Some time ago, we changed the serialization format of wikidata items. For consistency, we implemented on-the-fly conversion to the new format in the exporter (using the ContentHandler::exportTransform facility). 

This seems to work fine with Special:Export, and when I try it with dumpBackup.php locally. However, the dumps like wikidatawiki-20141009-pages-articles.xml.bz2 still contain revisions with the old style format, both . 

Is this because new revisions get stitched into old dumps? That's the only explanation I currently have. If this is the case, how do we reset this, so all revisions get re-exported? If this is not the case, how can we investigate what is going wrong?

One alternative explanation would be if the host that generates the dump was running an old version of wikibase, I suppose.
Comment 1 Daniel Kinzler 2014-10-22 09:47:24 UTC
Bumping to critical, since it may result in data loss for clients that cannot process the old style format. We really do not want them to implement that, we changed for a reason...


Btw: In order to check for old style serializations, grep for "entity". To detect new style serialization, check for "descriptions" (plural).
Comment 3 John Mark Vandenberg 2014-10-22 11:30:00 UTC
Just confirming, this only applies to XML dumps, and not the new JSON dumps?
Comment 4 Daniel Kinzler 2014-10-22 15:52:35 UTC
The reason seems to be backupTextPass.inc, see bug 72361.
Comment 5 tobias.gritschacher 2014-10-28 09:37:21 UTC
*** Bug 72613 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links