Last modified: 2014-10-17 15:54:59 UTC
Dear Sir, The CLI utility `maintenance/importDump.php' fails to process XML incremental data dump files for `wikidatawiki'. mediawiki version: wmf/1.24wmf8 dataset URL: <https://dumps.wikimedia.org/other/incr/wikidatawiki/> datasets tested: wikidatawiki-20140706-pages-meta-hist-incr.xml.bz2, through wikidatawiki-20140803-pages-meta-hist-incr.xml.bz2 Even after the incremental dump file for 20140706 is split into smaller dump files each containing a single page, only about one in a hundred such single page dump files are processed successfully.
If it 'fails', what is the error? And what are exact steps to reproduce?
To reproduce: 0) set up wiki farm using wikimedia method See <https://www.mediawiki.org/wiki/Manual:Wiki_family#Wikimedia_Method> 1) write helper script: (rootshell)# cat /usr/share/mediawiki/maintenance/importDump_farm.php <?php # importDump_farm.php script # # Usage: /usr/bin/php /usr/share/mediawiki/maintenance/importDump_farm.php \ # zuwiki-20121002-pages-articles-p000001000-c000001000.xml \ # zu.wikipedia.site # # $argv[1] is the xchunk file-name $_SERVER['SERVER_NAME'] = $argv[2]; #$_SERVER['DOCUMENT_ROOT'] = $argv[3]; #optional define( 'IMPORTDUMP_FARM', true); include('importDump.php'); 2) download incremental XML data dump files (xincr)s (rootshell)# /usr/bin/wget https://dumps.wikimedia.org/other/incr/simplewiki/20140803/simplewiki-20140803-pages-meta-hist-incr.xml.bz2 (rootshell)# /usr/bin/wget https://dumps.wikimedia.org/other/incr/wikidatawiki/20140803/wikidatawiki-20140803-pages-meta-hist-incr.xml.bz2 3) import into database (rootshell)# /usr/bin/php /usr/share/wp-mirror-mediawiki/maintenance/importDump_farm.php simplewiki-20140803-pages-meta-hist-incr.xml.bz2 simple.wikipedia.site 100 (8.01 pages/sec 12.58 revs/sec) 100 (7.40 pages/sec 11.69 revs/sec) 200 (10.14 pages/sec 15.42 revs/sec) Done! You might want to run rebuildrecentchanges.php to regenerate RecentChanges (rootshell)# /usr/bin/php /usr/share/wp-mirror-mediawiki/maintenance/importDump_farm.php wikidatawiki-20140803-pages-meta-hist-incr.xml.bz2 www.wikidata.site [4a104de5] [no req] Exception from line 1324 of /usr/share/wp-mirror-mediawiki/extensions/Wikidata/extensions/Wikibase/repo/Wikibase.hooks.php: To avoid ID conflicts, the import of Wikibase entities is currently not supported. Backtrace: #0 [internal function]: Wikibase\RepoHooks::onImportHandleRevisionXMLTag(WikiImporter, array, array) #1 /usr/share/wp-mirror-mediawiki/includes/Hooks.php(206): call_user_func_array(string, array) #2 /usr/share/wp-mirror-mediawiki/includes/GlobalFunctions.php(4056): Hooks::run(string, array, NULL) #3 /usr/share/wp-mirror-mediawiki/includes/Import.php(690): wfRunHooks(string, array) #4 /usr/share/wp-mirror-mediawiki/includes/Import.php(654): WikiImporter->handleRevision(array) #5 /usr/share/wp-mirror-mediawiki/includes/Import.php(507): WikiImporter->handlePage() #6 /usr/share/wp-mirror-mediawiki/maintenance/importDump.php(298): WikiImporter->doImport() #7 /usr/share/wp-mirror-mediawiki/maintenance/importDump.php(256): BackupReader->importFromHandle(resource) #8 /usr/share/wp-mirror-mediawiki/maintenance/importDump.php(102): BackupReader->importFromFile(string) #9 /usr/share/wp-mirror-mediawiki/maintenance/doMaintenance.php(109): BackupReader->execute() #10 /usr/share/wp-mirror-mediawiki/maintenance/importDump.php(303): require_once(string) #11 /usr/share/wp-mirror-mediawiki/maintenance/importDump_farm.php(12): include(string) #12 {main}
Hmm, Wikibase entities... adding mailinglist to CC field.
We prevent this in Wikibase as importing Wikibase content usually doesn't work because entities are being referred to by entity ids, which probably don't exist or don't contain the wanted content (see bug 63228). That of course doesn't apply in case you have *all* other entities from the Wiki you're importing from (Wikidata) already... Maybe we want make it possible to import Wikibase content via shell?