Last modified: 2014-10-17 15:54:59 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T72898, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 70898 - maintenance/importDump.php fails for wikidatawiki XML incremental dump files
maintenance/importDump.php fails for wikidatawiki XML incremental dump files
Status: NEW
Product: MediaWiki extensions
Classification: Unclassified
WikidataRepo (Other open bugs)
master
All All
: Normal major (vote)
: ---
Assigned To: Wikidata bugs
u=dev c=backend p=0
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-09-16 19:15 UTC by wp mirror
Modified: 2014-10-17 15:54 UTC (History)
5 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description wp mirror 2014-09-16 19:15:44 UTC
Dear Sir,

The CLI utility `maintenance/importDump.php' fails to process XML incremental data dump files for `wikidatawiki'.

mediawiki version: wmf/1.24wmf8
dataset URL: <https://dumps.wikimedia.org/other/incr/wikidatawiki/>
datasets tested: wikidatawiki-20140706-pages-meta-hist-incr.xml.bz2, through
                 wikidatawiki-20140803-pages-meta-hist-incr.xml.bz2

Even after the incremental dump file for 20140706 is split into smaller dump files each containing a single page, only about one in a hundred such single page dump files are processed successfully.
Comment 1 Andre Klapper 2014-09-17 08:52:21 UTC
If it 'fails', what is the error? And what are exact steps to reproduce?
Comment 2 wp mirror 2014-09-21 22:22:52 UTC
To reproduce:

0) set up wiki farm using wikimedia method

See <https://www.mediawiki.org/wiki/Manual:Wiki_family#Wikimedia_Method>

1) write helper script:

(rootshell)# cat /usr/share/mediawiki/maintenance/importDump_farm.php
<?php
# importDump_farm.php script
#
# Usage: /usr/bin/php /usr/share/mediawiki/maintenance/importDump_farm.php \
#        zuwiki-20121002-pages-articles-p000001000-c000001000.xml \
#        zu.wikipedia.site
#
# $argv[1] is the xchunk file-name
$_SERVER['SERVER_NAME'] = $argv[2];
#$_SERVER['DOCUMENT_ROOT'] = $argv[3]; #optional
define( 'IMPORTDUMP_FARM', true);
include('importDump.php');

2) download incremental XML data dump files (xincr)s

(rootshell)# /usr/bin/wget https://dumps.wikimedia.org/other/incr/simplewiki/20140803/simplewiki-20140803-pages-meta-hist-incr.xml.bz2
(rootshell)# /usr/bin/wget https://dumps.wikimedia.org/other/incr/wikidatawiki/20140803/wikidatawiki-20140803-pages-meta-hist-incr.xml.bz2

3) import into database

(rootshell)# /usr/bin/php /usr/share/wp-mirror-mediawiki/maintenance/importDump_farm.php simplewiki-20140803-pages-meta-hist-incr.xml.bz2 simple.wikipedia.site
100 (8.01 pages/sec 12.58 revs/sec)
100 (7.40 pages/sec 11.69 revs/sec)
200 (10.14 pages/sec 15.42 revs/sec)
Done!
You might want to run rebuildrecentchanges.php to regenerate RecentChanges

(rootshell)# /usr/bin/php /usr/share/wp-mirror-mediawiki/maintenance/importDump_farm.php wikidatawiki-20140803-pages-meta-hist-incr.xml.bz2 www.wikidata.site              
[4a104de5] [no req]   Exception from line 1324 of /usr/share/wp-mirror-mediawiki/extensions/Wikidata/extensions/Wikibase/repo/Wikibase.hooks.php: To avoid ID conflicts, the import of Wikibase entities is currently not supported.
Backtrace:
#0 [internal function]: Wikibase\RepoHooks::onImportHandleRevisionXMLTag(WikiImporter, array, array)
#1 /usr/share/wp-mirror-mediawiki/includes/Hooks.php(206): call_user_func_array(string, array)
#2 /usr/share/wp-mirror-mediawiki/includes/GlobalFunctions.php(4056): Hooks::run(string, array, NULL)
#3 /usr/share/wp-mirror-mediawiki/includes/Import.php(690): wfRunHooks(string, array)
#4 /usr/share/wp-mirror-mediawiki/includes/Import.php(654): WikiImporter->handleRevision(array)
#5 /usr/share/wp-mirror-mediawiki/includes/Import.php(507): WikiImporter->handlePage()
#6 /usr/share/wp-mirror-mediawiki/maintenance/importDump.php(298): WikiImporter->doImport()
#7 /usr/share/wp-mirror-mediawiki/maintenance/importDump.php(256): BackupReader->importFromHandle(resource)
#8 /usr/share/wp-mirror-mediawiki/maintenance/importDump.php(102): BackupReader->importFromFile(string)
#9 /usr/share/wp-mirror-mediawiki/maintenance/doMaintenance.php(109): BackupReader->execute()
#10 /usr/share/wp-mirror-mediawiki/maintenance/importDump.php(303): require_once(string)
#11 /usr/share/wp-mirror-mediawiki/maintenance/importDump_farm.php(12): include(string)
#12 {main}
Comment 3 Andre Klapper 2014-09-22 11:00:31 UTC
Hmm, Wikibase entities... adding mailinglist to CC field.
Comment 4 Marius Hoch 2014-09-23 12:56:32 UTC
We prevent this in Wikibase as importing Wikibase content usually doesn't work because entities are being referred to by entity ids, which probably don't exist or don't contain the wanted content (see bug 63228). That of course doesn't apply in case you have *all* other entities from the Wiki you're importing from (Wikidata) already...

Maybe we want make it possible to import Wikibase content via shell?

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links