Last modified: 2014-11-02 18:08:54 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T74886, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 72886 - Recent XML dump files break mwxml2sql
Recent XML dump files break mwxml2sql
Status: RESOLVED DUPLICATE of bug 66663
Product: Datasets
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: Unprioritized blocker (vote)
: ---
Assigned To: Ariel T. Glenn
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-11-02 18:04 UTC by wp mirror
Modified: 2014-11-02 18:08 UTC (History)
1 user (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description wp mirror 2014-11-02 18:04:51 UTC
Dear Sir or Madam,

0) Context

`mwxml2sql' is a utility for rapidly converting published XML dump files into SQL files for the `page', `revision', and `text' tables. These SQL files may then be rapidly imported into a database.

1) Breaking change in XML dump file schema

XML dump files using schema `export-0.8.xsd' are processed by `mwxml2sql'.
XML dump files using schema `export-0.9.xsd' break `mwxml2sql'.

2) Example of error

(shell)$ rsync ftpmirror.your.org::wikimedia-dumps/simplewiki/20141025/simplewiki-20141025-pages-meta-current.xml.bz2 .
(shell)$ rsync ftpmirror.your.org::wikimedia-dumps/simplewiki/20141025/simplewiki-20141025-stub-meta-current.xml.gz
(shell)$ bzcat simplewiki-20141025-pages-meta-current.xml.bz2 | head -n 1
<mediawiki xmlns="http://www.mediawiki.org/xml/export-0.9/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.9/ http://www.mediawiki.org/xml/export-0.9.xsd" version="0.9" xml:lang="en">
(shell)$ /usr/bin/mwxml2sql --stubs simplewiki-20141025-stub-meta-current.xml.gz --text simplewiki-20141025-pages-meta-current.xml.bz2 --mysqlfile simplewiki-20141025.gz --mediawiki 1.24 2>&1
WHINE: (none) no end siteinfo tag

WHINE: (none) no end siteinfo tag

3) Recent dumps

Wiki       Date     Schema mwxml2sql
simplewiki/20140220 0.8    OK
simplewiki/20140723 0.8    OK
simplewiki/20140814 0.8    OK
simplewiki/20140903 0.9    fail
simplewiki/20140927 0.9    fail
simplewiki/20141025 0.9    fail

Sincerely Yours,
Kent
Comment 1 Sam Reed (reedy) 2014-11-02 18:08:54 UTC

*** This bug has been marked as a duplicate of bug 66663 ***

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links