Last modified: 2013-04-22 16:16:14 UTC
I used mwdumper from https://integration.mediawiki.org/ci/job/MWDumper/org.wikimedia$mwdumper/ and I got this: java -client -classpath mwdumper-1.16.jar;mysql-connector-java-5.1.22\mysql-connector-java-5.1.22-bin.jar;commons-compress-1.4.1.jar org.mediawiki.dumper.Dumper "--output=mysql://127.0.0.1/huwiki?user=user&password=password" --format=sql:1.5 huwiki-latest-pages-meta-current.xml.bz2 Exception in thread "main" java.io.IOException: Stream is not in the BZip2 format at org.apache.commons.compress.compressors.bzip2.BZip2CompressorInputStream.init(BZip2CompressorInputStream.java :255) at org.apache.commons.compress.compressors.bzip2.BZip2CompressorInputStream.<init>(BZip2CompressorInputStream.ja va:138) at org.apache.commons.compress.compressors.bzip2.BZip2CompressorInputStream.<init>(BZip2CompressorInputStream.ja va:111) at org.mediawiki.dumper.Tools.openBZip2Stream(Tools.java:42) at org.mediawiki.dumper.Tools.openInputFile(Tools.java:28) at org.mediawiki.dumper.Dumper.main(Dumper.java:124)
I think is https://gerrit.wikimedia.org/r/gitweb?p=mediawiki/tools/mwdumper.git;a=blob;f=src/org/mediawiki/dumper/Tools.java;hb=HEAD#l38 because bzip2 also checks for header http://commons.apache.org/compress/apidocs/src-html/org/apache/commons/compress/compressors/bzip2/BZip2CompressorInputStream.html#line.247
gerrit Ia6c811e8
I was wondered about that in the past also, but I remind me, that this check was not inside BZip2CompressorInputStream, maybe this was fixed in the past 2 years. You will find the same issue on [1], but I am not sure, which is the right way to read a bz2 data. [1] https://trac.openstreetmap.org/browser/applications/utils/osmosis/trunk/src/org/openstreetmap/osmosis/core/xml/common/CompressionActivator.java?rev=18986
Found more about that: http://lists.openstreetmap.org/pipermail/osmosis-dev/2009-December/000396.html https://issues.apache.org/jira/browse/COMPRESS-69
Note that all this was before the release of commons-compress 1.0. mwdumper originally contained a few copied classes from commons-compress. These required the caller to check for "BZ" before decompressing. (Compressing would, however, put the "BZ" into the output.) On 2012-02-10, that was removed from mwdumper, and it was changed to use the normal commons-compress Bzip2 compressor. That one has, since 1.0, always checked itself for "BZ" when decompressing. http://svn.apache.org/viewvc?view=revision&revision=764502 change 2009-04-13 http://archive.apache.org/dist/commons/compress/binaries/ V1.0 2009-05-21 Looks like removing this manual check for "BZ" was just forgotten.