Last modified: 2011-09-18 07:43:52 UTC
processing of for example /usr/bin/php -q /apache/common/php-1.17/maintenance/dumpTextPass.php --wiki=enwiki --stub=gzip:/mnt/data/xmldatadumps/public/enwiki/20110620/enwiki-20110620-stub-meta-history16.xml.gz --prefetch=bzip2:/mnt/data/xmldatadumps/public/enwiki/20110405/enwiki-20110405-pages-meta-history8.xml.bz2 --force-normal --report=1000 --spawn=/usr/bin/php --output=bzip2:/mnt/data/xmldatadumps/public/enwiki/20110620/enwiki-20110620-pages-meta-history16.xml.bz2 --full produces: PHP Warning: XMLReader::read(): compress.bzip2:///mnt/data/xmldatadumps/public/enwiki/20110405/enwiki-20110405-pages-meta-history8.xml.bz2:264777585: error: xmlSAX2Characters: huge text node: out of memory in /usr/local/apache/common-local/php-1.17/maintenance/backupPrefetch.inc on line 126 Warning: XMLReader::read(): compress.bzip2:///mnt/data/xmldatadumps/public/enwiki/20110405/enwiki-20110405-pages-meta-history8.xml.bz2:264777585: error: xmlSAX2Characters: huge text node: out of memory in /usr/local/apache/common-local/php-1.17/maintenance/backupPrefetch.inc on line 126 PHP Warning: XMLReader::read(): er CA}}{{User CA}}{{User CA}}{{User CA}}{{User CA}}{{User CA}}{{User CA}}{{User in /usr/local/apache/common-local/php-1.17/maintenance/backupPrefetch.inc on line 126 and then the bzip reader of the prefetch file dies.
Versions of libxml from 2.7.3 on have a 10mb cap to the size of a text node. The revision in the above file where it chokes is revid 39456798, pageid 3976790, length 10369813. (I guess in 2006 you could still sneak in really huge text.) This can be overridden by passing the option LIBXML_PARSEHUGE to XMLReader::open when the constant is defined. This was committed for Import.php and backupPrefetch.inc in trunk in r91967. I don't know if we are going to need this in a writer someplace so I'm leaving this bug open for now.
I'm running this code in production and en pedia dumps are happy, so closing.