Last modified: 2011-09-18 07:43:52 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T31846, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 29846 - on lucid, processing some prefetch history files die with "huge text node: out of memory"
on lucid, processing some prefetch history files die with "huge text node: ou...
Status: RESOLVED FIXED
Product: Datasets
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: Normal normal (vote)
: ---
Assigned To: Ariel T. Glenn
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-07-12 15:49 UTC by Ariel T. Glenn
Modified: 2011-09-18 07:43 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Ariel T. Glenn 2011-07-12 15:49:29 UTC
processing of for example 
/usr/bin/php -q /apache/common/php-1.17/maintenance/dumpTextPass.php --wiki=enwiki --stub=gzip:/mnt/data/xmldatadumps/public/enwiki/20110620/enwiki-20110620-stub-meta-history16.xml.gz --prefetch=bzip2:/mnt/data/xmldatadumps/public/enwiki/20110405/enwiki-20110405-pages-meta-history8.xml.bz2 --force-normal --report=1000 --spawn=/usr/bin/php --output=bzip2:/mnt/data/xmldatadumps/public/enwiki/20110620/enwiki-20110620-pages-meta-history16.xml.bz2 --full

produces:

PHP Warning:  XMLReader::read(): compress.bzip2:///mnt/data/xmldatadumps/public/enwiki/20110405/enwiki-20110405-pages-meta-history8.xml.bz2:264777585: error: xmlSAX2Characters: huge text node: out of memory in /usr/local/apache/common-local/php-1.17/maintenance/backupPrefetch.inc on line 126
Warning: XMLReader::read(): compress.bzip2:///mnt/data/xmldatadumps/public/enwiki/20110405/enwiki-20110405-pages-meta-history8.xml.bz2:264777585: error: xmlSAX2Characters: huge text node: out of memory in /usr/local/apache/common-local/php-1.17/maintenance/backupPrefetch.inc on line 126
PHP Warning:  XMLReader::read(): er CA}}{{User CA}}{{User CA}}{{User CA}}{{User CA}}{{User CA}}{{User CA}}{{User  in /usr/local/apache/common-local/php-1.17/maintenance/backupPrefetch.inc on line 126

and then the bzip reader of the prefetch file dies.
Comment 1 Ariel T. Glenn 2011-07-12 15:53:07 UTC
Versions of libxml from 2.7.3 on have a 10mb cap to the size of a text node.  The revision in the above file where it chokes is revid 39456798, pageid 3976790, length 10369813. (I guess in 2006 you could still sneak in really huge text.)  

This can be overridden by passing the option  LIBXML_PARSEHUGE to XMLReader::open when the constant is defined.

This was committed for Import.php and backupPrefetch.inc in trunk in r91967.  I don't know if we are going to need this in a writer someplace so I'm leaving this bug open for now.
Comment 2 Ariel T. Glenn 2011-09-18 07:43:52 UTC
I'm running this code in production and en pedia dumps are happy, so closing.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links