Last modified: 2014-02-20 23:31:25 UTC
Trying to import a full-revision dump of Uncyclopedia into a clean 1.20 install, it ran out of memory and crashed a short way in (17303 revisions): php maintenance/importDump.php --memory-limit=500M pages_full.xml.gz PHP Fatal error: Allowed memory size of 524288000 bytes exhausted (tried to allocate 131072 bytes) in /var/www/mediawiki/core/includes/objectcache/SqlBagOStuff.php on line 517 Fatal error: Allowed memory size of 524288000 bytes exhausted (tried to allocate 131072 bytes) in /var/www/mediawiki/core/includes/objectcache/SqlBagOStuff.php on line 517 It was also running at something like 2 revisions/second, though I dunno if that had anything to do with anything.
> full-revision dump of Uncyclopedia How big is that?
~6GB compressed. Also crashed for ?pedia, which is only ~100MB, though.
Same bug seen in MediaWiki 1.23-HEAD, importing from a recursive dump of the mediawiki.org/Template: namespace. The resulting XML file is only 6.8MB, but the memory used to import seems to go up superlinearly, at over 90KB/revision. There are memory leaks like a floating cardboard box.
I just tried it with the most recent version an it works for me. The Maintemamce.php scrip just passes whatever you specify as $limit to php via ini_set( 'memory_limit', $limit );
physikerwelt: Can you let us know roughly what size your target wiki and output file were? Also, your PHP version would be helpful... And, if you are passing a new memory_limit, what do you specify? The bug isn't that it's impossible to run the dump script, it's about a memory leak which causes rapid memory exhaustion on even small data sets.
I used the most recent vagrant version. I assigned 8G main memory and 8 cores to the VM. The dataset was 500mb a sample from the most recent version of enwiki (all pages that contain math). I set the main memory limit to 8G which would have been basically the same as max. And that migh be important I used the --no-updates flag. Can you post your dataset?
I think I recall it working with the --no-updates flag since then as well. So if this is still broken, the bug may just be in how it handles updates. If this is the case, maybe just having it always run without updates would be in order - then have the option to run the appropriate scripts when it's done or something.
(In reply to Isarra from comment #7) > I think I recall it working with the --no-updates flag since then as well. > So if this is still broken, the bug may just be in how it handles updates. > > If this is the case, maybe just having it always run without updates would > be in order - then have the option to run the appropriate scripts when it's > done or something. Can you give me a pointer to the dataset. I'd like to test, how much memory you need. Maybe 500M is just not enough for a complex nested structure. I tend to write a note on that on the manpage rather than changing the code. But it's just a first guess.
Well, there's this: http://dump.zaori.org/20121114_uncy_en_pages_full.xml.gz That's the file I was trying to import when I originally filed this bug, I believe, though it's probably not the best thing to test on due to its being enormous.