Last modified: 2014-02-20 23:31:25 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T44095, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 42095 - importDump.php crashes with out of memory error
importDump.php crashes with out of memory error
Status: RESOLVED WORKSFORME
Product: MediaWiki
Classification: Unclassified
Maintenance scripts (Other open bugs)
1.20.x
All All
: High normal with 1 vote (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-11-14 06:18 UTC by Isarra
Modified: 2014-02-20 23:31 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Isarra 2012-11-14 06:18:45 UTC
Trying to import a full-revision dump of Uncyclopedia into a clean 1.20 install, it ran out of memory and crashed a short way in (17303 revisions):

php maintenance/importDump.php --memory-limit=500M pages_full.xml.gz 

PHP Fatal error:  Allowed memory size of 524288000 bytes exhausted (tried to allocate 131072 bytes) in /var/www/mediawiki/core/includes/objectcache/SqlBagOStuff.php on line 517

Fatal error: Allowed memory size of 524288000 bytes exhausted (tried to allocate 131072 bytes) in /var/www/mediawiki/core/includes/objectcache/SqlBagOStuff.php on line 517


It was also running at something like 2 revisions/second, though I dunno if that had anything to do with anything.
Comment 1 Andre Klapper 2013-05-16 16:05:52 UTC
> full-revision dump of Uncyclopedia

How big is that?
Comment 2 Isarra 2013-05-16 17:56:01 UTC
~6GB compressed.

Also crashed for ?pedia, which is only ~100MB, though.
Comment 3 Adam Wight 2014-01-26 01:26:12 UTC
Same bug seen in MediaWiki 1.23-HEAD, importing from a recursive dump of the mediawiki.org/Template: namespace.  The resulting XML file is only 6.8MB, but the memory used to import seems to go up superlinearly, at over 90KB/revision.  There are memory leaks like a floating cardboard box.
Comment 4 physikerwelt 2014-02-20 12:12:31 UTC
I just tried it with the most recent version an it works for me.
The Maintemamce.php scrip just passes whatever you specify as $limit to php via
ini_set( 'memory_limit', $limit );
Comment 5 Adam Wight 2014-02-20 17:29:37 UTC
physikerwelt: Can you let us know roughly what size your target wiki and output file were?  Also, your PHP version would be helpful...  And, if you are passing a new memory_limit, what do you specify?

The bug isn't that it's impossible to run the dump script, it's about a memory leak which causes rapid memory exhaustion on even small data sets.
Comment 6 physikerwelt 2014-02-20 17:53:57 UTC
I used the most recent vagrant  version. I assigned 8G main memory and 8 cores to the VM. The dataset was 500mb a sample from the most recent version of enwiki (all pages that contain math). I set the main memory limit to 8G which would have been basically the same as max. And that migh be important I used the --no-updates flag. Can you post your dataset?
Comment 7 Isarra 2014-02-20 18:02:03 UTC
I think I recall it working with the --no-updates flag since then as well. So if this is still broken, the bug may just be in how it handles updates.

If this is the case, maybe just having it always run without updates would be in order - then have the option to run the appropriate scripts when it's done or something.
Comment 8 physikerwelt 2014-02-20 22:59:29 UTC
(In reply to Isarra from comment #7)
> I think I recall it working with the --no-updates flag since then as well.
> So if this is still broken, the bug may just be in how it handles updates.
> 
> If this is the case, maybe just having it always run without updates would
> be in order - then have the option to run the appropriate scripts when it's
> done or something.

Can you give me a pointer to the dataset. I'd like to test, how much memory you need. Maybe 500M is just not enough for a complex nested structure. I tend to write a note on that on the manpage rather than changing the code. But it's just a first guess.
Comment 9 Isarra 2014-02-20 23:31:25 UTC
Well, there's this: http://dump.zaori.org/20121114_uncy_en_pages_full.xml.gz 

That's the file I was trying to import when I originally filed this bug, I believe, though it's probably not the best thing to test on due to its being enormous.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links