Last modified: 2014-03-04 08:34:36 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T60531, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 58531 - [grabbers] PHP Fatal error: Out of memory in mediawikibot.class.php


Summary:	[grabbers] PHP Fatal error: Out of memory in mediawikibot.class.php

Status:	RESOLVED FIXED

Product:	Utilities
Classification:	Unclassified
Component:	grabbers (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	Normal major (vote)
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2013-12-16 13:08 UTC by Nemo
Modified:	2014-03-04 08:34 UTC (History)
CC List:	3 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Nemo 2013-12-16 13:08:26 UTC

With
    php grabText.php --url=http://wikihow.com/api.php
I got
    PHP Fatal error:  Out of memory (allocated 1851260928) (tried to allocate 32 bytes)
in mediawikibot.class.php.

This wiki is rather huge with about 3.5M pages (I was hoping to have better success than with dumpgenerator.py), is there any way to limit the amount of data stored in RAM?

Comment 1 Kunal Mehta (Legoktm) 2013-12-16 16:45:00 UTC

How far did the script make it? What was the last output it gave before running out of memory?

Comment 2 Nemo 2013-12-16 17:14:34 UTC

(In reply to comment #1)
> How far did the script make it? What was the last output it gave before
> running
> out of memory?

260k pages in ns0, then a count for the next ns and presumably it was getting titles for the third.

Comment 3 Isarra 2013-12-16 18:29:30 UTC

So it looks like it's running out of memory to store the initial list before it starts grabbing the pages (workflow being that it gets the list of namespaces, then gets the list of pages, and then gets the revisions themselves and inserts everything into the database)? That would certainly understandable at these sizes, so it should probably... what, be writing the page list to a temp file or something? 

What's the best practice here? What should it be doing?

Comment 4 Kunal Mehta (Legoktm) 2014-01-03 08:57:57 UTC

I think we should insert into the database in batches so we don't have to store everything in memory, just a few parts.

Comment 5 Gerrit Notification Bot 2014-01-03 09:10:58 UTC

Change 105153 had a related patch set uploaded by Legoktm:
grabText: Don't store entire list of pages in memory

https://gerrit.wikimedia.org/r/105153

Comment 6 Gerrit Notification Bot 2014-01-07 13:10:52 UTC

Change 105153 merged by Jack Phoenix:
grabText: Don't store entire list of pages in memory

https://gerrit.wikimedia.org/r/105153

Comment 7 Andre Klapper 2014-03-01 01:34:34 UTC

Can somebody confirm this is FIXED by the commit in comment 6?
Nemo?

Comment 8 Nemo 2014-03-04 08:34:36 UTC

I don't know, my local checkout no longer works. Tentatively closing.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links