Last modified: 2013-04-22 16:51:41 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T41493, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 39493 - High OOM rate in refreshLinks2


Summary:	High OOM rate in refreshLinks2

Status:	RESOLVED FIXED

Product:	MediaWiki
Classification:	Unclassified
Component:	Page editing (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	Normal normal (vote)
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2012-08-20 01:01 UTC by Tim Starling
Modified:	2013-04-22 16:51 UTC (History)
CC List:	4 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Tim Starling 2012-08-20 01:01:20 UTC

We're logging around 6000-10000 job queue OOMs per day:

$ for day in `seq 15 17`; do echo -n "August $day: "; zgrep -A2 'Allowed memory size of' fatal.log-201208$day.gz  | grep unknown-host | wc -l ; done

August 15: 7239
August 16: 9737
August 17: 6492

They are OOMs from various points in the parser, with RefreshLinksJob2::run() as the ultimate caller.

These cause collateral damage beyond the article that actually triggered the OOM, since the whole RefreshLinks2 batch is lost.

Perhaps there is a memory leak.

Comment 1 Aaron Schulz 2012-09-03 19:55:35 UTC

Starting looking at this a bit.

Also, see https://gerrit.wikimedia.org/r/22497.

Comment 2 Rob Lanphier 2012-09-21 18:27:26 UTC

Speaking with Aaron now, it would seem this one maybe isn't a problem anymore.  Tim, does this still look like a problem to you?

Comment 3 Aaron Schulz 2012-09-25 21:58:59 UTC

Actually it looks just as frequent as before.

Comment 4 Aaron Schulz 2012-10-10 18:39:53 UTC

OK, more info:

aaron@fluorine:~/mw-log$ for day in `seq 25 30`; do echo -n "Sep $day: "; zgrep -A2 'Allowed memory
size of' archive/fatal.log-201209$day.gz  | grep unknown-host | wc -l ; done
Sep 25: 1359
Sep 26: 979
Sep 27: 823
Sep 28: 812
Sep 29: 769
Sep 30: 970

Comment 5 Aaron Schulz 2013-01-03 01:09:02 UTC

Seems lower the last few weeks.

aaron@fluorine:~/mw-log$ for day in `seq 25 31`; do echo -n "Dec $day: "; zgrep -A2 'Allowed memory size of' archive/fatal.log-201212$day.gz  | grep unknown-host | wc -l ; done
Dec 25: 11
Dec 26: 11
Dec 27: 239
Dec 28: 46
Dec 29: 3
Dec 30: 4
Dec 31: 1

Comment 6 Aaron Schulz 2013-04-17 19:43:30 UTC

The jobs runner memory limits were doubled and the wikidata job batch sizes where also halved (again) on Apr 16 (those were piling OOMs of there own).

Comment 7 Aaron Schulz 2013-04-22 16:51:41 UTC

aaron@fluorine:~/mw-log$ for day in `seq 10 21`; do echo -n "Day $day: "; zgrep -A2 'Allowed memory size of' archive/fatal.log-201304$day.gz | grep -P "mw10(0[1-9]|1[0-6])" | wc -l ; done
Day 10: 6665
Day 11: 16169
Day 12: 29571
Day 13: 1879
Day 14: 142
Day 15: 6
Day 16: 141
Day 17: 27
Day 18: 0
Day 19: 0
Day 20: 0
Day 21: 0

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links