Last modified: 2013-02-21 06:33:31 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T45188, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 43188 - Enhance imagescaler processes containment
Enhance imagescaler processes containment
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
File management (Other open bugs)
1.21.x
All All
: High enhancement (vote)
: ---
Assigned To: Jan Gerber
: performance
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-12-17 04:32 UTC by Faidon Liambotis
Modified: 2013-02-21 06:33 UTC (History)
5 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Faidon Liambotis 2012-12-17 04:32:11 UTC
We often see scaler processes (convert, avconv etc.) hang for various reasons. For example, we've seen convert and avconv both hang in some cases, when they can't allocate more memory than the one limited to them by MediaWiki's ulimit call.

We should both handle such cases and also contain them better than we do now, for security reasons.

Tim Starling and I have discussed various approaches for this in the past. I proposed an LD_PRELOAD wrapper that would abort whenever malloc() is unable to allocate memory (hackish, but might work). We could also use cgroups, as they're better at tracking resources than ulimits and they can also prove useful at containing what those processes can do in case of security exploits.

Finally, we can probably also use something like SIGALRM or non-blocking pipe calls so that the parent process counts the time of waiting for its child and, if times out, kills it.
Comment 1 Faidon Liambotis 2012-12-17 19:38:40 UTC
#40099 (AppArmor profile for avconv) is also related to this. The AppArmor idea is nice and could be expanded to image scalers too. However, note that it probably won't be enough, as it can't impose resource limits.
Comment 2 Antoine "hashar" Musso (WMF) 2012-12-18 00:10:42 UTC
Would it make sense to run such huge scaling process on a dedicated box that will run them one at a time and thus with a lot more memory?

We could probably get MediaWiki to catch such failing processes and create a dedicated job queue  that will be process by such server. Of course if scaling fail on that one, we would abandon.
Comment 3 Jan Gerber 2012-12-20 12:50:42 UTC
one simple addition might be to use something like timelimit[1] to enforce a real time limit not a cpu time limit, that way processes that hang and don't use any cpu anymore would still get killed.

[1] http://devel.ringlet.net/sysutils/timelimit/
Comment 4 Tim Starling 2012-12-20 23:40:52 UTC
Cgroups are a fairly nice option for memory limiting. With a little bit of per-server setup, an unprivileged user can create a cgroup dynamically for each convert/avconv process. It measures memory in an intelligent way, better than RSS and much better than vsize. And when the limit is reached, the kernel OOM killer kills it, so there's no chance of a deadlock.
Comment 5 Faidon Liambotis 2012-12-21 19:10:17 UTC
cgroups are nice and I agree that it'd be nice to use them. We can even use LXC, which sits on top of them and provides an easy way to run processes better isolated.

However, I also think that a timeout mechanism should be employed to kill processes that hang, for whatever reason that can't be caught otherwise (e.g. maybe they're just sitting idle, not consuming memory or CPU). If we really feel like overengineering, we can even use cgroups to properly kill the whole group, including potentially runaway processes :)

Jan, from a quick glance timelimit is basically signal handlers and alarm(), like what I suggested earlier. My idea was writing that in PHP, but using that helper might be easier indeed.
Comment 6 Jan Gerber 2012-12-27 12:13:10 UTC
pushed a patch to use cgroup memory limits via ulimit4.sh if setup:

core:   https://gerrit.wikimedia.org/r/#/c/40785/
puppet: https://gerrit.wikimedia.org/r/#/c/40784/
Comment 7 Faidon Liambotis 2013-01-28 15:49:10 UTC
Tim, I know you were in the process of doing some work related to this. Can we get your opinion on Jan's patches either here or in Gerrit? I'd like to push this forward, let me know if I can help coordinating this in a better way.
Comment 8 Tim Starling 2013-02-21 06:33:31 UTC
All done now.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links