Last modified: 2014-08-06 23:57:06 UTC
We currently contain image scaling jobs into cgroups. We have an upstart script in puppet (/modules/mediawiki/files/cgroup/mw-cgroup.conf) that basically does: pre-start script mkdir -p /sys/fs/cgroup/memory/mediawiki mkdir -m 0777 /sys/fs/cgroup/memory/mediawiki/job echo "/usr/local/bin/cgroup-mediawiki-clean" > /sys/fs/cgroup/memory/release_agent end script When cgroup-bin gets reconfigured e.g. during an upgrade, the cgroups go away (that looks like a bug of its own?) and the upstart job "mw-cgroup" is never re-run again, since it was already in the "started" upstart state. In the meantime, thumbnailing jobs fail since they can't create their own job cgroup as the parent hierarchy (mediawiki/job) doesn't exist. Although we could do all kinds of upstart tricks (stop on cgconfig stop for example), I can't see a reason on why limit.sh can't check for the existence of mediawiki & mediawiki/job and if they don't exist, create them itself. This would nicely solve this and it'd be far more resilient. Note that the above issue produced a complete thumbnail outage for the past hour or so and it is bound to happen again on the next cgroup-bin upgrade.
one reason this is an upstart script is that its run as root. can you also restart it on the videoscalers, they are also out.
Change 83067 had a related patch set uploaded by J: restart mw-cgroup on cgconfig restart https://gerrit.wikimedia.org/r/83067
Change 83067 merged by Faidon Liambotis: restart mw-cgroup on cgconfig restart https://gerrit.wikimedia.org/r/83067
Sure, I guess this works too.
Maybe it would be better to use cgconfig.conf for this? It's better to use the standard configuration system than to start a wheel war with it, right?
cgconfig.conf is not flexible enough to accommodate the current setup, if its possible to rework the cgroups use to fit within the options cgconfig.conf allows, moving it to that would be an option. cgconfig.conf can be used to limit the overall resources for a group, but not to have per process limits within a group. afaik it also does not provide an option to install a release agent. given those limitations having an upstart job that sets up the cgroups, as we do now, seams to be the best option. with the merged change, restarts are also no longer a problem. not sure i would call that a wheel war - its more that there was a bug in mw-cgroup.conf that got fixed.