Last modified: 2014-09-24 16:26:10 UTC
Before OCG was turned on for everyone, we had a 30k icinga warning limit for the job status queue. Since entries expire from the queue after 5 days and we expect around 10k jobs/day, we raised the limit to a more reasonable 100k. But we should re-examine this once OCG goes live by default in production and see whether this limit makes sense. We also have: warn output dir 40GB critical output dir 50GB postmortem dir warn 1G, critical 2G render jobs queue warn 100, critical 500 temp size warn 1G, critical 5G We should examine these as well. (If changes are needed, see https://gerrit.wikimedia.org/r/162623 ) Finally -- is the 5 day expiry reasonable? Should we instead/also have a "# entries" limit, and expire things as needed until the status queue goes down before NNN entries?