Last modified: 2014-09-24 16:26:10 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T73239, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 71239 - Re-examine icinga warning thresholds and job expiry.
Re-examine icinga warning thresholds and job expiry.
Status: NEW
Product: OCG
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: Unprioritized normal
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-09-24 16:26 UTC by C. Scott Ananian
Modified: 2014-09-24 16:26 UTC (History)
1 user (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description C. Scott Ananian 2014-09-24 16:26:10 UTC
Before OCG was turned on for everyone, we had a 30k icinga warning limit for the job status queue.  Since entries expire from the queue after 5 days and we expect around 10k jobs/day, we raised the limit to a more reasonable 100k.

But we should re-examine this once OCG goes live by default in production and see whether this limit makes sense.  We also have:

warn output dir 40GB
critical output dir 50GB
postmortem dir warn 1G, critical 2G
render jobs queue warn 100, critical 500
temp size warn 1G, critical 5G

We should examine these as well.  (If changes are needed, see https://gerrit.wikimedia.org/r/162623 )

Finally -- is the 5 day expiry reasonable?  Should we instead/also have a "# entries" limit, and expire things as needed until the status queue goes down before NNN entries?

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links