Last modified: 2013-08-23 15:17:58 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T54737, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 52737 - Icinga alarms for stafford seem to be set too tight
Icinga alarms for stafford seem to be set too tight
Status: RESOLVED INVALID
Product: Wikimedia
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: Low enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
: ops
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-08-11 22:13 UTC by Tim Landscheidt
Modified: 2013-08-23 15:17 UTC (History)
1 user (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Tim Landscheidt 2013-08-11 22:13:36 UTC
icinga-wm complained on #wikimedia-operations today regularly about stafford and recovered usually one to two minutes later:

| [00:32:46] <icinga-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
| [00:33:37] <icinga-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time
| [02:30:55] <icinga-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
| [02:31:45] <icinga-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 6.529 second response time
| [04:27:50] <icinga-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
| [04:28:40] <icinga-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.137 second response time
| [08:43:30] <icinga-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
| [08:44:21] <icinga-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 5.129 second response time
| [08:52:30] <icinga-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
| [08:54:20] <icinga-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.175 second response time
| [09:21:30] <icinga-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
| [09:24:20] <icinga-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time
| [09:52:28] <icinga-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
| [09:54:18] <icinga-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.145 second response time
| [10:13:28] <icinga-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
| [10:14:18] <icinga-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.134 second response time
| [10:32:28] <icinga-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
| [10:33:18] <icinga-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time
| [10:36:28] <icinga-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
| [10:37:14] <icinga-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.131 second response time
| [10:52:24] <icinga-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
| [10:53:15] <icinga-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time
| [11:22:24] <icinga-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
| [11:23:15] <icinga-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time
| [12:02:28] <icinga-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
| [12:05:18] <icinga-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.146 second response time
| [12:08:28] <icinga-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
| [12:09:18] <icinga-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.178 second response time
| [12:13:28] <icinga-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
| [12:14:19] <icinga-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time
| [12:22:28] <icinga-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
| [12:23:18] <icinga-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.191 second response time
| [12:52:31] <icinga-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
| [12:53:22] <icinga-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.132 second response time
| [13:31:41] <icinga-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
| [13:33:31] <icinga-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.128 second response time
| [14:01:38] <icinga-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
| [14:02:28] <icinga-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.220 second response time
| [14:23:38] <icinga-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
| [14:24:38] <icinga-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 9.811 second response time
| [14:31:38] <icinga-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
| [14:32:29] <icinga-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time
| [15:57:35] <icinga-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
| [15:59:26] <icinga-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.242 second response time
| [17:27:45] <icinga-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
| [17:29:35] <icinga-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time
| [17:51:24] <icinga-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
| [17:53:14] <icinga-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time
| [18:21:24] <icinga-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
| [18:22:14] <icinga-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.128 second response time
| [18:52:19] <icinga-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
| [18:53:09] <icinga-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time
| [19:52:27] <icinga-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
| [19:53:17] <icinga-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time
| [21:21:20] <icinga-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
| [21:22:10] <icinga-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time
| [21:39:21] <icinga-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
| [21:40:11] <icinga-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.148 second response time
Comment 1 Faidon Liambotis 2013-08-23 15:17:58 UTC
They're not too tight. These are real problems. stafford is on the edge of its capacity; a few requests more throw it into an outage, a few requests less and it recovers. It's on our plans to scale up the puppet infrastructure to cover this. Until then, it isn't a huge deal as this isn't realtime traffic.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links