Last modified: 2014-07-29 22:25:30 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T70796, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 68796 - Packetloss was critical on 2014-07-29 ~2:00 for oxygen, analytics1003, erbium


Summary:	Packetloss was critical on 2014-07-29 ~2:00 for oxygen, analytics1003, erbium

Status:	RESOLVED WONTFIX

Product:	Analytics
Classification:	Unclassified
Component:	General/Unknown (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	Unprioritized normal
Target Milestone:	---
Assigned To:	christian

URL:
Whiteboard:	u=Community c=General/Unknown p=0 s=2...
Keywords:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2014-07-29 09:17 UTC by christian
Modified:	2014-07-29 22:25 UTC (History)
CC List:	5 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description christian 2014-07-29 09:17:43 UTC

On 2014-07-29 ~02:00, there were packet loss alarms for oxygen, analytics1003, erbium in the #wikimedia-operations IRC channel:

  [01:52:47] <icinga-wm> PROBLEM - Packetloss_Average on erbium is CRITICAL: packet_loss_average CRITICAL: 37.5854172414
  [01:56:47] <icinga-wm> RECOVERY - Packetloss_Average on erbium is OK: packet_loss_average OKAY: -0.0539559302326  
  [01:57:17] <icinga-wm> PROBLEM - Packetloss_Average on analytics1003 is CRITICAL: packet_loss_average CRITICAL: 14.0737649167  
  [02:01:17] <icinga-wm> RECOVERY - Packetloss_Average on analytics1003 is OK: packet_loss_average OKAY: 1.17930608333  
  [02:02:57] <icinga-wm> PROBLEM - Packetloss_Average on oxygen is CRITICAL: packet_loss_average CRITICAL: 9.18785825  
  [02:06:57] <icinga-wm> RECOVERY - Packetloss_Average on oxygen is OK: packet_loss_average OKAY: 1.15079566667

The packetloss periods were short, and there was much monitoring noise in the
IRC channel around that time, so those might have been flukes.

Comment 1 christian 2014-07-29 22:25:30 UTC

The issue was a flapping esams link [1], which (depending on the stream)
killed half up to all esams traffic (eqiad and ulsfo were unaffected) to the
udp2log instances between 2014-07-29T01:35:45 and 2014-07-29T01:42:00.

This issue affects all of our logging infrastructure, from TSVs to
webstatscollector to pagecounts.


[1] See
  http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-operations/20140729.txt
  between [01:36:07] [02:02:19]

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links