Last modified: 2014-09-25 12:37:01 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T73116, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 71116 - Packetloss_Average alarm on udp2log machines on 2014-09-20
Packetloss_Average alarm on udp2log machines on 2014-09-20
Status: RESOLVED FIXED
Product: Analytics
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: Unprioritized normal
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-09-22 11:33 UTC by christian
Modified: 2014-09-25 12:37 UTC (History)
5 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description christian 2014-09-22 11:33:00 UTC
There have been Packetloss_Average alerts on 2014-09-20 for a few
minutes on erbium and oxygen [1].

What happened and how does it affect our files?


[1] See http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-operations/20140920.txt

  [07:18:51] <icinga-wm> PROBLEM - Packetloss_Average on erbium is CRITICAL: packet_loss_average CRITICAL: 10.7821244706
  [07:22:50] <icinga-wm> PROBLEM - Packetloss_Average on oxygen is CRITICAL: packet_loss_average CRITICAL: 8.09920889831
  [07:27:01] <icinga-wm> RECOVERY - Packetloss_Average on erbium is OK: packet_loss_average OKAY: 1.81911928571
  [07:37:18] <icinga-wm> RECOVERY - Packetloss_Average on oxygen is OK: packet_loss_average OKAY: 0.894191101695
Comment 1 christian 2014-09-22 11:36:43 UTC
Ops reported an ULSO outage [1] that matches the time period, and Ops said to
have a proper incident report today (2014-09-22).

Once that is out, we'll see how it matches the effect we saw.

[1] https://lists.wikimedia.org/mailman/private/ops/2014-September/040429.html
Comment 2 christian 2014-09-25 12:37:01 UTC
(In reply to christian from comment #1)
> and Ops said to
> have a proper incident report today (2014-09-22).

Since I still could not find a proper report, I had a look
nonetheless, and the Packetloss_Average alerts closely match the
preliminary timeline from Ops.
Also only ulsfo hosts was affected.

So the alert was just an artifact of the ulsfo issue and traffic
reshuffling.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links