Last modified: 2014-09-25 12:37:01 UTC
There have been Packetloss_Average alerts on 2014-09-20 for a few minutes on erbium and oxygen [1]. What happened and how does it affect our files? [1] See http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-operations/20140920.txt [07:18:51] <icinga-wm> PROBLEM - Packetloss_Average on erbium is CRITICAL: packet_loss_average CRITICAL: 10.7821244706 [07:22:50] <icinga-wm> PROBLEM - Packetloss_Average on oxygen is CRITICAL: packet_loss_average CRITICAL: 8.09920889831 [07:27:01] <icinga-wm> RECOVERY - Packetloss_Average on erbium is OK: packet_loss_average OKAY: 1.81911928571 [07:37:18] <icinga-wm> RECOVERY - Packetloss_Average on oxygen is OK: packet_loss_average OKAY: 0.894191101695
Ops reported an ULSO outage [1] that matches the time period, and Ops said to have a proper incident report today (2014-09-22). Once that is out, we'll see how it matches the effect we saw. [1] https://lists.wikimedia.org/mailman/private/ops/2014-September/040429.html
(In reply to christian from comment #1) > and Ops said to > have a proper incident report today (2014-09-22). Since I still could not find a proper report, I had a look nonetheless, and the Packetloss_Average alerts closely match the preliminary timeline from Ops. Also only ulsfo hosts was affected. So the alert was just an artifact of the ulsfo issue and traffic reshuffling.