Last modified: 2014-10-22 12:01:24 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T74355, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 72355 - "ulsfo <-> eqiad" network issue on 2014-10-21 affecting udp2log streams


Summary:	"ulsfo <-> eqiad" network issue on 2014-10-21 affecting udp2log streams

Status:	RESOLVED WONTFIX

Product:	Analytics
Classification:	Unclassified
Component:	General/Unknown (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	Unprioritized normal
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2014-10-22 11:40 UTC by christian
Modified:	2014-10-22 12:01 UTC (History)
CC List:	5 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description christian 2014-10-22 11:40:07 UTC

Ops reported [1] a network issue between ulsfo and eqiad (According to
IRC logs [2], alerts started around 2014-10-21 ~10:30).

We did not see alerts on the udp2log pipeline.
However, we saw alerts on the tighter monitoring the kafka pipeline.

Did the issue affect the udp2log pipeline too?

[1] https://lists.wikimedia.org/mailman/private/ops/2014-October/042427.html
[2] http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-operations/20141021.txt

Comment 1 christian 2014-10-22 11:40:32 UTC

The upd2log pipeline shows the first sporadic ulsfo drop-outs on
2014-10-21T10:58 and continued to show ulsfo drop-outs until ulsfo got
depooled on 2014-10-21T11:43
(Ifc2a1f1abb7d532e01782b05df764bf4cd072014).

Per host packet loss computation for the affected hour does not give a
meaningful result due to the ulsfo depooling bringing down message
volume from ulsfo too much.

Comment 2 christian 2014-10-22 12:01:24 UTC

(In reply to christian from comment #0)
> We did not see alerts on the udp2log pipeline.

That's wrong.
There have been alerts [1]:

  [11:54:29] <icinga-wm>         PROBLEM - Packetloss_Average on erbium is CRITICAL: packet_loss_average CRITICAL: 9.11388505882
  [12:02:12] <icinga-wm>         PROBLEM - Packetloss_Average on analytics1026 is CRITICAL: packet_loss_average CRITICAL: 23.0722363964
  [12:06:06] <icinga-wm>         RECOVERY - Packetloss_Average on erbium is OK: packet_loss_average OKAY: 0.0
  [12:21:25] <icinga-wm>         RECOVERY - Packetloss_Average on analytics1026 is OK: packet_loss_average OKAY: 2.49366398305
  [12:27:01] <icinga-wm>         RECOVERY - Packetloss_Average on oxygen is OK: packet_loss_average OKAY: 1.85878847458


[1] http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-operations/20141021.txt

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links