Last modified: 2014-10-22 12:03:31 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T74306, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 72306 - "ulsfo <-> eqiad" network issue on 2014-10-20 affecting udp2log streams
"ulsfo <-> eqiad" network issue on 2014-10-20 affecting udp2log streams
Status: RESOLVED WONTFIX
Product: Analytics
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: Unprioritized normal
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-10-21 13:13 UTC by christian
Modified: 2014-10-22 12:03 UTC (History)
5 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description christian 2014-10-21 13:13:54 UTC
Ops reported [1] a network issue between ulsfo and eqiad (2014-10-20 ~13:07).

We did not see alerts on the udp2log pipeline.
However, we saw alerts on the tighter monitoring the kafka pipeline.

Did the issue affect the udp2log pipeline too?

[1] https://lists.wikimedia.org/mailman/private/ops/2014-October/042274.html
Comment 1 christian 2014-10-21 13:14:53 UTC
(In reply to christian from comment #0)
> However, we saw alerts on the tighter monitoring the kafka pipeline.

For the kafka pipeline, the bug is 72296.
Comment 2 christian 2014-10-21 17:21:38 UTC
The upd2log pipeline seems affected between
2014-10-20T13:06--2014-10-20T13:27.

Per hour per host packetloss ranges between 6-47% for ulsfo caches for
the hour that covers the affected period.

  +--------------------+--------------+
  |                    |     Per hour |
  |                    |   packetloss |
  | Host               | (in percent) |
  +--------------------+--------------+
  | cp4005.ulsfo.wmnet |           46 |
  | cp4006.ulsfo.wmnet |           12 |
  | cp4007.ulsfo.wmnet |           47 |
  | cp4008.ulsfo.wmnet |           42 |
  | cp4009.ulsfo.wmnet |           38 |
  | cp4010.ulsfo.wmnet |            8 |
  | cp4011.ulsfo.wmnet |           36 |
  | cp4012.ulsfo.wmnet |           37 |
  | cp4013.ulsfo.wmnet |            6 |
  | cp4014.ulsfo.wmnet |           44 |
  | cp4015.ulsfo.wmnet |            7 |
  | cp4016.ulsfo.wmnet |           22 |
  | cp4017.ulsfo.wmnet |           40 |
  | cp4018.ulsfo.wmnet |           12 |
  | cp4019.ulsfo.wmnet |           45 |
  | cp4020.ulsfo.wmnet |            9 |
  +--------------------+--------------+

Non-ulsfo don't show a drop/rise.
Comment 3 christian 2014-10-22 12:03:31 UTC
(In reply to christian from comment #0)
> We did not see alerts on the udp2log pipeline.

That's wrong.
There have been alerts [1]:

  [13:19:04] <icinga-wm>         PROBLEM - Packetloss_Average on erbium is CRITICAL: packet_loss_average CRITICAL: 13.2572885542
  [13:27:37] <icinga-wm>         PROBLEM - Packetloss_Average on oxygen is CRITICAL: packet_loss_average CRITICAL: 25.0862913793
  [13:29:40] <icinga-wm>         PROBLEM - Packetloss_Average on analytics1026 is CRITICAL: packet_loss_average CRITICAL: 14.6411538136
  [13:32:00] <icinga-wm>         RECOVERY - Packetloss_Average on erbium is OK: packet_loss_average OKAY: 2.36820388235
  [13:42:20] <icinga-wm>         RECOVERY - Packetloss_Average on analytics1026 is OK: packet_loss_average OKAY: 2.73679050847
  [13:46:30] <icinga-wm>         RECOVERY - Packetloss_Average on oxygen is OK: packet_loss_average OKAY: 1.89986423729


[1] http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-operations/20141020.txt

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links