Last modified: 2014-10-30 18:54:28 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T74548, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 72548 - Raw webrequest bits partition for 2014-10-26T21/1H not marked successful


Summary:	Raw webrequest bits partition for 2014-10-26T21/1H not marked successful

Status:	RESOLVED FIXED

Product:	Analytics
Classification:	Unclassified
Component:	Refinery (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	Unprioritized normal
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:	72298
	Show dependency tree / graph

Reported:	2014-10-27 07:53 UTC by christian
Modified:	2014-10-30 18:54 UTC (History)
CC List:	7 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description christian 2014-10-27 07:53:12 UTC

The bits webrequest partition [1] for 2014-10-26T21/1H has not been marked
successful.

What happened?


[1]
_________________________________________________________________
qchris@stat1002 // jobs: 0 // time: 07:51:53 // exit code: 0
cwd: ~
~/cluster-scripts/dump_webrequest_status.sh 
  +------------------+--------+--------+--------+--------+
  | Date             |  bits  | mobile |  text  | upload |
  +------------------+--------+--------+--------+--------+
[...]
  | 2014-10-26T19/1H |    .   |    .   |    .   |    .   |
  | 2014-10-26T20/1H |    .   |    .   |    .   |    .   |
  | 2014-10-26T21/1H |    X   |    .   |    .   |    .   |
  | 2014-10-26T22/1H |    .   |    .   |    .   |    .   |
  | 2014-10-26T23/1H |    .   |    .   |    .   |    .   |
[...]
  +------------------+--------+--------+--------+--------+


Statuses:
  . --> Partition is ok
  M --> Partition manually marked ok
  X --> Partition is not ok (duplicates, missing, or nulls)

Comment 1 christian 2014-10-27 08:22:12 UTC

Only cp3019 is affected. For that host data worth ~55 seconds got lost
in the ~1 minute between 2014-10-26T21:16:22 2014-10-26T21:17:24.

I could neither find changes in puppet, dns, or SAL that look relevant.

cp3019 (as all other esams caches) are gone from ganglia, so it's hard
to see further data from cp3019 itself for non-Ops.

Icinga shows the “Varnishkafka Delivery Errors” service having status
WARNING since 2014-10-24 17:11:57 (but the same holds true for the
other esams caches too).

Comment 2 christian 2014-10-27 08:31:45 UTC

Kafka logs did not show peculiar entries in the relevant period of time.

Comment 3 christian 2014-10-28 22:26:12 UTC

ganglia again shows data for esams caches, but the data between
~2014-10-24T12 and ~2014-10-27T16 is missing (which contains the
minute where we had cp3019 issues).
Judging from the cumulative counters, neither varnish nor varnishkafka
got restarted on cp3019.

ottomata ... since I cannot find any explanation, does cp3019
or 2014-10-26T21:16 ring a bell for you?

Was there some other migration/testing/network issue that I am missing?

Comment 4 christian 2014-10-30 18:54:28 UTC

ottomata had a look at the logs on cp3019 and said that there were
produce errors about full buffers.
So we're writing it off as temporary network issues for now.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links