Last modified: 2014-10-31 12:54:58 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T74810, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 72810 - Raw webrequest partitions for 2014-10-30T21/1H not marked successful


Summary:	Raw webrequest partitions for 2014-10-30T21/1H not marked successful

Status:	NEW

Product:	Analytics
Classification:	Unclassified
Component:	Refinery (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	Unprioritized normal
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:	72809
	Show dependency tree / graph

Reported:	2014-10-31 12:54 UTC by christian
Modified:	2014-10-31 12:54 UTC (History)
CC List:	7 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description christian 2014-10-31 12:54:37 UTC

The bits and upload webrequest partition [1] for 2014-10-30T21/1H have
not been marked successful.

What happened?


[1]
_________________________________________________________________
qchris@stat1002 // jobs: 0 // time: 08:12:06 // exit code: 130
cwd: ~
~/cluster-scripts/dump_webrequest_status.sh
  +------------------+--------+--------+--------+--------+
  | Date             |  bits  | mobile |  text  | upload |
  +------------------+--------+--------+--------+--------+
[...]
  | 2014-10-30T19/1H |    .   |    .   |    .   |    .   |
  | 2014-10-30T20/1H |    .   |    .   |    .   |    .   |
  | 2014-10-30T21/1H |    X   |    .   |    .   |    X   |
  | 2014-10-30T22/1H |    .   |    .   |    .   |    .   |
  | 2014-10-30T23/1H |    .   |    .   |    .   |    .   |
[...]
  +------------------+--------+--------+--------+--------+


Statuses:

  . --> Partition is ok
  M --> Partition manually marked ok
  X --> Partition is not ok (duplicates, missing, or nulls)

Comment 1 christian 2014-10-31 12:54:58 UTC

For bits, it only affected cp3020.
The affected period is 2014-10-30T21:25:41/2014-10-30T21:26:26.
No lost messages, only 70660 duplicates, which is <2 seconds worth of
data for bits.

For bits, it only affected cp3018.
The affected period is 2014-10-30T21:25:18/2014-10-30T21:26:10.
No lost messages, only 34087 duplicates, which is <2 seconds worth of
data for upload.

I could not find anything relevant in puppet, nor in SAL.

It's again only esams.

According to ganglia, kafka.rdkafka.brokers.*.rtt.avg's Max went up
during that time on
* cp3018 to 6.0M for analytics1018)
* cp3020 to 12.6M for analytics1018)

But other caches had even higher Max values for that average (
  cp3019 had 36.7M for analytics1021
  cp3010 had 11.8M for analytics1021
  cp3010 had  8.8M for analytics1022
), but did not show duplicates.

According to ganglia, kafka.rdkafka.brokers.*.outbuf_cnt's Max went up
during that time on
* cp3018 to 334.9 for analytics1022 (not analytics1018! It had 28.4 max for analytics1018)
* cp3020 to 720.8 for analytics1018

But cp3019 had 479 for analytics1021 (i.e. a similar Max value), but
did not show duplicates.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links