Last modified: 2014-10-31 12:54:58 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T74810, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 72810 - Raw webrequest partitions for 2014-10-30T21/1H not marked successful
Raw webrequest partitions for 2014-10-30T21/1H not marked successful
Status: NEW
Product: Analytics
Classification: Unclassified
Refinery (Other open bugs)
unspecified
All All
: Unprioritized normal
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks: 72809
  Show dependency treegraph
 
Reported: 2014-10-31 12:54 UTC by christian
Modified: 2014-10-31 12:54 UTC (History)
7 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description christian 2014-10-31 12:54:37 UTC
The bits and upload webrequest partition [1] for 2014-10-30T21/1H have
not been marked successful.

What happened?


[1]
_________________________________________________________________
qchris@stat1002 // jobs: 0 // time: 08:12:06 // exit code: 130
cwd: ~
~/cluster-scripts/dump_webrequest_status.sh
  +------------------+--------+--------+--------+--------+
  | Date             |  bits  | mobile |  text  | upload |
  +------------------+--------+--------+--------+--------+
[...]
  | 2014-10-30T19/1H |    .   |    .   |    .   |    .   |
  | 2014-10-30T20/1H |    .   |    .   |    .   |    .   |
  | 2014-10-30T21/1H |    X   |    .   |    .   |    X   |
  | 2014-10-30T22/1H |    .   |    .   |    .   |    .   |
  | 2014-10-30T23/1H |    .   |    .   |    .   |    .   |
[...]
  +------------------+--------+--------+--------+--------+


Statuses:

  . --> Partition is ok
  M --> Partition manually marked ok
  X --> Partition is not ok (duplicates, missing, or nulls)
Comment 1 christian 2014-10-31 12:54:58 UTC
For bits, it only affected cp3020.
The affected period is 2014-10-30T21:25:41/2014-10-30T21:26:26.
No lost messages, only 70660 duplicates, which is <2 seconds worth of
data for bits.

For bits, it only affected cp3018.
The affected period is 2014-10-30T21:25:18/2014-10-30T21:26:10.
No lost messages, only 34087 duplicates, which is <2 seconds worth of
data for upload.

I could not find anything relevant in puppet, nor in SAL.

It's again only esams.

According to ganglia, kafka.rdkafka.brokers.*.rtt.avg's Max went up
during that time on
* cp3018 to 6.0M for analytics1018)
* cp3020 to 12.6M for analytics1018)

But other caches had even higher Max values for that average (
  cp3019 had 36.7M for analytics1021
  cp3010 had 11.8M for analytics1021
  cp3010 had  8.8M for analytics1022
), but did not show duplicates.

According to ganglia, kafka.rdkafka.brokers.*.outbuf_cnt's Max went up
during that time on
* cp3018 to 334.9 for analytics1022 (not analytics1018! It had 28.4 max for analytics1018)
* cp3020 to 720.8 for analytics1018

But cp3019 had 479 for analytics1021 (i.e. a similar Max value), but
did not show duplicates.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links