Last modified: 2014-10-29 17:08:55 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T74252, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 72252 - Raw webrequest partitions for 2014-10-20T02:xx:xx not marked successful
Raw webrequest partitions for 2014-10-20T02:xx:xx not marked successful
Status: RESOLVED WONTFIX
Product: Analytics
Classification: Unclassified
Refinery (Other open bugs)
unspecified
All All
: Unprioritized normal
: ---
Assigned To: Nobody - You can work on this!
:
Depends on: 72295
Blocks: 69667
  Show dependency treegraph
 
Reported: 2014-10-20 10:31 UTC by christian
Modified: 2014-10-29 17:08 UTC (History)
7 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description christian 2014-10-20 10:31:45 UTC
For the hour 2014-10-20T02:xx:xx, none [1] of the the four sources'
bucket was marked successful.

What happened?




[1]
_________________________________________________________________
qchris@stat1002 // jobs: 0 // time: 10:29:22 // exit code: 0
cwd: ~
~/cluster-scripts/dump_webrequest_status.sh 
  +---------------------+--------+--------+--------+--------+
  | Date                |  bits  | mobile |  text  | upload |
  +---------------------+--------+--------+--------+--------+
[...]
  | 2014-10-20T00:xx:xx |    .   |    .   |    .   |    .   |    
  | 2014-10-20T01:xx:xx |    .   |    .   |    .   |    .   |    
  | 2014-10-20T02:xx:xx |    X   |    X   |    X   |    X   |    
  | 2014-10-20T03:xx:xx |    .   |    .   |    .   |    .   |    
  | 2014-10-20T04:xx:xx |    .   |    .   |    .   |    .   |    
[...]
  +---------------------+--------+--------+--------+--------+


Statuses:

  . --> Partition is ok
  M --> Partition manually marked ok
  X --> Partition is not ok (duplicates, missing, or nulls)

pass /home/qchris/cluster-scripts/dump_webrequest_status.sh
Comment 1 christian 2014-10-20 10:39:29 UTC
It seems that somewhere between 2014-10-20T02:05:00 and
2014-10-20T02:12:00 analytics1021 again got kicked out of its
partition leader role.

I now ran leader elections, so analytics1021 is ready to help
with esams bits today in the evening.
Comment 2 christian 2014-10-20 11:01:40 UTC
From the logs between 2014-10-20T02:05:08 2014-10-20T02:05:16, data
worth <2 seconds got lost.

It's noteworthy that we again did not see loss for the hosts that we
tuned the ACKs for. So I think we should move forward to roll out the
ACK experiment to more hosts, so we can get rid of issues when
analytics1021 drops out of its leader role again.
Comment 3 christian 2014-10-20 11:48:20 UTC
(In reply to christian from comment #2)
> So I think we should move forward to roll out the
> ACK experiment to more hosts, so we can get rid of issues when
> analytics1021 drops out of its leader role again.

Patches to roll out the ACK experiment got uploaded to gerrit

  https://gerrit.wikimedia.org/r/#/q/status:open+project:operations/puppet+branch:production+topic:kafka-acks,n,z

(for not yet merged parts) and have been linked to big 69667.
Comment 4 christian 2014-10-20 11:50:48 UTC
s/big 69667/bug 69667/

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links