Last modified: 2014-10-29 17:11:43 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T71854, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 69854 - Raw webrequest partition monitoring did not flag data for 2014-08-18T13:..:.. as valid for text caches
Raw webrequest partition monitoring did not flag data for 2014-08-18T13:..:.....
Status: RESOLVED WONTFIX
Product: Analytics
Classification: Unclassified
Refinery (Other open bugs)
unspecified
All All
: Unprioritized normal
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks: 69666
  Show dependency treegraph
 
Reported: 2014-08-21 15:52 UTC by christian
Modified: 2014-10-29 17:11 UTC (History)
7 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
kafka-requests-per-second-2014-08-17--2014-08-19 (23.14 KB, image/png)
2014-08-21 15:52 UTC, christian
Details

Description christian 2014-08-21 15:52:11 UTC
The imported raw webrequests data from text caches for
2014-08-18T13:..:.. at

  hdfs://analytics-hadoop/wmf/data/raw/webrequest/webrequest_text/hourly/2014/08/18/13

was not marked as ok.

Is that valid?
What happened?
Comment 1 christian 2014-08-21 15:52:50 UTC
Created attachment 16256 [details]
kafka-requests-per-second-2014-08-17--2014-08-19
Comment 2 christian 2014-08-21 15:54:15 UTC
Monitoring worked as expected, as the data is missing sequence numbers:

  +-----------------------------+-----------+---------------------+---------------------+
  | Hostname                    | # missing | Start time          | End time            |
  +-----------------------------+-----------+---------------------+---------------------+
  | amssq37.esams.wmnet         |       155 | 2014-08-18T13:29:37 | 2014-08-18T13:29:39 |
  | amssq47.esams.wmnet         |       125 | 2014-08-18T13:29:37 | 2014-08-18T13:29:39 |
  | amssq48.esams.wikimedia.org |       149 | 2014-08-18T13:29:37 | 2014-08-18T13:29:39 |
  | amssq59.esams.wikimedia.org |        74 | 2014-08-18T13:29:37 | 2014-08-18T13:29:39 |
  | cp1052.eqiad.wmnet          |        96 | 2014-08-18T13:29:38 | 2014-08-18T13:29:39 |
  | cp4008.ulsfo.wmnet          |       173 | 2014-08-18T13:29:37 | 2014-08-18T13:29:38 |
  +-----------------------------+-----------+---------------------+---------------------+
  | Total                       |       772 | 2014-08-18T13:29:37 | 2014-08-18T13:29:39 |
  +-----------------------------+-----------+---------------------+---------------------+

Those hosts are all text caches, but are not limited to a datacenter.

The affect timespan, matches a leader re-election.
See attachment kafka-requests-per-second-2014-08-17--2014-08-19.

There goes kafka's "at least once" guarantee :-D
Comment 3 Andrew Otto 2014-08-21 16:40:55 UTC
Ha, ah yes, ok, if this corresponds with an election, then this makes sense.  The producers themselves have errors in the amount of time it takes for the partition leadership to change.  This shouldn't happen, and is something I need to look into for sure.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links