Last modified: 2014-07-29 20:04:13 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T70819, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 68819 - Duplicates in Hadoop cluster's webrequest data around 2014-07-29 ~01:40
Duplicates in Hadoop cluster's webrequest data around 2014-07-29 ~01:40
Status: RESOLVED WONTFIX
Product: Analytics
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: Unprioritized normal
: ---
Assigned To: Nobody - You can work on this!
u=Cluster c=General/Unknown p=0 s=201...
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-07-29 20:03 UTC by christian
Modified: 2014-07-29 20:04 UTC (History)
5 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description christian 2014-07-29 20:03:05 UTC
Duplicate monitoring of camus imported data flagged a few datasets as
duplicates.

The datasets were all esams, and all for 2014-07-29 01:00.
Comment 1 christian 2014-07-29 20:04:13 UTC
It turned out, that there was some link flapping around esams [1],
which matches the timestamp for the dupes. This timestamp also matches
message delivery errors in the kafka logs [2].

Hence it explains where the dupes are coming from.
Since deduping is not in place, closing as WONTFIX.

----------------------

The dupes were real, and not monitoring hiccups.

The sequence number dupes not only agree in the sequence number +
host, but all other fields (urls, dt, referer, ...) also agree. So the
dupes look like plain dupes and not like mangled packets.



[1] See
  http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-operations/20140729.txt
  between [01:36:07] [02:02:19]

[2] See
  http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-analytics/20140729.txt
  between [19:43:45] and [19:43:47].

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links