Last modified: 2014-07-29 20:04:13 UTC
Duplicate monitoring of camus imported data flagged a few datasets as duplicates. The datasets were all esams, and all for 2014-07-29 01:00.
It turned out, that there was some link flapping around esams [1], which matches the timestamp for the dupes. This timestamp also matches message delivery errors in the kafka logs [2]. Hence it explains where the dupes are coming from. Since deduping is not in place, closing as WONTFIX. ---------------------- The dupes were real, and not monitoring hiccups. The sequence number dupes not only agree in the sequence number + host, but all other fields (urls, dt, referer, ...) also agree. So the dupes look like plain dupes and not like mangled packets. [1] See http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-operations/20140729.txt between [01:36:07] [02:02:19] [2] See http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-analytics/20140729.txt between [19:43:45] and [19:43:47].