Last modified: 2014-09-19 20:09:26 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T66181, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 64181 - Kafkatee zero files having 10% less requests than udp2log zero files
Kafkatee zero files having 10% less requests than udp2log zero files
Status: NEW
Product: Analytics
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: Normal normal
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-04-21 12:12 UTC by christian
Modified: 2014-09-19 20:09 UTC (History)
6 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description christian 2014-04-21 12:12:46 UTC
While the kafkatee zero files and udp2log zero files previously
more or less agreed in number of lines, the kafkatee zero files
currently have ~10% less lines than udp2log zero files. As the
files start to run appart for the 2014-03-27 file [1], it seems
to have started around 2014-03-26.

Is this difference expected?



P.S.: Running

  zgrep '^cp3014.*http://en\.m\.wikipedia\.org/wiki/Mamunur_Rashid' /a/{squid/archive,log/webrequest}/zero/zero.tsv.log-20140419.gz

on stat1002 will give you a harmless looking log line that's only in the udp2log stream, and not in the kafka stream.




[1]
___________________________________________________________
qchris@stat1002 // 0 // 11:44:00
cwd: ~
for i in /a/squid/archive/zero/zero.tsv.log-20140[34]*.gz ; do echo "$i: $(zcat $i | wc -l)" ; done
[...]
/a/squid/archive/zero/zero.tsv.log-20140324.gz: 4092113
/a/squid/archive/zero/zero.tsv.log-20140325.gz: 3953345
/a/squid/archive/zero/zero.tsv.log-20140326.gz: 3944141
/a/squid/archive/zero/zero.tsv.log-20140327.gz: 4145574
/a/squid/archive/zero/zero.tsv.log-20140328.gz: 6191020
/a/squid/archive/zero/zero.tsv.log-20140329.gz: 8338602
/a/squid/archive/zero/zero.tsv.log-20140330.gz: 10344328
/a/squid/archive/zero/zero.tsv.log-20140331.gz: 11867766
/a/squid/archive/zero/zero.tsv.log-20140401.gz: 12191221
/a/squid/archive/zero/zero.tsv.log-20140402.gz: 13204630
/a/squid/archive/zero/zero.tsv.log-20140403.gz: 13793041
[...]

___________________________________________________________
qchris@stat1002 // 0 // 11:44:15
cwd: ~
for i in /a/log/webrequest/zero/zero.tsv.log-20140[34]*.gz ; do echo "$i: $(zcat $i | wc -l)" ; done
[...]
/a/log/webrequest/zero/zero.tsv.log-20140324.gz: 4087376
/a/log/webrequest/zero/zero.tsv.log-20140325.gz: 3946964
/a/log/webrequest/zero/zero.tsv.log-20140326.gz: 3939808
/a/log/webrequest/zero/zero.tsv.log-20140327.gz: 3918716
/a/log/webrequest/zero/zero.tsv.log-20140328.gz: 5545742
/a/log/webrequest/zero/zero.tsv.log-20140329.gz: 7500078
/a/log/webrequest/zero/zero.tsv.log-20140330.gz: 9303477
/a/log/webrequest/zero/zero.tsv.log-20140331.gz: 10671821
/a/log/webrequest/zero/zero.tsv.log-20140401.gz: 10971750
/a/log/webrequest/zero/zero.tsv.log-20140402.gz: 11885403
/a/log/webrequest/zero/zero.tsv.log-20140403.gz: 12417002
[...]
Comment 1 Bingle 2014-04-21 12:15:29 UTC
Prioritization and scheduling of this bug is tracked on Mingle card https://wikimedia.mingle.thoughtworks.com/projects/analytics/cards/cards/1560
Comment 2 Andrew Otto 2014-04-21 16:09:19 UTC
I found a line that was in udp2log output, but not in kafkatee output:

  cp1046.eqiad.wmnet2014-04-19T07:00:350.000164270X.X.X.Xhit/20017375GEThttp://en.m.wikipedia.org/wiki/Mobile_application_software-text/html; charset=UTF-8--XXXXenzero=410-01

(Note that the sequence number has been removed from this line.)

I checked the webrequest_mobile Hive table in hdfs for this line, and it does exist.  That means that the kafkatee missing logs were in the kafka log buffer, as they were consumed properly by Camus into HDFS.

Still sleuthing...
Comment 3 Andrew Otto 2014-04-21 16:22:01 UTC
Also of interest, the kafkatee process on analytics1003 has only been running since March 26th.
Comment 4 Toby Negrin 2014-04-21 18:30:28 UTC
Let's prioritize this -- the kafka-tee stuff is pretty critical to get right.
Comment 5 christian 2014-04-22 13:52:06 UTC
Since it was constantly missing 10% and we have 10 partitions, the
assumption is that some partition did not get properly fed into
varnishkafka.

Ottomata restarted varnishkafka, and the problem disappeared:
(...-$DATE.gz is filename,
  2014-04-20T$HOUR:xx:xx is the extracted hour, and
  difference is NUMBER_OF_LINES_IN_KAFKATEE_FILE * 100 / NUMBER_OF_LINES_IN_ UDP2LOG_FILE - 100
    (we want difference to be 0)
)


...-20140421.gz: 2014-04-20T07:xx:xx   -11% difference
...-20140421.gz: 2014-04-20T08:xx:xx   -11% difference
...-20140421.gz: 2014-04-20T09:xx:xx   -11% difference
...-20140421.gz: 2014-04-20T10:xx:xx   -11% difference
...-20140421.gz: 2014-04-20T11:xx:xx   -11% difference
...-20140421.gz: 2014-04-20T12:xx:xx   -11% difference
...-20140421.gz: 2014-04-20T13:xx:xx   -10% difference
...-20140421.gz: 2014-04-20T14:xx:xx   -11% difference
...-20140421.gz: 2014-04-20T15:xx:xx   -10% difference
...-20140421.gz: 2014-04-20T16:xx:xx   -11% difference
...-20140421.gz: 2014-04-20T17:xx:xx   -11% difference
...-20140421.gz: 2014-04-20T18:xx:xx   -11% difference
...-20140421.gz: 2014-04-20T19:xx:xx   -10% difference
...-20140421.gz: 2014-04-20T20:xx:xx   -10% difference
...-20140421.gz: 2014-04-20T21:xx:xx   -10% difference
...-20140421.gz: 2014-04-20T22:xx:xx   -11% difference
...-20140421.gz: 2014-04-20T23:xx:xx   -11% difference
...-20140421.gz: 2014-04-21T00:xx:xx   -11% difference
...-20140421.gz: 2014-04-21T01:xx:xx   -11% difference
...-20140421.gz: 2014-04-21T02:xx:xx   -10% difference
...-20140421.gz: 2014-04-21T03:xx:xx   -10% difference
...-20140421.gz: 2014-04-21T04:xx:xx   -11% difference
...-20140421.gz: 2014-04-21T05.xx:xx   -11% difference
...-20140421.gz: 2014-04-21T06:xx:xx   -25% difference <-- log rotation
...-20140422.gz: 2014-04-21T06:xx:xx    -4% difference <-- log rotation
...-20140422.gz: 2014-04-21T07:xx:xx   -11% difference
...-20140422.gz: 2014-04-21T08:xx:xx   -10% difference
...-20140422.gz: 2014-04-21T09:xx:xx   -11% difference
...-20140422.gz: 2014-04-21T10:xx:xx   -11% difference
...-20140422.gz: 2014-04-21T11:xx:xx    -6% difference <-- ?
...-20140422.gz: 2014-04-21T12:xx:xx   -11% difference
...-20140422.gz: 2014-04-21T13:xx:xx   -10% difference
...-20140422.gz: 2014-04-21T14:xx:xx   -10% difference
...-20140422.gz: 2014-04-21T15:xx:xx   -10% difference
...-20140422.gz: 2014-04-21T16:xx:xx   -10% difference
...-20140422.gz: 2014-04-21T17:xx:xx   -10% difference
...-20140422.gz: 2014-04-21T18:xx:xx    -9% difference <-- ottomata restarting kafkatee ~18:30
...-20140422.gz: 2014-04-21T19:xx:xx   -48% difference
...-20140422.gz: 2014-04-21T20:xx:xx     0% difference <-- back to normal.
...-20140422.gz: 2014-04-21T21:xx:xx     0% difference
...-20140422.gz: 2014-04-21T22:xx:xx    -1% difference
...-20140422.gz: 2014-04-21T23:xx:xx    -1% difference
...-20140422.gz: 2014-04-22T00:xx:xx    -1% difference
...-20140422.gz: 2014-04-22T01:xx:xx    -1% difference
...-20140422.gz: 2014-04-22T02:xx:xx    -1% difference
...-20140422.gz: 2014-04-22T03:xx:xx    -1% difference
...-20140422.gz: 2014-04-22T04:xx:xx    -1% difference
...-20140422.gz: 2014-04-22T05:xx:xx    -1% difference


How can we guard against partitions not being fed into varnishkafka in
future?
Comment 6 Toby Negrin 2014-06-12 17:15:03 UTC
We have other kafka issues we need to resolve before we can address this problem.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links