Last modified: 2013-09-06 19:24:26 UTC
Created attachment 11664 [details] Screenshot of Beeswax showing parse failure Sort out the field separator issue in your handling of squid logs first. To summarize: 1) Kafka byte offset is delimited from hostname by a tab (\t). 2) Other fields are delimited by a space (\0020). 3) The content-type field contains unescaped spaces. 4) Beeswax only supports splitting on a single character. As a result: 1) Byte offset is not separable from the hostname ("316554683463cp1043.wikimedia.org") 2) Unescaped spaces in the content type field cause it to span a variable number of columns. 3) It is impossible to select the user agent field. I'd like a solution to this that does not require that I provide a jar file for customized string processing.
Fixed https://gerrit.wikimedia.org/r/#/c/46942/