Last modified: 2013-09-06 19:24:26 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T46236, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 44236 - Inconsistent field separation makes Squid logs in Hadoop largely unusable
Inconsistent field separation makes Squid logs in Hadoop largely unusable
Status: RESOLVED FIXED
Product: Analytics
Classification: Unclassified
Refinery (Other open bugs)
unspecified
All All
: High critical
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-01-22 10:41 UTC by Ori Livneh
Modified: 2013-09-06 19:24 UTC (History)
6 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Screenshot of Beeswax showing parse failure (76.15 KB, image/png)
2013-01-22 10:41 UTC, Ori Livneh
Details

Description Ori Livneh 2013-01-22 10:41:05 UTC
Created attachment 11664 [details]
Screenshot of Beeswax showing parse failure

Sort out the field separator issue in your handling of squid logs first.

To summarize:

1) Kafka byte offset is delimited from hostname by a tab (\t).
2) Other fields are delimited by a space (\0020).
3) The content-type field contains unescaped spaces.
4) Beeswax only supports splitting on a single character.

As a result:

1) Byte offset is not separable from the hostname ("316554683463cp1043.wikimedia.org")
2) Unescaped spaces in the content type field cause it to span a variable number of columns.
3) It is impossible to select the user agent field.

I'd like a solution to this that does not require that I provide a jar file for customized string processing.
Comment 1 Diederik van Liere 2013-01-31 23:48:58 UTC
Fixed https://gerrit.wikimedia.org/r/#/c/46942/

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links