Last modified: 2014-02-09 01:57:33 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T63063, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 61063 - Duplicate entries with missing referers in webrequest logs.
Duplicate entries with missing referers in webrequest logs.
Status: RESOLVED INVALID
Product: Analytics
Classification: Unclassified
Refinery (Other open bugs)
unspecified
All All
: Unprioritized normal
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-02-07 23:05 UTC by Oliver Keyes
Modified: 2014-02-09 01:57 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Oliver Keyes 2014-02-07 23:05:49 UTC
So, I've been noodling around in the request logs recently and I've seen a lot of rows that have null entries for some columns. Not too big a deal with most of em - some are things like referrer, or user language, where I can see it not being provided by the client.

Today, though, I encountered requests without a MIME type. Not even requests for weird things - I'm talking, pages like the enwiki main page, or the article on India. I've backtracked the examples I pulled out into hive itself and confirmed that the elements are blank there, too (happy to provide em in private to anyone investigating this).

I'm kinda confused about what's going on. It shouldn't really be possible to send a request and return it without that data, and the invalid requests are coming from both Android and iPhone devices. An investigation upstream (for example, checking if they're null in the varnish memstore, too) would be most welcome.
Comment 1 Bingle 2014-02-07 23:10:38 UTC
Prioritization and scheduling of this bug is tracked on Mingle card https://wikimedia.mingle.thoughtworks.com/projects/analytics/cards/cards/1439
Comment 2 Oliver Keyes 2014-02-08 00:41:51 UTC
Further investigation:

*I went through some of the requestlogs manually and found duplicate requests, about 9-10ms apart, the latter of which had the MIME type and referer stripped. This could be the source of both the MIME type data loss and the referer data loss we've seen with Special:BannerRandom hits. Matt Walker theorises that the problem may be us consuming requestlog data from multiple layers of varnish machines, and thus getting the same requests multiple times. I'm going to yank out the hostnames for the weird hinky hits I've noticed to see.
Comment 3 christian 2014-02-08 15:08:15 UTC
(In reply to comment #0)
> Today, though, I encountered requests without a MIME type.

Requests without a MIME type are fine in many settings.
We're seeing many of them.

Since you seem to be able to reproduce, could you provide a short snippet
that allows to exhibit such a log line?

(I am not asking for the log line itself, but for some chain of actions that
allows us to see a log line that you are concerned about)
Comment 4 Oliver Keyes 2014-02-09 00:30:53 UTC
Alright, you want to hunt for:

*hits to uri_path /wiki/File:Thailand_Surin_locator_map.svg
*Between 2014-01-20T10:14:00 and 2014-01-20T10:15:00

(hopefully that's anonymised enough)

From that particular example, it looks like the (intact) request was a MISS from the varnish cache's point of view, which explains the immediate repeat of the request. Whether it's also responsible for the lack of referrer data is too network engineer-y for me to know - but it is a potential limiter if we want to use MIME type filtering for say, pageviews. The good news is that, assuming my data sample is representative (and it's probably off, since it's 128k mobile views from a specific date), this only happens about 0.03 percent of the time.
Comment 5 Oliver Keyes 2014-02-09 01:48:55 UTC
*Blinks* actually, looking at that example, the MIME type is intact, it's the referrer that's vanished. My brain is...clearly not on today.
Comment 6 Oliver Keyes 2014-02-09 01:57:33 UTC
Closing for now, since there doesn't seem to be any easy link to identify why the referers and such are missing. Blah. I need to do a lot more work before BZing things, I think.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links