Last modified: 2014-10-20 20:20:22 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T73790, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 71790 - By counting HTTP redirects, webstatscollector reporting too high numbers
By counting HTTP redirects, webstatscollector reporting too high numbers
Status: RESOLVED FIXED
Product: Analytics
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: Unprioritized normal
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-10-08 11:20 UTC by christian
Modified: 2014-10-20 20:20 UTC (History)
8 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description christian 2014-10-08 11:20:34 UTC
One of the longstanding issues with Webstatscollector is that it
counts redirects at the HTTP level.

So for example:
- Requesting a page with a lower case first letter [1],
- Requesting a page from the desktop site on a mobile device [2], or
- Requesting to www.wikipedia.org (first part is www, not a language) [3]
causes two requests to the caches, and webstatscollector counts both,
although actually only a single page is shown to the user.
Thereby too high numbers get reported.

Since we're about the deploy a new webstatscollector anyways, and this
double counting should not be too hard to fix, let's get it fixed too.

(Note that redirects above the HTTP level are not affected. So for example
  http://en.wikipedia.org/wiki/Michael_J_Fox
(no dot after the J) is, was and will be one request, although it shows
the content of
  http://en.wikipedia.org/wiki/Michael_J._Fox
(dot after the J). Such redirects at Wiki level are not affected.)





[1]
_________________________________________________________________
christian@spencer // jobs: 0 // time: 13:13:36 // exit code: 0
cwd: ~
wget -O /dev/null 'http://en.wikipedia.org/wiki/main_page'
--2014-10-08 13:13:39--  http://en.wikipedia.org/wiki/main_page
Resolving en.wikipedia.org... 91.198.174.192
Connecting to en.wikipedia.org|91.198.174.192|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://en.wikipedia.org/wiki/Main_page [following]
--2014-10-08 13:13:39--  http://en.wikipedia.org/wiki/Main_page
Reusing existing connection to en.wikipedia.org:80.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: `/dev/null'

    [ <=>                                                                                                                  ] 67,779      --.-K/s   in 0.1s    

2014-10-08 13:13:39 (472 KB/s) - `/dev/null' saved [67779]



[2]
_________________________________________________________________
christian@spencer // jobs: 0 // time: 13:13:39 // exit code: 0
cwd: ~
wget -O /dev/null --user-agent 'iPhone' 'http://en.wikipedia.org/wiki/Main_Page'
--2014-10-08 13:13:44--  http://en.wikipedia.org/wiki/Main_Page
Resolving en.wikipedia.org... 91.198.174.192
Connecting to en.wikipedia.org|91.198.174.192|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: http://en.m.wikipedia.org/wiki/Main_Page [following]
--2014-10-08 13:13:44--  http://en.m.wikipedia.org/wiki/Main_Page
Resolving en.m.wikipedia.org... 91.198.174.204
Connecting to en.m.wikipedia.org|91.198.174.204|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: `/dev/null'

    [ <=>                                                                                                                  ] 22,002      --.-K/s   in 0.05s   

2014-10-08 13:13:44 (416 KB/s) - `/dev/null' saved [22002]



[3]
_________________________________________________________________
christian@spencer // jobs: 0 // time: 13:13:44 // exit code: 0
cwd: ~
wget -O /dev/null 'http://www.wikipedia.org/wiki/Main_Page'
--2014-10-08 13:13:49--  http://www.wikipedia.org/wiki/Main_Page
Resolving www.wikipedia.org... 91.198.174.192
Connecting to www.wikipedia.org|91.198.174.192|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://en.wikipedia.org/wiki/Main_Page [following]
--2014-10-08 13:13:49--  http://en.wikipedia.org/wiki/Main_Page
Resolving en.wikipedia.org... 91.198.174.192
Reusing existing connection to www.wikipedia.org:80.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: `/dev/null'

    [ <=>                                                                                                                  ] 67,565      --.-K/s   in 0.1s    

2014-10-08 13:13:49 (471 KB/s) - `/dev/null' saved [67565]
Comment 1 Nemo 2014-10-08 11:47:46 UTC
(In reply to christian from comment #0)
> Since we're about the deploy a new webstatscollector anyways, and this
> double counting should not be too hard to fix, let's get it fixed too.

+1. https://meta.wikimedia.org/w/index.php?title=Research_talk:Page_view&oldid=10069001#Special_namespace_and_actual_problems (I'll miss stats for Special:MyLanguage, but that was a dirty trick).

Are we talking of 301 and 302 or something more?
Comment 2 christian 2014-10-08 14:38:23 UTC
(In reply to Nemo from comment #1)
> I'll miss
> stats for Special:MyLanguage, [...]

Yup. I'll miss stats for Special:Random :-(

> Are we talking of 301 and 302 or something more?

301, 302, and 303.

303 basically only affects bots on wikidata. But there, some requests [1]
see two 303s, before content gets sent.



[1]
_________________________________________________________________
christian@spencer // jobs: 0 // time: 16:34:01 // exit code: 0
cwd: ~
wget -O /dev/null --header='Accept: text/html' 'https://www.wikidata.org/entity/Q507970'
--2014-10-08 16:34:02--  https://www.wikidata.org/entity/Q507970
Resolving www.wikidata.org... 91.198.174.192
Connecting to www.wikidata.org|91.198.174.192|:443... connected.
HTTP request sent, awaiting response... 303 See Other
Location: https://www.wikidata.org/wiki/Special:EntityData/Q507970 [following]
--2014-10-08 16:34:03--  https://www.wikidata.org/wiki/Special:EntityData/Q507970
Reusing existing connection to www.wikidata.org:443.
HTTP request sent, awaiting response... 303 See Other
Location: https://www.wikidata.org/wiki/Q507970 [following]
--2014-10-08 16:34:03--  https://www.wikidata.org/wiki/Q507970
Reusing existing connection to www.wikidata.org:443.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: `/dev/null'

    [ <=>                                                                                                                  ] 81,443      --.-K/s   in 0.1s    

2014-10-08 16:34:04 (593 KB/s) - `/dev/null' saved [81443]
Comment 3 Yuvi Panda 2014-10-08 14:43:03 UTC
I'm sure we can count special page requests separately if we want them...
Comment 4 christian 2014-10-08 15:11:59 UTC
Oh. Counting of Special pages won't change per se.

It only those Special pages that happen to come with 301, 302, or 303
HTTP status codes.

So for example Special:Search, or Special:Export come with HTTP status
code 200. They'll still be counted as usual.
Comment 5 Gerrit Notification Bot 2014-10-08 21:38:31 UTC
Change 165351 had a related patch set uploaded by QChris:
Release fix that stops counting [uU]ndefined and redirects

https://gerrit.wikimedia.org/r/165351
Comment 6 Gerrit Notification Bot 2014-10-08 21:38:38 UTC
Change 165631 had a related patch set uploaded by QChris:
Stop counting 301, 302, 303 HTTP status codes

https://gerrit.wikimedia.org/r/165631
Comment 7 Gerrit Notification Bot 2014-10-09 14:31:22 UTC
Change 165725 had a related patch set uploaded by QChris:
Stop counting 301, 302, 303 HTTP status codes

https://gerrit.wikimedia.org/r/165725
Comment 8 Gerrit Notification Bot 2014-10-09 15:41:28 UTC
Change 165748 had a related patch set uploaded by QChris:
[webstatscollector] Add condition to not count redirects

https://gerrit.wikimedia.org/r/165748
Comment 9 Gerrit Notification Bot 2014-10-09 22:01:36 UTC
Change 165631 merged by jenkins-bot:
Stop counting 301, 302, 303 HTTP status codes

https://gerrit.wikimedia.org/r/165631
Comment 10 Gerrit Notification Bot 2014-10-09 22:02:19 UTC
Change 165351 merged by Ottomata:
Release fix that stops counting [uU]ndefined and redirects

https://gerrit.wikimedia.org/r/165351
Comment 11 Gerrit Notification Bot 2014-10-15 19:22:55 UTC
Change 165725 merged by QChris:
Stop counting 301, 302, 303 HTTP status codes

https://gerrit.wikimedia.org/r/165725
Comment 13 Nemo 2014-10-15 23:19:34 UTC
Are retroactive adjustments of stats.wikimedia.org pageview stats expected?
Comment 14 Gerrit Notification Bot 2014-10-20 20:20:22 UTC
Change 165748 merged by Ottomata:
[webstatscollector] Add condition to not count redirects

https://gerrit.wikimedia.org/r/165748

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links