Last modified: 2014-01-14 19:25:36 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T60876, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 58876 - SSL endpoints log %-encoded URLs logged as \x-encoded URLs
SSL endpoints log %-encoded URLs logged as \x-encoded URLs
Status: RESOLVED FIXED
Product: Analytics
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: Unprioritized normal
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-12-22 22:10 UTC by christian
Modified: 2014-01-14 19:25 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description christian 2013-12-22 22:10:49 UTC
SSL endpoints log %-encoded URLs logged as \x-encoded URLs

When requesting %-encoded URLs like

  https://ru.wikipedia.org/wiki/1092_%D0%B3%D0%BE%D0%B4

(note: “https”) we get a log line for

  http://ru.wikipedia.org/wiki/1092_%D0%B3%D0%BE%D0%B4

(%-encoded) from the cache, but the SSL endpoint additionally adds a
log entry using the URL

  https://ru.wikipedia.org/wiki/1092_\xD0\xB3\xD0\xBE\xD0\xB4

(\x-encoded).
The latter, \x-encoded URL cannot be fetched, and distorts logs.

I'd prefer if we have no \x-encoded URLs in our logs.

Should we:
* try to fix the SSL endpoints to not log distorted URLs, or
* stop having ssl endpoints in the udp2log log stream altogether
  (Currently, https requests get two entries in the log stream. One
  from the SSL endpoint, and one from the responding cache)

?
Comment 1 christian 2013-12-22 22:11:28 UTC
Actual request and log entries:



* request:

___________________________________________________________
christian@spencer // 0 // 21:32:57
cwd: ~/tmp/encoding-test
LC_ALL=C wget https://ru.wikipedia.org/wiki/1092_год
--2013-12-22 21:34:14--  https://ru.wikipedia.org/wiki/1092_%D0%B3%D0%BE%D0%B4
Resolving ru.wikipedia.org... 91.198.174.192
Connecting to ru.wikipedia.org|91.198.174.192|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: `1092_\320\263\320\276\320\264.2'

    [ <=>                                                            ] 80,537       510K/s   in 0.2s

2013-12-22 21:34:15 (510 KB/s) - `1092_\320\263\320\276\320\264' saved [80537]



* Corresponding log entries from udp2log stream:

amssq57.esams.wikimedia.org	4663480343	2013-12-22T20:34:15	0.596358538	$WIKIMEDIA_IP	miss/200	80537	GET	http://ru.wikipedia.org/wiki/1092_%D0%B3%D0%BE%D0%B4	-	text/html; charset=UTF-8	-	$MY_IP	Wget/ (linux-gnu)	-	-
ssl3002	454288692	2013-12-22T20:34:15.377	0.950	$MY_IP	-/200	81682	GET	https://ru.wikipedia.org/wiki/1092_\xD0\xB3\xD0\xBE\xD0\xB4	NONE/wikimedia	-	-	-	Wget/%20(linux-gnu)	-	-
Comment 2 christian 2013-12-22 22:24:16 UTC
The problem does not show on sampled-1000, mobile, zero stream but is
visible on unsampled streams that do not filter to hosts. So for example
the edit stream, and webstatscollector output (and hence stats.grok.se).

Especially the exposure of this problem through webstatscollector, seems
problematic, as people start to add redirects for the non-existing but
seemingly requested \x encoded URLs. :-/
(See bug 58316)
Comment 3 Bingle 2013-12-22 22:36:50 UTC
Prioritization and scheduling of this bug is tracked on Mingle card https://wikimedia.mingle.thoughtworks.com/projects/analytics/cards/1351
Comment 4 Gerrit Notification Bot 2014-01-04 18:20:09 UTC
Change 105449 had a related patch set uploaded by QChris:
Log correctly encoded url with parameters for nginx

https://gerrit.wikimedia.org/r/105449
Comment 5 Gerrit Notification Bot 2014-01-14 18:57:01 UTC
Change 105449 merged by Ottomata:
Log correctly encoded url with parameters for nginx

https://gerrit.wikimedia.org/r/105449

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links