Last modified: 2014-02-26 12:54:53 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T47262, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 45262 - Incorrect decoding of QSON
Incorrect decoding of QSON
Status: RESOLVED FIXED
Product: Analytics
Classification: Unclassified
EventLogging (Other open bugs)
unspecified
All All
: High major
: ---
Assigned To: Ori Livneh
: utf8
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-02-22 00:49 UTC by Ori Livneh
Modified: 2014-02-26 12:54 UTC (History)
8 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Ori Livneh 2013-02-22 00:49:33 UTC
16:43 <DarTar> ori-l: SELECT event_targetTitle FROM `GettingStarted_5243394` WHERE uuid = "51c6149554d85e77b665f303f28adf25";
16:43 <DarTar> Héctor Elizondo
Comment 1 Ori Livneh 2013-02-22 00:56:12 UTC
Event in question is in all-events-json.log-20130215.gz
Comment 2 Dario Taraborelli 2013-02-22 07:22:25 UTC
It turns out this is not a bug in json2sql but in the instrumentation of GettingStarted, updating the ticket accordingly.
Comment 3 Matthew Flaschen 2013-02-22 07:39:45 UTC
I tested again with a page called 'Some contrivéd page name!'()*~' (no quotes).

The JSON is:

{"event":{"action":"gettingstarted-click","funnel":"gettingstarted","targetTitle":"Some contrivéd page name!'()*~","experimentId":"ob3","userId":1,"isNew":false},"isValid":true,"revision":5219269,"schema":"GettingStarted","webHost":"127.0.0.1","wiki":"testwiki"}

Note that the logging for GettingStarted is in E3Experiments.  So if it were a client-side bug, it would probably be there.

For the record, the page in question above is https://en.wikipedia.org/wiki/H%C3%A9ctor_Elizondo

à is http://www.fileformat.info/info/unicode/char/00c3/index.htm
© is http://www.fileformat.info/info/unicode/char/00a9/index.htm
é (the correct one) is http://www.fileformat.info/info/unicode/char/00e9/index.htm

If you follow the last link, you will see the UTF-8 is:

UTF-8 (hex) 	0xC3 0xA9 (c3a9)

So it looks like the UTF-8 bytes are being separated and projected out to UTF-16 (the format that site happens to use for the URL).

But for now, back to EventLogging.
Comment 4 Ori Livneh 2013-02-22 10:23:50 UTC
Nope, it wasn't GettingStarted. Fixed in change I0f4ea76b911e572405bcfbde23be74d29f7fd783.
Comment 5 Ori Livneh 2013-02-22 23:30:31 UTC
Adding a bit of documentation for future reference. If we run into unicode / URL issues in the future, we can try replacing all code points above the ascii range with unicode escape sequences:

	function escapeChar( char ) {
		var codePoint = '0000' + char.charCodeAt(0).toString(16);
		return "\\u" + codePoint.slice(-4);
	}

	function toSafeJSON( obj ) {
		var json = $.toJSON( obj );
		return json.replace( /[\u007f-\uffff]/g, escapeChar );
	}
Comment 6 Matthew Flaschen 2013-02-23 11:46:13 UTC
If this problem does crop up again, let's try to figure out the underlying cause before trying something like toSafeJSON.
Comment 7 Andre Klapper 2014-02-26 12:54:53 UTC
[moving from MediaWiki extensions to Analytics product - see bug 61946]

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links