Last modified: 2014-02-26 12:54:53 UTC
16:43 <DarTar> ori-l: SELECT event_targetTitle FROM `GettingStarted_5243394` WHERE uuid = "51c6149554d85e77b665f303f28adf25"; 16:43 <DarTar> Héctor Elizondo
Event in question is in all-events-json.log-20130215.gz
It turns out this is not a bug in json2sql but in the instrumentation of GettingStarted, updating the ticket accordingly.
I tested again with a page called 'Some contrivéd page name!'()*~' (no quotes). The JSON is: {"event":{"action":"gettingstarted-click","funnel":"gettingstarted","targetTitle":"Some contrivéd page name!'()*~","experimentId":"ob3","userId":1,"isNew":false},"isValid":true,"revision":5219269,"schema":"GettingStarted","webHost":"127.0.0.1","wiki":"testwiki"} Note that the logging for GettingStarted is in E3Experiments. So if it were a client-side bug, it would probably be there. For the record, the page in question above is https://en.wikipedia.org/wiki/H%C3%A9ctor_Elizondo à is http://www.fileformat.info/info/unicode/char/00c3/index.htm © is http://www.fileformat.info/info/unicode/char/00a9/index.htm é (the correct one) is http://www.fileformat.info/info/unicode/char/00e9/index.htm If you follow the last link, you will see the UTF-8 is: UTF-8 (hex) 0xC3 0xA9 (c3a9) So it looks like the UTF-8 bytes are being separated and projected out to UTF-16 (the format that site happens to use for the URL). But for now, back to EventLogging.
Nope, it wasn't GettingStarted. Fixed in change I0f4ea76b911e572405bcfbde23be74d29f7fd783.
Adding a bit of documentation for future reference. If we run into unicode / URL issues in the future, we can try replacing all code points above the ascii range with unicode escape sequences: function escapeChar( char ) { var codePoint = '0000' + char.charCodeAt(0).toString(16); return "\\u" + codePoint.slice(-4); } function toSafeJSON( obj ) { var json = $.toJSON( obj ); return json.replace( /[\u007f-\uffff]/g, escapeChar ); }
If this problem does crop up again, let's try to figure out the underlying cause before trying something like toSafeJSON.
[moving from MediaWiki extensions to Analytics product - see bug 61946]