Last modified: 2012-10-23 17:16:26 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T41756, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 39756 - CSV log processing garbled on page names with '@'
CSV log processing garbled on page names with '@'
Status: RESOLVED INVALID
Product: MediaWiki extensions
Classification: Unclassified
E3 Experiments (Other open bugs)
master
All All
: Unprioritized normal (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-08-29 03:37 UTC by spage
Modified: 2012-10-23 17:16 UTC (History)
5 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description spage 2012-08-29 03:37:54 UTC
While processing the output of (I think) ct2csv.py, some of the CSV lines have an excess field.  In all cases (see Google Doc) some script split the Page field into two.

For example, row 29063:

enwiki,editEvent@1,2012-08-27 11:39:57,1,8EQA1K5WdqfHMcH8TtsIoLiSDt5Bam2L4,0,27881,629,341,151,wpPreview,update,false,Einstein,Home,505769298

that page name was originally "Einstein@Home".  Likewise "News @ 1", "Folding@Home", etc.

I think the fix is to implement backslash escaping, and test it end-to-end.  The original page name should have been encoded as
  this@that@Einstein\@Home@other
and then subsequent processing only split on '@', not '\@'.  Zero-width negative look-behind FTW: conceptually you're splitting on "'@' not preceded by a backslash", which disregarding escaping is (?<!\)@, in Python perhaps it's split('(?<!\\\)' + sep)

You could instead convert @ to some crazy escape using some encoding system, like &#64;, but that just introduces more complexity.
Comment 1 Ori Livneh 2012-10-23 17:16:26 UTC
We're no longer using this homebrew encoding format, fortunately.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links