Last modified: 2014-04-17 14:53:10 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T65836, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 63836 - Validation failure when uploading a new cohort
Validation failure when uploading a new cohort
Status: RESOLVED FIXED
Product: Analytics
Classification: Unclassified
Wikimetrics (Other open bugs)
unspecified
All Linux
: High normal
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-04-11 21:20 UTC by Pete F
Modified: 2014-04-17 14:53 UTC (History)
6 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
screenshot of error message (31.37 KB, image/png)
2014-04-11 21:25 UTC, Tighe
Details
Tighe's cohort CSV (8.50 KB, text/csv)
2014-04-11 21:31 UTC, Tighe
Details

Description Pete F 2014-04-11 21:20:23 UTC
I am trying to upload a cohort of 103 users and get a "validation failure" bug. I was successful with a subset of 4 users, but not with larger subsets (I tried splitting into two and uploading each "half" and still got validation failures).
Comment 1 Tighe 2014-04-11 21:24:45 UTC
I'm having a similar problem when I test upload a cohort. I uploaded my cohort yesterday and it worked (after Dan fixed my other problem). I upload the same cohort today and get "validation: FAILURE". Is it because of the size of the cohort? Pete's is 100+ and mine is 400+

(In reply to Pete F from comment #0)
> I am trying to upload a cohort of 103 users and get a "validation failure"
> bug. I was successful with a subset of 4 users, but not with larger subsets
> (I tried splitting into two and uploading each "half" and still got
> validation failures).
Comment 2 Bingle 2014-04-11 21:25:21 UTC
Prioritization and scheduling of this bug is tracked on Mingle card https://wikimedia.mingle.thoughtworks.com/projects/analytics/cards/cards/1539
Comment 3 Tighe 2014-04-11 21:25:34 UTC
Created attachment 15084 [details]
screenshot of error message

Same cohort uploaded yesterday and validated successfully.
Comment 4 Toby Negrin 2014-04-11 21:29:15 UTC
Hi Tighe -- do we have the cohorts?

thanks,

-Toby
Comment 5 Tighe 2014-04-11 21:31:05 UTC
Created attachment 15085 [details]
Tighe's cohort CSV

Some of the usernames will be invalid (known issue to me), but most should be valid and should not result in validation:FAILURE
Comment 6 Dan Andreescu 2014-04-12 05:29:07 UTC
Thanks for the bug report Tighe.

I tried to debug for an hour or so tonight and I have some progress but no resolution.  Unfortunately I won't be able to get back to this until Monday or Tuesday next week.

So, it's not the size of the cohort, wikimetrics accepts much larger cohorts than the ones you're mentioning.  And the problem seems totally unrelated to yesterday's bugs.  As far as I can tell, wikimetrics is choking on some non-standard characters that show up in your cohort.  These shouldn't have validated yesterday either, so I'm thinking somehow this file has changed since then.  If that's not true, then I'm very puzzled because the code hasn't changed at all and all the ghost process problems are gone.  Here's what I did:

I stripped any non-ascii character from your cohort and validated that, and that worked fine.  about 80+ users were found valid.  That's obviously not a solution but it goes to say that we're having strange character issues.

I will look into this in more depth and provide a fix early next week.  The really strange thing is that I expressly tested these kinds of characters and they worked, and they also work fine on my local machine.
Comment 7 Dan Andreescu 2014-04-12 06:06:12 UTC
This may be useful if someone else decides to debug:

milimetric@wikimetrics-staging1:/srv/wikimetrics$ sudo tail -f /var/log/upstart/wikimetrics-queue.log
    return self.run(*args, **kwargs)
  File "/srv/wikimetrics/wikimetrics/models/validate_cohort.py", line 21, in async_validate
    validate_cohort.run()
  File "/srv/wikimetrics/wikimetrics/models/validate_cohort.py", line 109, in run
    self.validate_records(session, cohort)
  File "/srv/wikimetrics/wikimetrics/models/validate_cohort.py", line 177, in validate_records
    validate_users(wikiusers, project, self.validate_as_user_ids)
  File "/srv/wikimetrics/wikimetrics/models/validate_cohort.py", line 270, in validate_users
    raise e
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8' in position 13: ordinal not in range(128)


The problem is in wikimetrics/controllers/forms/cohort_upload.py:parse_username

Basically, character set handling in python 2.x is unfairly difficult and seemingly randomly stops working.  We want to switch to python 3.x and I think this is even more proof that we should
Comment 8 Pete F 2014-04-12 15:39:19 UTC
Thanks Dan. In the short term will try uploading a subset that excludes non-alphanumeric characters. I think that will allow me to learn what I need, and I can (I think?) expand my cohort in the future if and when this is resolved.
Comment 9 nuria 2014-04-14 13:21:24 UTC
The failure is actually happening on decode, rather than encode but since we are swallowing errors there we see it on the line after.

File:
wikimetrics/controllers/forms/cohort_upload.py

Line where error is first presen:
username = username.decode('utf8', errors='ignore')


Still looking into it.
Comment 10 Gerrit Notification Bot 2014-04-15 09:09:30 UTC
Change 125961 had a related patch set uploaded by QChris:
Fix type of user_name in SQLAlchemy's model of MediaWiki's user table

https://gerrit.wikimedia.org/r/125961
Comment 11 Gerrit Notification Bot 2014-04-15 20:57:59 UTC
Change 125961 abandoned by QChris:
Fix type of user_name in SQLAlchemy's model of MediaWiki's user table

Reason:
The Bug 63836 will get fixed by

  https://gerrit.wikimedia.org/r/#/c/125752/

instead.

https://gerrit.wikimedia.org/r/125961
Comment 12 Tighe 2014-04-16 16:28:55 UTC
The fix allowed me to validate my cohort (attached) but it appears to have rejected all usernames in Arabic script.
Comment 13 Dan Andreescu 2014-04-17 14:53:10 UTC
Fixed by https://gerrit.wikimedia.org/r/#/c/125752/

Tighe, I know you haven't confirmed yet so feel free to reopen if you find issues.  I'm closing because two other users confirmed the fix.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links