Last modified: 2014-04-17 14:53:10 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T65836, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 63836 - Validation failure when uploading a new cohort


Summary:	Validation failure when uploading a new cohort

Status:	RESOLVED FIXED

Product:	Analytics
Classification:	Unclassified
Component:	Wikimetrics (Other open bugs)
Version:	unspecified
Hardware:	All Linux

Importance:	High normal
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2014-04-11 21:20 UTC by Pete F
Modified:	2014-04-17 14:53 UTC (History)
CC List:	6 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
screenshot of error message (31.37 KB, image/png) 2014-04-11 21:25 UTC, Tighe	Details
Tighe's cohort CSV (8.50 KB, text/csv) 2014-04-11 21:31 UTC, Tighe	Details
Add an attachment (proposed patch, testcase, etc.)

Description Pete F 2014-04-11 21:20:23 UTC

I am trying to upload a cohort of 103 users and get a "validation failure" bug. I was successful with a subset of 4 users, but not with larger subsets (I tried splitting into two and uploading each "half" and still got validation failures).

Comment 1 Tighe 2014-04-11 21:24:45 UTC

I'm having a similar problem when I test upload a cohort. I uploaded my cohort yesterday and it worked (after Dan fixed my other problem). I upload the same cohort today and get "validation: FAILURE". Is it because of the size of the cohort? Pete's is 100+ and mine is 400+

(In reply to Pete F from comment #0)
> I am trying to upload a cohort of 103 users and get a "validation failure"
> bug. I was successful with a subset of 4 users, but not with larger subsets
> (I tried splitting into two and uploading each "half" and still got
> validation failures).

Comment 2 Bingle 2014-04-11 21:25:21 UTC

Prioritization and scheduling of this bug is tracked on Mingle card https://wikimedia.mingle.thoughtworks.com/projects/analytics/cards/cards/1539

Comment 3 Tighe 2014-04-11 21:25:34 UTC

Created attachment 15084 [details]
screenshot of error message

Same cohort uploaded yesterday and validated successfully.

Comment 4 Toby Negrin 2014-04-11 21:29:15 UTC

Hi Tighe -- do we have the cohorts?

thanks,

-Toby

Comment 5 Tighe 2014-04-11 21:31:05 UTC

Created attachment 15085 [details]
Tighe's cohort CSV

Some of the usernames will be invalid (known issue to me), but most should be valid and should not result in validation:FAILURE

Comment 6 Dan Andreescu 2014-04-12 05:29:07 UTC

Thanks for the bug report Tighe.

I tried to debug for an hour or so tonight and I have some progress but no resolution.  Unfortunately I won't be able to get back to this until Monday or Tuesday next week.

So, it's not the size of the cohort, wikimetrics accepts much larger cohorts than the ones you're mentioning.  And the problem seems totally unrelated to yesterday's bugs.  As far as I can tell, wikimetrics is choking on some non-standard characters that show up in your cohort.  These shouldn't have validated yesterday either, so I'm thinking somehow this file has changed since then.  If that's not true, then I'm very puzzled because the code hasn't changed at all and all the ghost process problems are gone.  Here's what I did:

I stripped any non-ascii character from your cohort and validated that, and that worked fine.  about 80+ users were found valid.  That's obviously not a solution but it goes to say that we're having strange character issues.

I will look into this in more depth and provide a fix early next week.  The really strange thing is that I expressly tested these kinds of characters and they worked, and they also work fine on my local machine.

Comment 7 Dan Andreescu 2014-04-12 06:06:12 UTC

This may be useful if someone else decides to debug:

milimetric@wikimetrics-staging1:/srv/wikimetrics$ sudo tail -f /var/log/upstart/wikimetrics-queue.log
    return self.run(*args, **kwargs)
  File "/srv/wikimetrics/wikimetrics/models/validate_cohort.py", line 21, in async_validate
    validate_cohort.run()
  File "/srv/wikimetrics/wikimetrics/models/validate_cohort.py", line 109, in run
    self.validate_records(session, cohort)
  File "/srv/wikimetrics/wikimetrics/models/validate_cohort.py", line 177, in validate_records
    validate_users(wikiusers, project, self.validate_as_user_ids)
  File "/srv/wikimetrics/wikimetrics/models/validate_cohort.py", line 270, in validate_users
    raise e
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8' in position 13: ordinal not in range(128)


The problem is in wikimetrics/controllers/forms/cohort_upload.py:parse_username

Basically, character set handling in python 2.x is unfairly difficult and seemingly randomly stops working.  We want to switch to python 3.x and I think this is even more proof that we should

Comment 8 Pete F 2014-04-12 15:39:19 UTC

Thanks Dan. In the short term will try uploading a subset that excludes non-alphanumeric characters. I think that will allow me to learn what I need, and I can (I think?) expand my cohort in the future if and when this is resolved.

Comment 9 nuria 2014-04-14 13:21:24 UTC

The failure is actually happening on decode, rather than encode but since we are swallowing errors there we see it on the line after.

File:
wikimetrics/controllers/forms/cohort_upload.py

Line where error is first presen:
username = username.decode('utf8', errors='ignore')


Still looking into it.

Comment 10 Gerrit Notification Bot 2014-04-15 09:09:30 UTC

Change 125961 had a related patch set uploaded by QChris:
Fix type of user_name in SQLAlchemy's model of MediaWiki's user table

https://gerrit.wikimedia.org/r/125961

Comment 11 Gerrit Notification Bot 2014-04-15 20:57:59 UTC

Change 125961 abandoned by QChris:
Fix type of user_name in SQLAlchemy's model of MediaWiki's user table

Reason:
The Bug 63836 will get fixed by

  https://gerrit.wikimedia.org/r/#/c/125752/

instead.

https://gerrit.wikimedia.org/r/125961

Comment 12 Tighe 2014-04-16 16:28:55 UTC

The fix allowed me to validate my cohort (attached) but it appears to have rejected all usernames in Arabic script.

Comment 13 Dan Andreescu 2014-04-17 14:53:10 UTC

Fixed by https://gerrit.wikimedia.org/r/#/c/125752/

Tighe, I know you haven't confirmed yet so feel free to reopen if you find issues.  I'm closing because two other users confirmed the fix.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links