Last modified: 2014-08-21 02:56:09 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T68101, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 66101 - Multiple user_ids per username in account creation events from ServerSideAccountCreation log
Multiple user_ids per username in account creation events from ServerSideAcco...
Status: NEW
Product: Analytics
Classification: Unclassified
EventLogging (Other open bugs)
unspecified
All All
: Normal normal
: ---
Assigned To: Toby Negrin
u=noone c=none p=0 s=none
:
Depends on: 67175
Blocks:
  Show dependency treegraph
 
Reported: 2014-06-03 23:35 UTC by Dario Taraborelli
Modified: 2014-08-21 02:56 UTC (History)
15 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Dario Taraborelli 2014-06-03 23:35:30 UTC
There are ca. 14K account creation events generated in April (from both the mobile and the desktop site) with the same user_name associated to multiple user_id's in ServerSideAccountCreation:

SELECT event_userName, COUNT(event_userId) AS dupe, MIN(timestamp), MAX(timestamp) FROM ServerSideAccountCreation_5487345 WHERE event_isSelfMade = 1 AND LEFT(timestamp, 6) = '201404' GROUP BY 1 HAVING dupe > 1 ORDER BY dupe DESC;

(all events for April seem to have been generated between April 4 and April 24)

The problem goes back to the very beginning of the log (June 2013) and continues until now, for a total of 200K entries with multiple user_ids over the course of a year.

I haven't done any further investigation but this affects anyone counting usernames as opposed to distinct user_ids.
Comment 1 nuria 2014-06-04 13:57:12 UTC
Given that webhosts are different I do not think this is a bug, as that schema is userNames are not unique per ServerSideAccountCreation event but rather per event and webhost.

Please see:

> select id, uuid,webHost from ServerSideAccountCreation_5487345 where event_userName='<removed>';
+---------+----------------------------------+------------------+
| id      | uuid                             | webHost          |
+---------+----------------------------------+------------------+
| 1969244 | 4b20276622cc59788f029aeb2099f9db | bg.wikipedia.org |
| 1970649 | cba01f0c458e590382b19ae4827b707d | www.wikidata.org |
+---------+----------------------------------+------------------+
2 rows in set (1.97 sec)

> select id, uuid,webHost from ServerSideAccountCreation_5487345 where event_userName='<removed>';
+---------+----------------------------------+------------------+
| id      | uuid                             | webHost          |
+---------+----------------------------------+------------------+
| 1968557 | 14fb6a363ac25f92bdd9efa4ed9ab40c | mr.wikipedia.org |
| 2320001 | f6b428c53ca450458d33074775c6a035 | ar.wikipedia.org |
+---------+----------------------------------+------------------+
2 rows in set (2.16 sec)
Comment 2 Dario Taraborelli 2014-06-04 14:19:23 UTC
I reached out to Chris Steipp and James Forrester to hear from them about SUL implications. Event IDs should be unique by design so that's as expected. Here I am referring to new user_names associated to multiple user_ids (which suggests that users still have the ability to register on multiple sites with the same username). If I get a confirmation that this is indeed related to the way SUL is implemented, I'll move the ticket to the corresponding owner.
Comment 3 nuria 2014-06-04 15:02:21 UTC
> Here I am referring to new user_names associated to multiple user_ids
Right, we see that on the records (same user name for two user_ids) but that should not be a bug on EL data, rather on the account creation process itself.


Now, there is a bug regarding encoding and storage on that table:
non ascii chars are entered like '???', which will return "false" matching. If anyone is grabbing usernames from there they likely have run into encoding issues.

Many records are of this type:
???????? ?????? ?????????         

I have file a bug for the encoding issue:

https://bugzilla.wikimedia.org/show_bug.cgi?id=66123
Comment 4 Chris Steipp 2014-06-04 16:22:39 UTC
If you're pulling those user_ids from the local wikis, then it's totally expected for them to have different user_ids across wikis. The text of the username is how we relate accounts.

Each wiki will assign a new user_id (user.user_id in the local wiki db) to a username when the user is created or autocreated there-- sequential for that wiki. When a user is created the first time, they should get a CentralAuth global id (globaluser.gu_id in the centralauth database). There shouldn't be a way for the same username to have multiple gu_id's in the centralauth db. If that's happening, we have a very, very bad problem. But gu_name is declared as a unique key in the database, so someone would have had to manually update our centralauth db for that to happen.
Comment 5 Dario Taraborelli 2014-06-05 03:58:23 UTC
I'm aware that local wikis have different user_ids, but should I still expect new users to be able to register on multiple wikis with the same user_name and without their account being automatically unified (and a record being created in the centralauth DB)? Shouldn't all new accounts be unified by default?

For example:

SELECT * FROM ServerSideAccountCreation_5487345 WHERE `event_username` = 'Jlmcnamara';

has 3 separate account creation events in 2014 (on enwikisourcewiki, commonswiki, specieswiki) but no record in globaluser.

Under what conditions does this happen?
Comment 6 Chris Steipp 2014-06-05 17:53:20 UTC
Sorry, I didn't understand what you were saying. If no global account is being created, that's definitely a SUL bug. I don't think we intentionally allow that anywhere. I'll look into it.
Comment 7 nuria 2014-06-05 17:59:24 UTC
Could we move this bug to the corresponding category? It does not seem a bug on EL but rather on the account creation process itself.

u=nuria@wikimedia.org c=Wikimetrics p=0 s=2014-05-29
Comment 8 nuria 2014-06-05 18:00:58 UTC
u=nuria@wikimedia.org c=EventLogging p=0 s=2014-05-29
Comment 9 Andre Klapper 2014-06-05 18:56:50 UTC
(In reply to nuria from comment #8)
> u=nuria@wikimedia.org c=EventLogging p=0 s=2014-05-29

-> I've put this into the Whiteboard field so it can get picked up
Comment 10 Nemo 2014-06-05 20:44:51 UTC
(In reply to Dario Taraborelli from comment #5)
> Shouldn't all new accounts be
> unified by default?

Should, but are not: see bug 39996 and friends. There are over 15k broken accounts on Meta-Wiki only, see bug 61876.
Comment 11 Chris Steipp 2014-06-25 15:57:58 UTC
In https://bugzilla.wikimedia.org/show_bug.cgi?id=39996#c73 Aaron mentioned,

> Look at the addUser() method, which has the line:
> 
> if ( !$central->exists() && !$central->listUnattached() ) {
> ...
> }

Which I think is actually directly related to _this_ bug. This check means that for someone registering a new account, if there isn't a global account, but there are unattached accounts with this same name, the account isn't created in centralauth-- it's kept as a local only account.

I believe this is the behavior causing the issue that Analytics saw, right?

I think there's a question as to what the correct behavior should be in this case.
Comment 12 Aaron Halfaker 2014-08-19 18:54:05 UTC
You probably shouldn't be able to register an account name if someone already has a global account with that name.  You should need to pick a new name.
Comment 13 Matthew Flaschen 2014-08-20 23:17:37 UTC
(In reply to Aaron Halfaker from comment #12)
> You probably shouldn't be able to register an account name if someone
> already has a global account with that name.  You should need to pick a new
> name.

Chris's question was about a different scenario, when there isn't a global account, but there are local accounts on other wikis with the same name.

However, I think it's still better to not allow creation in this scenario.  This should reduce the number of account renames or people getting stuck with Example~xywiki usernames when we do Single User Finalization.
Comment 14 Chris Steipp 2014-08-20 23:58:42 UTC
(In reply to Matthew Flaschen from comment #13)
> However, I think it's still better to not allow creation in this scenario. 
> This should reduce the number of account renames or people getting stuck
> with Example~xywiki usernames when we do Single User Finalization.

If we prevent the local account from being created at this point, then any users who can't currently globalize their account (someone else with more edits has their name on another wiki, and hasn't globalized the name for some reason), then the user is prevented from doing any cross-wiki work. I'm not seeing a good way to resolve that pre-finalization.

Once we do finalize, merging one more ~wiki account isn't much extra work on our end.
Comment 15 Matthew Flaschen 2014-08-21 02:56:09 UTC
(In reply to Chris Steipp from comment #14)
> If we prevent the local account from being created at this point, then any
> users who can't currently globalize their account (someone else with more
> edits has their name on another wiki, and hasn't globalized the name for
> some reason), then the user is prevented from doing any cross-wiki work.

You're right.  I didn't think of that.  They could still create an account, of course, but they'd have to have different usernames across the cluster, which is definitely not ideal.

> Once we do finalize, merging one more ~wiki account isn't much extra work on
> our end.

True.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links