Last modified: 2014-01-15 16:47:23 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T55489, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 53489 - Relating tech contributors with organizations
Relating tech contributors with organizations
Status: RESOLVED FIXED
Product: Analytics
Classification: Unclassified
Tech community metrics (Other open bugs)
unspecified
All All
: Highest normal
: ---
Assigned To: Alvaro
:
Depends on:
Blocks: 53485
  Show dependency treegraph
 
Reported: 2013-08-28 17:10 UTC by Quim Gil
Modified: 2014-01-15 16:47 UTC (History)
6 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Quim Gil 2013-08-28 17:10:22 UTC
Currently we have no way to map tech contributors to their organizations. For instance, many WMF employees commit code with their personal email addresses. While this happens we can't assess properly who contributes code:

https://www.mediawiki.org/wiki/Community_metrics#Who_contributes_code

So far we have published a form to allow contributors to introduce their data:

https://docs.google.com/forms/d/1RFUa2zBAOolw78W-ozJPoYlR2lYbrAOYvOZYgjaAYQg/viewform

What needs to be done is to integrate that data at http://korma.wmflabs.org and document the process to update the data.

Not in the scope of this report: come up with a way to allow contributors to manage this data directly through their user profiles.
Comment 1 Quim Gil 2013-08-28 17:14:15 UTC
Note: what matters more in the short term are the organizations. Also, that information is public at least for WMF / WMDE employees, and other professionals. This is a never ending task, but in order to consider this report resolved we should have the top 100 code contributors identified with their orgs.

Contributors' location is completely opt-in. Whatever data we get will be good.
Comment 2 Quim Gil 2013-09-16 16:34:24 UTC
Bringing back the Highest priority since this is indeed a top task in our backlog.
Comment 3 Alvaro 2013-09-30 03:02:41 UTC
Top 100 code contributors using git commits or gerrit merged reviews?

Right now we have a total of 223095 commits with 62631 described as merged reviews. So if we use gerrit data we have 28% of total code contribution activity covered. 

If we analyze data from 2012 we have 61050 merged reviews from a total of 104399 total commits.
Comment 4 Quim Gil 2013-09-30 06:14:16 UTC
Good point. Let's start with the authors who get their code merged.
Comment 5 Alvaro 2013-09-30 07:13:18 UTC
Great! The email data is more accurate so it is easier to do the mapping.

Maybe the best way is to share a spreadsheet with the 100 tops and a company field that will be filled automatically initially with the email domain and that we can check by hand.

Then the companies to people mapping will be added to the Grimoire identities db and the companies report can be created.

I will add to this ticket the progress in this!
Comment 6 Quim Gil 2013-09-30 15:41:24 UTC
(In reply to comment #5)
> Maybe the best way is to share a spreadsheet

Remember that we have already one spreadsheet associated to th form linked in the first post. Let's expand on it rather than create a new one, ok?

This is one of those situation where we are dealing with information which is 100% public but scattered. We need to be more careful before having a public spreadsheet where anybody can copy in two clicks the email addresses of the top 100 Wikimedia tech contributors.
Comment 7 Alvaro 2013-10-02 02:04:06 UTC
Sure Quim, I am working with the spreadsheet from the beginning. But it seems it will be more a confirm and complete data source to be used after a first automatic analysis from gerrit data.

But yes, we should try to reuse it and have only one spreadsheet.
Comment 8 Alvaro 2013-10-02 03:25:54 UTC
I have added a new sheet with the 45 people from the Top100 whose emails are not from wikimedia (the other 55 the affiliation is WM) and research the affiliation. Quim, could you take a look to results?

Is it a good idea to join WMF and WMF DE?
Comment 9 Quim Gil 2013-10-02 04:54:02 UTC
(In reply to comment #8)
> I have added a new sheet with the 45 people from the Top100 whose emails are
> not from wikimedia (the other 55 the affiliation is WM) and research the
> affiliation. Quim, could you take a look to results?

Will do, hopefully tomorrow.

> Is it a good idea to join WMF and WMF DE?

Nope. The Wikimedia Foundation and Wikimedia Deutschland (without F of Foundation) are separate organizations.
Comment 10 Quim Gil 2013-10-02 15:59:32 UTC
Great! I made a few corrections for the most evident cases (WMF former employee, WMDE instead of WMF) but other cases are less clear to interpret, even knowing the data. For instance, Nishayn22 has worked as WMF contractor but honestly I have no idea what is done by him as volunteer and what is done as contractor, and I don't know his professional situation today - or tomorrow.

Proposal: add a spreadsheet with the people among the top 100 conmmitters that didn't fill the form (including their email addresses) and I will send them a request to fill the form. There they can define themselves their affiliation. Those not answering will default to "independent".

That said, I think all this looks already good enough for a first version of Bug 53485 - Key performance indicator: analyze who contributes code. Once these metrics are public we can respond to feedback from the user with wrong/missing data.

Also, the "Name" column contains many no-real-names. Should I edit them?
Comment 11 Alvaro 2013-10-09 17:00:24 UTC
I won't spend time now updating the "Name" column yet. I feel the current id could be enough for people identifying themselves in the report. Later we can provide a way for the users to change that.

Quim, using the data from our spreadsheet we have now a SCR companies report:

http://korma.wmflabs.org/browser/scr-companies.html

About the proposal of detecting from the Top100 those who did not complete the form, if they use "wikimedia.org/de" I think that it is pretty enough now to relate to a company. Do you feel unconfortable with that? If yes, I will complete this list and send to you.
Comment 12 Quim Gil 2013-10-09 17:21:47 UTC
Good to see some data! 

Let's call them "Organizations" instead of "Companies". This is a good idea for free software projects in general, and in our case it's even more needed: WMF and WMDE are non-profits. The story is a bit more complicated than that but calling these orgs companies will raise the eyebrows of some people.

What is "NA"? If it's Non Available then we should consider those "Unknown".

gmail and live should be Unknown" as well.

Also, can we add instructions for contributors to get their data corrected? Or a link to a page with the instructions. It is everybody's interest to keep that "Unknown" as small as possible.
Comment 13 Quim Gil 2013-10-09 18:44:24 UTC
There are more cases that should be aggregated to "Unknown":

users, adres, email, hotmail, yahoo, gmx, googlemail

About the rest, most of them could be aggregated to "Independent"

The end result could then be:

1. Wikimedia Foundation
2. Unknown
3. Wikimedia Deutschland
4. Independent
5. Wikia
6. WikWorks
7. OmniTI

This would identify a lot better what we know about contributing organizations in the MediaWiki community.
Comment 14 Quim Gil 2013-10-09 22:12:09 UTC
I just learned that "NA" are bots (or the only bot so far). If this refers to the i18n bot then I believe it is fair to say that these contributions come from the TranslateWiki community, and it would be good to identify them as an organization.
Comment 15 Alvaro 2013-10-10 10:27:04 UTC
Ok Quim, so:

1. Wikimedia Foundation
2. Unknown
3. Wikimedia Deutschland
4. Independent
5. Wikia
6. WikiWorks
7. OmniTI
8. TranslateWiki

I will use this mapping and update the data following it!

users, adres, email, hotmail, yahoo, gmx, googlemail -> Unknown
rest -> Independent

As soon as the data is updated, I will comment it here!
Comment 16 Alvaro 2013-10-11 07:38:26 UTC
You have the new report in:

http://korma.wmflabs.org/browser/scr-companies.html
Comment 17 Quim Gil 2013-10-11 15:23:36 UTC
Very good! This shows enough committers connected to an organization. From this point we "just"need to update the data based on the feedback received.

What are your thoughts about the other ideas at Comment 12 ? Namely changing "companies" for "organizations" and adding instructions to correct your own data.

By the way, I'm removing the reference to countries in the summary since all the discussion here has been about organizations. I will file a new bug specific to countries.
Comment 18 Alvaro 2013-10-14 07:00:22 UTC
In order to close this issue:

* "companies" will be renamed to organizations
* In order to update contributors data, could we create a google form to store the changes? We will review it periodically and update mapping. In the future, we will need a web interface and do it directly over the database, but now, I will use this approach.
Comment 19 Quim Gil 2013-10-14 17:45:11 UTC
(In reply to comment #18)
> * In order to update contributors data, could we create a google form to
> store the changes? We will review it periodically and update mapping.

We have already a Google Form. Let's use it.
Comment 20 Alvaro 2013-10-15 05:44:04 UTC
Great! So a link to the webform will be added to people.html page in order a user can feedback info about the info the browser is showing.
Comment 21 Quim Gil 2013-10-25 19:48:20 UTC
Let's change "TranslateWiki" for "translatewiki.net". My fault. Thank you!

From nemobis at https://github.com/Bitergia/mediawiki-dashboard/issues/26

http://korma.wmflabs.org/browser/contributors.html mentions "TranslateWiki". This name is ambiguous; you probably meant translatewiki.net.
https://www.mediawiki.org/wiki/Translatewiki.net
Comment 22 Alvaro 2013-10-31 11:05:08 UTC
I have changed this name. In next updates the org name will be changed from "TranslateWiki" to "translatewiki.net".
Comment 23 Quim Gil 2013-12-13 00:52:12 UTC
(In reply to comment #20)
> Great! So a link to the webform will be added to people.html page in order a
> user can feedback info about the info the browser is showing.

I believe the only bit missing to resolve this report as FIXED is to provide instructions to contribute your data to the form. We can open a new report about a better way for users to maintain their data. In the meantime, we will introduce the data manually.
Comment 24 Quim Gil 2013-12-13 07:20:14 UTC
We have decided in Bug 53485 comment 32 that translatewiki.net contributions aka L10n-bot self-merges shouldn't be counted. Please remove them. Sorry for the hassle.
Comment 25 Alvaro 2013-12-17 04:40:47 UTC
Ok, I will remove it in current iteration!
Comment 26 Alvaro 2013-12-17 15:55:59 UTC
(In reply to comment #23)
> (In reply to comment #20)
> > Great! So a link to the webform will be added to people.html page in order a
> > user can feedback info about the info the browser is showing.
> 
> I believe the only bit missing to resolve this report as FIXED is to provide
> instructions to contribute your data to the form. We can open a new report
> about a better way for users to maintain their data. In the meantime, we will
> introduce the data manually.

Is it enough?

http://korma.wmflabs.org/browser/people.html?id=341&name=raymond

the people.html has being updated also to the new design.
Comment 27 Quim Gil 2013-12-17 19:22:20 UTC
Yes, we can close this bug now. There is only a small detail on the wording and the position of the text: 

https://github.com/Bitergia/mediawiki-dashboard/pull/38
Comment 28 Nemo 2014-01-15 09:26:42 UTC
I don't understand a thing, is it possible for a person to be in multiple organisations? For instance, many work at the WMF for a period but are/were independent before and after that, or they change affiliation. If it's possible to have multiple affiliations, are they just double counted or can one specify periods and do those periods have to be non-overlapping?
Comment 29 Nemo 2014-01-15 09:33:13 UTC
Speaking of which, "affiliation" makes more sense than "organization", let alone "company" (!).
Comment 30 Quim Gil 2014-01-15 16:47:23 UTC
(In reply to comment #28)
> I don't understand a thing, is it possible for a person to be in multiple
> organisations? For instance, many work at the WMF for a period but are/were
> independent before and after that, or they change affiliation. If it's
> possible
> to have multiple affiliations, are they just double counted or can one
> specify
> periods and do those periods have to be non-overlapping?

According to https://www.mediawiki.org/wiki/Community_metrics#Contributors

"The classification supports periods of time to cover that a unique people has worked for several companies." I also remember Alvaro mentioning it, but I don't know whether this has been applied already to our metrics.

Ref "affiliations" yes, you are right, see 

Bug 60091 - Tech metrics should talk about "Affiliation" instead of organizations or companies

Thank you!

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links