Last modified: 2014-09-24 00:50:02 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T55485, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 53485 - Key performance indicator: analyze who contributes code
Key performance indicator: analyze who contributes code
Status: ASSIGNED
Product: Analytics
Classification: Unclassified
Tech community metrics (Other open bugs)
unspecified
All All
: High enhancement
: ---
Assigned To: Quim Gil
:
Depends on: 53374 53489
Blocks:
  Show dependency treegraph
 
Reported: 2013-08-28 16:45 UTC by Quim Gil
Modified: 2014-09-24 00:50 UTC (History)
9 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Report for Wikimedia contributors per quarter (12.60 KB, text/plain)
2013-10-24 08:10 UTC, Alvaro
Details
Report for Wikimedia orgs contributors per quarter (1.44 KB, text/plain)
2013-10-24 08:10 UTC, Alvaro
Details

Description Quim Gil 2013-08-28 16:45:38 UTC
We need metrics to answer these questions.

Who is contributing merged code each quarter? How is the weight of the WMF evolving? What regions have a higher density of contributors? The evolution of the total amount of merged commits should be visible too.

The work is being done at 
https://www.mediawiki.org/wiki/Community_metrics#Who_contributes_code
Comment 1 Quim Gil 2013-09-16 16:33:17 UTC
Bringing back the Highest priority since this is indeed a top task in our backlog.
Comment 2 MZMcBride 2013-09-18 01:40:32 UTC
What is KPI?
Comment 3 Quim Gil 2013-09-18 03:38:32 UTC
Sorry: Key Performance Indicator

https://www.mediawiki.org/wiki/Community_metrics#Key_performance_indicators
Comment 4 Quim Gil 2013-09-26 22:06:50 UTC
There are a couple of highest issues that we want to fix before tackling this one.
Comment 5 Alvaro 2013-09-30 07:40:21 UTC
Is it ok to use Gerrit (vs Git) data to answer this question? 

commits = merged submissions

In

http://korma.wmflabs.org/browser/scr.html

we are close to cover this "who contributes code" KPI.

We need to add the metrics computed in

https://www.mediawiki.org/wiki/Community_metrics#Who_contributes_code

We need the mapping between people and companies and countries, something covered in

https://bugzilla.wikimedia.org/show_bug.cgi?id=53489

and reports will be created for both as it exists now for repositories:

http://korma.wmflabs.org/browser/scr-repos.html
Comment 6 Quim Gil 2013-09-30 15:42:55 UTC
(In reply to comment #5)
> Is it ok to use Gerrit (vs Git) data to answer this question? 

Yes, Bug 53489 is a blocker of this one. We are talking about the same thing.
Comment 7 Quim Gil 2013-10-02 16:02:59 UTC
fyi I have included the delivery of this KPI to my monhly goals for October:

https://www.mediawiki.org/wiki/Engineering_Community_Team/Meetings#Monthly_goals

We are organizing an Engineering Community Team Showcase on Oct 29 via videoconference and it would be great to demo this KPI. Álvaro, all the better if you want to do it since you are the main developer working on this.
Comment 8 Quim Gil 2013-10-09 22:36:33 UTC
Let's clarify the vocabulary to be used in metrics reports: 

Authors: anybody contributing a patch, regardless of it being merged or not. 

Committers: the subset of authors extracted from the code that has been committed to the repository. (Alvaro, you see to call these "mergers", which I find confusing.)

Reviewer: anybody commenting or evaluating a changeset in Gerrit (-2 to +2)

Merger: anybody able to exercise +2 permissions, merging proposed changesets to the repository.

Do you agree on this or is there a better way to define there roles?
Comment 9 Alvaro 2013-10-14 07:11:02 UTC
"Mergers" right now are not committers. We are using committers in the git analysis but not in gerrit one. In gerrit we call mergers to contributors whose patch has being merged.

We are using authors for the people who create the contributions committed in git. And committers for the people who commit the contributions from authors. Sometime an author an a committer could be the same guy if the author has commit rights.

So thinking in a map between git and gerrit:

* Author -> Merger
* Committer -> Reviewer?
* -> Contributor (git does not have knowledge about contributions not committed)
Comment 10 Quim Gil 2013-10-22 22:11:30 UTC
Author, committer and reviewer are clear terms. "Merger" is confusing. 

As far as I know it's not a term used to describe a person, even less contributors whose patch has been merged. If anything, it sounds as contributors with +2 permissions in Gerrit that merge code from others.



But we really need to clean this terminology:

http://korma.wmflabs.org/browser/scr.html mentions "top openers" (what is this?) and "top mergers". 

Clicking a contributor e.g.

http://korma.wmflabs.org/browser/people.html?id=278&name=Leslie%20Carr mentions

you see "commits" and "closed reviews", and none of the values match.

I would start by renaming "mergers" by "authors of merged code". We don't need to force one word only if that word doesn't exist. It is more important to have self-explanatory labels in metrics.

Alvaro, how do you think you are doing with this KPI? As of today none of the questions of Comment #0 is answered yet. Do you think this will be ready by the ECT Showcase meeting next week?
Comment 11 Alvaro 2013-10-23 14:18:31 UTC
Taking a look to the original questions:

 * Who is contributing merged code each quarter?

We know who is contributing code and we can create SQL queries for this specific case. Right now we are showing it aggregated in:

http://korma.wmflabs.org/browser/scr.html

We have also the top contributors.

If we want the data group by quarter the best thing right now is to create SQL queries and get the data in text lists. Is it ok Quim?

Later, we can find the best way to viz it. Probably just with tables.

 * How is the weight of the WMF evolving?

This is something answered in the organizations report:

http://korma.wmflabs.org/browser/scr-companies.html

But we can not see how it is evolving. Maybe it is just a matter to do the SQL analysis for:

2000, 2005, 2010, 2013.

 * What regions have a higher density of contributors? 

We don't have regions information yet, so it is not possible to do it yet.

 * The evolution of the total amount of merged commits should be visible too.

This is in:

http://korma.wmflabs.org/browser/scr.html

So from the 4 questions, the final one is answered and I can create queries to answer the other two. I will include them in a report.

You are right Quim, we are putting a lot of effort working in the web browser but this questions was not answered yet!
Comment 12 Quim Gil 2013-10-23 20:59:27 UTC
Ok, then

**** Who is contributing merged code each quarter?

What about this:

* List of all-time top 100 individual "mergers" (as they are called now). Currently only 10 are shown.

* List of top 25 individual "mergers" of each quarter.

* List of all-time organizations, all of them.

* List of organizations of each quarter.


**** How is the weight of the WMF evolving?

> analysis for:
> 
> 2000, 2005, 2010, 2013.

Why not adding the % of commits of each organization in each quarter, in the same "List of all-time organizations" and "List of organizations of each quarter"?
Comment 13 Alvaro 2013-10-24 05:28:03 UTC
Ok, we can do that way. I am working now in SQL queries for getting all these data and we can see later how to present it. Maybe new HTML pages for Top Contributors and Top Orgs including this data.
Comment 14 Alvaro 2013-10-24 08:00:06 UTC
Quim, I have already the queries to get this information. Now we need to format it a report. You have always the last SQL database in:

http://korma.wmflabs.org/browser/data/db/

In this case, we need the reviews database so:

acs@lenovix:/tmp$ wget http://korma.wmflabs.org/browser/data/db/reviews.mysql.7z
acs@lenovix:/tmp$ 7zr x reviews.mysql.7z 
acs@lenovix:/tmp$ mysqladmin -u root create wikimedia_gerrit
acs@lenovix:/tmp$ mysql -u root wikimedia_gerrit < reviews.mysql


And now time to play SQL:

**** Who is contributing merged code each quarter?

acs@lenovix:/tmp$ mysql -u root wikimedia_gerrit
SELECT total, name, email, quarter, year 
FROM 
 (SELECT COUNT(i.id) AS total, upeople_id, p.id, name, email, user_id, QUARTER(submitted_on) as quarter, YEAR(submitted_on) year 
  FROM issues i, people p , people_upeople pup 
  WHERE i.submitted_by=p.id AND pup.people_id=p.id AND status='merged' 
  GROUP BY upeople_id,year,quarter ORDER BY year,quarter,total DESC
 ) t 
WHERE total>50;

With this query you get a list ordered in time for all quarters of contributors that have merged more than 50 contributions.

We can create queries, one for each quarter, in order to get the 25 top "mergers". But with just one query we can get the full picture.

And for organizations it is pretty similar. But in this case with this query we get just 60 rows so it is usable without working in specific queries for each quarter.

SELECT COUNT(i.id) AS total, c.name, QUARTER(submitted_on) as quarter, YEAR(submitted_on) year 
FROM issues i, people p , people_upeople pup, 
     acs_cvsanaly_mediawiki_2029_1.upeople_companies upc, acs_cvsanaly_mediawiki_2029_1.companies c
WHERE i.submitted_by=p.id AND pup.people_id=p.id 
 AND pup.upeople_id = upc.upeople_id AND upc.company_id = c.id
 AND status='merged'
GROUP BY year, quarter, c.id ORDER BY year, quarter, total DESC

Quim, I have attached to this issue the people-quarters.txt and orgs-quarters.csv with the results of this queries.

With this results we can generate HTML tables.
Comment 15 Alvaro 2013-10-24 08:07:09 UTC
(In reply to comment #14)
> Quim, I have already the queries to get this information. Now we need to
> format
> it a report. You have always the last SQL database in:
> 
> http://korma.wmflabs.org/browser/data/db/
> 
> In this case, we need the reviews database so:
> 
> acs@lenovix:/tmp$ wget
> http://korma.wmflabs.org/browser/data/db/reviews.mysql.7z
> acs@lenovix:/tmp$ 7zr x reviews.mysql.7z 
> acs@lenovix:/tmp$ mysqladmin -u root create wikimedia_gerrit
> acs@lenovix:/tmp$ mysql -u root wikimedia_gerrit < reviews.mysql

You will need also the unique identities db:

acs@lenovix:/tmp$ wget http://korma.wmflabs.org/browser/data/db/source_code.mysql.7z
acs@lenovix:/tmp$ 7zr x source_code.mysql.7z
acs@lenovix:/tmp$ mysqladmin -u root create acs_cvsanaly_mediawiki_2029_1
acs@lenovix:/tmp$ mysql -u root acs_cvsanaly_mediawiki_2029_1 < source_code.mysql
Comment 16 Alvaro 2013-10-24 08:10:25 UTC
Created attachment 13554 [details]
Report for Wikimedia contributors per quarter
Comment 17 Alvaro 2013-10-24 08:10:53 UTC
Created attachment 13555 [details]
Report for Wikimedia orgs contributors per quarter
Comment 18 Quim Gil 2013-10-24 15:53:42 UTC
Ok, this is a start. Thank you!

Why the data starts in 2011 only? Is it because we are only looking at code merged via Gerrit?

If we start with tables then the format we probably want is:

Columns: All-time | 4Q 2013 | 3Q 2013 | 2Q 2013 | ...

Starting with the most recent quarter and going backward to avoid requiring horizontal scrolling to check the most recent data. If we have a graph then we can find better solutions e.g. showing the tables for all-time and 4 quarters only, and then providig the rest of data only through mouseover on the graph.

Rows: #position. Contributor - org #commits (%total). 

For instance: 

1. Leslie Carr (WMF) 1234 (12.7%) 

1. Wikimedia Foundation 12345 (72.1%)
Comment 19 Quim Gil 2013-10-24 16:02:22 UTC
(In reply to comment #13)
> we can see later how to present it. Maybe new HTML pages for Top
> Contributors and Top Orgs including this data.

Why not having everything at http://korma.wmflabs.org/browser/scr.html ?
Comment 20 Alvaro 2013-10-25 07:54:40 UTC
Ok, we can try it. Include here the contributors (people and orgs) using the above criteria. First step is to include the Top100 lists with the search field. And then, we can think in a GUI for filterting using quarters. Your proposed solution or maybe a list selector with all the quarters, and once you select one quarter, all the page contents should be adjusted to this quarter.

This is an idea we have being playing with: all the contents in the browser should have the possibility of being filtered by dates. But you can not compare easily quarters in the same page. You can open two tabs but it is not ideal.

Ok, time to work on it. As soon as I get results, I will share with you.
Comment 21 Alvaro 2013-10-25 07:56:37 UTC
(In reply to comment #18)
> Ok, this is a start. Thank you!
> 
> Why the data starts in 2011 only? Is it because we are only looking at code
> merged via Gerrit?
> 

Yes, as you can see in 

http://korma.wmflabs.org/browser/scr.html

SCR activity starts on Sep 2011.
Comment 22 Quim Gil 2013-10-25 16:43:17 UTC
But since the question is "Who contributes code", and in the SVN/Git history we have also the email addresses of the contributors, could we answer the question properly since the development started?

Not for next week, but at some point?
Comment 23 Alvaro 2013-10-25 17:00:25 UTC
Yes, at some point we should mix SCR and SCM information. But right now we decided to focus SCR as the data source for contributions analysis. Right now we have also the global view for SCM that helps understanding this, but the details are now for SCR.

I will focus in having all right for SCR and then, move effort to SCM and understand how to join the information from the two data sources.

SCM (Source Code Management): git
SCR (Source Code Review): gerrit

Is that ok? Maybe we can open a new issue for it so we can move on closing tickets.
Comment 24 Quim Gil 2013-10-25 17:18:41 UTC
(In reply to comment #23)
> SCR (Source Code Review): gerrit
> 
> Is that ok? Maybe we can open a new issue for it so we can move on closing
> tickets.

Ok, let's nail down first this KPI in the context of Gerrit data. Once we resolve this task as FIXED we will figure out what is more important, adding the SVN history or moving on to the next KPI.
Comment 25 Alvaro 2013-10-25 19:15:21 UTC
SVN history ... git history now :) SVN analysis is more ... ellaborated. It is pretty good to have git instead. Good weekend Quim!
Comment 26 Alvaro 2013-10-26 05:39:34 UTC
Quim, after some development, first results:

http://korma.wmflabs.org/browser/contributors.html

You can see total contributors and companies lists and companies quarters reports.

In order to include in the HTML page a quarter report it is pretty easy:

<div class="Contribs" data-type="companies" data-search="false"
data-quarter="2012 1"></div>

It is the same for companies and people (and other items like countries in the future).

It is just a work in progress but the basics (data and javascript logic to process it) is done.

Next week we can continue advancing in quarters reports to make them more useful.
Comment 27 Quim Gil 2013-10-26 15:49:44 UTC
(In reply to comment #26)
> http://korma.wmflabs.org/browser/contributors.html

Good! 

Please follow the format proposed at Comment #18, or propose a better format.  :)

An "All-time" header for the corresponding lists is needed.

About the orgs by quarter, as suggested it is better to start by the most recent quarter complete (2Q 2013) and then list the rest backward.

It is not clear where the lists of individuals for quarter will fit. At least there is space for the last complete quarter next to the all-time list.
Comment 28 Quim Gil 2013-10-28 15:56:04 UTC
Álvaro, we will demo http://korma.wmflabs.org/browser/contributors.html in exactly 24 hours. 

That page still needs some work on low-hanging fruits. Did you see my pull requests in GitHub? If I did something wrong please comment on them. Otherwise why not taking them.  :)
Comment 29 Alvaro 2013-10-28 16:57:23 UTC
This is my next task, taking a look to pull requests and pushing them. I hope I can do it next two hours!!!

Ups, contributors.html is in early beta state but ok, let's try to put it a better dress.
Comment 30 Alvaro 2013-10-28 17:44:59 UTC
Quim, you have all your changes now in:

http://korma.wmflabs.org/browser/contributors.html

I have tried to give you credit for the changes but I have changed also contributors.html so no automatic merge is possible and during the manual process, your commits do not appear. Sorry about that.
Comment 31 Quim Gil 2013-12-13 00:49:19 UTC
Status of this report:

(In reply to comment #0)
> Who is contributing merged code each quarter? 

Answered at http://korma.wmflabs.org/browser/contributors.html

> How is the weight of the WMF evolving? 

Pending. This will be answered by a graph like http://activity.openstack.org/dash/newbrowser/browser/scm-companies-summary.html

> What regions have a higher density of contributors? 

This is being addressed at Bug 55626

> The evolution of the total amount of merged commits should be visible too.

Mmm... We have total amount of commits merged at http://korma.wmflabs.org/browser/scm.html , and we have graphs showing commits per month. Do we have a graph showing the total amount of commits? Do we need it, actually?
Comment 32 Nemo 2013-12-13 06:43:39 UTC
(In reply to comment #31)
> Status of this report:
> 
> (In reply to comment #0)
> > Who is contributing merged code each quarter? 
> 
> Answered at http://korma.wmflabs.org/browser/contributors.html

This graph is obviously not excluding self-merges (there's even L10n-bot), it's not particularly useful.
Comment 33 Quim Gil 2013-12-13 06:46:47 UTC
L10n-bot channels the contributions from the translatewiki.net community. Are these contributions we want to count? If not, why not?

Are there other self-merges? Should be excluded?
Comment 34 Bawolff (Brian Wolff) 2013-12-13 06:59:27 UTC
(In reply to comment #33)
> L10n-bot channels the contributions from the translatewiki.net community. Are
> these contributions we want to count? If not, why not?

Certainly, but they are a different type of contributor and form a (mostly) separate community, so are probably best analyzed separately. Additionally from what I understand l10n-bot runs at regular intervals copying over all translations made in that interval. A commit by l10n-bot might represent a single contribution by a single translatewiki user, or it might represent the contribution of a hundred translatewiki users. Thus graphing l10n-bot commits tells us nothing about the translatewiki community.
Comment 35 Nemo 2013-12-13 06:59:48 UTC
(In reply to comment #33)
> Are there other self-merges? 

Several of those I see there look like self-merges, I've not run my own stats on it.

> Should be excluded?

Self-merges are not code review: they are either a routine practice on the repos where this is accepted (so they only measure commit activity there) or something discouraged where self-merges should generally not exist (like on core, where you don't want to "credit" them).
Comment 36 Quim Gil 2013-12-13 07:21:56 UTC
Ref L10n-bot: ok, let's not counted. Requested at Bug 53489 comment 24. If you find more users that should be removed please report them in that bug report. Thank you!
Comment 37 Nemo 2013-12-13 07:24:25 UTC
(In reply to comment #36)
> Ref L10n-bot: ok, let's not counted. Requested at Bug 53489 comment 24. If
> you
> find more users that should be removed please report them in that bug report.
> Thank you!

How about self-merges by human users?
Comment 38 Quim Gil 2013-12-13 15:26:21 UTC
I confess not knowing anything about self-merges. Are they still code contributions? Then they should be counted in our metrics about code contributions.

About the "time to respond" metrics ok, if they are instant merges they are deforming the picture and we need to address this. Questions:

Won't be they be discarded (or minimized) by the calculation of the median, instead of plain average?

If they still distort data when calculating the median, I guess be can just remove all self-merges automatically. Are they marked in the database as such? Otherwise I guess we rely on other facts, like they are only reviewed by Jenkins, or they are resolved in less than x seconds (?).
Comment 39 Nemo 2013-12-13 18:42:40 UTC
(In reply to comment #38)
> I confess not knowing anything about self-merges.

"+2 is for code review, not merging your own stuff" ([[mw:Gerrit/+2]]). Self-merges i.e. +2 on own commits are not "real" +2.
Comment 40 Alvaro 2013-12-17 04:37:06 UTC
Guys, the evolution in time and global of merged reviews by company:

http://korma.wmflabs.org/browser/scr-companies-summary.html

It answers "How is the weight of the WMF evolving?".
Comment 41 Alvaro 2013-12-17 04:41:56 UTC
(In reply to comment #36)
> Ref L10n-bot: ok, let's not counted. Requested at Bug 53489 comment 24. If
> you
> find more users that should be removed please report them in that bug report.
> Thank you!

I will remove L10n-bot in next iteration.
Comment 42 Alvaro 2013-12-17 04:45:51 UTC
> > How is the weight of the WMF evolving? 
> 
> Pending. This will be answered by a graph like
> http://activity.openstack.org/dash/newbrowser/browser/scm-companies-summary.
> html

http://korma.wmflabs.org/browser/scr-companies-summary.html

Sorry about not using the thread before!
Comment 43 Alvaro 2013-12-17 04:49:45 UTC
(In reply to comment #38)
> I confess not knowing anything about self-merges. Are they still code
> contributions? Then they should be counted in our metrics about code
> contributions.
> 
> About the "time to respond" metrics ok, if they are instant merges they are
> deforming the picture and we need to address this. Questions:

Which are "instant merges"? self-merges? We should check it.

> 
> Won't be they be discarded (or minimized) by the calculation of the median,
> instead of plain average?

The median is better in any case for time to review. We should change to it.

> 
> If they still distort data when calculating the median, I guess be can just
> remove all self-merges automatically. Are they marked in the database as
> such?

I think we can just check that the submitter and the +2 reviewer is the same person.

> Otherwise I guess we rely on other facts, like they are only reviewed by
> Jenkins, or they are resolved in less than x seconds (?).
Comment 44 Bawolff (Brian Wolff) 2013-12-17 05:13:24 UTC
Nothing is reviewed by jenkins. Things get verified by jenkins, and things get submitted by jenkins (sometimes the word merged is used for this act of submitting, which is technically correct but different from how we use this word causually), but nothing is reviewed by jenkins.
Comment 45 Quim Gil 2013-12-17 05:58:14 UTC
(In reply to comment #40)
> Guys, the evolution in time and global of merged reviews by company:
> 
> http://korma.wmflabs.org/browser/scr-companies-summary.html
> 
> It answers "How is the weight of the WMF evolving?".

Interesting! Just a couple of formal details:

* The graph starts in January 2002 but there is no data before August 2011. Most of the graph is empty, which is not very useful. Please sync graph and data. By the way, the same happens in other Code Review graphs.

* Mouseover brings data on top of the graph, and currently on top of your mouse. Not very usable.

* The legend overlaps the graph.

The types of problems are quite common in our dashboard. We should check them at least whenever we publish new graphs. Ideally Grimoire would prevent this problems by design. Should I file separate bugs?
Comment 46 Nemo 2013-12-17 07:29:38 UTC
(In reply to comment #43)
> > About the "time to respond" metrics ok, if they are instant merges they are
> > deforming the picture and we need to address this. Questions:
> 
> Which are "instant merges"? self-merges? We should check it.

"Instant merges" are not defined, but fastest merges are usually either self-merges or consequence of team work (typically staff). After excluding self-merges (where owner and +2'er are the same person) the graph will get meaningful; median is ok, you could also add e.g. 95th percentile to see how many super-fast merges we have in case we're missing something.
Comment 47 Alvaro 2013-12-17 12:18:23 UTC
(In reply to comment #45)
> (In reply to comment #40)
> > Guys, the evolution in time and global of merged reviews by company:
> > 
> > http://korma.wmflabs.org/browser/scr-companies-summary.html
> > 
> > It answers "How is the weight of the WMF evolving?".
> 
> Interesting! Just a couple of formal details:
> 
> * The graph starts in January 2002 but there is no data before August 2011.
> Most of the graph is empty, which is not very useful. Please sync graph and
> data. By the way, the same happens in other Code Review graphs.

Ok, we have a param in the widgets that cut the non-data series.

> 
> * Mouseover brings data on top of the graph, and currently on top of your
> mouse. Not very usable.

I will play a bit with it.

> 
> * The legend overlaps the graph.
> 

I will play a bit with it.

> The types of problems are quite common in our dashboard. We should check them
> at least whenever we publish new graphs. Ideally Grimoire would prevent this
> problems by design. Should I file separate bugs?

We can solve this issues in this ticket. We are migrating to the new product version for the dashboard next weeks, so it is better to improve globally this kind of things there.
Comment 48 Alvaro 2013-12-17 14:08:18 UTC
(In reply to comment #45)
> (In reply to comment #40)
> > Guys, the evolution in time and global of merged reviews by company:
> > 
> > http://korma.wmflabs.org/browser/scr-companies-summary.html
> > 
> > It answers "How is the weight of the WMF evolving?".
> 
> Interesting! Just a couple of formal details:

http://korma.wmflabs.org/browser/scr-companies-summary.html

The new viz includes your suggestions.

Also, we have updated to the last VizJS-lib version.
Comment 49 Bawolff (Brian Wolff) 2013-12-17 16:40:30 UTC
[mildly off topic suggestions] It might also be cool to further split up WMF (since its such a big contributor) into departments (Features, Platform, Ops, Other). It would also be interesting to see which groups are being reviewed by which other groups (e.g. Do WMF mostly review other WMFers code or does everyone's code get reviewed evenly)
Comment 50 Nemo 2013-12-17 17:06:04 UTC
(In reply to comment #49)
> It would also be interesting to see which groups are being reviewed
> by
> which other groups (e.g. Do WMF mostly review other WMFers code or does
> everyone's code get reviewed evenly)

This would also be indirectly addressed by the "Time to review" metric (which AFAIK is among the defined KPI?), if that was also split by org. Though of course it's possible that volunteers review volunteers and staffers review staffers but this doesn't affect time, in practice the unreviewed mediawiki.* commits are always in the range of 70-80 % non-WMF ownership, although volunteers have way less than 70-80 % of all commits.
Comment 51 Quim Gil 2013-12-17 19:32:39 UTC
(In reply to comment #49)
> [mildly off topic suggestions] It might also be cool to further split up WMF
> (since its such a big contributor) into departments (Features, Platform, Ops,
> Other).

Interesting idea but out of scope in this bug / round. It might be interested, although I'm not sure how much. In any case this would put more pressure on reliable user data. Therefore, I see it as an improvement blocked by 

Bug 58585 - Allow contributors to update their own details in tech metrics directly

> It would also be interesting to see which groups are being reviewed
> by which other groups (e.g. Do WMF mostly review other WMFers code or does
> everyone's code get reviewed evenly)

See Bug 37463 - Key performance indicator: Gerrit review queue + dependent reports.
Comment 52 Quim Gil 2013-12-18 17:14:43 UTC
(In reply to comment #32)
> This graph is obviously not excluding self-merges (there's even L10n-bot),
> it's not particularly useful.

Let's see. When it comes to code MERGED we have two options in relation to bots:

1. Remove all data from all identified bots altogether. Meaning that, in practical terms, their commits don't exist in our tech metrics.

2. Remove bots from rankings in order to "promote" the actual humans and coding tasks. However, their data still counts in the totals.

What should we do with L10n-bot? Do these i18n string commits count as code contributions or not?
Comment 53 Quim Gil 2014-01-07 20:24:32 UTC
We are in the final sprint:

http://korma.wmflabs.org/browser/who_contributes_code.html

Points for Alvaro:

Blockers

* The search boxes don't seem to work. What result is expected if I search "Quim"? Do we need two search boxes?

* Lists of people: let's have all-time and last quarter, instead of the two previous quarters.

* Just checking: "Siebrand 419" means that Siebrand is the author of 419 patches that have been merged to key Wikimedia projects in the last quarter, right?

* Based on comment 34, let's not compute translatewiki.net data in this KPI.

* In "What regions have a higher density of contributors?", "Unknown" should be added, as requested in Bug 55626


Non-blockers

* "How is the weight of the WMF evolving?" starts on March 2013, while "The evolution of the total amount of merged commits" starts on September 2011. Is there a reason to have different starting points? If you ask me, the longer history we can display the better...

* In fact, "How is the weight of the WMF evolving?" already shows implicitly the amount of commits merged every month. Could we simply add the total amounts and be done with one graph?

* I still think that adding another graph like "How is the weight of the WMF evolving?" but based on % would be useful to see clearly the trends of each organization (if any).


There is more work to be done in the descriptions and the organization of the page, but we can do this directly through pull requests at https://github.com/Bitergia/mediawiki-dashboard/blob/master/browser/who_contributes_code.html
Comment 54 Alvaro 2014-01-08 08:15:44 UTC
(In reply to comment #53)
> We are in the final sprint:
> 
> http://korma.wmflabs.org/browser/who_contributes_code.html
> 
> Points for Alvaro:
> 
> Blockers
> 
> * The search boxes don't seem to work. What result is expected if I search
> "Quim"? Do we need two search boxes?

Fixed search box, and activated only for the long all contributors list.

> 
> * Lists of people: let's have all-time and last quarter, instead of the two
> previous quarters.

Done

> 
> * Just checking: "Siebrand 419" means that Siebrand is the author of 419
> patches that have been merged to key Wikimedia projects in the last quarter,
> right?

Right!

> 
> * Based on comment 34, let's not compute translatewiki.net data in this KPI.
> 

We have filtered it out for contributors but ... we should filter it out also for orgs?

> * In "What regions have a higher density of contributors?", "Unknown" should
> be
> added, as requested in Bug 55626

Done!
Comment 55 Quim Gil 2014-01-09 23:38:26 UTC
In a second look I have realized that the graphs at http://korma.wmflabs.org/browser/scr-countries.html count reviews. I think it would make more sense that they would count authors.

I mean, when it comes to organizations it does make sense to see which organization is funding how much work, and it is good to count that work in reviews. However, our interest in the location of contributors is based on the people, less on the amount of reviews. 

In the case of our community it is clear that most reviews come from USA and Germany (when the devs fills their data) because this is where most WMF and WMDE (professional, full time) developers are located. Still, if there are a dozen of developers with just a bunch of commits in some other country we definitely want to know. In this case, 10 developers with 5 merged commits each has more relevance than a single developer with 50 commits.

Conclusion: it would be good to have the data based on authors. If you want to keep the current graphs that is fine too. 

When it comes to http://korma.wmflabs.org/browser/who_contributes_code.html , we will swap "Submitted per country (aggregated)" for the graph by people as soon as it is available. But this is not a blocker for the KPI anymore, as agreed.
Comment 56 Quim Gil 2014-01-10 00:05:02 UTC
(In reply to comment #55)

Sorry, that comment belongs to bug 55626
Comment 57 Derk-Jan Hartman 2014-01-10 00:31:37 UTC
Can we get some KPIs on the smaller stuff ?
How about:

What repositories get the least contribs/reviews
Who contributes/reviews most of the code that no on else contributes/reviews (who takes care of the orphan projects)
Who makes the most changes to the fields of a ticket (who does triage?)
Which repo's have the longest time between first and last patch
Which repo's have the longest time between first/last patch and merge ?

Usually I find these kinds of things more interesting than 'top performance' indicators.
Comment 58 Quim Gil 2014-01-13 22:50:12 UTC
(In reply to comment #57)
> Can we get some KPIs on the smaller stuff ?

Just a bit of background: we agreed to focus our work on five KPIs described at https://www.mediawiki.org/wiki/Community_metrics#Key_performance_indicators . Here we are working on the first one, related to code contributions. In addition to this http://korma.wmflabs.org offers more data.

The answers below refer to the data provided by the metrics dashboard in Korma alone.

> How about:
> 
> What repositories get the least contribs/reviews

Contributions: http://korma.wmflabs.org/browser/scm-repos.html?page=24 and up.

Reviews: http://korma.wmflabs.org/browser/scr-repos.html?page=24 and up.

> Who contributes/reviews most of the code that no on else contributes/reviews
> (who takes care of the orphan projects)

Currently we are not displaying authors per repo, and we don't have a way to distinguish the repositories an author or a reviewer contributes to. Is this what you mean? It is a good point, and it would be great if you could open a new report to cover it.

> Who makes the most changes to the fields of a ticket (who does triage?)

Do you mean triage in Bugzilla? There is not much now http://korma.wmflabs.org/browser/its.html but we want to work more on this when we address the KPI Bugzilla response time: https://www.mediawiki.org/wiki/Community_metrics#Bugzilla_response_time


> Which repo's have the longest time between first and last patch
> Which repo's have the longest time between first/last patch and merge ?

Do you mean first and last patch in the review queue? We are measuring the time to review patches by repository at http://korma.wmflabs.org/browser/scr-repos.html If you need something different not covered in the currently planned KPIs then please open a report.
 
> Usually I find these kinds of things more interesting than 'top performance'
> indicators.

Yes, in these KPI we attempt to look not only at top performers but also at bottlenecks or neglected areas. For instance, when looking at times to review at the Gerrit queue we sort the lists starting by those repos with larger time to review. See Bug 37463 - Key performance indicator: Gerrit review queue
Comment 59 Alvaro 2014-01-15 07:50:27 UTC
(In reply to comment #55)
> In a second look I have realized that the graphs at
> http://korma.wmflabs.org/browser/scr-countries.html count reviews. I think it
> would make more sense that they would count authors.
> 
> I mean, when it comes to organizations it does make sense to see which
> organization is funding how much work, and it is good to count that work in
> reviews. However, our interest in the location of contributors is based on
> the
> people, less on the amount of reviews. 
> 
> In the case of our community it is clear that most reviews come from USA and
> Germany (when the devs fills their data) because this is where most WMF and
> WMDE (professional, full time) developers are located. Still, if there are a
> dozen of developers with just a bunch of commits in some other country we
> definitely want to know. In this case, 10 developers with 5 merged commits
> each
> has more relevance than a single developer with 50 commits.
> 
> Conclusion: it would be good to have the data based on authors. If you want
> to
> keep the current graphs that is fine too. 
> 
> When it comes to http://korma.wmflabs.org/browser/who_contributes_code.html ,
> we will swap "Submitted per country (aggregated)" for the graph by people as
> soon as it is available. But this is not a blocker for the KPI anymore, as
> agreed.

Ok Quim, I will take a look and try to use authors also in this report.
Comment 60 Quim Gil 2014-01-21 15:39:50 UTC
As agreed with Alvaro, his only remaining task in this report is to edit the graph "The evolution of the total amount of merged commits" at

http://korma.wmflabs.org/browser/who_contributes_code.html

so that the starting date is the same as "Reviews merged".

After this is done the report will be assigned to me, since I still want to improve the titles and descriptions via HTML edits.

Alvaro, I intend to finish my part this week because I really really want to show a first complete KPI at the Engineering Community Team showcase next week. :)
Comment 61 Alvaro 2014-01-21 15:45:36 UTC
Quim, you will have this change tomorrow for the review meeting! :)

And I hope some advances also in KPI gerrit review queue.
Comment 62 Quim Gil 2014-01-22 17:07:02 UTC
Alvaro has made the last change to http://korma.wmflabs.org/browser/who_contributes_code.html required from him in this report. I'm taking it now.
Comment 63 Nemo 2014-03-05 08:15:28 UTC
http://korma.wmflabs.org/browser/scr.html is quite mysterious for me, I'm unable to extract any meaningful information from it.
* "pending" graph has no legend, the ? icon does nothing. Worth noting if you're filtering any repo, or -1/-2 commits, or not.
* "Review time in days": absolutely no idea what this is. The legend is just a tautology so it doesn't explain anything: "Review time in days: Median review time in days".
* "submitted vs. Merged changes vs. Abandoned": this is the only clear part of the page. :)
* "code reviews waiting for reviewer" doesn't make any sense, a code review (which is composed by comments and a label like +1, +2) always as an author. Perhaps this means "commits waiting for reviews", but from the examples below I can't tell.
* "code reviews waiting for submitter" presumably means "commits waiting for merge" ("submit" is ambiguous, better not use it). Note that merge depends on +2 which is one code review label. If that's the meaning, 
* "Top Successful submitters" per above, confusing: do you mean commit authors, or commit mergers/approvers/+2'er? 

Note that self-merges have not been excluded yet, but they're in the process of being filtered at last according to bug 37463 comment 30.
Comment 64 Alvaro 2014-03-05 09:26:45 UTC
Nemo, we will try to clarify all these points.

But our current effort is in closing KPIs.

We will clarify and fix all of this once KPIs are closed. Thank you very much for your valuable collaboration.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links