Last modified: 2014-09-15 09:25:27 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T72278, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 70278 - "Volume of open changesets" graph should show reviews pending every month
"Volume of open changesets" graph should show reviews pending every month
Status: PATCH_TO_REVIEW
Product: Analytics
Classification: Unclassified
Tech community metrics (Other open bugs)
unspecified
All All
: High major
: ---
Assigned To: Alvaro
http://korma.wmflabs.org/browser/gerr...
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-09-02 07:53 UTC by Quim Gil
Modified: 2014-09-15 09:25 UTC (History)
5 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Quim Gil 2014-09-02 07:53:28 UTC
I just realized that "Volume of open changesets" at http://korma.wmflabs.org/browser/gerrit_review_queue.html is not showing "How long is the Gerrit review queue over time?" -- that is, the amount of "Pending" and "Waiting for review" changesets every month. 

What I deduce it is being shown is some kind of accumulation over time of the changesets *currently open*.

We should be able to see whether the queue is increasing or decreasing over time. Very much like in http://korma.wmflabs.org/browser/bugzilla_response_time.html we can see whether our response time decreases or not every month.
Comment 1 Alvaro 2014-09-02 18:12:19 UTC
Volume of open changesets is showing the amount of changesets that were "Pending" and "Waiting for review" in a specific month.

For example:

* Jun 2014: 862 pending, 326 waiting for review. At the end of Jun 2014 there were 826 reviews in open state, pending for review, and 326 of them were pending for reviewers.

This metric is implemented aggregating the number of pending reviews that remains pending at the end of each month plus the pending reviews inherit from previous months. In gerrit, pending reviews is a metric that can be aggregated. 

So for example:

* Jul 2013: 97 pending
* Aug 2013: 122 pending (25 new pending issues in Aug, plus 97 pending from Jul 2013)


At the end of Aug 2014 there are 1594 and as expected, currently the pending reviews no is pretty similar. It is 1630 and you have it in:

http://korma.wmflabs.org/browser/scr.html

In order to compute this number we just get the total open reviews now.

So as a conclusion, this metric is showing the evolution in time of total pending reviews each month.
Comment 2 Quim Gil 2014-09-02 19:34:24 UTC
(In reply to Alvaro from comment #1)
> 
> This metric is implemented aggregating the number of pending reviews that
> remains pending at the end of each month plus the pending reviews inherit
> from previous months.

If I understand this correctly...

Why adding the inherited amount from the previous month? All we need to see is the total amount of changesets open at the end of the month, and how many of these are waiting for review. 

Then repeat the exercise at the end of the next mont: how many are open in total, from which how many are waiting for review. If you can count how many are open now, it doesn't matter how many were open last month, there is no need to add these amounts, just reflect each amount in each month in the graph.

This way we can have a graph that goes up and down depending on how long is the queue every month. If we are always adding the previous months then we will have an ever-increasing graph, no matter of much we improve our rate of closing reviews.
Comment 3 Alvaro 2014-09-02 20:02:17 UTC
Ok Quim, you are right. 

Right now the metric is showing the backlog of the current open issues.

Let us think more about the metric you are proposing in order to be sure we are in sync with the metric definition and we will create a new metric showing that, "at the end of the month: how many are open in total, from which how many are waiting for review".
Comment 4 Alvaro 2014-09-02 20:33:12 UTC
(In reply to Alvaro from comment #3)
> Ok Quim, you are right. 

Yes, you are looking for a related but different metric.

Taking a look to the current metric, you can infer the relative speed at which pending reviews are added, but it is not easy get the total speed for each month.

So we need to create this new metric.

The original name for this metric was "Number of pending submissions" and for us, the definition was the backlog of pending submissions. But it is clear we need a new metric to show what you are looking for.

We will implement it and as soon as it is available we will share with you in a testing page in the dashboard to check it.

> 
> Right now the metric is showing the backlog of the current open issues.
> 
> Let us think more about the metric you are proposing in order to be sure we
> are in sync with the metric definition and we will create a new metric
> showing that, "at the end of the month: how many are open in total, from
> which how many are waiting for review".
Comment 5 Quim Gil 2014-09-02 20:55:02 UTC
I don't understand the confusion, and I don't want to make you work before assuring that we are oin the same page. "Number of pending submissions" is always the total number of open/waiting-for-review changesets NOW. We just check once a month because it would be very expensive to count on a daily basis.

I really see no point in adding what was open last month, because from those some will be still open, some will be now closed.

Our ideal goal is to have a queue of zero changesets waiting for review, and the purpose of that graph is to see whether we are getting closer or further from that goal.
Comment 6 Alvaro 2014-09-04 04:17:15 UTC
After a carefully analysis of the metric we are showing in Volume of open changesets, the graph is showing the aggregation of the number of new pending issues that each month exists in Wikimedia gerrit.

Why the aggregation? To understand the total work pending.

If you think it is better not to show the aggregation, but just how many new pending reviews are added each month, it is just removing from the HTML the aggregation conversion param.

We can talk about all details or doubts in the next meeting. I am sorry not to be clear in my first comments in this ticket.
Comment 7 Alvaro 2014-09-04 04:24:15 UTC
(In reply to Quim Gil from comment #2)
> (In reply to Alvaro from comment #1)
> > 
> > This metric is implemented aggregating the number of pending reviews that
> > remains pending at the end of each month plus the pending reviews inherit
> > from previous months.
> 
> If I understand this correctly...
> 
> Why adding the inherited amount from the previous month?

To see how the total number of pending reviews is evolving. 

> All we need to see
> is the total amount of changesets open at the end of the month, and how many
> of these are waiting for review. 

In order to get this number, we get the total amount of open and close at the end of the month, and the difference is the pending. 

> 
> Then repeat the exercise at the end of the next mont: how many are open in
> total, from which how many are waiting for review. If you can count how many
> are open now, it doesn't matter how many were open last month, there is no
> need to add these amounts, just reflect each amount in each month in the
> graph.

In order to show that, we can just remove the aggregation and you will get just that.

> 
> This way we can have a graph that goes up and down depending on how long is
> the queue every month. If we are always adding the previous months then we
> will have an ever-increasing graph, no matter of much we improve our rate of
> closing reviews.

At the end of a month if you have closed more reviews than opened new ones, the number of new pending reviews for this month will be negative, the total amount of pending issues will be lower and the current graph will decrease.
Comment 8 Quim Gil 2014-09-04 07:24:40 UTC
I still find your answer slightly confusing:

> In order to get this number, we get the total amount of open and close at
> the end of the month, and the difference is the pending. 

Ok, if this is the case then the number is correct. But then, what does this have to do with adding the inherited amount of the past month?

The ultimate test:

1. Forget all history in our Gerrit instance and just look at all the changesets we have now as if they were created during September 2014.

2. Count how many changesets we have in total, from which how many have status:open, from which how many are waiting for review (no -1, no WIP).

If the numbers still are around the 1563 open - 933 waiting for review currently declared for August 2014, then the calculation is correct.

Sorry for being a skeptical pain in the ass with this metric, but we need to be absolutely sure about it. If it is true we will set an alarm because the trend is really alarming. The worst that could happen to the credibility of our metrics is that the data is found to be wrong after setting the alarm. Thank you for your understanding.
Comment 9 Alvaro 2014-09-05 12:53:21 UTC
(In reply to Quim Gil from comment #8)
> I still find your answer slightly confusing:
> 
> > In order to get this number, we get the total amount of open and close at
> > the end of the month, and the difference is the pending. 
> 
> Ok, if this is the case then the number is correct. But then, what does this
> have to do with adding the inherited amount of the past month?

This is bad wording from my side.

In order to measure the total pending issues in a month, you need to add to the new pending issues for this month and all previous pending issues.

> 
> The ultimate test:
> 
> 1. Forget all history in our Gerrit instance and just look at all the
> changesets we have now as if they were created during September 2014.
> 
> 2. Count how many changesets we have in total, from which how many have
> status:open, from which how many are waiting for review (no -1, no WIP).
> 
> If the numbers still are around the 1563 open - 933 waiting for review
> currently declared for August 2014, then the calculation is correct.
> 
> Sorry for being a skeptical pain in the ass with this metric, but we need to
> be absolutely sure about it. If it is true we will set an alarm because the
> trend is really alarming. The worst that could happen to the credibility of
> our metrics is that the data is found to be wrong after setting the alarm.
> Thank you for your understanding.

I understand perfectly what you say. We have reviewed in deep the metric and the final number is right (1646 total pending reviews now), but the evolution in time was not right because we were using as close date, the date in which the review was submitted. After fixing this issue, I have updated the graph and now the tendency is better. The queue is growing, but now is growing less. In some months like March 2014, you have closed more reviews than opened.

I need to recheck is WIP and -1 filters are been used here, because they were added to mediawiki new time measures, but this is a generic metric. 

But the "core" of the problem is fixed. I will continue reporting on this ticket the rest of open issues (WIP and -1) and cross checking data.
Comment 10 Alvaro 2014-09-06 08:42:12 UTC
(In reply to Quim Gil from comment #8)
> I still find your answer slightly confusing:
> 
> > In order to get this number, we get the total amount of open and close at
> > the end of the month, and the difference is the pending. 
> 
> Ok, if this is the case then the number is correct. But then, what does this
> have to do with adding the inherited amount of the past month?
> 

Right now let's check this ultimate test:

> The ultimate test:
> 
> 1. Forget all history in our Gerrit instance and just look at all the
> changesets we have now as if they were created during September 2014.
> 
> 2. Count how many changesets we have in total, from which how many have
> status:open, from which how many are waiting for review (no -1, no WIP).

With the fix in closed date and more updated data we have now in Aug 2014 1647 open reviews as the final point for the evolution of open reviews.

Taking a look to the current open reviews in:

http://korma.wmflabs.org/browser/scr.html

we have 1607. So during September we should have -40 open reviews.

Looking at the data in the JSON:

http://korma.wmflabs.org/browser/data/json/scr-evolutionary.json

if you look to pending time series:

"pending": [431, 111, 127, 141, 29, 136, 139, 72, 51, 74, -72, 69, 158, 80, 88, 13, -33]

we have -33.

So the difference is 7 reviews and the numbers are pretty similar. From my point of view this validate the metrics. We can look at this 7 reviews to understand the difference if really needed.

About reviews waiting for reviewer, we are checking it.
Comment 11 Quim Gil 2014-09-09 08:17:19 UTC
Ok, thank you for the double checking. The resolution of this bug is a mix of Invalid (the problem reported was not a problem) and Fixed (you found something wrong on the way). Let's call it Fixed.  :)
Comment 12 Alvaro 2014-09-11 11:48:41 UTC
About reviews waiting for reviewer, we are checking it ... we can not fix yet this ticket. As soon as this analysis is completed I will post results here.
Comment 13 Alvaro 2014-09-11 14:54:35 UTC
Ok, after a review of the metric "pending reviews waiting for reviewer", we are doing the analysis for the backlog, so it is normal that the graph grows more in last months.

We will change this graph to a "evolution time graph" (as in other places) and then we can recheck the tendency of the graph.

The logic for detecting reviews pending for reviewer seems to be right after checking with some repos the results.
Comment 14 Alvaro 2014-09-12 04:26:56 UTC
"pending reviews waiting for reviewer" (Waiting for Reviewer) is now implemented as a "evolution time metric" (not backlog anymore) and the results are now in:

http://korma.wmflabs.org/browser/gerrit_review_queue.html

To be consistent in both metrics, I have renamed "pending" to "Waiting".
Comment 15 Alvaro 2014-09-12 04:31:32 UTC
Hmmm, the last change from "pending" to "Waiting" I am not sure bout it.

Changed "pending" to "New changesets" using the same label than in the second graph.
Comment 16 Quim Gil 2014-09-12 06:12:02 UTC
Alright, the shape of the green line makes more sense now. At least now we see a steady increase instead of a sudden change of trend in the last three months.

Still, the total numbers look inconsistent with the values provided by other graphs in that page.

The blue line in the first graphs marks 1657 changesets, but the total sum of blue columns in the second graph is 1324. Why the difference of 333 changesets?

For the green we have 994 in the first graph, 718 in the second: a difference of 276 changesets. I went through the 211 repositories listed below, adding the value of "Waiting for review", and the total for August is 755. 

Therefore, we have two metrics that point a value in the 700s, while the first graph says close to 1000. Which one is right?
Comment 17 Alvaro 2014-09-12 08:10:43 UTC
(In reply to Quim Gil from comment #16)
> Alright, the shape of the green line makes more sense now. At least now we
> see a steady increase instead of a sudden change of trend in the last three
> months.
> 
> Still, the total numbers look inconsistent with the values provided by other
> graphs in that page.
> 
> The blue line in the first graphs marks 1657 changesets, but the total sum
> of blue columns in the second graph is 1324. Why the difference of 333
> changesets?

You are comparing time evolution with backlog time distribution. This two time series are the same at the final value, but during the past time, the time distribution will have more "new changesets".

time distribution new changesets in A = new backlog time distribution changesets in A + closed changesets between A and now.

In "closed changesets between A and now"  if A = NOW, this value is zero and both numbers in NOW should be the same.

And doing this analysis:

"date": ["May 2013", "Jun 2013", "Jul 2013", "Aug 2013", "Sep 2013", "Oct 2013", "Nov 2013", "Dec 2013", "Jan 2014", "Feb 2014", "Mar 2014", "Apr 2014", "May 2014", "Jun 2014", "Jul 2014", "Aug 2014", "Sep 2014"]
"ReviewsWaiting_ts": [418, 546, 669, 798, 827, 998, 1113, 1191, 1233, 1286, 1258, 1316, 1455, 1565, 1670, 1657, 1571]
"new": [49, 21, 25, 24, 14, 46, 39, 33, 62, 76, 82, 64, 126, 161, 284, 267, 197]

where new is for backlog metric, and ReviewsWaiting_ts is time evolution time series, in Sep 2014 you have 1570 for new (adding all values for "new") and 1571 for ReviewsWaiting_ts (getting the last point).

It is also the same case for the waiting for reviewer metric.

We are not showing the Sep 2014 data because it is incomplete, as we decided in the past.

> 
> For the green we have 994 in the first graph, 718 in the second: a
> difference of 276 changesets. I went through the 211 repositories listed
> below, adding the value of "Waiting for review", and the total for August is
> 755. 
> 
> Therefore, we have two metrics that point a value in the 700s, while the
> first graph says close to 1000. Which one is right?
Comment 18 Quim Gil 2014-09-12 08:27:07 UTC
(In reply to Alvaro from comment #17)
> 
> You are comparing time evolution with backlog time distribution. This two
> time series are the same at the final value

Exactly, I'm just saying that in the graphs they are not.

In the first graph, the green line for August 2014 shows "Waiting for review: 994".

The second graph should show how these 994 are spread in time based on their submission date, from August 2014 backward, right?

08-14 230
07-14 208
06-14  57
05-14  60
04-14  33
03-14  20
02-14  39
01-14  19
12-13  13
11-13  15
10-13   7
09-13   3
08-13   8
07-13   5
06-13   1
05-13  12 (this date has no column but does have changesets!)

TOTAL 730

The data of September 2014 doesn't play any role here, because both data points are based in August 2014. I assume both metrics are counting the end of August 2014?
Comment 19 Alvaro 2014-09-12 10:46:56 UTC
(In reply to Quim Gil from comment #18)
> (In reply to Alvaro from comment #17)
> > 
> > You are comparing time evolution with backlog time distribution. This two
> > time series are the same at the final value
> 
> Exactly, I'm just saying that in the graphs they are not.
> 
> In the first graph, the green line for August 2014 shows "Waiting for
> review: 994".
> 
> The second graph should show how these 994 are spread in time based on their
> submission date, from August 2014 backward, right?
> 
> 08-14 230
> 07-14 208
> 06-14  57
> 05-14  60
> 04-14  33
> 03-14  20
> 02-14  39
> 01-14  19
> 12-13  13
> 11-13  15
> 10-13   7
> 09-13   3
> 08-13   8
> 07-13   5
> 06-13   1
> 05-13  12 (this date has no column but does have changesets!)
> 
> TOTAL 730
> 
> The data of September 2014 doesn't play any role here, because both data
> points are based in August 2014. I assume both metrics are counting the end
> of August 2014?

Yes both are counting end of August 2014, but one of them is showing the number of reviews in August 2014 from the current open reviews, and the other is showing the number of open reviews in 2014 (from which some of them could be already closed now).
Comment 20 Alvaro 2014-09-12 10:51:12 UTC
Quim, Jesus has sent you a proposal for the definition for the metrics we are working for. Focusing in the "Waiting for review", the way to detect it is to find the open reviews that are not with a "-1" a no patches uploaded after that "-1". In order to validate the current queries we need to close the definition on that and be sure the code is implementing it.

We are working with this proposal definition right now.
Comment 21 Alvaro 2014-09-12 20:09:02 UTC
With the proposal metrics definition we have updated the Waiting for review metric in korma. You can check the new graph in:

http://korma.wmflabs.org/browser/gerrit_review_queue.html

There are now more reviews waiting for reviewer than before with the simplified condition we have used: if last change fore a review is Code-Review -1 then the review is not waiting for reviewer. In any other case, it is.

We are working in approaches to improve this method and probably, we are counting some reviews as waiting for reviewer that should not be counted. Consider the current line as a max reviews waiting for reviewer.
Comment 22 Quim Gil 2014-09-13 11:07:13 UTC
I will reply later today about the new definition, but let me comment here asap:

* We had agreed that "WIP" patches shouldn't be counted i.e. https://gerrit.wikimedia.org/r/#/c/134782/ "WIP: Have GettingStarted depend on CentralAuth"

* -2 in open changesets shouldn't be counted either, i.e. https://gerrit.wikimedia.org/r/#/c/158632/ "Mattflaschen 4:14 AM Patch Set 8: Code-Review-2"
Comment 23 Alvaro 2014-09-13 11:16:02 UTC
(In reply to Quim Gil from comment #22)
> I will reply later today about the new definition, but let me comment here
> asap:
> 
> * We had agreed that "WIP" patches shouldn't be counted i.e.
> https://gerrit.wikimedia.org/r/#/c/134782/ "WIP: Have GettingStarted depend
> on CentralAuth"

Ok, we will remove all current "WIP" reviews for waiting for reviewer. Reviews that are in "WIP" now but not in the past, will be count as "WIP" for all the history. Reviews that are not in "WIP" now but they were in the past in "WIP", will be count as no "WIP" for all the history. We can improve later if needed.

> 
> * -2 in open changesets shouldn't be counted either, i.e.
> https://gerrit.wikimedia.org/r/#/c/158632/ "Mattflaschen 4:14 AM Patch Set
> 8: Code-Review-2"

Ok, we will include this condition also.
Comment 24 Quim Gil 2014-09-13 13:37:20 UTC
I have fine tuned the description proposed at https://www.mediawiki.org/wiki/Community_metrics#Backlog_of_open_changesets_.28monthly_snapshots.29 -- see the history of the page for the specific changes.

The string proposed for the title of the chart and the blue/green lines are good and can be implemented. The descriptions can be used as help clicking "?".

I think the safest parameters to count the "Waiting for review" (green line) is:

Status == "Review in Progress" AND
NOT "WIP" AND
Code-Review AND	Verified == (blank OR +1)

The "blank OR +1" is a better filter than the opposite, because Gerrit can have -1, -2, and even +2 but "Can Merge No" for some reason external to the Gerrit change.
Comment 25 Alvaro 2014-09-13 18:30:15 UTC
Ok, right now what it is implemented is 

NO "WIP" AND NOT Code-Review-2 NOT AND Code-Review-1

and this is what you have now in production.

All the data is checked so the first green line graph and the sencond green bars graph have coherent data.

We will think about your last filtering proposal and compare with the current one. For implementing filtering we should review the fields because some fields are easy to use in SQL, but other require more work.
Comment 26 Alvaro 2014-09-13 18:30:43 UTC
(In reply to Quim Gil from comment #24)
> I have fine tuned the description proposed at
> https://www.mediawiki.org/wiki/Community_metrics#Backlog_of_open_changesets_.
> 28monthly_snapshots.29 -- see the history of the page for the specific
> changes.
> 
> The string proposed for the title of the chart and the blue/green lines are
> good and can be implemented. The descriptions can be used as help clicking
> "?".
> 
> I think the safest parameters to count the "Waiting for review" (green line)
> is:
> 
> Status == "Review in Progress" AND
> NOT "WIP" AND
> Code-Review AND	Verified == (blank OR +1)

This Code-Review condition is missing something?

> 
> The "blank OR +1" is a better filter than the opposite, because Gerrit can
> have -1, -2, and even +2 but "Can Merge No" for some reason external to the
> Gerrit change.
Comment 27 Quim Gil 2014-09-13 20:27:34 UTC
(In reply to Alvaro from comment #25)
> Ok, right now what it is implemented is 
> 
> NO "WIP" AND NOT Code-Review-2 NOT AND Code-Review-1
> 
> and this is what you have now in production.

Note that apart from Code-Review (human evaluation) there is Verified (Jenkins). -1 (if it exists here, not sure) and -2 in Verified should be discarded in "Waiting for review".

See for instance https://gerrit.wikimedia.org/r/#/c/22167/ (no Code-Review, but -2 from Jenkins in "Verified").

All the rest in this chart looks good!

Question: where do you store these MySQL queries? I can only see the commits from the Owl Bot at https://github.com/Bitergia/mediawiki-dashboard/commits/master but I don't where does the bot get the data from.
Comment 28 Alvaro 2014-09-13 20:43:24 UTC
(In reply to Quim Gil from comment #27)
> (In reply to Alvaro from comment #25)
> > Ok, right now what it is implemented is 
> > 
> > NO "WIP" AND NOT Code-Review-2 NOT AND Code-Review-1
> > 
> > and this is what you have now in production.
> 
> Note that apart from Code-Review (human evaluation) there is Verified
> (Jenkins). -1 (if it exists here, not sure) and -2 in Verified should be
> discarded in "Waiting for review".
> 
> See for instance https://gerrit.wikimedia.org/r/#/c/22167/ (no Code-Review,
> but -2 from Jenkins in "Verified").

Ok, we can add this new events to try to include them in the SQL queries.

> 
> All the rest in this chart looks good!

Great!

> 
> Question: where do you store these MySQL queries? I can only see the commits
> from the Owl Bot at
> https://github.com/Bitergia/mediawiki-dashboard/commits/master but I don't
> where does the bot get the data from.

What you are seeing in this repository is the activity directly in the dashboard (HTML+CSS+JS+JSON metrics), not the code used to generate the metrics (in JSON format).

This code is in GrimoireLib and we have a branch for Wikimedia that is synced with master (main product) frequently. 

In order to see this SQL the last commits are:

https://github.com/VizGrimoire/GrimoireLib/commit/8fe792fcb560a597df0d5f239e581eeeb89d3f5c

and we are modifying SCR (gerrit) metrics that are included from scr_metrics.py. In this commits you can see how the filters for "-2" and not WIP has been added.
Comment 29 Alvaro 2014-09-15 09:25:27 UTC
Quim, we plan to add the Verified -1 and -2 filtering today but the generation of the new metrics won't be available until this night.

As soon as it it available, I will inform you.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links