Last modified: 2013-02-18 11:02:40 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T47090, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 45090 - Gerrit REST API's changes module doesn't support offset
Gerrit REST API's changes module doesn't support offset
Status: RESOLVED WORKSFORME
Product: Wikimedia
Classification: Unclassified
Git/Gerrit (Other open bugs)
wmf-deployment
All All
: Unprioritized normal (vote)
: ---
Assigned To: Nobody - You can work on this!
https://gerrit-review.googlesource.co...
: upstream
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-02-17 08:54 UTC by MZMcBride
Modified: 2013-02-18 11:02 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description MZMcBride 2013-02-17 08:54:53 UTC
I'm trying to gather metadata about every Gerrit changeset.

I read <https://gerrit.wikimedia.org/r/Documentation/rest-api-changes.html>, which states "If the n query parameter is supplied and additional changes exist that match the query beyond the end, the last change object has a _more_changes: true JSON field set. Callers can resume a query with the n query parameter, supplying the last change’s _sortkey field as the value."

Here's what I tried:

---
$ curl -s "https://gerrit.wikimedia.org/r/changes/?q=age:1second&n=500" | tail -20
    "project": "mediawiki/extensions/Polyglot",
    "branch": "master",
    "topic": "tidyup",
    "change_id": "I35ebc242fcf04e5b527631d6be67d1a8c78ef251",
    "subject": "Add method parameter documentation",
    "status": "NEW",
    "created": "2013-01-24 18:41:09.000000000",
    "updated": "2013-01-24 21:44:15.000000000",
    "_sortkey": "0022a6180000b1ff",
    "_number": 45567,
    "owner": {
      "name": "Reedy"
    },
    "labels": {
      "Verified": {},
      "Code-Review": {}
    },
    "_more_changes": true
  }
]

$ curl -s "https://gerrit.wikimedia.org/r/changes/?q=age:1second&n=0022a6180000b1ff"
"0022a6180000b1ff" is not a valid value for "-n"
---

I tried other URL parameters such as &sortkey= and &_sortkey= and &sortkey_after and &resume_sortkey, but nothing seems to work.

After discussing this issue with qchris in #gerrit on freenode, it seems that Gerrit's search functionality is broken (or perhaps restricted). qchris pointed to this (non-working) search example:

---
$ curl -s "https://gerrit.wikimedia.org/r/changes/?q=status:merged+project:mediawiki/core+sortkey_after:m"
)]}'
[]
---

It's unclear whether this issue has a corresponding bug in Gerrit's bug tracker. As it stands, it appears to be impossible to pull metadata of more than 500 changesets from the Gerrit REST API.

Without the ability to specify an offset (and consequently retrieve information about more than 500 changesets), I'm unable to generate Gerrit reports ([[mw:Gerrit/Reports]]). :-(
Comment 1 christian 2013-02-17 11:43:40 UTC
Some more lines from #gerrit, showing how to fetch the required
changes through the search query string.

09:54 <qchris> Susan: I screwed up before. It seems thinking is easier after
         breakfast. sortkey is not the bug title but the change's sort key.
         Stupid me.
09:54 <qchris> So what it comes down to is, that you could fetch all changes
         like this:
09:54 <qchris> Fetch
         https://gerrit.wikimedia.org/r/changes/?q=status:merged+project:mediawiki/core+limit:3
09:55 <qchris> (For whatever value of status, project you are interested in. Limit
         is just to get nice small file to look at by hand. You can drop that)
09:55 <qchris> Look for the _sortkey field of the last object in the result list
09:55 <qchris> And the fetch
         https://gerrit.wikimedia.org/r/changes/?q=status:merged+project:mediawiki/core+limit:3+sortkey_before:LAST_SORTKEY_OF_PREVIOUS_REQUEST
09:55 <qchris> Where LAST_SORTKEY_OF_PREVIOUS_REQUEST is the last sort key of
         the previous request
09:55 <qchris> so something like 002327db0000c0f1


To also get the _more_changes field set, use the URL parameter limit instead
of the query parameter.
Comment 2 christian 2013-02-17 11:45:49 UTC
I pushed a change to correct the REST API documentation upstream
https://gerrit-review.googlesource.com/#/c/42421/
Comment 3 MZMcBride 2013-02-18 04:17:23 UTC
Thank you very much for your help, Christian.

I failed to realize that &n= is distinct from &N= in Gerrit's REST API.

I also failed to realize that "sortkey_after" and "sortkey_before" existed (they're documented at the bottom of <https://gerrit.wikimedia.org/r/Documentation/user-search.html>). It might be nice to mention these (or cross-reference them) at <https://gerrit.wikimedia.org/r/Documentation/rest-api-changes.html>.

My current understanding is that these are equivalent:

* ?q=limit:[integer] and &n=[integer]
* ?q=sortkey_after:[sortkey] and &N=[sortkey]
* ?q=sortkey_before:[sortkey] and &P=[sortkey]

Marking this bug resolved/worksforme.
Comment 4 christian 2013-02-18 11:02:40 UTC
(In reply to comment #3)
> My current understanding is that these are equivalent:
> 
> * ?q=limit:[integer] and &n=[integer]

Yes, they are more or less equivalent. However, &n=[integer] provides
you with a "_more_changes" field, while ?q=limit:[integer] does not.

You can however circumvent this difference, by keeping the limiting
integer below your queryLimit while still asking for 1 more result
than needed.

In some edge cases, gerrit will even give you one more result than
your queryLimit allows for.

> * ?q=sortkey_after:[sortkey] and &N=[sortkey]
> * ?q=sortkey_before:[sortkey] and &P=[sortkey]

It's actually the other way round.
  ?q=sortkey_after corresponds to &P=
  ?q=sortkey_before corresponds to &N=

* sortkey is increasing for new changes.
* &N= is for the /N/ext page of search results (i.e.: older changes,
  lower sortkeys, hence sortkey_before)
* &P= is for the /P/revious page of search results (i.e.: newer
  changes, higher sortkeys, hence sortkey_after)

As confusing as this is already, there are further
differences. ?q=sortkey_after skips the first search result. So when
comparing
https://gerrit.wikimedia.org/r/changes/?P=00232fbd0000bc17&n=3
https://gerrit.wikimedia.org/r/changes/?q=sortkey_after:00232fbd0000bc17&n=3
you'll get something like
     sortkey        In q=sortkey... ?  In P=... ?
     ...               ...              ...
002330130000c1d5       no               no
0023300f0000c1d8       no               no
00232fff0000bc7b       yes              no
00232ff10000c1d6       yes              yes
00232fec0000c1d3       yes              yes
00232fd50000bf53       no               yes
00232fbd0000bc17 <---- used sortkey

Additionally, if you supply a &n=[integer] parameter to limit the
number of results, the result set for a ?P= query has the
"_more_changes" key set on the first object, while a &sortkey_after=
query has it set on the last object.

(This result skipping, and shuffling around "_more_changes" does not
occur for ?q=sortkey_before or ?N= queries)

Bottom line: When trying to process the data automatically, I'd go for
using &n=[integer] to obtain "_more_changes" marker, but I would not
rely on getting at most [integer] results. Be prepared that there may
be one additional result in the result set. Furthermore, I'd go for the
&N=, and &P= variants, keeping in mind that the "_more_changes" need
not be at the end.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links