Last modified: 2013-06-12 00:25:29 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T45888, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 43888 - Batch API requests
Batch API requests
Status: RESOLVED WONTFIX
Product: Parsoid
Classification: Unclassified
token-stream transforms (Other open bugs)
unspecified
All All
: Low enhancement
: ---
Assigned To: Gabriel Wicke
: performance
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-01-11 22:32 UTC by Gabriel Wicke
Modified: 2013-06-12 00:25 UTC (History)
8 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Gabriel Wicke 2013-01-11 22:32:04 UTC
We should amortize per-request overheads by batching API requests.

AFAIK there is no generic batching support in the API currently, and adding it would probably be a bit too time-consuming for now. Instead we can add a unique string as a separator, for example something like 

<nowiki>d41d8cd98f00b204e9800998ecf8427e</nowiki>

This would work for for action=expandtemplates and action=parse which are the main workarounds we currently use. In the longer term we should switch to explicit methods that don't involve parsing wikitext, which is probably also a good moment to add real batching support.

Decisions about batch sizes could be based on wikitext source size initially (based on the assumption that templates with a bazillion parameters also take longer to expand). A fixed number of templates per batch would be another simple alternative. Really fancy batching could use stats of previous processing times (returned by the PHP preprocessor per transclusion and stored in HTML).

Ideally we would also enforce separation between batched requests to avoid non-deterministic behavior for stateful extensions.
Comment 1 Yuri Astrakhan 2013-02-03 21:30:05 UTC
action=query was built specifically to support batching, but other actions might choose not to implement it. Could you give a very specific example of
* How the final api request should look like?
* What do you expect in return.
Comment 2 Gabriel Wicke 2013-02-03 21:51:47 UTC
For us, the main use cases are template expansion using the PHP preprocessor and the expansion of extension tags. Currently, both of these are only available with a detour through the PHP parser (action=expandtemplates and action=parse respectively), so we'd like to add direct API end points to perform these actions.

A general batching mechanism would probably accept a posted JSON array with each member being an object containing the API arguments for one action (action etc). The return value would be a JSON array with results corresponding to the request array. Instead of relying on positions in an array the client could also supply UIDs for requests which are then mirrored in the result. This could be achieved with an reqid member in each request object.

The problem with general batching is that state in MediaWiki is not very well defined. Several extensions for example keep internal state between calls to tag hooks, and resetting that state between individual requests in a batch would probably produce a similar overhead as the regular per-request setup (30+ms).

In Parsoid, we can work around this issue for our specific use cases by maintaining a whitelist of stateless extensions which can be mixed in a batch without producing non-deterministic output. Non-whitelisted extensions would need to be expanded in a single action=parse request containing all extension tags for that particular extension in page order.
Comment 3 Gabriel Wicke 2013-02-21 18:56:18 UTC
Adding custom batching support just for template expansion and extension tag calls would really be a workaround, and would add complexity to both clients and the server. If the performance of API requests could be improved in general, we could and should probably avoid that complex workaround in favor of the general solution.

In the longer term, our HTML storage and incremental re-parsing / expansion plans should reduce the number of template expansions and extension calls to a fraction of the level we'll produce initially. The round-trip testing we are currently performing is already close to the edit rate of the English Wikipedia and did not create any issues on the API cluster. A Parsoid installation tracking all edits on all wikis would probably create 2-3 times the number of API requests, which are probably still doable for the current API cluster.

Overall it seems to be a good idea to hold off on a custom batching solution until it is clear that it is really needed and general API performance improvements are not possible.
Comment 4 Gabriel Wicke 2013-06-12 00:25:29 UTC
The current performance numbers combined with several optimizations aimed at avoiding API requests lessened the need for API request batching on our end.

Speeding up individual API requests is a general architectural goal, and should be addressed in that context. Closing this bug as WONTFIX on our end.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links