Last modified: 2013-06-12 00:25:29 UTC
We should amortize per-request overheads by batching API requests. AFAIK there is no generic batching support in the API currently, and adding it would probably be a bit too time-consuming for now. Instead we can add a unique string as a separator, for example something like <nowiki>d41d8cd98f00b204e9800998ecf8427e</nowiki> This would work for for action=expandtemplates and action=parse which are the main workarounds we currently use. In the longer term we should switch to explicit methods that don't involve parsing wikitext, which is probably also a good moment to add real batching support. Decisions about batch sizes could be based on wikitext source size initially (based on the assumption that templates with a bazillion parameters also take longer to expand). A fixed number of templates per batch would be another simple alternative. Really fancy batching could use stats of previous processing times (returned by the PHP preprocessor per transclusion and stored in HTML). Ideally we would also enforce separation between batched requests to avoid non-deterministic behavior for stateful extensions.
action=query was built specifically to support batching, but other actions might choose not to implement it. Could you give a very specific example of * How the final api request should look like? * What do you expect in return.
For us, the main use cases are template expansion using the PHP preprocessor and the expansion of extension tags. Currently, both of these are only available with a detour through the PHP parser (action=expandtemplates and action=parse respectively), so we'd like to add direct API end points to perform these actions. A general batching mechanism would probably accept a posted JSON array with each member being an object containing the API arguments for one action (action etc). The return value would be a JSON array with results corresponding to the request array. Instead of relying on positions in an array the client could also supply UIDs for requests which are then mirrored in the result. This could be achieved with an reqid member in each request object. The problem with general batching is that state in MediaWiki is not very well defined. Several extensions for example keep internal state between calls to tag hooks, and resetting that state between individual requests in a batch would probably produce a similar overhead as the regular per-request setup (30+ms). In Parsoid, we can work around this issue for our specific use cases by maintaining a whitelist of stateless extensions which can be mixed in a batch without producing non-deterministic output. Non-whitelisted extensions would need to be expanded in a single action=parse request containing all extension tags for that particular extension in page order.
Adding custom batching support just for template expansion and extension tag calls would really be a workaround, and would add complexity to both clients and the server. If the performance of API requests could be improved in general, we could and should probably avoid that complex workaround in favor of the general solution. In the longer term, our HTML storage and incremental re-parsing / expansion plans should reduce the number of template expansions and extension calls to a fraction of the level we'll produce initially. The round-trip testing we are currently performing is already close to the edit rate of the English Wikipedia and did not create any issues on the API cluster. A Parsoid installation tracking all edits on all wikis would probably create 2-3 times the number of API requests, which are probably still doable for the current API cluster. Overall it seems to be a good idea to hold off on a custom batching solution until it is clear that it is really needed and general API performance improvements are not possible.
The current performance numbers combined with several optimizations aimed at avoiding API requests lessened the need for API request batching on our end. Speeding up individual API requests is a general architectural goal, and should be addressed in that context. Closing this bug as WONTFIX on our end.