Last modified: 2014-01-29 16:30:07 UTC
When in def preloadpages(self, pagelist, groupsize=50, templates=False, langlinks=False) templates=True and langlinks=True, not all lnks/templates are returned. import pywikibot site = pywikibot.Site('en', 'wikipedia') page = pywikibot.Page(site, 'Main Page') for p in site.preloadpages([page], templates=True, langlinks=True): pass print 'p._templates', len(page._templates) print 'p._langlinks', len(page._langlinks) They are actually more, see https://en.wikipedia.org/w/api.php?maxlag=5&format=jsonfm&rvprop=ids|flags|timestamp|user|comment|content&prop=revisions|info|categoryinfo|templates|langlinks&titles=Main+Page&meta=userinfo&indexpageids=&action=query&uiprop=blockinfo|hasmsg
Retrieving 1 pages from wikipedia:en. p._templates 10 p._langlinks 10
The actual query used is https://en.wikipedia.org/w/api.php?maxlag=5&format=json&rvprop=ids%7Cflags%7Ctimestamp%7Cuser%7Ccomment%7Ccontent&prop=revisions%7Cinfo%7Ccategoryinfo%7Ctemplates%7Clanglinks&titles=Main+Page&meta=userinfo&indexpageids=&action=query&uiprop=blockinfo%7Chasmsg i.e. maxlag: 5 format: json rvprop: ids|flags|timestamp|user|comment|content prop: revisions|info|categoryinfo|templates|langlinks titles: Main Page meta: userinfo indexpageids: action: query uiprop: blockinfo|hasmsg it's clear not all results are returned (see the continue header), BUT according to Yuri, the continue header uses here is broken (this is https://www.mediawiki.org/wiki/API:Legacy_Query_Continue instead of https://www.mediawiki.org/wiki/API:Query#Continuing_queries).
Is it an option to migrate to https://www.mediawiki.org/wiki/API:Query#Continuing_queries? This supported only from MediaWiki version: ≥ 1.21.
After re-reading the Legacy Query Continue page, I think supporting that in this case is not a huge hassle - we don't use a generator, so there is no need to seperate the different query-continue parameters...
There are 2 issues. 1) query does not query-continue because self.continuekey is not recognized (see https://bugzilla.wikimedia.org/show_bug.cgi?id=55193) 2) even if it did, there would be multiple chunks yielded for each page and api.update_page() just record the last returned
Change 110067 had a related patch set uploaded by Mpaa: Bug 60206 - site.preloadpages does not preload all links and templates https://gerrit.wikimedia.org/r/110067
Change 110067 merged by jenkins-bot: Bug 60206 - site.preloadpages does not preload all links and templates https://gerrit.wikimedia.org/r/110067