Last modified: 2014-01-29 16:30:07 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T62206, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 60206 - site.preloadpages does not preload all links and templates
site.preloadpages does not preload all links and templates
Status: RESOLVED FIXED
Product: Pywikibot
Classification: Unclassified
General (Other open bugs)
core-(2.0)
All All
: Unprioritized normal
: ---
Assigned To: Pywikipedia bugs
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-01-18 15:50 UTC by Mpaa
Modified: 2014-01-29 16:30 UTC (History)
1 user (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Mpaa 2014-01-18 15:50:21 UTC
When in def preloadpages(self, pagelist, groupsize=50, templates=False, langlinks=False) templates=True and langlinks=True, not all lnks/templates are returned.

import pywikibot

site = pywikibot.Site('en', 'wikipedia')
page = pywikibot.Page(site, 'Main Page')

for p in site.preloadpages([page], templates=True, langlinks=True):
    pass
print 'p._templates', len(page._templates)
print 'p._langlinks', len(page._langlinks)

They are actually more, see https://en.wikipedia.org/w/api.php?maxlag=5&format=jsonfm&rvprop=ids|flags|timestamp|user|comment|content&prop=revisions|info|categoryinfo|templates|langlinks&titles=Main+Page&meta=userinfo&indexpageids=&action=query&uiprop=blockinfo|hasmsg
Comment 1 Mpaa 2014-01-18 15:51:22 UTC
Retrieving 1 pages from wikipedia:en.
p._templates 10
p._langlinks 10
Comment 2 Merlijn van Deen (test) 2014-01-18 15:56:02 UTC
The actual query used is https://en.wikipedia.org/w/api.php?maxlag=5&format=json&rvprop=ids%7Cflags%7Ctimestamp%7Cuser%7Ccomment%7Ccontent&prop=revisions%7Cinfo%7Ccategoryinfo%7Ctemplates%7Clanglinks&titles=Main+Page&meta=userinfo&indexpageids=&action=query&uiprop=blockinfo%7Chasmsg


i.e.
maxlag:        5
format:        json
rvprop:        ids|flags|timestamp|user|comment|content
prop:          revisions|info|categoryinfo|templates|langlinks
titles:        Main Page
meta:          userinfo
indexpageids:
action:        query
uiprop:        blockinfo|hasmsg


it's clear not all results are returned (see the continue header), BUT according to Yuri, the continue header uses here is broken (this is https://www.mediawiki.org/wiki/API:Legacy_Query_Continue instead of https://www.mediawiki.org/wiki/API:Query#Continuing_queries).
Comment 3 Mpaa 2014-01-18 18:54:23 UTC
Is it an option to migrate to https://www.mediawiki.org/wiki/API:Query#Continuing_queries? This supported only from MediaWiki version: ≥ 1.21.
Comment 4 Merlijn van Deen (test) 2014-01-18 18:59:07 UTC
After re-reading the Legacy Query Continue page, I think supporting that in this case is not a huge hassle - we don't use a generator, so there is no need to seperate the different query-continue parameters...
Comment 5 Mpaa 2014-01-19 09:22:41 UTC
There are 2 issues.
1) query does not query-continue because self.continuekey is not recognized (see https://bugzilla.wikimedia.org/show_bug.cgi?id=55193)
2) even if it did, there would be multiple chunks yielded for each page and api.update_page() just record the last returned
Comment 6 Gerrit Notification Bot 2014-01-28 21:33:54 UTC
Change 110067 had a related patch set uploaded by Mpaa:
Bug 60206 - site.preloadpages does not preload all links and templates

https://gerrit.wikimedia.org/r/110067
Comment 7 Gerrit Notification Bot 2014-01-29 16:30:07 UTC
Change 110067 merged by jenkins-bot:
Bug 60206 - site.preloadpages does not preload all links and templates

https://gerrit.wikimedia.org/r/110067

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links