Last modified: 2014-10-16 01:51:23 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T65800, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 63800 - Page generators not working with Wikidata
Page generators not working with Wikidata
Status: RESOLVED FIXED
Product: Pywikibot
Classification: Unclassified
pagegenerators (Other open bugs)
core-(2.0)
All All
: Unprioritized major
: ---
Assigned To: John Mark Vandenberg
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-04-11 00:49 UTC by sofardamngood@reallymymail.com
Modified: 2014-10-16 01:51 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description sofardamngood@reallymymail.com 2014-04-11 00:49:03 UTC
Pegegenerator has been broken for Wikidata scripts like harvest_template or claimit since February. The diff http://git.wikimedia.org/blob/pywikibot%2Fcore.git/b9ddecb363a1c208b507dbfe5bc0774dfb7cd253/pywikibot%2Fpagegenerators.py is the last working version.

A command like
python pwb.py claimit -family:wikipedia -lang:en -transcludes:'Infobox video game' P19 Q30
is supposed to create a generator with pages transcluding the template on the given Wikipedia, but the current version of pagegenerators ignores the arguments and tries to fetch the pages from wikidatawiki instead, which of course fails.

More information about this bug is available here: https://de.wikipedia.org/w/index.php?title=Benutzer_Diskussion:Xqt&oldid=129234287#Page_generators
Comment 1 John Mark Vandenberg 2014-05-24 15:14:31 UTC
My guess is that you have your user-config.py family & lang set to wikidatawiki.

The problem is that these scripts, and many others, instantiate a pagegenerators.py class GeneratorFactory before calling pywikibot.handleArgs().  handleArgs is where the command line family & lang are parsed and set up.  GeneratorFactory instantiates a default site object in the constructor, which means handleArgs must be completed prior to instantiating the GeneratorFactory.

If your user-config.py was defaulted to wikidatawiki, the generator factory would create generators against wikidatawiki.

It looks like this regression was caused by bug 54540 / https://gerrit.wikimedia.org/r/#/c/112436/

Before that change, a default site object was instantiated for each argument that the GeneratorFactory parsed.

There are three ways I can see to fix this:
1. the GeneratorFactory obtains a default site object for each argument again, if a site object wasnt provided in the constructor
2. change all the scripts to call pywikibot.handleArgs() before instantiating a GeneratorFactory. (i.e. same as delete.py)
3. bot.handleArgs is called transparently from the GeneratorFactory constructor, or the Site constructor, with nonGlobalArgs cached to be later processed when bot.handleArgs is called a second time.

Note that there has been the possibility of pagegenerators pulling in pages from multiple wikis using args:

 ... -family:wikipedia -lang:nl -transcludes:'Taxobox' -lang:en -transcludes:'Taxobox' 

However it also means the following are not identical:

 ... -family:wikipedia -lang:nl -transcludes:'Taxobox'
 ... -transcludes:'Taxobox' -lang:en -family:wikipedia


option 2 & 3 above would prevent that hack from working, but would mean the order of global arguments is not important.

option 3 makes the global arguments effective in any script which doesnt currently call handleArgs.  There are no scripts in core that this would apply to, but scripts in the wild may break as a result if they have 're-purposed' a global argument name.  (the three scripts in core which dont call handleArgs also dont use page generators).

Option 2 looks to be the most efficient and best at self-documenting code.  Im happy to do any of the options, or other options I havent thought about.
Comment 2 Gerrit Notification Bot 2014-05-25 02:13:16 UTC
Change 135287 had a related patch set uploaded by John Vandenberg:
Bug 63800: Call handleArgs before GeneratorFactory

https://gerrit.wikimedia.org/r/135287
Comment 3 Gerrit Notification Bot 2014-05-25 09:37:30 UTC
Change 135287 merged by jenkins-bot:
Bug 63800: Call handleArgs before GeneratorFactory

https://gerrit.wikimedia.org/r/135287

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links