Last modified: 2014-07-18 14:45:27 UTC
Today my bot encountered https://www.wikidata.org/wiki/Q3211336 . This contains the deleted property P288. This crashes the bot: Traceback (most recent call last): File "C:\pywikibot\core\article_to_category.py", line 97, in <module> main() File "C:\pywikibot\core\article_to_category.py", line 94, in main bot.run() File "C:\pywikibot\core\article_to_category.py", line 52, in run if not u'P910' in itemB.get().get('claims').keys(): File "C:\pywikibot\core\pywikibot\page.py", line 2571, in get c = Claim.fromJSON(self.repo, claim) File "C:\pywikibot\core\pywikibot\page.py", line 2776, in fromJSON if claim.getType() == 'wikibase-item': File "C:\pywikibot\core\pywikibot\page.py", line 2715, in getType self.type = self.repo.getPropertyType(self) File "C:\pywikibot\core\pywikibot\site.py", line 3533, in getPropertyType dtype = data['entities'][prop.getID().lower()]['datatype'] KeyError: u'p288' If you look at https://git.wikimedia.org/blob/pywikibot%2Fcore.git/a1d63c8ba608b87d3a3aff6acc8dbbc29085eb8f/pywikibot%2Fpage.py from line 2776 and furthere, you'll see that it tries to lookup the type of the property. That fails because the property is deleted. I don't think the lookup is needed, see https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q3211336&format=json : "P288": [ { "id": "q3211336$00f2cba0-4c06-ee8d-9598-d8c034b663ed", "mainsnak": { "snaktype": "value", "property": "P288", "datavalue": { "value": { "entity-type": "item", "numeric-id": 2304194 }, "type": "wikibase-entityid" } }, "type": "statement", "rank": "normal" } ] This should be used to construct the claim instead of the extra lookup
Unfortunately that won't work in some cases like commonsMedia or url. If you look at https://www.wikidata.org/w/api.php?action=wbgetentities&ids=q76&format=jsonfm: "property": "P18", "datavalue": { "value": "President Barack Obama.jpg", "type": "string" } We could probably implement a fallback for the ones that will work like entity, coordinate and time, and emit a warning like "P288 does not exist" This error is probably rare to run into since the datatype is cached forever, but most scripts need to re-cache everything because the ids went uppercase.
It looks like the problem mentioned in comment 1 is no longer an problem, as the JSON now includes the mainsnak datatype, whereas I assume it didnt previously based on comment 1. e.g. q76 JSON now includes "datatype" "P18": [ { "id": "q76$AA25BE21-FFCA-4C0A-A7B4-C041CBE549F7", "mainsnak": { "snaktype": "value", "property": "P18", "datatype": "commonsMedia", "datavalue": { "value": "President Barack Obama.jpg", "type": "string" } }, I ran into a very tightly related bug while loading items from an xml dump, without access to the wiki the dump was from - such as with a firewall in the way. If I understand this correctly, any time the JSON parser finds a 'datatype' in a snak, it can use that instead of doing a lookup. The JSON *may* not include this, in which a lookup is required, and if a lookup isnt possible then the code should raise an appropriate exception.
(In reply to John Mark Vandenberg from comment #2) > It looks like the problem mentioned in comment 1 is no longer an problem, as > the JSON now includes the mainsnak datatype, whereas I assume it didnt > previously based on comment 1. > Awesome! > If I understand this correctly, any time the JSON parser finds a 'datatype' > in a snak, it can use that instead of doing a lookup. The JSON *may* not > include this, in which a lookup is required, and if a lookup isnt possible > then the code should raise an appropriate exception. Yup, sounds good.
Change 137737 had a related patch set uploaded by Ricordisamoa: initialize PropertyPage directly with datatype https://gerrit.wikimedia.org/r/137737
Change 137737 merged by jenkins-bot: initialize Property directly with datatype https://gerrit.wikimedia.org/r/137737