Last modified: 2012-11-29 13:13:10 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T40234, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 38234 - wbsetitem api action returns invalid xml on error
wbsetitem api action returns invalid xml on error
Status: VERIFIED FIXED
Product: MediaWiki extensions
Classification: Unclassified
WikidataRepo (Other open bugs)
unspecified
All All
: Unprioritized blocker (vote)
: ---
Assigned To: Wikidata bugs
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-07-07 18:17 UTC by merl
Modified: 2012-11-29 13:13 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description merl 2012-07-07 18:17:54 UTC
My bot gets an error message when parsing response of wbsetitem

This bug is not about the error itself but about the wrong return format:

Request:
http://wikidata-test-repo.wikimedia.de/w/api.php?action=wbsetitem&format=xml&item=add&data={%22label%22%3A{%22en%22%3A{%22language%22%3A%22en%22%2C%22value%22%3A%22Sina%22}%2C%22de%22%3A{%22language%22%3A%22de%22%2C%22value%22%3A%22Sina%22}%2C%22la%22%3A{%22language%22%3A%22la%22%2C%22value%22%3A%22Sina%22}%2C%22fy%22%3A{%22language%22%3A%22fy%22%2C%22value%22%3A%22Sina%22}%2C%22nl%22%3A{%22language%22%3A%22nl%22%2C%22value%22%3A%22Sina%22}%2C%22mg%22%3A{%22language%22%3A%22mg%22%2C%22value%22%3A%22Sina%22}}%2C%22links%22%3A{%22en%22%3A{%22site%22%3A%22en%22%2C%22title%22%3A%22Sina%22}%2C%22de%22%3A{%22site%22%3A%22de%22%2C%22title%22%3A%22Sina%22}%2C%22la%22%3A{%22site%22%3A%22la%22%2C%22title%22%3A%22Sina%22}%2C%22fy%22%3A{%22site%22%3A%22fy%22%2C%22title%22%3A%22Sina+%28betsjuttings%29%22}%2C%22nl%22%3A{%22site%22%3A%22nl%22%2C%22title%22%3A%22Sina%22}%2C%22mg%22%3A{%22site%22%3A%22mg%22%2C%22title%22%3A%22Sina%22}}}

Response:

Fatal error: Call to a member function getPrefixedDBkey() on a non-object in /var/www/wikidata-test-repo.wikimedia.de/w/extensions/Wikibase/repo/includes/ItemContent.php on line 139 

So the returned content is plain text although the request contains 'format=xml'. This should _never_ happend because most parsers always expect valid xml. So please always catch all errors and wrap then into valid xml.

An exmaple from the same module containing another error, but returning valid, so that it can be handled by the requestor:
http://wikidata-test-repo.wikimedia.de/w/api.php?action=wbsetitem&format=xml&item=add&data={%22label%22%3A{%22en%22%3A{%22language%22%3A%22en%22%2C%22value%22%3A%22ABC%22}}%2C%22links%22%3A{%22en%22%3A{%22site%22%3A%22en%22%2C%22title%22%3A%22ABC%22}}}
Respone:
<?xml version="1.0"?><api><error code="internal_api_error_DBQueryError" info="Database query error" xml:space="preserve">

#0 /var/www/wikidata-test-repo.wikimedia.de/w/includes/db/Database.php(939): DatabaseBase-&gt;reportQueryError('Duplicate entry...', 1062, 'INSERT  INTO `w...', 'Wikibase\ItemSt...', false)
...
</error></api>
Comment 1 jeblad 2012-07-08 08:44:14 UTC
Short answer:
First URL hits a bug in the code, but this bug is now partly fixed in a new version. Any fatal bugs due to faulty program flow may result in bugs producing any kind of formats, including but not limited to free text.

In addition you use language as site ids, which should have be catched but this module bypasses validity checks. Your call in this case is flawed.

Second URL tries to make a duplicate entry. This was allowed in some cases previously, but is not allowed anymore.

Long answer:
The function of wbsetitem is mostly undocumented and undefined, and may in the future include additional validation of the arguments. For the moment _all_ actions that includes json passed to this module is a feature but unsupported and may go away. ;)

The servers may, or may not, run in a debug mode where GET requests are allowed. If they are allowed the GET requests are limited in length. When they are limited in length they will be truncated. When they are truncated they will fail because the json will be invalid and the call to json_decode will fail. The code is currently missing several checks on validity when transitioning from a json structure and to a item structure (which is basically a json structure itself).

The reason why it is so is because this module tries to map a json-structure to an array structure representing an item. Later this array structure is recreated as a json structure in the item itself and as rows in special tables in the database structure. The mappings from json to the array are not well-defined, and especially the handling of requests that somehow violates the existing constraints are not defined at all.

Note that json used as input to wbsetitem is _not_ the same as the json you get as output.

In my opinion _all_ calls that violates _any_ constraint should fail.

Especially note that the repo does important normalization and validation when called through wbsetsitelink, and that those are bypassed when you use wbsetitem. It is highly likely that the same normalization and validation will be enforced in wbsetitem.

First call reads something like
http://wikidata-test-repo.wikimedia.de/w/api.php?action=wbsetitem&format=xml&item=add&data=
{
	"label":{
		"en":{"language":"en","value":"Sina"},
		"de":{"language":"de","value":"Sina"},
		"la":{"language":"la","value":"Sina"},
		"fy":{"language":"fy","value":"Sina"},
		"nl":{"language":"nl","value":"Sina"},
		"mg":{"language":"mg","value":"Sina"}
	},
	"links":{
		"en":{"site":"en","title":"Sina"},
		"de":{"site":"de","title":"Sina"},
		"la":{"site":"la","title":"Sina"},
		"fy":{"site":"fy","title":"Sina (betsjuttings)"},
		"nl":{"site":"nl","title":"Sina"},
		"mg":{"site":"mg","title":"Sina"}
	}
}

It should be
http://wikidata-test-repo.wikimedia.de/w/api.php?action=wbsetitem&format=xml&item=add&data=
{
	"label":{
		"en":"Sina",
		"de":"Sina",
		"la":"Sina",
		"fy":"Sina",
		"nl":"Sina",
		"mg":"Sina"
	},
	"links":{
		"enwiki":"Sina",
		"dewiki":"Sina",
		"lawiki":"Sina",
		"fywiki":"Sina (betsjuttings)",
		"nlwiki":"Sina",
		"mgwiki":"Sina"
	}
}

First form is invalid after a previous change of the internal transform from an array to the internal json structure. It does not fail due to bypassed validation checks, but it should fail.

Second call reads something like
http://wikidata-test-repo.wikimedia.de/w/api.php?action=wbsetitem&format=xml&item=add&data=
{
	"label":{
		"en":{"language":"en","value":"ABC"}
	},
	"links":{
		"en":{"site":"en","title":"ABC"}
	}
}

It should be
http://wikidata-test-repo.wikimedia.de/w/api.php?action=wbsetitem&format=xml&item=add&data=
{
	"label":{
		"en":"ABC"
	},
	"links":{
		"enwiki":"ABC"
	}
}
Comment 2 jeblad 2012-07-09 12:06:39 UTC
Note that upcoming changes will make the previous examples fail hard.
https://gerrit.wikimedia.org/r/#/c/14762/

See also the documentation on Mediawiki.org
http://www.mediawiki.org/wiki/Extension:Wikibase/API#wbsetitem

Use of wbsetitem to set sitelinks still bypasses normalization, and it is the bot operators responsibility to check and verify that only valid canonical page names are used. Verification of the external page name will (probably) be added later.
Comment 3 merl 2012-07-09 12:22:28 UTC
As i already said, the bug is not about the reported error, but about the returned invalid format caused by an error (there can be many other kind of errors in future).

My SAXParser throws an expection while reading the inputstream from tcp socket, because it expects valid xml if the response header contains a 2xx status code.

If you wrap the error within an xml message everything would be ok and the bot could do error handling based on error code.
Comment 4 merl 2012-07-09 13:31:49 UTC
Just to let you know:
* My bot uses post request
* All my submitted page titles should be normalised because these strings are extracted from a previous api request containing a /page/@title attribute value.
Comment 5 jeblad 2012-07-09 14:25:42 UTC
Sounds good. Note also that the list "label" should use the keyword "labels", "description" should use "descriptions", and "links" should use "sitelinks".

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links