Last modified: 2013-09-04 10:33:56 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T51425, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 49425 - Invalid Timevalues stored in database
Invalid Timevalues stored in database
Status: VERIFIED FIXED
Product: MediaWiki extensions
Classification: Unclassified
WikidataRepo (Other open bugs)
unspecified
All All
: High major (vote)
: ---
Assigned To: Wikidata bugs
:
Depends on: 49264
Blocks:
  Show dependency treegraph
 
Reported: 2013-06-11 08:07 UTC by Byrial Jensen
Modified: 2013-09-04 10:33 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Byrial Jensen 2013-06-11 08:07:04 UTC
While parsing a database dump for Wikidata item [[Q441536]], I found this statement:

{"m":["value",570,"time",{"time":"+0000000 1998-10-21T00:00:00Z","timezone":0,"before":0,"after":0,"precision":11,"calendarmodel":"http://www.wikidata.org/entity/Q1985727"}],"q":[],"g":"q441536$3568592E-BA6D-48A3-9FBE-C558F8A47415","rank":1,"refs":[[["value",143,"wikibase-entityid",{"entity-type":"item","numeric-id":328}]]]}

There is a space in the year after the first 7 digits. When I look at wiki page for the item, the date is shown as "October 21, 1 BCE" so only the first digits (all zeroes) until the space is used. The correct year is of course 1998.

You will also see the space in the diff for the insertion of the statement: http://www.wikidata.org/w/index.php?title=Q441536&diff=48523994&oldid=48523992
Comment 1 Byrial Jensen 2013-06-11 08:40:31 UTC
There is also a space between the day and "T" in some cases, like this "+00000001940-10-5 T00:00:00Z" (taken from Q1825747, diff for insertion: http://www.wikidata.org/w/index.php?title=Q1825747&diff=48577191&oldid=45206040 )

That gives this error text on the wiki page for the item: The value does not comply with the property's definition.
The value's data value type "ununserializable" does not match the property's data type's data value type "time".

There is a new complete database dump in progress right now. When it is done, I can make a list of all occurrences for of invalid time values.
Comment 2 Daniel Kinzler 2013-06-11 09:43:41 UTC
This is caused by a bug in the bot making the edit. Please advise the bot's owner and block the bot if necessary.

Of course, wikidata shouldn't accept broken dates. This is a known issue: Time values are currently not properly validated by the API, see bug 49264. Since I6990983 is merged, this is technically fixed; I recommend a backport of the fix though, and a hotfix deployment, so let's keep the bug open until that is done.
Comment 3 Gerrit Notification Bot 2013-06-11 10:20:08 UTC
Related URL: https://gerrit.wikimedia.org/r/67962 (Gerrit Change I6990983ef0c0cad7c9d4f271bdf803902b94230b)
Comment 4 Byrial Jensen 2013-06-12 16:14:53 UTC
There is 55 cases of malformed timevalues in the new database dump dated 2013-06-10. There is a list at http://www.wikidata.org/wiki/User:Byrial/Bad_time_values
Comment 5 Gerrit Notification Bot 2013-06-13 14:26:11 UTC
Related URL: https://gerrit.wikimedia.org/r/68397 (Gerrit Change Ib3e7b16c203d08008d7465859af0e1e7f940db14)
Comment 6 Gerrit Notification Bot 2013-06-21 08:56:59 UTC
https://gerrit.wikimedia.org/r/68397 (Gerrit Change Ib3e7b16c203d08008d7465859af0e1e7f940db14) | change ABANDONED [by Daniel Kinzler]
Comment 7 Byrial Jensen 2013-06-29 07:01:27 UTC
There was 15 new cases of malformed time values in the database dump of 2013-06-23, all inserted by the same bot at 2013-06-14 and 2013-06-15. It was values like:

+0000000or-02-22T00:00:00Z
+00000001603.-01-01T00:00:00Z
+00000001239-06-17/18T00:00:00Z
+00000001650)-06-19T00:00:00Z
+00000001601/1602-05-02T00:00:00Z
+00000001869-09-26 (disputed)T00:00:00Z
+0000000or-07-31T00:00:00Z
+0000000January-01-16T00:00:00Z
+00000001878''(''Some-05-12T00:00:00Z
+00000001587/8-01-12T00:00:00Z
+00000001766?-03-16T00:00:00Z
+00000001985-10-22 correct date is October 27, 1985T00:00:00Z
+0000000or-12-17T00:00:00Z
Comment 8 Daniel Kinzler 2013-06-29 08:26:13 UTC
Note: with I67b9ae480c, this should not happen any more. I67b9ae480c provides input validation for time values.

It does not however enforce time format rules on TimeValue objects, so "bad" time values are not yet detected when found in the database, etc.

This bug should remain open until strict validation is implemented (see  I72d6b6d890), but I don't think it's very urgent any more since it should now be impossible to enter bad values via the API.
Comment 9 Daniel Kinzler 2013-07-17 11:27:31 UTC
Keep this open until I72d6b6d89 is merged.
Comment 10 denny vrandecic 2013-07-18 09:32:45 UTC
Is merged. We will check the dump one more time and then close this.
Comment 11 Daniel Kinzler 2013-07-18 11:30:17 UTC
it's not merged.
Comment 12 Byrial Jensen 2013-07-30 19:06:19 UTC
Either this is fixed now, or the bots is better. There was no malformed time values in the 2013-07-17 database dump.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links