Last modified: 2014-11-08 11:56:37 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T73459, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 71459 - Wrong parsing of centuries and millennia
Wrong parsing of centuries and millennia
Status: NEW
Product: MediaWiki extensions
Classification: Unclassified
WikidataRepo (Other open bugs)
unspecified
All All
: High critical (vote)
: ---
Assigned To: Wikidata bugs
u=dev c=backend p=0
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-09-30 08:03 UTC by Daniel Kinzler
Modified: 2014-11-08 11:56 UTC (History)
7 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Daniel Kinzler 2014-09-30 08:03:19 UTC
Currently, entering "20. century" will result in the timestamp +00000002000-01-01T00:00:00Z with the precision set to "century"[1].
Similarly, entering "3. millennium" will result in the timestamp +00000003000-01-01T00:00:00Z with the precision set to "millennium"[2].

This is clearly wrong: the 20th century is the one from the start of 1901 to the end of 2000,
the 3rd millennium is the one from the start of 2001 to the end of 3000.

So, "20. century" should result in the timestamp +00000001901-01-01T00:00:00Z, with precision "century" (and before=0 and after=1 [3]), to accurately represent the century between the start of 2001 and the end of 2100.
Similarly, "3. millennium" should result in the timestamp +00000002001-01-01T00:00:00Z.

An alternative fix would be to set the before=1 and after=0 - but then, the timestamp would still be off by a year (the 20th century ended 2000-12-31T23:59:59, not 2000-01-01T00:00:00).

When fixing this, we should also investigate how many century/millennium dates we already have in the database. Many of these are likely to have the wrong timestamp.

[1] https://www.wikidata.org/w/index.php?title=Q4115189&diff=160572560&oldid=160505889
[2] https://www.wikidata.org/w/index.php?title=Q4115189&diff=160674691&oldid=160586967
[3] we actually set both before and after to 0 at the moment. That's bug 65253. Per default, after should be 1, otherwise the precision would be meaningless (as before and after are factors to be applied to the precision).
Comment 1 Daniel Kinzler 2014-09-30 08:04:27 UTC
oops, should be "to accurately represent the century between the start of 1901 and the end of 2000."
Comment 2 Daniel Kinzler 2014-09-30 08:05:56 UTC
setting to "critical" since this behavior causes silent loss of information (the bad timestamp is only visible in the diff). The incorrect interval will only become apparent once we start to run queries against dates.
Comment 3 Henning 2014-10-28 16:08:22 UTC
This is an issue of the back-end parser. Submitting "20. century" to the parser results in returning {"time":"+0000000000002000-00-00T00:00:00Z","timezone":0,"before":0,"after":0,"precision":7,"calendarmodel":"http://www.wikidata.org/entity/Q1985727"} as data value.
Comment 4 Gordon P. Hemsley 2014-11-08 11:56:37 UTC
This will require substantial changes to the way such days are calculated.

The code is in:
* https://git.wikimedia.org/blob/mediawiki%2Fextensions%2FWikibase.git/7702b647e49cb445e34eb8a581adbbe0f9e29797/lib%2Fincludes%2Fparsers%2FMWTimeIsoParser.php
* https://git.wikimedia.org/blob/mediawiki%2Fextensions%2FWikibase.git/7702b647e49cb445e34eb8a581adbbe0f9e29797/extensions%2FWikibase%2Flib%2Fincludes%2Fformatters%2FMwTimeIsoFormatter.php

Right now, the precision of dates is calculated by extracting only the significant digits. The insignificant digits are always 0. Turning "20. century" into 1901 instead of 2000 (and back again) will require additional special logic.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links