Last modified: 2014-11-13 11:31:42 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T74183, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 72183 - New serialization code needs to support language fallback
New serialization code needs to support language fallback
Status: PATCH_TO_REVIEW
Product: MediaWiki extensions
Classification: Unclassified
WikidataRepo (Other open bugs)
unspecified
All All
: Normal normal (vote)
: ---
Assigned To: Wikidata bugs
u=dev c=backend p=0 s=2014-11-11
:
Depends on:
Blocks: 71170
  Show dependency treegraph
 
Reported: 2014-10-17 13:48 UTC by Daniel Kinzler
Modified: 2014-11-13 11:31 UTC (History)
5 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Daniel Kinzler 2014-10-17 13:48:09 UTC
The serializer needs to be able to represent language fallback - that is, they key used for a label or description term can be different from the actual language of the term. For example, consider <https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q42&languages=ii&format=json&languagefallback>: The requested language is ii, but since there is no label known in ii, but a language fallback from zh-cn to ii is defined, we get back the zh-cn label for the ii key:

  "labels":{
    "ii":{
      "language":"zh-cn",
      "value":"\u9053\u683c\u62c9\u65af\u00b7\u4e9a\u5f53\u65af"
    }
  }

We will need to support at least this behavior, either in the serializer or in the model.

Note: According to bug 72038, "ii" should also be present in a separate field called "for-language". And that in cases where translitteration is involved, there may be a third language (the original language) involved.
Comment 1 Adrian Lang 2014-10-23 11:21:13 UTC
After a discussion today we agreed to the following:

* The data model has to have knowledge of the fact that an entity can have a term for another language than the term is in
* The data model serializers and deserializers have to have knowledge of the fact that there are language fallbacks

Premises:
* We want to provide a view on our data which includes for example language fallbacks (for the API, for wbEntity, …)
* We want to enable users (for example the JavaScript frontend code) to work with these views
* Data model deserializers should return data model objects
* Data model deserializers should not lose information

Necessary steps:
* Make TermList a TermMap
* Make TermMap::_construct respect the keys of its parameter
* Make TermMap::setTerm expect a language parameter (adapt callers in DM)
* Make EntityDeserializer::deserializeValuePerLanguageSerialization respect and pass the keys
* Make EntityDeserializer::setAliasesFromSerialization respect and pass the keys
* Make EntityDeserializer::assertIsValidValueSerialization assert on the key
* Make FingerprintSerializer::serializeValuePerLanguageArray (and everybody above it) aware of the fact that there are different ways to serialize a map of terms (with and without keys, with and without fallback terms included)
* Write high-level, implementation-independent documentation on this decision
Comment 2 Jan Zerebecki 2014-11-04 15:31:10 UTC
Additional requirement: We want to have a facility that can test if an object has inferred information like language fall back and thus should not be written into the database. So that we can easily ensure that at runtime instead of only by code review.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links