Last modified: 2014-11-03 14:24:57 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T74430, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 72430 - Re-implement uniqueness constraint in a consistent and efficient way
Re-implement uniqueness constraint in a consistent and efficient way
Status: NEW
Product: MediaWiki extensions
Classification: Unclassified
WikidataRepo (Other open bugs)
unspecified
All All
: Normal normal (vote)
: ---
Assigned To: Wikidata bugs
u=dev c=backend p=0
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-10-23 16:21 UTC by Daniel Kinzler
Modified: 2014-11-03 14:24 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Daniel Kinzler 2014-10-23 16:21:22 UTC
At present, we have several kinds of uniqueness constraints:

1) a property's label must be unique (per language, among properties)

2) an item's combination of label and description must be unique (per language, among items, if a description is given)

3) an item's sitelink must be unique (per target wiki).

Each of these is implemented separately; some are checked on every save, some are checked only on creation and modification of the respective part of the entity. Some checks are expensive or awkward (the label+description uniqueness requires a self-join on a big table with a very complex condition).


Proposed solution:

* We add a "fingerprint" table, with two columns: fp_entity and fp_identity. fp_entity holds the id of the entity that fingerprint belongs to, fp_identity holds the fingerprint (as a string, which may be a hash). There's a composite unique key over both columns, and a separate index on the fb_identity column.

* To check for conflicts, we compute all fingerprints of the candidate, and check if we find any of them in the database. If so, there is a conflict.

* Fingerprints can be computed as sensitive or insensitive as we like (by e.g. converting to lower case or stripping whitespace before hashing)

* as an added bonus, we can look up any entity by a fingerprint (e.g. items by sitelink or properties by label) without touching the terms table.

Example fingerprints for the three use cases mentioned above:

1) property label:  $fp_fingerprint = "label:{$language}:{$hashOfLabel}";

2) item label+description:  $fp_fingerprint = "label+desc:{$language}:{$hashOfLabelAndDescription}";
// Note: if there is no description in that language, no fingerprint is generated for that language

3) item sitelinks:  $fp_fingerprint = "sitelink:{$site}:{$hashOfPageTitle}";
Comment 1 Daniel Kinzler 2014-10-23 16:41:08 UTC
Addendum: for selectively updating specific fingerprints of a given entity, e.g. all sitelinks, or the label+desc fingerprint in french, a prefix search on fp_identity can be used.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links