Last modified: 2014-11-03 14:24:57 UTC
At present, we have several kinds of uniqueness constraints: 1) a property's label must be unique (per language, among properties) 2) an item's combination of label and description must be unique (per language, among items, if a description is given) 3) an item's sitelink must be unique (per target wiki). Each of these is implemented separately; some are checked on every save, some are checked only on creation and modification of the respective part of the entity. Some checks are expensive or awkward (the label+description uniqueness requires a self-join on a big table with a very complex condition). Proposed solution: * We add a "fingerprint" table, with two columns: fp_entity and fp_identity. fp_entity holds the id of the entity that fingerprint belongs to, fp_identity holds the fingerprint (as a string, which may be a hash). There's a composite unique key over both columns, and a separate index on the fb_identity column. * To check for conflicts, we compute all fingerprints of the candidate, and check if we find any of them in the database. If so, there is a conflict. * Fingerprints can be computed as sensitive or insensitive as we like (by e.g. converting to lower case or stripping whitespace before hashing) * as an added bonus, we can look up any entity by a fingerprint (e.g. items by sitelink or properties by label) without touching the terms table. Example fingerprints for the three use cases mentioned above: 1) property label: $fp_fingerprint = "label:{$language}:{$hashOfLabel}"; 2) item label+description: $fp_fingerprint = "label+desc:{$language}:{$hashOfLabelAndDescription}"; // Note: if there is no description in that language, no fingerprint is generated for that language 3) item sitelinks: $fp_fingerprint = "sitelink:{$site}:{$hashOfPageTitle}";
Addendum: for selectively updating specific fingerprints of a given entity, e.g. all sitelinks, or the label+desc fingerprint in french, a prefix search on fp_identity can be used.