Last modified: 2012-11-28 14:07:44 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T42295, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 40295 - EntityObject::equals should make more consistent, strict comparison of entities ID
EntityObject::equals should make more consistent, strict comparison of entiti...
Status: VERIFIED FIXED
Product: MediaWiki extensions
Classification: Unclassified
WikidataRepo (Other open bugs)
unspecified
All All
: Low normal (vote)
: ---
Assigned To: Wikidata bugs
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-09-17 10:59 UTC by Daniel A. R. Werner
Modified: 2012-11-28 14:07 UTC (History)
6 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Daniel A. R. Werner 2012-09-17 10:59:13 UTC
Right now EntityObject::equals( entity ) will return true if one entity has an ID set and the other entity's ID is null. If both entities have an ID set but it is different then false will be returned. This seems kind of strange.

A better solution could be to add a function comparing the content of two entities only, ignoring the ID entirely, e.g. EntityObject::same( entity ).
Comment 1 Daniel Kinzler 2012-09-17 18:14:02 UTC
The intention was to allow an item that was already stored in the database (and thus has an ID) to be equal to a newly created "volatile" item that doesn't have an ID.

Using a different function for this kind of comparison is cleaner - I was trying to make this opaque to the caller, but that was probably a silly idea. So, I agree that we should go this route, but I would really like to have descriptive names. Between "equals" and "same", it's kind of non-obvious which does which. So, how about equals() for "total" equality and hasSameContent() for the version that ignores the ID?
Comment 2 jeblad 2012-09-17 22:32:30 UTC
We could perhaps use "similarity"? "Equals" is somehow triple equality and "same" is somehow double equality. Similarity is used for correlation with  higher order statistics. We could say that we use a correlation function (or functions) to measure the similarity, and that the function(s) run over a number of properties (or all). With a little thought we could make something that is fairly efficient, that is basically running over a limited set of properties and testing if they are the same. If many enough of them are the same the two entities will be "similar". It is also possible to extend such similarity measures in several ways, for example by using Levenshtein distance between strings used as properties instead of a double equality.

Its not so difficult as it seems.
Comment 3 Daniel Kinzler 2012-09-18 09:10:45 UTC
(In reply to comment #2)
> We could perhaps use "similarity"? 

The notion of "similarity" and/or semantic proximity of of items (topics) is very interesting and useful for information retrieval, natural language processing, etc; but I don't think that it is what we need here. Similarity is a complex beast and how it should be defined highly depends on what it is intended to be used for, so I think the notion should be defined on the application level.

The equality function under consideration here is primarily used to decide whether a new version of an item is the same as the previous one, i.e. it's used to check whether an edit is a "null edit" and should thus be omitted from the page history. That's a fairly low-level and should be pretty strict.
Comment 4 Daniel Kinzler 2012-09-18 09:17:03 UTC
Hm... thinking about equals() vs. hasSameContent(): Note that we are free to define the equals function in Entity as we like, but the equals function in EntityContent is defined by the Content interface and used by WikiPage::doEditContent (which is called by EntityContent::save). It's used to determine whether the new content is the same as the previous content, in which case a "null edit" is triggered and no new revision is created in the database.

The reason I made equals lenient about missing IDs is this: if I want to store a new revision of an item, and to do so I construct a fresh Entity and EntityContent and then tell WikiPage to save it, the new content  object may not yet have an ID attached. So equals() would fail without need when comparing the new version to the previous one.

Hm... this could also be solved by forcing the new content to have an ID. This should probably be done anyway, for consistency. And if it already has an ID that is different from the previous item's ID, the save should probably fail - I can't think of a valid reason to allow this.
Comment 5 jeblad 2012-09-18 09:21:30 UTC
What you talking about here is not equality and it seems to me that it is not even sameness, it is similarity. If we call such tests equality we introduce confusion to the soup and will get into problems later.
Comment 6 Daniel Kinzler 2012-09-18 09:26:03 UTC
(In reply to comment #5)
> What you talking about here is not equality and it seems to me that it is not
> even sameness, it is similarity. 

How is it not equality? The function is intended to determine whether the data structures are semantically equivalent with respect to our data mode. To me, this is the definition of equality.
Comment 7 denny vrandecic 2012-10-11 10:35:05 UTC
Think about the naming of the function.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links