Last modified: 2013-09-18 15:39:17 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T47351, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 45351 - improve sort order in entity selector
improve sort order in entity selector
Status: VERIFIED FIXED
Product: MediaWiki extensions
Classification: Unclassified
WikidataRepo (Other open bugs)
master
All All
: High critical with 4 votes (vote)
: ---
Assigned To: Wikidata bugs
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-02-25 11:34 UTC by Lydia Pintscher
Modified: 2013-09-18 15:39 UTC (History)
8 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Lydia Pintscher 2013-02-25 11:34:06 UTC
It'd be great if we could be smarter about the order of items/properties in the entity selector and put the ones at the top that are likely to be relevant for the current statement.
Comment 1 Magnus Manske 2013-02-25 16:59:01 UTC
As we now have an increasing number of links, the easiest and fastest way might be to count the number of incoming wikilinks to each potential item, and sort them by most incoming links first. Paris (France) will have more incoming links than Paris (the god) or Paris (Texas), which will have more than other obscure uses.
Comment 2 Magnus Manske 2013-02-26 10:51:36 UTC
Addendum: If no hits are available in the current language, try other languages.
Comment 3 jeblad 2013-02-26 12:11:36 UTC
The problem is the same as product advice to customers, where customers are the properties and products are the items used. It will also trigger the same scalabillity issues.

That is simple counting is not enough to get good guesses on which items should be sorted first. For example that would mean all municipalities of Brazil (or France) will be listed before municipalities in Norway, which is bad if you try to find municipalities in Norway.
Comment 4 Lydia Pintscher 2013-02-26 13:11:45 UTC
We're not going to make this sort order perfect. But taking the number of site-links should give a good-enough sort order most of the time. This is what counts.
Comment 5 jeblad 2013-02-26 13:16:43 UTC
Number of sitelinks doesn't really make sense in this case. A low number actually is an indication that a high count does not make sense because the values you are looking for isn't common. That is it is a feature with negative correlation with your wanted entries. Its a classic automatic data classifier problem.
Comment 6 Magnus Manske 2013-02-26 17:23:34 UTC
Addendum: Allow language prefixes, e.g. "de:Berlin", to show the item that has the language link for "Berlin" on de.wikipedia
Comment 7 Magnus Manske 2013-02-26 17:25:38 UTC
(In reply to comment #5)
> Number of sitelinks doesn't really make sense in this case. A low number
> actually is an indication that a high count does not make sense because the
> values you are looking for isn't common. That is it is a feature with
> negative
> correlation with your wanted entries. Its a classic automatic data classifier
> problem.


Not sure I understand. I want Paris, France, to show up on top for the search "Paris", as I most likely add a person's birth or death place, or location of an object.
Comment 8 Magnus Manske 2013-03-01 16:46:43 UTC
Addendum: For each item, show the "is a(n)" field if no description is set.
Comment 9 Nilesh Chakraborty 2013-05-03 02:25:29 UTC
(In reply to comment #3)
> The problem is the same as product advice to customers, where customers are
> the
> properties and products are the items used. It will also trigger the same
> scalabillity issues.

This is interesting. But I think you meant item=>product and property=>customer, since we are recommending properties to items (then based upon the recommendation scores we can sort the list), much like recommending "products" to "customers".

i) What kind of scalability issues and why?
ii) Do you think this would be a better method (accuracy-wise) for sorting than using incoming wikilinks as a metric?
Comment 10 Nilesh Chakraborty 2013-05-03 14:58:45 UTC
(In reply to comment #9)
> But I think you meant item=>product and
> property=>customer, since we are recommending properties to items (then based
> upon the recommendation scores we can sort the list), much like recommending
> "products" to "customers".

Sorry - my mistake. Please ignore the above section of my comment.
Comment 11 denny vrandecic 2013-09-05 11:42:18 UTC
Entity search is now weighted by number of sitelinks.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links