Last modified: 2014-01-03 15:49:12 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T45238, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 43238 - Wrong ordering of search results
Wrong ordering of search results
Status: VERIFIED FIXED
Product: MediaWiki extensions
Classification: Unclassified
WikidataRepo (Other open bugs)
unspecified
All All
: Normal major with 3 votes (vote)
: ---
Assigned To: Wikidata bugs
https://en.wikidata.org/w/index.php?s...
:
Depends on:
Blocks: 44529
  Show dependency treegraph
 
Reported: 2012-12-18 19:07 UTC by Incarus
Modified: 2014-01-03 15:49 UTC (History)
9 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Incarus 2012-12-18 19:07:17 UTC
Wikidatas order of search result is somehow strange.
If you search for "Wine" in the english version (!) you get the wine-software as first result, then some people called "Wine" and as 20th result the same-named wiki article,
the same with other searching terms.

That makes the search difficult.
Comment 1 Incarus 2012-12-27 21:26:41 UTC
A good example is "mead":
1. result: disambiguation page
2. - 31. result: people and things containing that word
32. result: wanted result (the actual "[...]wiki/Mead" article)

In my eyes the search bar is unusable.
Comment 2 Vogone 2012-12-27 22:06:39 UTC
Well, there is a JS solution. Directly beside the "normal" search field is a small triangle (Vector). If you click on this, another search field appears. Now it depends on the interface language. If "German" is set, the Item which is linked with the article "de:Wine" is found with the input of "Wine". If "English" is set, the item linked with "en:Wine" is found. This JS is set by default. Besides, there is a special page named "Special:ItemByTitle". You can use it as well.

Regards, eikes
Comment 3 jeblad 2012-12-28 00:18:33 UTC
Much of the problem comes from the fact that Wikidata lacks the necessary text to build the relevance of the article, we need something else. Without this it will be somewhat random which entries turns up first and last. 

One way around is to do the searching in Wikipedia and use the hits and ranking from there as hints for an internal search similar to "ItemByTitle". That way we get working relevance ranking by borrowing the values from Wikipedia. The remaining problem is how we should solve this for languages with very few articles.

We could do it the other way around and check relevance in Wikipedia for items found by searching for labels. That would work for all items that has any sitelinks, as we can use the highest ranking article anyhow. Something like a prefixed termsearch getting a list of items, then getting the relevance for the wikipedia articles, then sorting out the highest ranking articles. Mjæ..

To just throw up the ItemByTitle unmodified is in my opinion not a very good solution.

This should probably be done as a modification of the existing opensearch module as it will operate much faster that way.

Still we should at some point build our own result sets from searches, but then we need to figure out how to make the relevance ranking working. The reason is that we can't jump out and search in Wikipedia for the stored queries.
Comment 4 Incarus 2012-12-28 00:32:52 UTC
As first part we could order the search results to their similarity to the searching term, disambiguation pages - if available - can be on the second search result / after the most similar result - as long as they're as similar as the first search result.

The second part can be search results that contains the searching terms in the title as it is, no letters before and after it.

The third part can be search results there the searching term is contained by any way in the result title.

The fourth part would be results there only the result page contains the searching term, but the title of the result doesn't.
Comment 5 jeblad 2012-12-28 00:41:31 UTC
Note that wikipedias ranking mechanism is strictly not relevance ranking, but that is another discussion.
Comment 6 Incarus 2012-12-28 11:33:12 UTC
My point was to order the results to their similarity to the searching term.
The most similiar result should be always the same named wiki article and the rest as mentioned in comment 4.
Comment 7 Nemo 2013-01-13 10:32:38 UTC
(In reply to comment #6)
> My point was to order the results to their similarity to the searching term.
> The most similiar result should be always the same named wiki article and the
> rest as mentioned in comment 4.

This is definitely a valid bug, see e.g. search for "canis lupus" where the correct result is 18th/19th place.
http://lists.wikimedia.org/pipermail/wikispecies-l/2013-January/000076.html
Comment 8 Mushroom 2013-05-31 13:29:27 UTC
It may also be useful to order results by the number of pages that link to them. For instance, a few minutes ago I searched for "company" (meaning "business organization") and these were the first results:

1) Bad Company           (13 links)
2) Ford Motor Company    (200 links)
3) Hyundai Motor Company (64 links)

If we simply order by similarity we might get:

1) company (disambig)    (0 links)
2) Company (novel)       (0 links)
3) Company (magazine)    (0 links)

But most people would prefer this:

1) company (business)    (10000+ links)
2) company (military)    (2 links)
3) Company (novel)       (2 links)

Even without using similarity it would still be an improvement, i.e.:

1) company (business)    (10000+ links)
2) Ford Motor Company    (200 links)
3) Hyundai Motor Company (64 links)
Comment 9 Gerrit Notification Bot 2013-07-12 14:23:14 UTC
Change 73405 had a related patch set uploaded by Denny Vrandecic:
(bug 43238) Add very simple weighting for entity search (DO NOT MERGE)

https://gerrit.wikimedia.org/r/73405
Comment 10 Gerrit Notification Bot 2013-07-17 11:51:25 UTC
Change 73405 merged by jenkins-bot:
(bug 43238) Add very simple weighting for entity search

https://gerrit.wikimedia.org/r/73405
Comment 11 denny vrandecic 2013-07-17 11:56:40 UTC
A simple weighting and ranking is now merged, based on sitelinks. This should roll out to Wikidata soon, and then we can see whether it improves the current situation. In the long term, it is still the goal to replace it with something Lucene-based.
Comment 12 abraham.taherivand 2013-07-17 14:32:26 UTC
Verified in Wikidata demo time July 17th
Comment 13 Trijnstel 2013-07-25 10:12:40 UTC
I don't know whether it's related to this bug (or whether it already has been reported), but it seems like I can't search for any pages on wikidata with special characters. See for example https://www.wikidata.org/w/index.php?search=+Ji%C5%99%C3%AD+Polnick%C3%BD&title=Special%3ASearch - while the page does exist: https://www.wikidata.org/wiki/Q1428346

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links