Last modified: 2014-05-05 16:59:28 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T46773, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 44773 - property search in entity selector should not be prefix-only
property search in entity selector should not be prefix-only
Status: NEW
Product: MediaWiki extensions
Classification: Unclassified
WikidataRepo (Other open bugs)
master
All All
: Normal major with 1 vote (vote)
: ---
Assigned To: Wikidata bugs
u=dev c=backend p=0
:
Depends on:
Blocks: 44529
  Show dependency treegraph
 
Reported: 2013-02-08 00:58 UTC by Cristian Consonni
Modified: 2014-05-05 16:59 UTC (History)
8 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Cristian Consonni 2013-02-08 00:58:39 UTC
The search in the property name box seems to be performed only on the first word of the property, for example "place of birth" or "place of death" show up only if you type "place" but they should be shown also when you type "birth" (maybe because you were looking for "birthplace"). 
Making alias will help only partially because, for example, I would like to type "birth" and being shown "place of birth" but also "date of birth", so while for "place of birth" adding "birthplace" as an alias will help, for other elements it may not be the case. Creating alias only to help searches doesn't look as The Right Thing To Do(TM) to me. Clarifying, I'm perfectly fine with having "''sound''" aliases as "birthplace" for "place of birth", "birthday" for "date of birth", what I'm not so sure it's a good thing is using alias to provide search shortcuts. Improving property search will improve user friendliness.
Further examples and discussion here: https://www.wikidata.org/w/index.php?title=Wikidata:Project_chat&oldid=6006546#Improve_property_search
Comment 1 jeblad 2013-02-08 17:04:10 UTC
I don't think it makes sense to break down the phrase into words and search on them for those lists. It could be that I'm wrong, but I find the usefulness of this proposal to be highly language specific.

As long as the list of properties are limited this could work, but imagine searching for properties that share the initial letters with some common words like "of" and "in", or even if the label for the property is those common words. That will make the list explode.
Comment 2 Cristian Consonni 2013-02-08 19:34:51 UTC
I fail to see how it would be language specific, I'm sure there are many case in English where there is no alias for a given property with the most significant word in the first position. Just another example, if I type "language" I would like to shown both "native language" and "official language". Heck, I would like just to type "lang" and being show those two... .

I agree that making lists (and maintaining them) makes very little sense. I would change the search algorithm to search *any* substring in the property name. I think It's better to trade off a little speed to be sure that the property one is looking for is found.
Comment 3 Lydia Pintscher 2013-11-04 12:32:39 UTC
Katie, Daniel: What's the status of this with the new search backend? Can we change this to a non-prefix-only search?
Comment 4 Daniel Kinzler 2013-12-10 16:38:33 UTC
@Lydia: eventually, yes. but needs more thinking, coding, backend processes, etc
Comment 5 Thiemo Mättig 2014-02-21 15:57:26 UTC
I already worked on that code in Gerrit change #114165 and started doing more refactoring including a possible solution for this request in Gerrit change #114748 (currently a draft).

As Daniel said, this needs a lot more thinking. My change set is far from being a solution but I hope it could be a step in the right direction.
Comment 6 Thiemo Mättig 2014-03-19 11:52:21 UTC
I finished refactoring the related code in several patches but unfortunately had to abandon the draft that was supposed to fix this issue. My initial idea was to simply do an additional WHERE LIKE '%<search term>' if the other three requests that are currently done do not return enough results (basically WHERE id = '<search term>' concatenated with WHERE term = '<search term>' concatenated with WHERE term LIKE '<search term>%').

This is a bad idea for multiple reasons:
1. Ranking will be bad. The "contains" results will always be hidden behind the "equals to" and "starts with" results.
2. It should probably be different for Items and Properties.
3. LIKE queries don't use any indexes if they start with a placeholder.

To make this a proper solution the least thing we need to do is to split labels into words (or come up with a more clever solution like identifying common prefixes like "date of" and turn such labels into "birth, date of"). Then we can add these individual words to our term index.

The current solution to do exactly that is very intuitive and simple: add aliases.
Comment 7 Marius Hoch 2014-05-05 16:59:28 UTC
The whole wb_terms table is a little performance bottleneck, so we probably should find a way to use another database backend service for this (can we "abuse" CirrusSearch for this somehow?).

What could be done with our current setup is to have a new field on wb_terms term_back (or so) which has a reversed version of the term. With that we could also search for terms ending on a word...

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links