Last modified: 2014-01-29 16:19:59 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T61841, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 59841 - CirrusSearch: near match doesn't prefer exact matches to unicode flattened ones
CirrusSearch: near match doesn't prefer exact matches to unicode flattened ones
Status: VERIFIED FIXED
Product: MediaWiki extensions
Classification: Unclassified
CirrusSearch (Other open bugs)
unspecified
All All
: Normal normal with 1 vote (vote)
: ---
Assigned To: Nik Everett
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-01-08 23:06 UTC by Nik Everett
Modified: 2014-01-29 16:19 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Nik Everett 2014-01-08 23:06:07 UTC
You can reproduce this by searching wiktionary for son.  It lands you on són.
Comment 1 Nik Everett 2014-01-09 13:39:34 UTC
I was thinking about this last night and was wondering if it would be bad if we stopped near matches from doing ascii flattening?  It already only does this for English and this would be the simplest way of fixing this from a technical perspective.

The downside is that "go" search would get a little more confusing: you might type into the prefix search box and see "són" as the top response because it is linked more frequently then "son".
Comment 2 Nik Everett 2014-01-09 13:49:47 UTC
Wiktionary has 8 pages that all "near match" son with the current analysis setup:
sơn
Son
són
son
sön
SON
søn
soñ

Even with my proposed near match change it still has three:
Son
son
SON

So I'm pretty sure that is a bad idea.  Another proposal: restore sorting by the number of incoming links.  This would drop you on "son" as expected.

Another proposal: if there is more than a single "near match" then declare that there are non and drop the user to the search page.  That will give them more options and never _force_ them to the wrong page.  It may be less convenient.  Also, it can be done with or without removing asciifolding.  Personally I'd prefer to leave the folding in place so prefix matching, which really should have folding, still looks sane.
Comment 3 Nik Everett 2014-01-14 20:48:29 UTC
https://gerrit.wikimedia.org/r/#/c/107433/
Comment 4 Nik Everett 2014-01-29 16:19:59 UTC
Verified on enwiktionary.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links