Last modified: 2013-10-09 17:23:33 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T43635, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 41635 - When editing sitelinks, target pages are suggested in a misleading order
When editing sitelinks, target pages are suggested in a misleading order
Status: VERIFIED FIXED
Product: MediaWiki extensions
Classification: Unclassified
WikidataRepo (Other open bugs)
unspecified
All All
: Low normal (vote)
: ---
Assigned To: Wikidata bugs
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-11-01 14:28 UTC by Jitrixis
Modified: 2013-10-09 17:23 UTC (History)
6 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Jitrixis 2012-11-01 14:28:11 UTC
in [[Q5607]] : (fr) Modem and (fr) MoDem or two different page but the editor doesn't see the difference and MoDem have the priority but it's only on the preview page if i reload it's good
Comment 1 Daniel Kinzler 2012-11-02 13:09:46 UTC
This appears to be caused by the ranking/sorting system that generates the suggestions. I'm not sure whether we can do anything about this in Wikibase, it may be a bug in MediaWiki's OpenSearch implementation.

Anyway, here is what happens:

Select "French" as the target language for the link, then type "mode" into the input box. The suggestions will be something like:

 Mode
 Modene
 MoDem
 Mode (habillement)
 MOD
 Mod
 Modulation
 Modernisme

So, no "Modem" there, only "MoDem". But if you type in "modem", you get:

 MoDem
 Modem
 Modem ADSM
 ....

The sorting seems eratic to me. Can we just fetch the top 100, sort them alphabetically (ignoring case), and then show the top 10?
Comment 2 jeblad 2012-11-03 16:05:46 UTC
The odd sorting is because the suggestions are on relevance, if I remember correct. To fetch the top 100 from relevance, then sort alphabetically and show a truncated list really does not give any meaningful list at all.

The bug comes from something that tries to turn the selected entry into a case agnostic selection, and then searches through the list for this entry. The user selection from the list should be retained with upper-/lowercase.
Comment 3 Daniel Kinzler 2012-11-03 22:42:19 UTC
Sorting the result of a prefix search by relevance seems silly to me, but I guess we can't do much about how MWSearch returns that. You are right that just sorting the "best" 100 results doesn't really solve the problem - e.g. you may not see what you are looking for at the right place in the list, even if it exists, because it was not ion the top 100. That would be misleading. However, I think it may still be better than what we have now.

Or can we get the search result directly in alphabetical order? That would be nice.
Comment 4 jeblad 2012-11-04 02:35:21 UTC
You may set up this as a user preference, but do not turn it on as a default. While not being completely wrong it is extremely confusing for the end user. The only thing that works is short lists are scoring mechanisms, which is often variations of relevance ranking. If you can present the _complete_ list within some subdomain you can sort alphabetically if you add some visual clue on the scoring. This is often done on time series like newspapers where you want to search within some timeframe.
Comment 5 jeblad 2012-11-04 10:28:51 UTC
The two scoring functions I know works on this kind of problem are one for sorting on full terms "sort all found terms on prefixed matches on probability or inverse document frequency or a similar function", possibly with some weighting on shorter terms to make absolute matches go first, and one for sorting on boundary effects "sort all found terms on the probability that syllables start within the right side of the boundary (aka within the prefix) and continues into the found term", possibly with some simplification with Markov chains.

The first form is the most common, and as I recall some comments aso the form used in the live search on Wikipedia, aka the existing Lucene-search.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links