Last modified: 2014-01-29 16:18:31 UTC
1. Go to http://ca.wikipedia.org 2. Search "José Mourinho" Or click here: https://ca.wikipedia.org/w/index.php?title=Especial%3ACerca&profile=advanced&search=Jos%C3%A9+Mourinho&fulltext=Search&ns0=1&ns4=1&ns10=1&ns12=1&redirs=1&profile=advanced EXPECTED If such page exists just show it. ACTUALLY Even if the exact page exists and it is listed first in the results, the first message displayed is "Did you mean: jose mourinho" Well no, I really meant José Mourinho. :) This is the first time I pay enough attention to detect this problem consciously (and report it) but I have seen more cases like this. Maybe the pattern is the accent in the search term? I will keep watching.
Elasticsearch won't make a suggestion unless the suggested text appears to be about 2x as likely as the provided text (our configuration) so I'm guessing this is caused by us getting suggestions from redirect as well as titles. I'll have a look at it soon. Another thing: I believe the fix for this will be to not provide a suggestion when the entire title is matched. I think it'd be more appropriate for me to implement this in CirrusSearch even though I'm sure that LuceneSearch has the same problem. The reason for this is that if I implement the fix in Cirrus then, one day, when I find a really good excuse to violate the rule, I'll be able to without having to make more convoluted changes to core. I know, YAGNI, but my gut says do it in Cirrus and I'm going to trust it.
(In reply to comment #1) > Elasticsearch won't make a suggestion unless the suggested text appears to be > about 2x as likely as the provided text (our configuration) so I'm guessing > this is caused by us getting suggestions from redirect as well as titles. > I'll > have a look at it soon. > > Another thing: I believe the fix for this will be to not provide a > suggestion > when the entire title is matched. I think it'd be more appropriate for me to > implement this in CirrusSearch even though I'm sure that LuceneSearch has the > same problem. The reason for this is that if I implement the fix in Cirrus > then, one day, when I find a really good excuse to violate the rule, I'll be > able to without having to make more convoluted changes to core. I know, > YAGNI, > but my gut says do it in Cirrus and I'm going to trust it. I was about to say the exact same thing, except let's fix it in core for all search engines. It makes no sense to have "Did you mean Foo?" "There is a page called 'Foo'" like 3 lines apart on the same page :)
I was really thinking about it in core too, but that little imp in my said we'd want to break that rule one day. I dunno. Also, I'd like to look into how that "There is a page called 'Foo'" thing comes up. Does it use the near match hook? If so, it'll work properly on wikis with Cirrus as primary come Monday because we're turning off TitleKey for them. I imagine there are cases where we'll return a fully highlighted title but not have a page that matches the results. Oooh, and check this out: if that fully highlighted title is on the first page of the search results, we'd start showing the did you mean on the second page! Needs more investigation!
OK! I had a look at it. 1. "There is a page called 'Foo'" comes from Title::newFromText( $term )->isKnown(). We certainly shouldn't provide a suggestion in that case. 2. It is still possible for CirrusSearch or LuceneSearch to provide a great match even though the text isn't known. Try searching for "pickett charge" or even "main pages". The top result is obviously good enough for it not to be worth showing a suggestion but cirrus does it any way. I think #1 we should fix in core. #2 we should probably do in Cirrus because it knows more about how it highlights. I'll do both.
Change 105705 had a related patch set uploaded by Manybubbles: Don't suggest if the search tem is a known title https://gerrit.wikimedia.org/r/105705
That patch was #1 in core. #2 in cirrus coming later.
Change 105705 merged by jenkins-bot: Don't suggest if the search term is a known title https://gerrit.wikimedia.org/r/105705
Change 106523 had a related patch set uploaded by Manybubbles: Don't suggest anything if a result is a full match https://gerrit.wikimedia.org/r/106523
Change 106523 merged by jenkins-bot: Don't suggest anything if a result is a full match https://gerrit.wikimedia.org/r/106523
Change 107663 had a related patch set uploaded by Chad: Don't suggest if the search term is a known title https://gerrit.wikimedia.org/r/107663
Change 107663 abandoned by Chad: Don't suggest if the search term is a known title Reason: Nevermind, just wait til tomorrow :) https://gerrit.wikimedia.org/r/107663