Last modified: 2013-10-18 14:08:29 UTC
When I first started to use "CirrusSearch" I was very happy that the "Search Suggestions" where working out of the box. In the resent versions (2-3 weeks) there are no search suggestions anymore. Can I somehow turn them on again? Thank you Martin
Where can I see this? A private wiki instance? So Wikimedia website? What are exact steps to reproduce this?
Yes, a private wiki. I have send you a direct mail with the link. I will now also try to reproduce it on one of the Wikimedia test sites. Thank you!
In case that you refer to the search box in the upper right corner, Search proposals work for me on your wiki (when I enter "Salz" it proposes one page with a name that starts with Salz). Using Firefox 24 here.
Ok, sorry by "Search Suggestions" I meant: searching for something that does NOT exist or is spelled incorrectly like "waser" instead of "wasser". I am truly sorry that I did not make myself clear. Yes "AutoComplete" as I would call it workes pretty good.
Works for me on production: https://en.wikisource.org/w/index.php?title=Special%3ASearch&profile=default&search=pots&fulltext=Search And in development: http://solr-mw2.instance-proxy.wmflabs.org/w/index.php?search=noble+prize&title=Special%3ASearch One thing that might have changed from the last time you checked: we only build suggestions from the titles and redirect titles. We used to build suggestions from titles and text. We felt that that produced too many false positives. Also, the search index required to do that took up a bunch of space. I'm going to attach the query that I always use for debugging suggestions issues to this bug. If you could send it to Elasticsearch and attach the results I'll decipher them for you. So you aren't in suspense: it'll return a bunch of suggestions including the search phrase. Normally CirrusSearch configures Elasticsearch to only return suggestions that have a score of twice what the original search phrase had so you can use the results to figure out if the suggestion that you expected was even being generated and, if so, how it scores. So, options: 1. I can make generating suggestions from text a configurable thing. Going from off to on would require a reindex. 2. You can change suggestion cutoff score and walk the false positive tuning line. The config value is $wgCirrusSearchPhraseSuggestConfidence - just make sure to keep it set to a number. You can change this as much as you like without breaking anything but if you set it to less than 1 then I believe you'll end up getting your search query back as a suggestion all the time. 3. There is some kind of problem in Elasticsearch, your setup (did you rebuild the index when you pulled), or gremlins.
Created attachment 13505 [details] Suggestion test query
I would go with Option 1 as I only tried so search for words in the "text" not the title. This would solve the problem for me at least! Thanks again for responding in such a fast matter... (as always).
Suggestion test query Result: { took: 14 timed_out: false _shards: { total: 8 successful: 8 failed: 0 } hits: { total: 0 max_score: null hits: [ ] } suggest: { title: [ { text: waser offset: 0 length: 5 options: [ { text: waser highlighted: waser score: 0.08398465 } ] } ] redirect: [ { text: waser offset: 0 length: 5 options: [ { text: waser highlighted: waser score: 0.27871963 } ] } ] } }
Yup - no suggestions are coming up and you would have got your suggestions from the text. Let me see about getting that working again.
Created attachment 13506 [details] Suggestion test query
I found a problem with the query I posted earlier so I posted a second copy - this second one also builds the suggestions against the text. See if that provides the suggestions you need.
Created attachment 13507 [details] Suggestion test query v2
Created attachment 13508 [details] Suggestion test query Sorry about all the updates, just found the obsoletes field on the uploader and wanted to get rid of the duplicates.
Result: { took: 30 timed_out: false _shards: { total: 8 successful: 8 failed: 0 } hits: { total: 0 max_score: null hits: [ ] } suggest: { title: [ { text: waser offset: 0 length: 5 options: [ { text: waser highlighted: waser score: 0.08398465 } ] } ] text_suggest: [ { text: waser offset: 0 length: 5 options: [ { text: waser highlighted: waser score: 0.10791662 } ] } ] redirect: [ { text: waser offset: 0 length: 5 options: [ { text: waser highlighted: waser score: 0.27871963 } ] } ] } }
Created attachment 13509 [details] Suggestion test query And one more bug. On the upside, the feature is almost done.
{ took: 135 timed_out: false _shards: { total: 8 successful: 8 failed: 0 } hits: { total: 0 max_score: null hits: [ ] } suggest: { title: [ { text: waser offset: 0 length: 5 options: [ { text: waser highlighted: waser score: 0.08398465 } ] } ] text_suggest: [ { text: waser offset: 0 length: 5 options: [ { text: waser highlighted: waser score: 0.10791662 } { text: wasser highlighted: <em>wasser</em> score: 0.042066924 } { text: water highlighted: <em>water</em> score: 0.017357524 } { text: wsser highlighted: <em>wsser</em> score: 0.011847182 } { text: wash highlighted: <em>wash</em> score: 0.009659772 } { text: wassers highlighted: <em>wassers</em> score: 0.00865565 } ] } ] redirect: [ { text: waser offset: 0 length: 5 options: [ { text: waser highlighted: waser score: 0.27871963 } ] } ] }
Hmmm. This: { text: waser highlighted: waser score: 0.10791662 } { text: wasser highlighted: <em>wasser</em> score: 0.042066924 } Says that Elasticsearch thinks that "waser" is still a better option than "wasser". I find that it actually works better for me when searching for phrases. I'm not super sure why at this point. For example, I have a page which contains the phrase "test catapult" but when I search for "catapul" I don't get a suggestion. I do get one when I search for "test catapul" or "tets catapul". I'll add it to my todo list to figure out why that happens. For now, I'll proceed with making text suggestions configurable.
Change 90132 had a related patch set uploaded by Manybubbles: Optionally pull suggestions from text https://gerrit.wikimedia.org/r/90132
With the referenced patch you can turn on getting suggestions from text by setting $wgCirrusSearchPhraseUseText = true; and doing an in place reindex: php updateSearchIndexConfig.php --reindexAndRemoveOk --indexIdentifier now php forceSearchIndex.php --forceUpdate
What I do not understand: how can Elasticsearch think "waser" is a better option even if the word "waser" is NOT FOUND at all. so here some more tests with 2 words: correct spelling would be "geraspelter Schokolade bestreuen" but I entered "geraspelter Shokolade bestreuen" without the c in "Schokolade". Here the output: { took: 119 timed_out: false _shards: { total: 8 successful: 8 failed: 0 } hits: { total: 0 max_score: null hits: [ ] } suggest: { title: [ { text: geraspelter Shokolade bestreuen offset: 0 length: 31 options: [ { text: geraspelter shokolade bestreuen highlighted: geraspelter shokolade bestreuen score: 0.00026727197 } ] } ] text_suggest: [ { text: geraspelter Shokolade bestreuen offset: 0 length: 31 options: [ { text: geraspelter schokolade bestreuen highlighted: geraspelter <em>schokolade</em> bestreuen score: 0.011861463 } { text: geraspelte schokolade bestreuen highlighted: <em>geraspelte schokolade</em> bestreuen score: 0.0057594595 } { text: geraspelter shokolade bestreuen highlighted: geraspelter shokolade bestreuen score: 0.0005670466 } { text: geraspelten schokolade bestreuen highlighted: <em>geraspelten schokolade</em> bestreuen score: 0.000060482846 } { text: geraspelt schokolade bestreuen highlighted: <em>geraspelt schokolade</em> bestreuen score: 0.00000123148 } { text: geraspelten shokolade bestreuen highlighted: <em>geraspelten</em> shokolade bestreuen score: 0.0000010980223 } { text: geraspelte shokolade bestreuen highlighted: <em>geraspelte</em> shokolade bestreuen score: 0.0000010932401 } { text: geraspelt 1 schokolade bestreuen highlighted: <em>geraspelt 1 schokolade</em> bestreuen score: 0.0000010556109 } { text: geraspelter schoklolade bestreuen highlighted: geraspelter <em>schoklolade</em> bestreuen score: 6.116578e-7 } { text: geraspelt shokolade bestreuen highlighted: <em>geraspelt</em> shokolade bestreuen score: 5.8213175e-7 } { text: geraspelt 1 shokolade bestreuen highlighted: <em>geraspelt 1</em> shokolade bestreuen score: 4.989969e-7 } { text: geraspelter schokoladen bestreuen highlighted: geraspelter <em>schokoladen</em> bestreuen score: 4.862056e-7 } ] } ] redirect: [ { text: geraspelter Shokolade bestreuen offset: 0 length: 31 options: [ { text: geraspelter shokolade bestreuen highlighted: geraspelter shokolade bestreuen score: 0.009769141 } ] } ] } }
Great Nik. Thank you again! I will try it out tomorrow...
(In reply to comment #20) > What I do not understand: how can Elasticsearch think "waser" is a better > option even if the word "waser" is NOT FOUND at all. > > so here some more tests with 2 words: > correct spelling would be "geraspelter Schokolade bestreuen" but I entered > "geraspelter Shokolade bestreuen" without the c in "Schokolade". > <snip> It gets it right that time, at least. You may want to try hitting the <wikiname>_content alias rather than the <wikiname> alias. I see that producing better results on my side. Still, I'll have to look into it.
when using <wikiname>_content and searchingn for "waser" this is the result if this is any help: { took: 14 timed_out: false _shards: { total: 4 successful: 4 failed: 0 } hits: { total: 0 max_score: null hits: [ ] } suggest: { title: [ { text: waser offset: 0 length: 5 options: [ { text: waser highlighted: waser score: 0.06573563 } ] } ] text_suggest: [ { text: waser offset: 0 length: 5 options: [ { text: wasser highlighted: <em>wasser</em> score: 0.03317656 } { text: water highlighted: <em>water</em> score: 0.017357524 } { text: wsser highlighted: <em>wsser</em> score: 0.009198759 } { text: wassers highlighted: <em>wassers</em> score: 0.00865565 } { text: waser highlighted: waser score: 0.007820548 } { text: wash highlighted: <em>wash</em> score: 0.0075003416 } ] } ] redirect: [ { text: waser offset: 0 length: 5 options: [ { text: waser highlighted: waser score: 0.27871963 } ] } ] } }
and here the other one: { took: 63 timed_out: false _shards: { total: 4 successful: 4 failed: 0 } hits: { total: 0 max_score: null hits: [ ] } suggest: { title: [ { text: geraspelter Shokolade bestreuen offset: 0 length: 31 options: [ { text: geraspelter shokolade bestreuen highlighted: geraspelter shokolade bestreuen score: 0.00012816112 } ] } ] text_suggest: [ { text: geraspelter Shokolade bestreuen offset: 0 length: 31 options: [ { text: geraspelter schokolade bestreuen highlighted: geraspelter <em>schokolade</em> bestreuen score: 0.009209847 } { text: geraspelte schokolade bestreuen highlighted: <em>geraspelte schokolade</em> bestreuen score: 0.004471939 } { text: geraspelten schokolade bestreuen highlighted: <em>geraspelten schokolade</em> bestreuen score: 0.000036463684 } { text: geraspelt schokolade bestreuen highlighted: <em>geraspelt schokolade</em> bestreuen score: 0.00000123148 } { text: geraspelt 1 schokolade bestreuen highlighted: <em>geraspelt 1 schokolade</em> bestreuen score: 0.0000010556109 } { text: geraspelter schoklolade bestreuen highlighted: geraspelter <em>schoklolade</em> bestreuen score: 6.116578e-7 } { text: geraspelt shokolade bestreuen highlighted: <em>geraspelt</em> shokolade bestreuen score: 5.8213175e-7 } { text: geraspelter shokolade bestreuen highlighted: geraspelter shokolade bestreuen score: 5.2390885e-7 } { text: geraspelten shokolade bestreuen highlighted: <em>geraspelten</em> shokolade bestreuen score: 5.1398877e-7 } { text: geraspelte shokolade bestreuen highlighted: <em>geraspelte</em> shokolade bestreuen score: 5.117502e-7 } { text: geraspelt 1 shokolade bestreuen highlighted: <em>geraspelt 1</em> shokolade bestreuen score: 4.989969e-7 } { text: geraspelter schokoladen bestreuen highlighted: geraspelter <em>schokoladen</em> bestreuen score: 4.862056e-7 } ] } ] redirect: [ { text: geraspelter Shokolade bestreuen offset: 0 length: 31 options: [ { text: geraspelter shokolade bestreuen highlighted: geraspelter shokolade bestreuen score: 0.009769141 } ] } ] } }
(In reply to comment #23) > when using <wikiname>_content and searchingn for "waser" this is the result > if this is any help: > <snip> > options: [ > { > text: wasser > highlighted: <em>wasser</em> > score: 0.03317656 > } > <snip> > { > text: waser > highlighted: waser > score: 0.007820548 > } > <snip> That is much better. See how "wasser"'s score is four times "waser"'s? That is enough to get it suggested. Off the cuff my guess is that the reason we see "waser" get a really high score when you use the <wikiname> alias is because everything's is MAX(per shard score) and the per shard score is based off of the number of terms in the shard. Since the <wikiname> alias combines both the <wikiname>_content and the <wikiname>_general aliases which might have vastly different sizes you could end up with bogus scores. The upshot from the perspective of a user is that suggestions work a lot less well when querying across content and non-content namespaces. Which I think is _reasonably_ rare.
Change 90132 merged by jenkins-bot: Optionally pull suggestions from text https://gerrit.wikimedia.org/r/90132
1. I updated to the newest master on git. 2. I updated LocalSettings.php with $wgCirrusSearchPhraseUseText = true; 3. I then ran: php updateSearchIndexConfig.php --reindexAndRemoveOk --indexIdentifier now php forceSearchIndex.php --forceUpdate I searches for "waser" and also for "shokolade" but I still did not get any suggestions :-( Am I forgetting something? Thanks again... Martin
One more thing: I also tried misspelling words that are not in the text but in the title of the page. Also no "did you mean" suggestions :-(
It might be simplest for me to connect to your wiki and es instance and have a look at what is going on. I'm really not sure. Did the script complete successfully? If you'd like me to have a look send me an email with connection information. I'm sorry this has been so much trouble!
E-Mail is on the way to you...
We worked this out over email - it was a code not rebased problem. For posterity: suggestions don't work if the first letter isn't right. I'm not filing a bug about that yet but it should be noted.