Last modified: 2013-10-18 14:08:29 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T57786, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 55786 - "did you mean" is not working
"did you mean" is not working
Status: RESOLVED FIXED
Product: MediaWiki extensions
Classification: Unclassified
CirrusSearch (Other open bugs)
master
Other Linux
: Normal minor (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-10-16 11:44 UTC by keyler
Modified: 2013-10-18 14:08 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Suggestion test query (1.10 KB, application/json)
2013-10-16 13:45 UTC, Nik Everett
Details
Suggestion test query (1.10 KB, application/json)
2013-10-16 14:08 UTC, Nik Everett
Details
Suggestion test query v2 (1.56 KB, application/json)
2013-10-16 14:10 UTC, Nik Everett
Details
Suggestion test query (1.56 KB, application/json)
2013-10-16 14:10 UTC, Nik Everett
Details
Suggestion test query (1.56 KB, application/json)
2013-10-16 14:21 UTC, Nik Everett
Details

Description keyler 2013-10-16 11:44:23 UTC
When I first started to use "CirrusSearch" I was very happy that the "Search Suggestions" where working out of the box. In the resent versions (2-3 weeks) there are no search suggestions anymore. Can I somehow turn them on again?

Thank you
Martin
Comment 1 Andre Klapper 2013-10-16 11:55:52 UTC
Where can I see this? A private wiki instance? So Wikimedia website? 
What are exact steps to reproduce this?
Comment 2 keyler 2013-10-16 12:24:34 UTC
Yes, a private wiki. I have send you a direct mail with the link.
I will now also try to reproduce it on one of the Wikimedia test sites.

Thank you!
Comment 3 Andre Klapper 2013-10-16 12:47:32 UTC
In case that you refer to the search box in the upper right corner, Search proposals work for me on your wiki (when I enter "Salz" it proposes one page with a name that starts with Salz). Using Firefox 24 here.
Comment 4 keyler 2013-10-16 12:50:18 UTC
Ok, sorry by "Search Suggestions" I meant: searching for something that does NOT exist or is spelled incorrectly like "waser" instead of "wasser". I am truly sorry that I did not make myself clear. Yes "AutoComplete" as I would call it workes pretty good.
Comment 5 Nik Everett 2013-10-16 13:45:37 UTC
Works for me on production: https://en.wikisource.org/w/index.php?title=Special%3ASearch&profile=default&search=pots&fulltext=Search
And in development: http://solr-mw2.instance-proxy.wmflabs.org/w/index.php?search=noble+prize&title=Special%3ASearch

One thing that might have changed from the last time you checked: we only build suggestions from the titles and redirect titles.  We used to build suggestions from titles and text.  We felt that that produced too many false positives.  Also, the search index required to do that took up a bunch of space.

I'm going to attach the query that I always use for debugging suggestions issues to this bug.  If you could send it to Elasticsearch and attach the results I'll decipher them for you.  So you aren't in suspense: it'll return a bunch of suggestions including the search phrase.  Normally CirrusSearch configures Elasticsearch to only return suggestions that have a score of twice what the original search phrase had so you can use the results to figure out if the suggestion that you expected was even being generated and, if so, how it scores.

So, options:
1.  I can make generating suggestions from text a configurable thing.  Going from off to on would require a reindex.
2.  You can change suggestion cutoff score and walk the false positive tuning line.  The config value is $wgCirrusSearchPhraseSuggestConfidence - just make sure to keep it set to a number.  You can change this as much as you like without breaking anything but if you set it to less than 1 then I believe you'll end up getting your search query back as a suggestion all the time.
3.  There is some kind of problem in Elasticsearch, your setup (did you rebuild the index when you pulled), or gremlins.
Comment 6 Nik Everett 2013-10-16 13:45:59 UTC
Created attachment 13505 [details]
Suggestion test query
Comment 7 keyler 2013-10-16 13:50:53 UTC
I would go with Option 1 as I only tried so search for words in the "text" not the title. This would solve the problem for me at least! Thanks again for responding in such a fast matter... (as always).
Comment 8 keyler 2013-10-16 13:59:53 UTC
Suggestion test query Result: 

{

    took: 14
    timed_out: false
    _shards: {
        total: 8
        successful: 8
        failed: 0
    }
    hits: {
        total: 0
        max_score: null
        hits: [ ]
    }
    suggest: {
        title: [
            {
                text: waser
                offset: 0
                length: 5
                options: [
                    {
                        text: waser
                        highlighted: waser
                        score: 0.08398465
                    }
                ]
            }
        ]
        redirect: [
            {
                text: waser
                offset: 0
                length: 5
                options: [
                    {
                        text: waser
                        highlighted: waser
                        score: 0.27871963
                    }
                ]
            }
        ]
    }

}
Comment 9 Nik Everett 2013-10-16 14:01:12 UTC
Yup - no suggestions are coming up and you would have got your suggestions from the text.  Let me see about getting that working again.
Comment 10 Nik Everett 2013-10-16 14:08:40 UTC
Created attachment 13506 [details]
Suggestion test query
Comment 11 Nik Everett 2013-10-16 14:09:33 UTC
I found a problem with the query I posted earlier so I posted a second copy - this second one also builds the suggestions against the text.  See if that provides the suggestions you need.
Comment 12 Nik Everett 2013-10-16 14:10:02 UTC
Created attachment 13507 [details]
Suggestion test query v2
Comment 13 Nik Everett 2013-10-16 14:10:43 UTC
Created attachment 13508 [details]
Suggestion test query

Sorry about all the updates, just found the obsoletes field on the uploader and wanted to get rid of the duplicates.
Comment 14 keyler 2013-10-16 14:13:36 UTC
Result:

{

    took: 30
    timed_out: false
    _shards: {
        total: 8
        successful: 8
        failed: 0
    }
    hits: {
        total: 0
        max_score: null
        hits: [ ]
    }
    suggest: {
        title: [
            {
                text: waser
                offset: 0
                length: 5
                options: [
                    {
                        text: waser
                        highlighted: waser
                        score: 0.08398465
                    }
                ]
            }
        ]
        text_suggest: [
            {
                text: waser
                offset: 0
                length: 5
                options: [
                    {
                        text: waser
                        highlighted: waser
                        score: 0.10791662
                    }
                ]
            }
        ]
        redirect: [
            {
                text: waser
                offset: 0
                length: 5
                options: [
                    {
                        text: waser
                        highlighted: waser
                        score: 0.27871963
                    }
                ]
            }
        ]
    }

}
Comment 15 Nik Everett 2013-10-16 14:21:08 UTC
Created attachment 13509 [details]
Suggestion test query

And one more bug.  On the upside, the feature is almost done.
Comment 16 keyler 2013-10-16 14:22:24 UTC
{

    took: 135
    timed_out: false
    _shards: {
        total: 8
        successful: 8
        failed: 0
    }
    hits: {
        total: 0
        max_score: null
        hits: [ ]
    }
    suggest: {
        title: [
            {
                text: waser
                offset: 0
                length: 5
                options: [
                    {
                        text: waser
                        highlighted: waser
                        score: 0.08398465
                    }
                ]
            }
        ]
        text_suggest: [
            {
                text: waser
                offset: 0
                length: 5
                options: [
                    {
                        text: waser
                        highlighted: waser
                        score: 0.10791662
                    }
                    {
                        text: wasser
                        highlighted: <em>wasser</em>
                        score: 0.042066924
                    }
                    {
                        text: water
                        highlighted: <em>water</em>
                        score: 0.017357524
                    }
                    {
                        text: wsser
                        highlighted: <em>wsser</em>
                        score: 0.011847182
                    }
                    {
                        text: wash
                        highlighted: <em>wash</em>
                        score: 0.009659772
                    }
                    {
                        text: wassers
                        highlighted: <em>wassers</em>
                        score: 0.00865565
                    }
                ]
            }
        ]
        redirect: [
            {
                text: waser
                offset: 0
                length: 5
                options: [
                    {
                        text: waser
                        highlighted: waser
                        score: 0.27871963
                    }
                ]
            }
        ]
    }
Comment 17 Nik Everett 2013-10-16 14:35:08 UTC
Hmmm.  This:
                    {
                        text: waser
                        highlighted: waser
                        score: 0.10791662
                    }
                    {
                        text: wasser
                        highlighted: <em>wasser</em>
                        score: 0.042066924
                    }
Says that Elasticsearch thinks that "waser" is still a better option than "wasser".  I find that it actually  works better for me when searching for phrases.  I'm not super sure why at this point.  For example, I have a page which contains the phrase "test catapult" but when I search for "catapul" I don't get a suggestion.  I do get one when I search for "test catapul" or "tets catapul".

I'll add it to my todo list to figure out why that happens.  For now, I'll proceed with making text suggestions configurable.
Comment 18 Gerrit Notification Bot 2013-10-16 15:08:02 UTC
Change 90132 had a related patch set uploaded by Manybubbles:
Optionally pull suggestions from text

https://gerrit.wikimedia.org/r/90132
Comment 19 Nik Everett 2013-10-16 15:09:59 UTC
With the referenced patch you can turn on getting suggestions from text by setting $wgCirrusSearchPhraseUseText = true; and doing an in place reindex:
 php updateSearchIndexConfig.php --reindexAndRemoveOk --indexIdentifier now
 php forceSearchIndex.php --forceUpdate
Comment 20 keyler 2013-10-16 15:11:17 UTC
What I do not understand: how can Elasticsearch think "waser" is a better option even if the word "waser" is NOT FOUND at all. 

so here some more tests with 2 words:
correct spelling would be "geraspelter Schokolade bestreuen" but I entered "geraspelter Shokolade bestreuen" without the c in "Schokolade".

Here the output:

{

    took: 119
    timed_out: false
    _shards: {
        total: 8
        successful: 8
        failed: 0
    }
    hits: {
        total: 0
        max_score: null
        hits: [ ]
    }
    suggest: {
        title: [
            {
                text: geraspelter Shokolade bestreuen
                offset: 0
                length: 31
                options: [
                    {
                        text: geraspelter shokolade bestreuen
                        highlighted: geraspelter shokolade bestreuen
                        score: 0.00026727197
                    }
                ]
            }
        ]
        text_suggest: [
            {
                text: geraspelter Shokolade bestreuen
                offset: 0
                length: 31
                options: [
                    {
                        text: geraspelter schokolade bestreuen
                        highlighted: geraspelter <em>schokolade</em> bestreuen
                        score: 0.011861463
                    }
                    {
                        text: geraspelte schokolade bestreuen
                        highlighted: <em>geraspelte schokolade</em> bestreuen
                        score: 0.0057594595
                    }
                    {
                        text: geraspelter shokolade bestreuen
                        highlighted: geraspelter shokolade bestreuen
                        score: 0.0005670466
                    }
                    {
                        text: geraspelten schokolade bestreuen
                        highlighted: <em>geraspelten schokolade</em> bestreuen
                        score: 0.000060482846
                    }
                    {
                        text: geraspelt schokolade bestreuen
                        highlighted: <em>geraspelt schokolade</em> bestreuen
                        score: 0.00000123148
                    }
                    {
                        text: geraspelten shokolade bestreuen
                        highlighted: <em>geraspelten</em> shokolade bestreuen
                        score: 0.0000010980223
                    }
                    {
                        text: geraspelte shokolade bestreuen
                        highlighted: <em>geraspelte</em> shokolade bestreuen
                        score: 0.0000010932401
                    }
                    {
                        text: geraspelt 1 schokolade bestreuen
                        highlighted: <em>geraspelt 1 schokolade</em> bestreuen
                        score: 0.0000010556109
                    }
                    {
                        text: geraspelter schoklolade bestreuen
                        highlighted: geraspelter <em>schoklolade</em> bestreuen
                        score: 6.116578e-7
                    }
                    {
                        text: geraspelt shokolade bestreuen
                        highlighted: <em>geraspelt</em> shokolade bestreuen
                        score: 5.8213175e-7
                    }
                    {
                        text: geraspelt 1 shokolade bestreuen
                        highlighted: <em>geraspelt 1</em> shokolade bestreuen
                        score: 4.989969e-7
                    }
                    {
                        text: geraspelter schokoladen bestreuen
                        highlighted: geraspelter <em>schokoladen</em> bestreuen
                        score: 4.862056e-7
                    }
                ]
            }
        ]
        redirect: [
            {
                text: geraspelter Shokolade bestreuen
                offset: 0
                length: 31
                options: [
                    {
                        text: geraspelter shokolade bestreuen
                        highlighted: geraspelter shokolade bestreuen
                        score: 0.009769141
                    }
                ]
            }
        ]
    }

}
Comment 21 keyler 2013-10-16 15:12:06 UTC
Great Nik. Thank you again! I will try it out tomorrow...
Comment 22 Nik Everett 2013-10-16 15:28:13 UTC
(In reply to comment #20)
> What I do not understand: how can Elasticsearch think "waser" is a better
> option even if the word "waser" is NOT FOUND at all. 
> 
> so here some more tests with 2 words:
> correct spelling would be "geraspelter Schokolade bestreuen" but I entered
> "geraspelter Shokolade bestreuen" without the c in "Schokolade".
> <snip>

It gets it right that time, at least.  You may want to try hitting the <wikiname>_content alias rather than the <wikiname> alias.  I see that producing better results on my side.

Still, I'll have to look into it.
Comment 23 keyler 2013-10-16 15:31:30 UTC
when using <wikiname>_content and searchingn for "waser" this is the result if this is any help:

{

    took: 14
    timed_out: false
    _shards: {
        total: 4
        successful: 4
        failed: 0
    }
    hits: {
        total: 0
        max_score: null
        hits: [ ]
    }
    suggest: {
        title: [
            {
                text: waser
                offset: 0
                length: 5
                options: [
                    {
                        text: waser
                        highlighted: waser
                        score: 0.06573563
                    }
                ]
            }
        ]
        text_suggest: [
            {
                text: waser
                offset: 0
                length: 5
                options: [
                    {
                        text: wasser
                        highlighted: <em>wasser</em>
                        score: 0.03317656
                    }
                    {
                        text: water
                        highlighted: <em>water</em>
                        score: 0.017357524
                    }
                    {
                        text: wsser
                        highlighted: <em>wsser</em>
                        score: 0.009198759
                    }
                    {
                        text: wassers
                        highlighted: <em>wassers</em>
                        score: 0.00865565
                    }
                    {
                        text: waser
                        highlighted: waser
                        score: 0.007820548
                    }
                    {
                        text: wash
                        highlighted: <em>wash</em>
                        score: 0.0075003416
                    }
                ]
            }
        ]
        redirect: [
            {
                text: waser
                offset: 0
                length: 5
                options: [
                    {
                        text: waser
                        highlighted: waser
                        score: 0.27871963
                    }
                ]
            }
        ]
    }

}
Comment 24 keyler 2013-10-16 15:32:42 UTC
and here the other one:

{

    took: 63
    timed_out: false
    _shards: {
        total: 4
        successful: 4
        failed: 0
    }
    hits: {
        total: 0
        max_score: null
        hits: [ ]
    }
    suggest: {
        title: [
            {
                text: geraspelter Shokolade bestreuen
                offset: 0
                length: 31
                options: [
                    {
                        text: geraspelter shokolade bestreuen
                        highlighted: geraspelter shokolade bestreuen
                        score: 0.00012816112
                    }
                ]
            }
        ]
        text_suggest: [
            {
                text: geraspelter Shokolade bestreuen
                offset: 0
                length: 31
                options: [
                    {
                        text: geraspelter schokolade bestreuen
                        highlighted: geraspelter <em>schokolade</em> bestreuen
                        score: 0.009209847
                    }
                    {
                        text: geraspelte schokolade bestreuen
                        highlighted: <em>geraspelte schokolade</em> bestreuen
                        score: 0.004471939
                    }
                    {
                        text: geraspelten schokolade bestreuen
                        highlighted: <em>geraspelten schokolade</em> bestreuen
                        score: 0.000036463684
                    }
                    {
                        text: geraspelt schokolade bestreuen
                        highlighted: <em>geraspelt schokolade</em> bestreuen
                        score: 0.00000123148
                    }
                    {
                        text: geraspelt 1 schokolade bestreuen
                        highlighted: <em>geraspelt 1 schokolade</em> bestreuen
                        score: 0.0000010556109
                    }
                    {
                        text: geraspelter schoklolade bestreuen
                        highlighted: geraspelter <em>schoklolade</em> bestreuen
                        score: 6.116578e-7
                    }
                    {
                        text: geraspelt shokolade bestreuen
                        highlighted: <em>geraspelt</em> shokolade bestreuen
                        score: 5.8213175e-7
                    }
                    {
                        text: geraspelter shokolade bestreuen
                        highlighted: geraspelter shokolade bestreuen
                        score: 5.2390885e-7
                    }
                    {
                        text: geraspelten shokolade bestreuen
                        highlighted: <em>geraspelten</em> shokolade bestreuen
                        score: 5.1398877e-7
                    }
                    {
                        text: geraspelte shokolade bestreuen
                        highlighted: <em>geraspelte</em> shokolade bestreuen
                        score: 5.117502e-7
                    }
                    {
                        text: geraspelt 1 shokolade bestreuen
                        highlighted: <em>geraspelt 1</em> shokolade bestreuen
                        score: 4.989969e-7
                    }
                    {
                        text: geraspelter schokoladen bestreuen
                        highlighted: geraspelter <em>schokoladen</em> bestreuen
                        score: 4.862056e-7
                    }
                ]
            }
        ]
        redirect: [
            {
                text: geraspelter Shokolade bestreuen
                offset: 0
                length: 31
                options: [
                    {
                        text: geraspelter shokolade bestreuen
                        highlighted: geraspelter shokolade bestreuen
                        score: 0.009769141
                    }
                ]
            }
        ]
    }

}
Comment 25 Nik Everett 2013-10-16 15:45:23 UTC
(In reply to comment #23)
> when using <wikiname>_content and searchingn for "waser" this is the result
> if this is any help:
> <snip>
>                 options: [
>                     {
>                         text: wasser
>                         highlighted: <em>wasser</em>
>                         score: 0.03317656
>                     }
> <snip>
>                     {
>                         text: waser
>                         highlighted: waser
>                         score: 0.007820548
>                     }
> <snip>

That is much better.  See how "wasser"'s score is four times "waser"'s?  That is enough to get it suggested.

Off the cuff my guess is that the reason we see "waser" get a really high score when you use the <wikiname> alias is because everything's is MAX(per shard score) and the per shard score is based off of the number of terms in the shard. Since the <wikiname> alias combines both the <wikiname>_content and the <wikiname>_general aliases which might have vastly different sizes you could end up with bogus scores.

The upshot from the perspective of a user is that suggestions work a lot less well when querying across content and non-content namespaces.  Which I think is _reasonably_ rare.
Comment 26 Gerrit Notification Bot 2013-10-16 17:16:26 UTC
Change 90132 merged by jenkins-bot:
Optionally pull suggestions from text

https://gerrit.wikimedia.org/r/90132
Comment 27 keyler 2013-10-17 14:31:16 UTC
1. I updated to the newest master on git.
2. I updated LocalSettings.php with
 $wgCirrusSearchPhraseUseText = true;
3. I then ran: 
 php updateSearchIndexConfig.php --reindexAndRemoveOk --indexIdentifier now
 php forceSearchIndex.php --forceUpdate

I searches for "waser" and also for "shokolade" but I still did not get any suggestions :-(

Am I forgetting something? Thanks again... Martin
Comment 28 keyler 2013-10-17 14:33:22 UTC
One more thing: I also tried misspelling words that are not in the text but in the title of the page. Also no "did you mean" suggestions :-(
Comment 29 Nik Everett 2013-10-17 18:28:16 UTC
It might be simplest for me to connect to your wiki and es instance and have a look at what is going on.  I'm really not sure.  Did the script complete successfully?  If you'd like me to have a look send me an email with connection information.

I'm sorry this has been so much trouble!
Comment 30 keyler 2013-10-18 12:06:53 UTC
E-Mail is on the way to you...
Comment 31 Nik Everett 2013-10-18 14:08:29 UTC
We worked this out over email - it was a code not rebased problem.

For posterity: suggestions don't work if the first letter isn't right.  I'm not filing a bug about that yet but it should be noted.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links