Last modified: 2014-04-16 21:24:41 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T55529, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 53529 - Wrong terms are highlighted / snippet does not contain search phrase
Wrong terms are highlighted / snippet does not contain search phrase
Status: RESOLVED FIXED
Product: MediaWiki extensions
Classification: Unclassified
CirrusSearch (Other open bugs)
unspecified
All All
: Normal normal (vote)
: ---
Assigned To: Nik Everett
Elasticsearch_Open_Bug
: upstream
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-08-29 10:11 UTC by Remco de Boer
Modified: 2014-04-16 21:24 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Remco de Boer 2013-08-29 10:11:44 UTC
The highlighted terms in a CirusSearch result are not always the most sensible ones.

Example: when searching for 'partial forms'[1], the top results include (sub)pages of the Extension:Semantic_Forms. This is to be expected, since partial forms are a functionality of that extension. However, the snippet that is shown in the search result for these pages does not contain the phrase ´partial forms´ even though this occurs on the page. Instead, for all pages the snippet shown is:

---
Semantic *Forms* - navigation (view) Basics Main page (talk) · Download and installation · Quick start
---

with 'Forms' being highlighted.

[1] https://www.mediawiki.org/w/index.php?search=partial+forms&button=&title=Special%3ASearch&srbackend=CirrusSearch
Comment 1 Nik Everett 2013-08-29 12:35:56 UTC
Triaging to normal because the results still make sense.  It is more important then some of the other bugs I've set to normal.  I might have to push those down to low and the low ones to lowest....
Comment 2 Nik Everett 2013-09-05 17:03:27 UTC
This one is fun.

First, we have to tell elasticsearch to order the highlights by score.  I was under the impression this is the default.  It isn't.  Document order is.  This is here: https://gerrit.wikimedia.org/r/#/c/82856/

Next, we have to convince elasticsearch to really boost perfect phrase matches.  This can't be merged because of a bug in elasticsearch (https://github.com/elasticsearch/elasticsearch/issues/3503) that will be fixed in the next release.  The commit has probably atrophied a bit because it has been sitting around but eventually we'll be able to merge it here:  https://gerrit.wikimedia.org/r/#/c/79087/

And finally it looks like elasticsearch doesn't take rescores into account when it highlights (https://github.com/elasticsearch/elasticsearch/issues/3630).  When that is released and we've merged the phrase boosts, then this bug should be solved.

I'm whiteboarding this Elasticsearch_Open_Bug until https://github.com/elasticsearch/elasticsearch/issues/3630 is merged and I know which release it'll go with.
Comment 3 Nik Everett 2013-09-26 14:46:05 UTC
So just pushing sorting by score [1] seems to have helped the situation quite a bit.  I'm not closing this because I'm using it to track the open elasticsearch issue [2].  Also, the boost perfect phrase matches work [3] is ready to be merged _but_ merging it would cause Bug 54526 which I filed in preparation for the merge.

[1] https://gerrit.wikimedia.org/r/#/c/82856/
[2] https://github.com/elasticsearch/elasticsearch/issues/3630
[3] https://gerrit.wikimedia.org/r/#/c/79087/
Comment 4 Nik Everett 2013-10-02 20:09:07 UTC
https://github.com/elasticsearch/elasticsearch/issues/3630 has just been closed and will be released with 0.90.6.  There is light at the end of this bug!
Comment 5 Nik Everett 2013-11-08 17:51:54 UTC
Switching to Elasticsearch_0.90.7 becuase 0.90.6 has an issue that we don't want to suffer and they are cutting a new one "in a couple of days".
Comment 6 Nik Everett 2013-12-04 21:21:30 UTC
I'm taking this to work on the final leg: when we boost perfect phrase matches then consider that boost when sorting the highlighting.
Comment 7 Nik Everett 2013-12-09 16:25:59 UTC
Looks like there a bug in Lucene which is reflecting into Elasticsearch stopping me from finishing this.  It stops highlighting from considering some boosts.  In my case, it stops it from considering the boosts I use to boost perfect phrase matches....
Comment 8 Nik Everett 2014-04-16 21:24:41 UTC
This should (finally) be all fixed.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links