Last modified: 2013-09-30 18:00:53 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T56020, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 54020 - Quoting terms in CirrusSearch doesn't turn off stemming
Quoting terms in CirrusSearch doesn't turn off stemming
Status: VERIFIED FIXED
Product: MediaWiki extensions
Classification: Unclassified
CirrusSearch (Other open bugs)
unspecified
All All
: High normal (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-09-11 16:52 UTC by Nik Everett
Modified: 2013-09-30 18:00 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Nik Everett 2013-09-11 16:52:53 UTC
Quoting phrases in CirrusSearch doesn't turn off stemming and the phrase slop is too high.  It should probably be 0 which is what people expect.

It might be nice to let users do stemmed phrase searches, maybe with "phrase"~.  We could also let them set the phrase slop with something like "phrase"~3 (set slow to 3 and use stemming) or "phrase"3 (set slow to 3).
Comment 1 Nik Everett 2013-09-19 13:45:43 UTC
Setting this aside for now as I _can_ implement turning off stemming but then I lose highlighting on the quoted term.

BTW, ~3 is the standard syntax for setting phrase slop so we shouldn't change that.  We can still have a syntax that turns on stemming for phrases but I'm not sure what it should be.
Comment 2 Nik Everett 2013-09-24 16:51:02 UTC
Raising importance because someone cared enough about the problem to send an email about it.
Comment 3 Nik Everett 2013-09-24 17:10:36 UTC
I'll add a fix for this even though it'll break highlighting for quoted terms.  This is the upstream issue that causes the loss of highlighting:  https://github.com/elasticsearch/elasticsearch/issues/3750
Comment 4 Gerrit Notification Bot 2013-09-24 20:17:20 UTC
Change 85908 had a related patch set uploaded by Manybubbles:
Quotes turn off stemming.

https://gerrit.wikimedia.org/r/85908
Comment 5 Gerrit Notification Bot 2013-09-24 20:19:19 UTC
Change 85910 had a related patch set uploaded by Manybubbles:
Tests for quotes turning of stemming.

https://gerrit.wikimedia.org/r/85910
Comment 6 Nik Everett 2013-09-24 21:26:39 UTC
Just for posterity:
The proposed solution to this bug, and every other solution I can think of, causes Bug 54526.  I'm happy to be told that fixing this isn't worth Bug 54526 and I'll make sure the commits for this are help in gerrit until I can fix 54526 upstream _and_ we update to the version with the fix.  That will probably take at least a month.

Yes, I know "probably take at least" is very wishy washy.  I can't predict Elasticsearch's release schedule or how long it'll take to fix the bug.  I can say that LuceneSearch seems to have figured out some kind of solution to the problem years ago with an old version of Lucene.
Comment 7 Gerrit Notification Bot 2013-09-26 16:27:08 UTC
Change 85910 merged by jenkins-bot:
Tests for quotes turning of stemming.

https://gerrit.wikimedia.org/r/85910
Comment 8 Gerrit Notification Bot 2013-09-26 16:28:06 UTC
Change 85908 merged by jenkins-bot:
Quotes turn off stemming.

https://gerrit.wikimedia.org/r/85908
Comment 9 Nik Everett 2013-09-26 16:41:25 UTC
Merged.
Comment 10 Nik Everett 2013-09-30 18:00:53 UTC
Verified on test2wiki.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links