Last modified: 2013-09-26 15:09:38 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T54904, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 52904 - Page shows up in search results though it does not include a search term
Page shows up in search results though it does not include a search term
Status: VERIFIED FIXED
Product: MediaWiki extensions
Classification: Unclassified
CirrusSearch (Other open bugs)
unspecified
All All
: High normal (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-08-15 23:49 UTC by Sumana Harihareswara
Modified: 2013-09-26 15:09 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Sumana Harihareswara 2013-08-15 23:49:55 UTC
To reproduce:

1. Go to https://test2.wikipedia.org

2. In the upper-right-hand searchbox, search for    love message    and hit Enter.

3. Result page: https://test2.wikipedia.org/w/index.php?search=love+message&title=Special%3ASearch with a top result: https://test2.wikipedia.org/wiki/Love_issue

Problem: the [[Love issue]] page does not include the word "message", so it shouldn't be in the results.

This may be related to bug 40210.
Comment 1 Nik Everett 2013-08-16 01:21:06 UTC
I don't believe this is related to Bug 40210 but it is complicated because it highlights three things:

It looks like lsearchd finds documents that contain ALL the terms then ranks them whereas CirrusSearch finds all the documents that contains ANY OF the terms then ranks them.  What CirrusSearch does is actually normal behaviour for a search engine - both google and bing do it.  It wouldn't be hard for me to change CirrusSearch to work just like lsearchd, but I'm not sure it'd be right.

This may be related to 52906.  I don't think so, though.

The reason this page is so highly ranked is because it contains a single word title match and no other pages do.  Right now CirrusSearch is configured to weight any title matches very highly compared to text matches.

All and all, I'm not sure what action to take on this bug, if any.
Comment 2 Sumana Harihareswara 2013-08-16 02:25:51 UTC
(Just a tip: thank you for mentioning bug 52906 .  It's great to say "bug 52906" or "bug # 52906" if you mention a bug number, because then BZ automatically links to it! Magic.)

I think we'd want to distinguish among these cases:

* there are no results that include all the search terms, and some results that are partial matches
* there are very few results that include all the search terms, and then more partial matches
* there are lots of results that include all the search terms

I agree that if there are NO results that include all the search terms, then we should offer partial matches (and say that we are doing so, and why).

And if there are FEW full matches and a lot of partial matches, I believe that we would generally want to rank full matches above partial matches.  I personally prefer to rank full matches above partial matches.  On real wikis with non-loremipsum content, I'm sure that page title matches are pretty important and should be ranked pretty high.  But maybe we should just test that again after rollout to mediawiki.org.
Comment 3 Nik Everett 2013-08-19 15:25:23 UTC
I agree with testing again after we roll out to mediawiki.org.  We may not be able to be truly happy with testing until we deploy this to enwiki as the non-default search backend.

I'm setting the priority to high so I'm sure to look at it again.
Comment 4 Nik Everett 2013-09-05 00:53:04 UTC
So after some more research it looks like I Google and Bing to default to AND and I was just mistaken.  I'll switch it in the morning.  I would like to one day do the whole "we couldn't find enough matches with your query so we tried some other queries for you" thing but right now defaulting to AND is probably the right thing to do.
Comment 5 Nik Everett 2013-09-05 13:58:02 UTC
Implementation: https://gerrit.wikimedia.org/r/#/c/82825/
Regression tests: https://gerrit.wikimedia.org/r/#/c/82827/
Comment 6 Nik Everett 2013-09-26 15:09:38 UTC
Live and working.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links