Last modified: 2014-10-31 14:36:30 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T74128, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 72128 - CirusSearch: Accelerated regex searches that stop early do not signal that
CirusSearch: Accelerated regex searches that stop early do not signal that
Status: NEW
Product: MediaWiki extensions
Classification: Unclassified
CirrusSearch (Other open bugs)
unspecified
All All
: Unprioritized normal (vote)
: ---
Assigned To: Nobody - You can work on this!
Elasticsearch_1.4
: upstream
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-10-16 12:11 UTC by Nik Everett
Modified: 2014-10-31 14:36 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Nik Everett 2014-10-16 12:11:48 UTC
In order to keep load down on the search cluster accelerated regex searches are only allowed to recheck a limited number of documents (10,000 right now).  Right now when that limit is reached all subsequent documents are considered not to match and Cirrus doesn't signal the user at all that this happened.  This means that results are less reliable.  OTOH this should only happen if your regex can't be accelerated down to a small subset of the wiki which _should_ be reasonably rare.  It'd happen if the regex actually does match more then the recheck limit or if it is specific but the trigram that we're able to extract from it still matches too many documents.

Example:
insource:/ {{/ will match a ton of pages and under report the number
insource:/ {{..ca/ will match fewer pages but the only trigram that can be extracted from (" {{") is still on too many pages

The plan is to allow the recheck code to signal back to cirrus that it gave up so it can let the user know that the results may not be consistent and it can tell them how to fix their regex.  Unfortunately that first level of signalling requires Elasticsearch 1.4 which isn't quite released yet.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links