Last modified: 2014-10-22 20:33:58 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T74381, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 72381 - Search: Add ability to search for special chars
Search: Add ability to search for special chars
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
Search (Other open bugs)
1.25-git
All All
: Unprioritized normal (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-10-22 20:06 UTC by Rezonansowy
Modified: 2014-10-22 20:33 UTC (History)
1 user (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Rezonansowy 2014-10-22 20:06:48 UTC
I mean characters like '<'. See short entry on wiki - https://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28technical%29#Search_for_special_chars
Comment 1 Nik Everett 2014-10-22 20:33:58 UTC
Technically this is already implemented with Cirrus's source regex queries but its super duper slow in production now.  Right now the default implementation is to brute force run the regex over all the pages.  That takes, like, 10 minutes on enwiki if you can't reduce the set of considered pages some other way (title filter, other required text, smaller namespace, etc).  After about a minute of waiting on the search varnish normally chops the request and sends you a timeout which is pretty lame.  So 10 minutes of compute time get wasted (kinda, we mitigate it a bit but it still lame).

Anyway, we're in the process of deploying trigram accelerated regex searches so we only actually have to run the regexes on pages that have a chance of matching the regex in the first place.  In the common case its something like 60 times faster than the brute force.  10 seconds is ok to wait if not great.  In the worst case we actually cut the query off at some point and don't let it take any more time.  This can cause weird results (Bug 72128) but at least you get results at all rather than waiting forever.

The trigram searches aren't the default because we haven't built the trigram index for all the wikis.  The plan is to make it the default once the trigram index is built for all the wikis which will take another few days.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links