Last modified: 2014-04-15 16:43:51 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T64058, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 62058 - CirrusSearch: Can't find text in specific heading
CirrusSearch: Can't find text in specific heading
Status: ASSIGNED
Product: MediaWiki extensions
Classification: Unclassified
CirrusSearch (Other open bugs)
unspecified
All All
: High normal (vote)
: ---
Assigned To: Nik Everett
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-02-28 15:27 UTC by Nik Everett
Modified: 2014-04-15 16:43 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Comment 1 Nik Everett 2014-02-28 15:28:34 UTC
Filing high because I'm not sure what is up with it.
Comment 2 Nik Everett 2014-02-28 15:29:14 UTC
Added some See Also bugs which might be the cause.  Or might not.
Comment 3 Nik Everett 2014-03-05 15:24:44 UTC
Not quite sure what is going on but this actually works in dev but not production.  Both enwiki and dewiki don't split on the ":" but my dev machines do.

http://localhost:1234/dewiki_content/_analyze?analyzer=text&text=Kategorie:Stolpersteine
{
  "tokens": [
    {
      "token": "kategorie:stolperstein",
      "start_offset": 0,
      "end_offset": 23,
      "type": "<ALPHANUM>",
      "position": 1
    }
  ]
}
Comment 4 Nik Everett 2014-03-05 15:27:16 UTC
Ah, what is saving me in dev is the $wgCirrusSearchUseAggressiveSplitting setting which _is_ enabled on mediawiki.org but only works in English.  The problem with enabling it everywhere is that it only works in English right now and might make it harder to find things....  Let me see what I can do about that.
Comment 5 Nik Everett 2014-03-05 16:33:04 UTC
Stalling this for a moment while I wait on input from Dan and Chad.  At question is whether to get aggressive splitting working everywhere or to use a smaller fix to get just colons.  I'd like to unify everywhere on aggressive splitting to make regression testing easier and so I don't have the confusion of some environment having it and some not.
Comment 6 Nik Everett 2014-04-15 16:43:51 UTC
I've gotten input: we should push aggressive splitting everywhere we can sensibly do it.  I've filed https://github.com/elasticsearch/elasticsearch/issues/5648 upstream so we can more easily edit the analyzers built in to elasticsearch.  Right now editing them requires rebuilding them as "custom" analyzers by hand which is error prone.  The issue would let us instruct Elasticsearch to rebuild them as custom analyzers and then we could make incremental changes to them.

We don't actually need the issue closed upstream to work on this here, but we will need it for a few languages because some of the language analyzers can't actually be rebuilt as custom analyzers: Persian, Thai, and German I believe.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links