Last modified: 2014-04-15 16:43:51 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T64058, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 62058 - CirrusSearch: Can't find text in specific heading


Summary:	CirrusSearch: Can't find text in specific heading

Status:	ASSIGNED

Product:	MediaWiki extensions
Classification:	Unclassified
Component:	CirrusSearch (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	High normal (vote)
Target Milestone:	---
Assigned To:	Nik Everett

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2014-02-28 15:27 UTC by Nik Everett
Modified:	2014-04-15 16:43 UTC (History)
CC List:	3 users (show)

See Also:	61965 52905 https://github.com/elasticsearch/elasticsearch/issues/5648
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Nik Everett 2014-02-28 15:27:57 UTC

I'm not sure the cause of this one: old search has three results and cirrus has two but should have all three.

Old: https://de.wikipedia.org/wiki/Special:Search?profile=advanced&search=stolpersteine+prefix%3APortal+Diskussion%3ANationalsozialismus%2F&fulltext=Search&ns0=1&ns4=1&ns10=1&ns12=1&redirs=1&profile=advanced
Cirrus: https://de.wikipedia.org/wiki/Special:Search?profile=advanced&search=stolpersteine+prefix%3APortal+Diskussion%3ANationalsozialismus%2F&fulltext=Search&ns0=1&ns4=1&ns10=1&ns12=1&redirs=1&profile=advanced&srbackend=CirrusSearch

Comment 1 Nik Everett 2014-02-28 15:28:34 UTC

Filing high because I'm not sure what is up with it.

Comment 2 Nik Everett 2014-02-28 15:29:14 UTC

Added some See Also bugs which might be the cause.  Or might not.

Comment 3 Nik Everett 2014-03-05 15:24:44 UTC

Not quite sure what is going on but this actually works in dev but not production.  Both enwiki and dewiki don't split on the ":" but my dev machines do.

http://localhost:1234/dewiki_content/_analyze?analyzer=text&text=Kategorie:Stolpersteine
{
  "tokens": [
    {
      "token": "kategorie:stolperstein",
      "start_offset": 0,
      "end_offset": 23,
      "type": "<ALPHANUM>",
      "position": 1
    }
  ]
}

Comment 4 Nik Everett 2014-03-05 15:27:16 UTC

Ah, what is saving me in dev is the $wgCirrusSearchUseAggressiveSplitting setting which _is_ enabled on mediawiki.org but only works in English.  The problem with enabling it everywhere is that it only works in English right now and might make it harder to find things....  Let me see what I can do about that.

Comment 5 Nik Everett 2014-03-05 16:33:04 UTC

Stalling this for a moment while I wait on input from Dan and Chad.  At question is whether to get aggressive splitting working everywhere or to use a smaller fix to get just colons.  I'd like to unify everywhere on aggressive splitting to make regression testing easier and so I don't have the confusion of some environment having it and some not.

Comment 6 Nik Everett 2014-04-15 16:43:51 UTC

I've gotten input: we should push aggressive splitting everywhere we can sensibly do it.  I've filed https://github.com/elasticsearch/elasticsearch/issues/5648 upstream so we can more easily edit the analyzers built in to elasticsearch.  Right now editing them requires rebuilding them as "custom" analyzers by hand which is error prone.  The issue would let us instruct Elasticsearch to rebuild them as custom analyzers and then we could make incremental changes to them.

We don't actually need the issue closed upstream to work on this here, but we will need it for a few languages because some of the language analyzers can't actually be rebuilt as custom analyzers: Persian, Thai, and German I believe.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links