Last modified: 2012-06-07 12:34:39 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T32840, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 30840 - Highlighting of non-Latin characters (SphinxMWSearch 0.8.1)
Highlighting of non-Latin characters (SphinxMWSearch 0.8.1)
Status: RESOLVED WORKSFORME
Product: MediaWiki extensions
Classification: Unclassified
SphinxSearch (Other open bugs)
unspecified
All All
: Normal normal (vote)
: ---
Assigned To: Svemir Brkic
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-09-10 02:17 UTC by MWJames
Modified: 2012-06-07 12:34 UTC (History)
1 user (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Display of Japanese search result after r96768 changes (53.60 KB, image/png)
2011-09-11 02:55 UTC, MWJames
Details
Compare highlighting after r96768 changes (152.07 KB, image/png)
2011-09-11 03:07 UTC, MWJames
Details

Description MWJames 2011-09-10 02:17:40 UTC
Conversion from [1]

While testing the new interface we encountered a problem with non-Latin characters (we tested Japanese and Chinese) and the highlighting of the search term on the output page. In earlier versions the search term was highlighted in the result output (Latin as well as non-Latin characters), now in the standard MW search results are displayed but without highlighting the particular search term (it does work with Latin characters). --[[User:MWJames|MWJames]] 07:03, 7 September 2011 (UTC)

: What is the value of $wgSphinxSearchMWHighlighter in your case? [[User:Svemir Brkic|Svemir Brkic]] 13:43, 7 September 2011 (UTC)

:: We did a combined search (a Latin character word together with a Japanese character) with  alternating settings $wgSphinxSearchMWHighlighter = true; and $wgSphinxSearchMWHighlighter = false; (also with $wgAdvancedSearchHighlighting = true set), the result page would show results with both terms, but only the Latin character word would get highlighted.

::: I can replicate the issue, but the problem seems to be in sphinxsearch engine itself. When I let MW do the highlighting, I get garbled characters (probably because I tested with cyrillic characters in otherwise English wiki) and when I let sphinx do the highlighting, results look correct, but non-latin words are not highlighted.  Highlighting code is the same as before (again, extension is not really doing any highlighting, it just gives the text and the terms to sphinxsearch engine.) Probably something related to utf8 encoding. [[User:Svemir Brkic|Svemir Brkic]] 03:11, 9 September 2011 (UTC)


[1] http://www.mediawiki.org/wiki/Extension_talk:SphinxSearch#SphinxMWSearch_0.8.3B_Highlighting_of_non-Latin_characters_doesn.27t_work
Comment 1 Svemir Brkic 2011-09-10 11:47:35 UTC
MW internal search works better if I set both of these to true:

$wgSphinxSearchMWHighlighter = true;
$wgAdvancedSearchHighlighting = true;

If it works for you this way, I am tempted to stop trying to reimplement the highlighter based on sphinx buildExcerpts and let MW do it - since it already knows how to handle wiki markup etc.
Comment 2 MWJames 2011-09-10 14:51:31 UTC
Parameters [1] are set but it still don't show up with highlighted text for non-Latin characters. 

In general, you should not try to implement a workaround since MW should handle this situation out of the box.

People might or might not ask about this, but it would be a good idea to talk to one of the responsible MW developers or sphinx developers on what reasons it could be that this isn't working as supposed to.

As far I could tell from the available documentation, it is not really clear who is responsible for the MW search interface and this makes it a bit difficult to ask for specific assistance.

Anyhow you might want to consider and implement a debug function, so that in those unclear cases one has a chance to see what comes from sphinx and what goes into MW in order to pinpoint any malfunctions or misconfiguration.

[1] $wgSphinxSearchMWHighlighter = true; $wgAdvancedSearchHighlighting = true;
Comment 3 Svemir Brkic 2011-09-10 19:49:29 UTC
Fixed in r96735 - search terms were not split properly. $wgSphinxSearchMWHighlighter is not used anymore, but you should probably still set $wgAdvancedSearchHighlighting to true.
Comment 4 MWJames 2011-09-11 02:55:11 UTC
Created attachment 9050 [details]
Display of Japanese search result after r96768 changes

Even with the r96768 changes, one can see that in the result display, 34 hits for the term ビジネス, but only one term is highlighted in red.
Comment 5 MWJames 2011-09-11 03:07:21 UTC
Created attachment 9051 [details]
Compare highlighting after r96768 changes

As seen in this attachment, highlighting has changed dramatically between r96711 (before) and r96768(after). The results are the same but the highlighting are totally different.
Comment 6 Svemir Brkic 2011-09-11 03:20:32 UTC
(In reply to comment #4)
> Even with the r96768 changes, one can see that in the result display, 34 hits
> for the term ビジネス, but only one term is highlighted in red.

The result it does highlight is the one where term appears after non-Japanese text, which means it is probably caused by http://www.mediawiki.org/wiki/Manual:$wgSearchHighlightBoundaries - according to that page this var should be empty for CJK languages. Not sure how would that affect mixed wikis, though...
Comment 7 Svemir Brkic 2011-09-11 03:22:12 UTC
(In reply to comment #5)

Is r96768 (after) screenshot with $wgAdvancedSearchHighlighting = true?
Comment 8 MWJames 2011-09-11 03:33:42 UTC
In all test cases described here $wgAdvancedSearchHighlighting = true; is set in LocalSettings.php.
Comment 9 Svemir Brkic 2011-09-12 00:14:51 UTC
It turns out the advanced search highlighting does not work well when matches are found via Sphinx. It could be something else I am doing wrong, but here is what I did for now:

In r96821 I made MW highlighting optional via $wgSphinxSearchMWHighlighter again, and brought back a slightly improved version of sphinx-based highlighter.

Please try these two options:

1. $wgSphinxSearchMWHighlighter = false;

2. $wgSphinxSearchMWHighlighter = true;
    $wgAdvancedSearchHighlighting = false;
Comment 10 Svemir Brkic 2012-06-07 12:34:39 UTC
I am resolving since there was no feedback since Sep. 2011. If it still does not work, please reopen the issue.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links