Last modified: 2012-06-07 12:34:39 UTC
Conversion from [1] While testing the new interface we encountered a problem with non-Latin characters (we tested Japanese and Chinese) and the highlighting of the search term on the output page. In earlier versions the search term was highlighted in the result output (Latin as well as non-Latin characters), now in the standard MW search results are displayed but without highlighting the particular search term (it does work with Latin characters). --[[User:MWJames|MWJames]] 07:03, 7 September 2011 (UTC) : What is the value of $wgSphinxSearchMWHighlighter in your case? [[User:Svemir Brkic|Svemir Brkic]] 13:43, 7 September 2011 (UTC) :: We did a combined search (a Latin character word together with a Japanese character) with alternating settings $wgSphinxSearchMWHighlighter = true; and $wgSphinxSearchMWHighlighter = false; (also with $wgAdvancedSearchHighlighting = true set), the result page would show results with both terms, but only the Latin character word would get highlighted. ::: I can replicate the issue, but the problem seems to be in sphinxsearch engine itself. When I let MW do the highlighting, I get garbled characters (probably because I tested with cyrillic characters in otherwise English wiki) and when I let sphinx do the highlighting, results look correct, but non-latin words are not highlighted. Highlighting code is the same as before (again, extension is not really doing any highlighting, it just gives the text and the terms to sphinxsearch engine.) Probably something related to utf8 encoding. [[User:Svemir Brkic|Svemir Brkic]] 03:11, 9 September 2011 (UTC) [1] http://www.mediawiki.org/wiki/Extension_talk:SphinxSearch#SphinxMWSearch_0.8.3B_Highlighting_of_non-Latin_characters_doesn.27t_work
MW internal search works better if I set both of these to true: $wgSphinxSearchMWHighlighter = true; $wgAdvancedSearchHighlighting = true; If it works for you this way, I am tempted to stop trying to reimplement the highlighter based on sphinx buildExcerpts and let MW do it - since it already knows how to handle wiki markup etc.
Parameters [1] are set but it still don't show up with highlighted text for non-Latin characters. In general, you should not try to implement a workaround since MW should handle this situation out of the box. People might or might not ask about this, but it would be a good idea to talk to one of the responsible MW developers or sphinx developers on what reasons it could be that this isn't working as supposed to. As far I could tell from the available documentation, it is not really clear who is responsible for the MW search interface and this makes it a bit difficult to ask for specific assistance. Anyhow you might want to consider and implement a debug function, so that in those unclear cases one has a chance to see what comes from sphinx and what goes into MW in order to pinpoint any malfunctions or misconfiguration. [1] $wgSphinxSearchMWHighlighter = true; $wgAdvancedSearchHighlighting = true;
Fixed in r96735 - search terms were not split properly. $wgSphinxSearchMWHighlighter is not used anymore, but you should probably still set $wgAdvancedSearchHighlighting to true.
Created attachment 9050 [details] Display of Japanese search result after r96768 changes Even with the r96768 changes, one can see that in the result display, 34 hits for the term ビジネス, but only one term is highlighted in red.
Created attachment 9051 [details] Compare highlighting after r96768 changes As seen in this attachment, highlighting has changed dramatically between r96711 (before) and r96768(after). The results are the same but the highlighting are totally different.
(In reply to comment #4) > Even with the r96768 changes, one can see that in the result display, 34 hits > for the term ビジネス, but only one term is highlighted in red. The result it does highlight is the one where term appears after non-Japanese text, which means it is probably caused by http://www.mediawiki.org/wiki/Manual:$wgSearchHighlightBoundaries - according to that page this var should be empty for CJK languages. Not sure how would that affect mixed wikis, though...
(In reply to comment #5) Is r96768 (after) screenshot with $wgAdvancedSearchHighlighting = true?
In all test cases described here $wgAdvancedSearchHighlighting = true; is set in LocalSettings.php.
It turns out the advanced search highlighting does not work well when matches are found via Sphinx. It could be something else I am doing wrong, but here is what I did for now: In r96821 I made MW highlighting optional via $wgSphinxSearchMWHighlighter again, and brought back a slightly improved version of sphinx-based highlighter. Please try these two options: 1. $wgSphinxSearchMWHighlighter = false; 2. $wgSphinxSearchMWHighlighter = true; $wgAdvancedSearchHighlighting = false;
I am resolving since there was no feedback since Sep. 2011. If it still does not work, please reopen the issue.