Last modified: 2010-05-15 15:59:51 UTC
Created attachment 5727 [details] Patch against languages/Language.php to let it return results for utf-8 terms Before being entered into the searchindex table, utf-8 encoded strings are converted to a special notation: eg. dämon becomes du8c3a4mon; the search form does the same transform, but with an uppercase U8 escape - so the search fails in mysql. Attached patch lets utf-8 search terms return results.
The form returns results just fine for me... SearchUpdate::doUpdate() takes the output of Language::stripForSearch() and does further processing to strip markup etc. This includes running it through strtolower() to make it entirely lowercase. This extra lowercasing is *not* done by Special:Search, which produces the discrepancy you noted -- only the input data is being lowercased. Searching "ééé FUNKY" hits this query: SQL: SELECT /* WikiSysop */ page_id, page_namespace, page_title FROM `page`,`searchindex` WHERE page_id=si_page AND MATCH(si_title) AGAINST('+U8c3a9U8c3a9U8c3a9 +funky' IN BOOLEAN MODE) AND page_is_redirect=0 AND page_namespace IN ('0') LIMIT 20 However the backend search engine is case-insensitive so it shouldn't make a difference. :) Worth going ahead and fixing though, just in case. Applied on trunk (for 1.15) in r46629
Thank you Brion, I guess that should not break other peoples installations. I moved a mediawiki between servers and doctored the mysqldump to have the new location store mediawiki utf-8 strings in "utf8-...-ci" columns (was iso...). on import some rows would produce double key errors, so I made everything with a charset "utf-bin" instead. except the search form the wiki works fine so far.