Last modified: 2013-10-02 13:22:50 UTC
If I go to <https://meta.wikimedia.org/wiki/Special:CentralNoticeBanners> and enter the text "rory" in the filter field, I get no results. If I enter the text "Rory" in the filter field, I get one result: [x] PrivacyPolicyDiscussion_Rory1 (Preview on-wiki) Search should be case insensitive.
Fascinating. The prototype SQL schema and the one on metawiki differ! We will have to run an alter on the table to get it to be case insensitive. Metawiki cn_templates.tmp_name is varbinary(255) CentralNotice.sql cn_templates.tmp_name is varchar(255) So the alter would be: ALTER TABLE cn_templates MODIFY COLUMN tmp_name varchar(255); Wonder if I can just do this tomorrow -- I'll ask Asher.
The cn_templates table is small, so the above ALTER would be OK to just run on metawiki master. Thank you for asking first :-) Although, I notice that other metawiki cn_% tables with VARCHAR fields in the schema show as VARBINARY too. Does this deserve a more thorough investigation?
(In reply to comment #2) > Does this deserve a more thorough investigation? So, in doing a more thorough look through I noticed that in fact: * All table rows in CentralNotice are binary types * The default character set and collation for all the wikis that I checked is binary ** As CN does not specify a table charset/collation we take the databases which is binary -- which is why all our columns are binary. The answer to the historical question of "Why are all our tables by default binary charset/collation instead of utf8?" is apparently: (05:00:55 PM) ori-l: mysql's 'utf8' wasn't real utf8 until a fairly recent version (5.1 iirc) (05:00:55 PM) ori-l: it was restricted to the basic multilingual plane (05:01:24 PM) ori-l: which is a problem if you want to support certain languages (05:01:31 PM) ori-l: and we do -- so -- I'm going to send an email to the wikitech list about how to resolve this problem.
(In reply to comment #3) > I'm going to send an email to the wikitech list about how to resolve this > problem. For posterity -- http://www.gossamer-threads.com/lists/wiki/wikitech/394239
Any alterations to the database schema should be wrapped in a maintenance script and/or committed to a Git repo as an SQL patch. Other people presumably use this extension. :-)
Well my understanding is the listed ALTER was actually to bring the schema back inline with the schema already committed to the repo. Not an SQL patch per se; the production schema is simply wrong. But I agree it should become a maintenance script if the problem is more widespread or indicative of a design flaw with regard to character sets. Switching wholesale to utf8 has many implications.