Last modified: 2014-09-23 20:01:53 UTC
Currently only some characters are normalized to a "canonical" form. For example, although ccnorm("α") results in "A", ccnorm("ά") doesn't change anything. The function should support the conversion of more characters. The following list is based on what is currently available at [[MediaWiki:Titleblacklist]], but maybe it is better to have different sets of characters depending on the case of the letter. For example, ⅅ for D, but ⅆ for d. ---- a: aαąăãàāάạậảấầẩắằẵẳẫặḁǟǡȁᾳὰᾀἁᾁἄᾄἂᾂἆᾆἅᾅἃᾃἇᾇáâäæåǻ٩4 b: bßβбв฿ c: cċĉ¢сćĉçč d: dďḍðⅆ e: éèëeęěĕėẻẹếềễểȨȩḝēḗȅȇệḙḛ3عڠẽə f: fғ₣ g: gĝģğġɠǥǧǵḡԌ h: hήĥħȞʰʱḣḥḧḩḫнңӈӉηἠἡἢἣἤἥἦἧὴᾐћⱧԋњһ i: iìíîïĩļǐīĭḷŀιїɨ!łľį k: kķкќқҝҡҟӄ l: l₤ĺľḷłŀλлљ m: mɯḿṁṃмӍμ₥ n: n₦ńñņňṇν o: oóòôöõǒōŏǫőœøəόοωὸὀὁὄὂὅὃоөӧӫδσʘǿọ p: pƥṕṗǷ₧þρр q: qɊʠ r: rŕŗřȑȓƦʳʴʵʶṙṛṝṟя® s: s$śŝşšṣσѕ t: tţťṭτтŧ u: uúùûüũůǔūǖǘǚǜŭųű w: wŵẁẃẅẇẉ₩ x: xҳχ y: yýÿŷƴȲʸẏỳỵỷỹʊύυϋὑὓὕὗὺῠῡуϓ z: zźžż ----
Created attachment 7800 [details] Proposed patch I've attached a proposed patch that would add the characters to the AntiSpoof checks (which are also used by the AbuseFilter).
Changed extension to AntiSpoof, since that's where the change would have to be made (unless AbuseFilter was fixed by an independent re-implementation of the normalization, which seems pointless).
Are they the same you added[1] in [2]? I synchronized the svn version from the list at mediawiki.org at r76484 1- http://www.mediawiki.org/w/index.php?title=Extension%3AAntiSpoof%2FEquivalence_sets%2Fequivset_1&action=historysubmit&diff=361648&oldid=251667 2- http://www.mediawiki.org/wiki/Extension:AntiSpoof/Equivalence_sets
Yes, they're the same ones that I added to mediawiki.org in the edits you linked.
They were committed in r76484, then.
Okay, thanks.
The function still doesn't works with all characters mentioned in comment 0 above. Using ccnorm in the string "ìíîïĩļǐīĭḷĿї!ľį₤ĺľḷĿΛЛљóòôöõǒōŏǫőόὸὀὁὄὂὅὃọ$śŝşšṣσ" doesn't change any of its characters.
EdoDodo, does your patch still apply? I recommend that you get a developer access account https://www.mediawiki.org/wiki/Developer_access so that you can commit your patches directly into the source control system in the future -- in fact, you could update and submit this patch, and get it reviewed faster. I'm sorry for the delay.
(In reply to comment #8) > EdoDodo, does your patch still apply? > This was already applied, see comment 5.
(In reply to comment #7) > The function still doesn't works with all characters mentioned in comment 0 > above. > > Using ccnorm in the string > "ìíîïĩļǐīĭḷĿї!ľį₤ĺľḷĿΛЛљóòôöõǒōŏǫőόὸὀὁὄὂὅὃọ$śŝşšṣσ" > doesn't change any of its characters. Still reproducible.
It looks like all of the equivalents were added except for the ones corresponding to the letters I, L, O, and S. Of course this makes sense since those 4 letters have never worked in AntiSpoof due to bug 27987. I fixed bug 27987 in change I613f9917, so I'll do a follow-up commit to add the missing equivs.
Change 92154 had a related patch set uploaded by Kaldari: Adding missing equivalents for I, L, O, and S. https://gerrit.wikimedia.org/r/92154
I added all the missing equivalencies, except for 4 or 5 that either didn't make sense or would have conflicted with valid equivalencies for Greek. For example: λ->L л->L љ->L σ->S
Change 97304 had a related patch set uploaded by Kaldari: Adding 2 new equivalencies (partial fix for bug 25619) https://gerrit.wikimedia.org/r/97304
Since I haven't had any luck getting code review on https://gerrit.wikimedia.org/r/92154 I submitted https://gerrit.wikimedia.org/r/97304 as a simpler version. It only adds ! and $ and nothing else.
Both patches are still open. The first one got some reviews and now it looks like is waiting for a new upload from Kaldari. The second one with the simpler version got no reviews at all.
(In reply to Ryan Kaldari from comment #15) > Since I haven't had any luck getting code review on > https://gerrit.wikimedia.org/r/92154 I submitted > https://gerrit.wikimedia.org/r/97304 as a simpler version. It only adds ! > and $ and nothing else. I'm not sure whether sending a request to wikitech-l could help getting any reviews to these two patches, but pinging at the patches and here doesn't seem to be enough... Any ideas?