Last modified: 2013-12-17 10:16:50 UTC
The Cyrillic letter Д д (Д д) is missing. The AbuseFilter extension uses your equivalent set for its normalizing and I'm trying to filter some Russian bad words, particularly "Пиздец", and equivset doesn't contain this character.
Confirming that in maintenance/equivset.in maintenance/equivset.txt equivset.php there is indeed no 414 Д entry, nor 434 д; and some other Cyrillic letters like Ш or Ц are not there either. However, the extension description says "It blocks the creation of accounts with mixed-script, confusing and similar usernames" and I am not aware of Д being similar to another letter in another script. Could you clarify which other letter is similar to Д ?
Seeing as the word "Говно" could be normalized into "R0BH0", I'd imagine "Д" would be listed under the "A" list. This also brings up the question of actually trying to use the character in terms of the "norm" and "ccnorm" functions of AbuseFilter. If a character wasn't in the list, like this case, would it attempt to match it to the actual character or throw an error since it's not in the list? If I were to use a normalized rule for the word I first mentioned, "Пиздец", I would end up with "ΠИ3дEU". Would this rule still match the word? Yet another question from what I wrote above, but more of an unrelated curiosity. Why doesn't И or и match N?
The question is whether faux cyrillic is considered similar, same for R and Я, N and И and so on.... I think the original intention was to match almost identical letters, but...
So it looks like this request would broaden scope if Д is considered similar to A and N and И.