Last modified: 2013-12-17 10:16:50 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T60093, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 58093 - Cyrillic letter Д д (Д д) could match A
Cyrillic letter Д д (Д д) could match A
Status: UNCONFIRMED
Product: MediaWiki extensions
Classification: Unclassified
AntiSpoof (Other open bugs)
master
PC Windows 7
: Lowest enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-12-06 09:36 UTC by Kanegasi
Modified: 2013-12-17 10:16 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Kanegasi 2013-12-06 09:36:18 UTC
The Cyrillic letter Д д (Д д) is missing. The AbuseFilter extension uses your equivalent set for its normalizing and I'm trying to filter some Russian bad words, particularly "Пиздец", and equivset doesn't contain this character.
Comment 1 Andre Klapper 2013-12-06 10:15:12 UTC
Confirming that in 
  maintenance/equivset.in 
  maintenance/equivset.txt 
  equivset.php
there is indeed no 414 Д entry, nor 434 д; and some other Cyrillic letters like Ш or Ц are not there either.

However, the extension description says "It blocks the creation of accounts with mixed-script, confusing and similar usernames" and I am not aware of Д being similar to another letter in another script.

Could you clarify which other letter is similar to Д ?
Comment 2 Kanegasi 2013-12-06 10:44:12 UTC
Seeing as the word "Говно" could be normalized into "R0BH0", I'd imagine "Д" would be listed under the "A" list.

This also brings up the question of actually trying to use the character in terms of the "norm" and "ccnorm" functions of AbuseFilter. If a character wasn't in the list, like this case, would it attempt to match it to the actual character or throw an error since it's not in the list? If I were to use a normalized rule for the word I first mentioned, "Пиздец", I would end up with "ΠИ3дEU". Would this rule still match the word?

Yet another question from what I wrote above, but more of an unrelated curiosity. Why doesn't И or и match N?
Comment 3 Marcin Cieślak 2013-12-06 10:58:38 UTC
The question is whether faux cyrillic is considered similar, same for R and Я, N and И and so on.... I think the original intention was to match almost identical letters, but...
Comment 4 Andre Klapper 2013-12-17 10:16:50 UTC
So it looks like this request would broaden scope if Д is considered similar to A and N and И.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links