Last modified: 2014-03-08 16:06:42 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T48531, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 46531 - Inconsistent normalization of Æ and æ
Inconsistent normalization of Æ and æ
Status: NEW
Product: MediaWiki extensions
Classification: Unclassified
AntiSpoof (Other open bugs)
master
All All
: Low normal (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-03-25 09:52 UTC by Antoine "hashar" Musso (WMF)
Modified: 2014-03-08 16:06 UTC (History)
1 user (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Antoine "hashar" Musso (WMF) 2013-03-25 09:52:44 UTC
The 'Æ' is a letter and is currently normalized differently. The upper case form is normalized to 'AE', the lower case to 'A'.
Comment 1 Antoine "hashar" Musso (WMF) 2013-03-25 10:09:18 UTC
The root cause is the equivalence file on our wiki: http://www.mediawiki.org/wiki/AntiSpoof/Equivalence_sets which is then copied under maintenance/equivset.in.

The file list uses the format:
 <hexadecimal codepoint> <character> => [<hexadecimal codepoint>] <character>

The relevant part:

E6 æ => C6 Æ
E6 æ => 41 A
4D4 Ӕ => C6 Æ
4D5 ӕ => C6 Æ

Running maintenance/generateEquivset.php generates a PHP array of the list which uses the character for key.  The codepoint E6 has two entries, I guess only the second one is taken in account.
Comment 2 Antoine "hashar" Musso (WMF) 2013-03-25 10:11:06 UTC
I have removed the always failing test with https://gerrit.wikimedia.org/r/#/c/55553/

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links