Last modified: 2014-10-08 13:03:29 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T72899, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 70899 - Search box needs some normalization for Arabic Family languages
Search box needs some normalization for Arabic Family languages
Status: NEW
Product: MediaWiki
Classification: Unclassified
Search (Other open bugs)
unspecified
All All
: Normal enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
: i18n
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-09-16 19:17 UTC by reza1615
Modified: 2014-10-08 13:03 UTC (History)
6 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description reza1615 2014-09-16 19:17:36 UTC
We have some langues such as Arabic, Persian, Urdu, Kurdish,... which uses common characters and they have similar geliphs with different Unicode number for example:
for ک (Kaf)
ك  Arabic U+0643
ڪ Urdu U+06AA
ﻙ Pushtu U+FED9
ﻚ Uyghur U+FEDA
ک Persian U+06A9
for ی (ya)
ی Persian U+06CC
ي  Arabic U+064A
ى Urdu U+0649
ۍ Pushtu U+06CD
ې Uyghur U+06D0
for ه (heh)
ہ Pushtu U+06C1
ە Kurdish U+06D5
ه Persian U+0647
we have these characters which have different Unicode number and different keyboard.
Now many users does not access to Persian keyboard or urdu keyboard by default in their OS (like windows xp, android (low versions), IOS ,...). so when they search for an article they can not find it in wikipedia searach box but it is existing in local characters.

For example if you search at fa.wikipedia for article ويليام شكسپير (characters are in Arabic ي , ك) you can not find it and the article in Farsi is ویلیام شکسپیر (characters are in Persian ی , ک).

for farsi please add a possibility for search tool to assume
 U+064A or U+0649 or  U+06CD or U+06D0 or U+06CC >  U+06CC
 U+0643 or U+06AA or U+FED9 or U+FEDA > U+06A9
 U+06C1 or U+06D5 > U+0647
Comment 1 Calak 2014-09-16 19:27:29 UTC
Yes, we have a same problem on ckb wikipedia. It can be useful.
Comment 2 reza1615 2014-09-16 19:41:14 UTC
may be for fa.wikipedia or ckb.wikipedia we needs some normalization like 

https://github.com/wikimedia/mediawiki-core/blob/master/languages/classes/LanguageAr.php
Comment 4 Andre Klapper 2014-10-08 11:47:46 UTC
Is this request about CirrusSearch or about LuceneSearch (deprecated)?
Comment 5 reza1615 2014-10-08 13:03:29 UTC
(In reply to Andre Klapper from comment #4)
> Is this request about CirrusSearch or about LuceneSearch (deprecated)?
We need normalization for search box which is placed on the top pages.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links