Last modified: 2014-10-08 13:03:29 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T72899, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 70899 - Search box needs some normalization for Arabic Family languages


Summary:	Search box needs some normalization for Arabic Family languages

Status:	NEW

Product:	MediaWiki
Classification:	Unclassified
Component:	Search (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	Normal enhancement (vote)
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:
Whiteboard:
Keywords:	i18n

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2014-09-16 19:17 UTC by reza1615
Modified:	2014-10-08 13:03 UTC (History)
CC List:	6 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description reza1615 2014-09-16 19:17:36 UTC

We have some langues such as Arabic, Persian, Urdu, Kurdish,... which uses common characters and they have similar geliphs with different Unicode number for example:
for ک (Kaf)
ك  Arabic U+0643
ڪ Urdu U+06AA
ﻙ Pushtu U+FED9
ﻚ Uyghur U+FEDA
ک Persian U+06A9
for ی (ya)
ی Persian U+06CC
ي  Arabic U+064A
ى Urdu U+0649
ۍ Pushtu U+06CD
ې Uyghur U+06D0
for ه (heh)
ہ Pushtu U+06C1
ە Kurdish U+06D5
ه Persian U+0647
we have these characters which have different Unicode number and different keyboard.
Now many users does not access to Persian keyboard or urdu keyboard by default in their OS (like windows xp, android (low versions), IOS ,...). so when they search for an article they can not find it in wikipedia searach box but it is existing in local characters.

For example if you search at fa.wikipedia for article ويليام شكسپير (characters are in Arabic ي , ك) you can not find it and the article in Farsi is ویلیام شکسپیر (characters are in Persian ی , ک).

for farsi please add a possibility for search tool to assume
 U+064A or U+0649 or  U+06CD or U+06D0 or U+06CC >  U+06CC
 U+0643 or U+06AA or U+FED9 or U+FEDA > U+06A9
 U+06C1 or U+06D5 > U+0647

Comment 1 Calak 2014-09-16 19:27:29 UTC

Yes, we have a same problem on ckb wikipedia. It can be useful.

Comment 2 reza1615 2014-09-16 19:41:14 UTC

may be for fa.wikipedia or ckb.wikipedia we needs some normalization like 

https://github.com/wikimedia/mediawiki-core/blob/master/languages/classes/LanguageAr.php

Comment 3 reza1615 2014-09-16 19:47:59 UTC

and https://github.com/wikimedia/mediawiki-core/blob/master/maintenance/language/generateNormalizerDataAr.php

Comment 4 Andre Klapper 2014-10-08 11:47:46 UTC

Is this request about CirrusSearch or about LuceneSearch (deprecated)?

Comment 5 reza1615 2014-10-08 13:03:29 UTC

(In reply to Andre Klapper from comment #4)
> Is this request about CirrusSearch or about LuceneSearch (deprecated)?
We need normalization for search box which is placed on the top pages.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links