Last modified: 2013-07-09 01:29:51 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T52936, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 50936 - Block using U+200C for pagenames in Odia
Block using U+200C for pagenames in Odia
Status: NEW
Product: MediaWiki
Classification: Unclassified
Page editing (Other open bugs)
1.22.0
All All
: Low enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
http://or.wikipedia.org
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-07-08 12:40 UTC by ansuman
Modified: 2013-07-09 01:29 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Screenshot of exact titled page in Odia - One (56.93 KB, image/png)
2013-07-08 12:40 UTC, ansuman
Details
Screenshot of exact titled page in Odia - Two (49.59 KB, image/png)
2013-07-08 12:43 UTC, ansuman
Details

Description ansuman 2013-07-08 12:40:35 UTC
Created attachment 12781 [details]
Screenshot of exact titled page in Odia - One

Hello, 

On Odia Wikipedia, we have encountered pages with exact same name (same title).

This makes difficulties to distinguish pages at odia Wikipedia. Any solution?

Link to Wikipedia pages, below:

http://or.wikipedia.org/s/5om and http://or.wikipedia.org/s/gj1


Also two article http://or.wikipedia.org/wiki/ବାଲେଶ୍ୱର_ଜିଲ୍ଲା and http://or.wikipedia.org/w/index.php?title=ବାଲେଶ୍ଵର_ଜିଲ୍ଲା&redirect=no with exact script, one redirected to another one!

Is this anything to do with fonts? Do we need to redefine unicode font? :D

Thanks.
Comment 1 ansuman 2013-07-08 12:43:37 UTC
Created attachment 12782 [details]
Screenshot of exact titled page in Odia - Two
Comment 2 Santhosh Thottingal 2013-07-08 13:09:40 UTC
http://or.wikipedia.org/s/5om is for ସମୟ and has unicode code points:
U+0B38 U+0B2E U+0B5F 

http://or.wikipedia.org/s/gj1 is ସମ‌ୟ
U+0B38 U+0B2E U+200C U+0B5F

As you can see both titles look same but differs in data with an extra U+200C

U+200C is ZERO WIDTH NON-JOINER an invisible character having different functionality in different scripts.

I am not sure whether 200C has valid usage in or. If this is unwanted, you need to consider it as a spelling mistake.
Comment 3 Andre Klapper 2013-07-08 13:20:34 UTC
This request would probably turn into somehow blocking U+200C from being used in page names.
Comment 4 ansuman 2013-07-08 16:51:06 UTC
(In reply to comment #2)
> http://or.wikipedia.org/s/5om is for ସମୟ and has unicode code points:
> U+0B38 U+0B2E U+0B5F 
> 
> http://or.wikipedia.org/s/gj1 is ସମ‌ୟ
> U+0B38 U+0B2E U+200C U+0B5F
> 
> As you can see both titles look same but differs in data with an extra U+200C
> 
> U+200C is ZERO WIDTH NON-JOINER an invisible character having different
> functionality in different scripts.

Thanks.

> 
> I am not sure whether 200C has valid usage in or. If this is unwanted, you
> need
> to consider it as a spelling mistake.

Then how can we know whether or not 200C has valid usage? I couldn't find any 200C in Odia Unicode chart. We can ignore if this is rare, so far 3/4 cases. We could wait and see if we find more such cases.
Comment 5 Subhashish Panigrahi 2013-07-09 01:29:51 UTC
(In reply to comment #4)
> (In reply to comment #2)
> > http://or.wikipedia.org/s/5om is for ସମୟ and has unicode code points:
> > U+0B38 U+0B2E U+0B5F 
> > 
> > http://or.wikipedia.org/s/gj1 is ସମ‌ୟ
> > U+0B38 U+0B2E U+200C U+0B5F
> > 
> > As you can see both titles look same but differs in data with an extra U+200C
> > 
> > U+200C is ZERO WIDTH NON-JOINER an invisible character having different
> > functionality in different scripts.
> 
> Thanks.
> 
> > 
> > I am not sure whether 200C has valid usage in or. If this is unwanted, you
> > need
> > to consider it as a spelling mistake.
> 
> Then how can we know whether or not 200C has valid usage? I couldn't find any
> 200C in Odia Unicode chart. We can ignore if this is rare, so far 3/4 cases.
> We
> could wait and see if we find more such cases.
I guess U+200C would be required. When I type s+m+Y it resulted ସମ୍ୟ whereas s+m+_ (Shift dash "-")+ Y it resulted ସମୟ using typing tool Lekhani. In the latter case Shift - ("_") produces U+200C. Is there any other way to avoid this problem instead of blocking this as I feel for some spellings it would be needed.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links