Last modified: 2013-07-09 01:29:51 UTC
Created attachment 12781 [details] Screenshot of exact titled page in Odia - One Hello, On Odia Wikipedia, we have encountered pages with exact same name (same title). This makes difficulties to distinguish pages at odia Wikipedia. Any solution? Link to Wikipedia pages, below: http://or.wikipedia.org/s/5om and http://or.wikipedia.org/s/gj1 Also two article http://or.wikipedia.org/wiki/ବାଲେଶ୍ୱର_ଜିଲ୍ଲା and http://or.wikipedia.org/w/index.php?title=ବାଲେଶ୍ଵର_ଜିଲ୍ଲା&redirect=no with exact script, one redirected to another one! Is this anything to do with fonts? Do we need to redefine unicode font? :D Thanks.
Created attachment 12782 [details] Screenshot of exact titled page in Odia - Two
http://or.wikipedia.org/s/5om is for ସମୟ and has unicode code points: U+0B38 U+0B2E U+0B5F http://or.wikipedia.org/s/gj1 is ସମୟ U+0B38 U+0B2E U+200C U+0B5F As you can see both titles look same but differs in data with an extra U+200C U+200C is ZERO WIDTH NON-JOINER an invisible character having different functionality in different scripts. I am not sure whether 200C has valid usage in or. If this is unwanted, you need to consider it as a spelling mistake.
This request would probably turn into somehow blocking U+200C from being used in page names.
(In reply to comment #2) > http://or.wikipedia.org/s/5om is for ସମୟ and has unicode code points: > U+0B38 U+0B2E U+0B5F > > http://or.wikipedia.org/s/gj1 is ସମୟ > U+0B38 U+0B2E U+200C U+0B5F > > As you can see both titles look same but differs in data with an extra U+200C > > U+200C is ZERO WIDTH NON-JOINER an invisible character having different > functionality in different scripts. Thanks. > > I am not sure whether 200C has valid usage in or. If this is unwanted, you > need > to consider it as a spelling mistake. Then how can we know whether or not 200C has valid usage? I couldn't find any 200C in Odia Unicode chart. We can ignore if this is rare, so far 3/4 cases. We could wait and see if we find more such cases.
(In reply to comment #4) > (In reply to comment #2) > > http://or.wikipedia.org/s/5om is for ସମୟ and has unicode code points: > > U+0B38 U+0B2E U+0B5F > > > > http://or.wikipedia.org/s/gj1 is ସମୟ > > U+0B38 U+0B2E U+200C U+0B5F > > > > As you can see both titles look same but differs in data with an extra U+200C > > > > U+200C is ZERO WIDTH NON-JOINER an invisible character having different > > functionality in different scripts. > > Thanks. > > > > > I am not sure whether 200C has valid usage in or. If this is unwanted, you > > need > > to consider it as a spelling mistake. > > Then how can we know whether or not 200C has valid usage? I couldn't find any > 200C in Odia Unicode chart. We can ignore if this is rare, so far 3/4 cases. > We > could wait and see if we find more such cases. I guess U+200C would be required. When I type s+m+Y it resulted ସମ୍ୟ whereas s+m+_ (Shift dash "-")+ Y it resulted ସମୟ using typing tool Lekhani. In the latter case Shift - ("_") produces U+200C. Is there any other way to avoid this problem instead of blocking this as I feel for some spellings it would be needed.