Last modified: 2011-09-09 18:32:34 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T31371, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 29371 - jQuery.suggestions.js highlighting of UTF-8 characters like "äüöß" does not work if such a non-ASCII is first character
jQuery.suggestions.js highlighting of UTF-8 characters like "äüöß" does not w...
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
JavaScript (Other open bugs)
1.20.x
All All
: Normal normal (vote)
: ---
Assigned To: T. Gries
:
Depends on: 29368
Blocks:
  Show dependency treegraph
 
Reported: 2011-06-13 12:02 UTC by T. Gries
Modified: 2011-09-09 18:32 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Screenshot showing highlight vs non-highlight (9.04 KB, image/png)
2011-06-13 17:16 UTC, Brion Vibber
Details

Description T. Gries 2011-06-13 12:02:18 UTC
If you have a page called "Österreich" the "Ö" or in general "Öst"+ are not highlighted in the suggesting search interface.

1. When the UTF-8 character is not the first character, highlights work.
2. Only results starting with a mere ASCII (non UTF-8) character are correctly highlighted.
2. If the UTF-8 characters are not in between other highlighted characters, they are not highlighted.

Page name - observation

Österreich - no highlight, when entering ö - s - t...
Niederösterreich - highlights ok

See also Extension:Vector which extends/users jquery.suggestions.js .
Comment 1 Brion Vibber 2011-06-13 17:16:57 UTC
Created attachment 8645 [details]
Screenshot showing highlight vs non-highlight

Screenshot showing the problem -- the matching chars should be highlighted in the entries in the drop-down, but where we start with a non-ascii char it doesn't match up.
Comment 2 Brion Vibber 2011-06-13 17:25:25 UTC
Looks like the actual highlighting is passed through from jquery.suggestions to jquery.autoEllipsis through to jquery.highlightText where a regex is used:

	// TODO - need to be smarter about the character matching here. 
	// non latin characters can make regex think a new word has begun. 
	// look for an occurence of our pattern and store the starting position
	var pos = node.data.search( new RegExp( "\\b" + $.escapeRE( pat ), "i" ) );

Looks like the \b (word break) gets confused at the 'Ö' despite being a legit word character. WTF? :(
Comment 3 Bawolff (Brian Wolff) 2011-06-13 19:44:59 UTC
My understanding is that in javascript regex's, \b only considers [a-zA-Z0-9_] (aka what \w matches) to be word characters, so Ö is technically not part of a word character, thus is really is a word boundary. See http://bclary.com/2004/11/07/#a-15.10.2.6
Comment 4 T. Gries 2011-06-13 21:51:09 UTC
(In reply to comment #3)
> My understanding is that in javascript regex's, \b only considers [a-zA-Z0-9_]
> (aka what \w matches) to be word characters, so Ö is technically not part of a
> word character, thus is really is a word boundary. See
> http://bclary.com/2004/11/07/#a-15.10.2.6

PHP - but not Javascript - has multi-byte aware mb_ functions. I am pretty sure, you know this, do you?
Comment 5 Brion Vibber 2011-06-13 21:54:07 UTC
JavaScript strings are UTF-16, and hence inherently aware of Unicode.

In general though the substring matches here are kinda funky as well, as the suggestion engine might or might not actually be doing simple substring matches.
Comment 6 T. Gries 2011-06-13 21:57:17 UTC
(In reply to comment #3)
> My understanding is that in javascript regex's, \b only considers [a-zA-Z0-9_]
> (aka what \w matches) to be word characters, so Ö is technically not part of a
> word character, thus is really is a word boundary. See
> http://bclary.com/2004/11/07/#a-15.10.2.6

I also think that it has to do with \w definition
Comment 7 T. Gries 2011-06-14 21:43:06 UTC
fixed in r90092
Comment 9 Roan Kattouw 2011-09-09 11:44:06 UTC
Fix deployed. Typing "Öster" into the enwiki search box now works as expected.
Comment 10 T. Gries 2011-09-09 12:06:22 UTC
yes, in de.wikipedia.
but not yet in en.wikipedia (why?)

dewiki 1.17wmf1 (Version 96617)
enwiki 1.17wmf1 (r96617)
Comment 11 Roan Kattouw 2011-09-09 12:08:32 UTC
WFM per comment 9. Have you tried clearing your browser cache?
Comment 12 T. Gries 2011-09-09 12:19:20 UTC
iuiuiuuuiuiu, highlighting works (=> closing this bug now)


but search suggestions (for "Ös" in enwiki) are strange:

Ös

==>

Oslo
Osaka

...
...
Östereich (correctly highlighted)
...


I guess, this has to do with TitleKey and Transliteration ? Where can I find some documentataion about this ?
Comment 13 Roan Kattouw 2011-09-09 18:32:34 UTC
(In reply to comment #12)
> I guess, this has to do with TitleKey and Transliteration ? Where can I find
> some documentataion about this ?
I would guess this is TitleKey's doing, yeah. Not sure where you'd find docs other than in TitleKey.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links