Last modified: 2014-03-05 16:42:54 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T59832, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 57832 - Fails to find same word with an apostrophe before (French usage)
Fails to find same word with an apostrophe before (French usage)
Status: NEW
Product: MediaWiki extensions
Classification: Unclassified
CirrusSearch (Other open bugs)
unspecified
All All
: Normal normal (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-12-02 11:59 UTC by Akeron
Modified: 2014-03-05 16:42 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Akeron 2013-12-02 11:59:44 UTC
When I search a word, the search engine fail to find the same word when it has an apostrophe before, so a search for "apostrophe" doesn't find "L'apostrophe" occurrence.
In French the apostrophe is not part of the word, its a contraction for "La apostrophe".

For example :
https://en.wikipedia.org/w/index.php?title=Special%3ASearch&profile=default&search=Arc-en-Ciel&fulltext=Search
 "Arc-en-Ciel" doesn't match any "L'Arc-en-Ciel" (with L') directly.

Compare with a search "L'Arc-en-Ciel"
https://en.wikipedia.org/w/index.php?search=L%27Arc-en-Ciel&title=Special%3ASearch&fulltext=1

In French, like in English, apostrophe should be not indexed as part of the word. 

Note : its the same bug than https://bugzilla.wikimedia.org/show_bug.cgi?id=9598 (old)
See also a different apostrophe usage in Ukrainian https://bugzilla.wikimedia.org/show_bug.cgi?id=21002
Comment 1 Dan Garry 2014-02-11 01:52:00 UTC
This problem still exists in CirrusSearch. Migrating bug to correct queue.
Comment 2 Nik Everett 2014-03-05 16:42:54 UTC
The problem here is that the language rules are customized for the wiki's language.  Elision is handled in French but not English.


I wonder how much harm it would be to just add it to English (and maybe other languages) as well.  Here are the term prefixes that would be removed:
l'
m'
t'
qu'
n'
s'
j'
d'
c'
jusqu'
quoiqu'
lorsqu'
puisqu'

We wouldn't add it to the plain analyzer so if you search for "l'avion" then "l'avion" will be worth more then "avion".

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links