Last modified: 2014-02-09 11:55:15 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T46350, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 44350 - No search results at all when searching in Javanese script on jv.wikipedia
No search results at all when searching in Javanese script on jv.wikipedia
Status: RESOLVED WONTFIX
Product: Wikimedia
Classification: Unclassified
lucene-search-2 (Other open bugs)
wmf-deployment
All All
: High major with 1 vote (vote)
: ---
Assigned To: Nobody - You can work on this!
https://jv.wikipedia.org/w/index.php?...
: i18n
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-01-25 18:00 UTC by bennylin
Modified: 2014-02-09 11:55 UTC (History)
8 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description bennylin 2013-01-25 18:00:26 UTC
Note: To display the font correctly, visit http://jv.wikipedia.org/wiki/Pitulung:Aksara_Jawa#English

I can't search using Javanese alphabet/script in sites like Javanese Wikipedia or Wiktionary. The word I'm using in this example are: ꦱꦸꦒꦼꦁ (transliterated: "sugeng" or "sugêng"), ꦱꦸꦒꦼꦁꦮꦂꦱꦲꦺꦁꦒꦭ꧀ (transliterated: "sugeng warsa enggal" or "sugêng warsa enggal" without spaces)

For example, in jv.wikt there're http://jv.wiktionary.org/wiki/sugêng_warsa_enggal and it's script form http://jv.wiktionary.org/wiki/ꦱꦸꦒꦼꦁꦮꦂꦱꦲꦺꦁꦒꦭ꧀

I tried to search the "ꦱꦸꦒꦼꦁ" and "ꦱꦸꦒꦼꦁꦮꦂꦱꦲꦺꦁꦒꦭ꧀", but returns zero result (other than title match for the second search term) 
* http://jv.wiktionary.org/w/index.php?title=Astamiwa:Pencarian&search=ꦱꦸꦒꦼꦁ&fulltext=1
* http://jv.wiktionary.org/w/index.php?title=Astamiwa:Pencarian&search=ꦱꦸꦒꦼꦁꦮꦂꦱꦲꦺꦁꦒꦭ꧀&fulltext=1

Expected result: returns pages that contains the terms, i.e. [[sugêng]], [[sugêng warsa enggal]]

Note: Javanese script is a Scriptio continua script. I don't know if that affects the Lucene search or not (http://en.wikipedia.org/wiki/Scriptio_continua)

Another example in Wikipedia: http://jv.wikipedia.org/wiki/ꦠꦺꦃ Trying to search the title ("ꦠꦺꦃ" - "tèh​") or any word in the content will return zero result.
Comment 1 Andre Klapper 2013-01-25 18:30:57 UTC
Confirming.

I am logged in, I go to http://jv.wikipedia.org/wiki/Kaca_Utama and enter
ꦠꦺꦃ
in the Search field ("Golèk") and click the dropdown ("ngisi") that pops up.

I get zero results:
"Wonten kaca kanthi nama "ꦠꦺꦃ" ing wiki punika"
however 
http://jv.wikipedia.org/wiki/%EA%A6%A0%EA%A6%BA%EA%A6%83 does exist.


Maybe this requires fixing bug 39381 and bug 43359 first, but I'm likely wrong.


As the summary for this component says "For issues with settings of the deployed version on Wikimedia servers see "Wikimedia → lucene-search-2" I am moving this report.
Comment 2 Andre Klapper 2013-04-11 13:11:46 UTC
The outcome of this problem has some similarities to bug 43663, hence CC'ing Ram who is investigating that other bug report too.
Comment 3 Andre Klapper 2013-06-25 13:08:57 UTC
Nowadays the output is always
    Kesalahan terjadi saat mencari: The search backend returned an error: 
when searching for a string.

CC'ing Nik on this.
Comment 4 Siebrand Mazeland 2013-10-01 12:07:25 UTC
Updated URL. The error (presumably with the ElasticSearch backend) is now (no error details given):

An error has occurred while searching: The search backend returned an error:
Comment 5 Nik Everett 2013-10-01 12:41:42 UTC
jvwiki is still trying to use MWSearch/lucene-search which doesn't support Javanese.  CirrusSearch/Elasticsearch doesn't have any explicit support for Javanese either.  I tried this morning on my sandbox and Javanese script doesn't cause CirrusSearch to crash which seems like an improvement over MWSearch.

Lack of explicit support CirrusSearch's case means that it won't know how to segment the words in Javanese script so it'll only be able to do things like match exact titles and whole sentences.

Fortunately, CirrusSearch is in a much better position to get a word segmented for Javanese script because it is using a modern version of Lucene.  Unfortunately I couldn't find one that already exists and writing one is a project.
Comment 6 bennylin 2013-10-01 12:57:00 UTC
Updated URL: mixing Wiktionary entry (ꦱꦸꦒꦼꦁ - welcome) and Wikipedia entry (ꦠꦺꦃ - tea)

[And here I am just stumbled upon this error again couple days ago, and wondering if I have submitted a bug yet or not :)]
Comment 8 Nik Everett 2013-10-01 13:03:02 UTC
I believe that worked because it found an exact page match and didn't actually dive into the full text search engine.  I believe the fulltext=Search parameter forces a full text search even if there is a matching article title.
Comment 9 Dan Garry 2014-02-08 01:44:08 UTC
This should not be an issue for CirrusSearch as it has better support for non-English languages. Since we're in the process of migrating from Lucene to CirrusSearch, I'm marking this as RESOLVED WONTFIX.

If you continue to experience issues with searching in your language with CirrusSearch, feel free to open a bug under MediaWiki extensions -> CirrusSearch.
Comment 10 bennylin 2014-02-09 11:55:15 UTC
OK, thanks for everyone who've been working on this bug for the past year!

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links