Last modified: 2013-10-08 17:24:10 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T57467, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 55467 - UnavailableShardsException
UnavailableShardsException
Status: RESOLVED FIXED
Product: MediaWiki extensions
Classification: Unclassified
CirrusSearch (Other open bugs)
master
Other Linux
: High normal (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-10-08 15:41 UTC by keyler
Modified: 2013-10-08 17:24 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description keyler 2013-10-08 15:41:15 UTC
Now that updateSearchIndexConfig.php is working again :-) I am stuck with forceSearchIndex.php.

This also worked about 4 weeks ago but now this happens:


/var/www/wiki/extensions/CirrusSearch/maintenance # php forceSearchIndex.php
...
...
...
...
index: /wikidb_mw1_test_general_first/page/495 caused UnavailableShardsException[[wikidb_mw1_test_general_first][3] [3] shardIt, [1] active : Timeout waiting for [1m], request: org.elasticsearch.action.bulk.BulkShardRequest@3915e7eb]
index: /wikidb_mw1_test_general_first/page/496 caused UnavailableShardsException[[wikidb_mw1_test_general_first][0] [3] shardIt, [1] active : Timeout waiting for [1m], request: org.elasticsearch.action.bulk.BulkShardRequest@bfb9758]
index: /wikidb_mw1_test_general_first/page/497 caused UnavailableShardsException[[wikidb_mw1_test_general_first][1] [3] shardIt, [1] active : Timeout waiting for [1m], request: org.elasticsearch.action.bulk.BulkShardRequest@4d2d1294]
index: /wikidb_mw1_test_general_first/page/498 caused UnavailableShardsException[[wikidb_mw1_test_general_first][2] [3] shardIt, [1] active : Timeout waiting for [1m], request: org.elasticsearch.action.bulk.BulkShardRequest@6ae84cfa]
index: /wikidb_mw1_test_general_first/page/499 caused UnavailableShardsException[[wikidb_mw1_test_general_first][3] [3] shardIt, [1] active : Timeout waiting for [1m], request: org.elasticsearch.action.bulk.BulkShardRequest@3915e7eb]

Indexed 475 pages ending at 499 at 4/second
Indexed a total of 475 pages at 4/second
Comment 1 Nik Everett 2013-10-08 15:56:36 UTC
Are those "..."s normal updates that worked or just the same exception repeated over and over and over again?
Comment 2 Nik Everett 2013-10-08 15:59:35 UTC
Also, what does Elasticsearch's health check (ES_HOST:9200/_cluster/health?pretty) say?

If it says RED then the problem is that elasticsearch isn't happy.  I'm happy to make the error message less obtuse and more helpful.
Comment 3 keyler 2013-10-08 16:29:42 UTC
the "..." are just the other 494 lines that said: Timeout waiting for [1m]
Comment 4 keyler 2013-10-08 16:33:38 UTC
Sorry, I do not know how to use this: "ES_HOST:9200/_cluster/health?pretty"
But with elasticsearch-head I get this (yellow)

{

    cluster_name: kc-alpha1
    status: yellow
    timed_out: false
    number_of_nodes: 1
    number_of_data_nodes: 1
    active_primary_shards: 8
    active_shards: 8
    relocating_shards: 0
    initializing_shards: 0
    unassigned_shards: 16

}

Just for you info. It worked a couple of weeks ago and I did not change my elasticsearch config. But you never know....
Comment 5 Nik Everett 2013-10-08 16:41:31 UTC
I've got it!

A few weeks ago we changed the default number of replicas from 1 to 2 so we could have more redundancy during updates.  Which is cool and all, but when you have a single node elasticsearch you can't have _any_replicas.

The reason it was working before is that we are using elasticsearch's default write consistency: quorum.  My guess is quorum let the write through when the master was up but the other shards weren't.

You can work around this by adding this to your LocalSettings:
$wgCirrusSearchContentReplicaCount = array( 'content' => 0, 'general' => 0 );


We're going to make that the default and add a note to the readme about changing it for production use.  That commit will go in later on today.

I'm going to usurp this bug to make the error message better.
Comment 6 keyler 2013-10-08 17:09:23 UTC
This is the LocalSettings.php now:

##Start --------------------------------------- CirrusSearch
require_once( "$IP/extensions/Elastica/Elastica.php" );
require_once( "$IP/extensions/CirrusSearch/CirrusSearch.php" );
#$wgDisableSearchUpdate = true;
$wgCirrusSearchServers = array( 'localhost' );
$wgCirrusSearchContentReplicaCount = array( 'content' => 0, 'general' => 0 );
#$wgSearchType = 'CirrusSearch';
##End   --------------------------------------- CirrusSearch

Input:
 php forceSearchIndex.php

Output:
...
...
...
index: /wikidb_mw1_test_general_first/page/499 caused UnavailableShardsException[[wikidb_mw1_test_general_first][3] [3] shardIt, [1] active : Timeout waiting for [1m], request: org.elasticsearch.action.bulk.BulkShardRequest@6a3b300c]

Indexed 475 pages ending at 499 at 4/second
Indexed a total of 475 pages at 4/second
Comment 7 Nik Everett 2013-10-08 17:18:02 UTC
Sorry, for replica count to take effect you have to rerun updateSearchIndexConfig.  I'll make sure variables like that are more clearly marked in future.
Comment 8 keyler 2013-10-08 17:24:10 UTC
now it only took 10 seconds..... :-)
you are the BEST!!!

php forceSearchIndex.php

Indexed 475 pages ending at 499 at 106/second
Indexed a total of 475 pages at 106/second

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links