Last modified: 2013-10-08 17:24:10 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T57467, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 55467 - UnavailableShardsException


Summary:	UnavailableShardsException

Status:	RESOLVED FIXED

Product:	MediaWiki extensions
Classification:	Unclassified
Component:	CirrusSearch (Other open bugs)
Version:	master
Hardware:	Other Linux

Importance:	High normal (vote)
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2013-10-08 15:41 UTC by keyler
Modified:	2013-10-08 17:24 UTC (History)
CC List:	2 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description keyler 2013-10-08 15:41:15 UTC

Now that updateSearchIndexConfig.php is working again :-) I am stuck with forceSearchIndex.php.

This also worked about 4 weeks ago but now this happens:


/var/www/wiki/extensions/CirrusSearch/maintenance # php forceSearchIndex.php
...
...
...
...
index: /wikidb_mw1_test_general_first/page/495 caused UnavailableShardsException[[wikidb_mw1_test_general_first][3] [3] shardIt, [1] active : Timeout waiting for [1m], request: org.elasticsearch.action.bulk.BulkShardRequest@3915e7eb]
index: /wikidb_mw1_test_general_first/page/496 caused UnavailableShardsException[[wikidb_mw1_test_general_first][0] [3] shardIt, [1] active : Timeout waiting for [1m], request: org.elasticsearch.action.bulk.BulkShardRequest@bfb9758]
index: /wikidb_mw1_test_general_first/page/497 caused UnavailableShardsException[[wikidb_mw1_test_general_first][1] [3] shardIt, [1] active : Timeout waiting for [1m], request: org.elasticsearch.action.bulk.BulkShardRequest@4d2d1294]
index: /wikidb_mw1_test_general_first/page/498 caused UnavailableShardsException[[wikidb_mw1_test_general_first][2] [3] shardIt, [1] active : Timeout waiting for [1m], request: org.elasticsearch.action.bulk.BulkShardRequest@6ae84cfa]
index: /wikidb_mw1_test_general_first/page/499 caused UnavailableShardsException[[wikidb_mw1_test_general_first][3] [3] shardIt, [1] active : Timeout waiting for [1m], request: org.elasticsearch.action.bulk.BulkShardRequest@3915e7eb]

Indexed 475 pages ending at 499 at 4/second
Indexed a total of 475 pages at 4/second

Comment 1 Nik Everett 2013-10-08 15:56:36 UTC

Are those "..."s normal updates that worked or just the same exception repeated over and over and over again?

Comment 2 Nik Everett 2013-10-08 15:59:35 UTC

Also, what does Elasticsearch's health check (ES_HOST:9200/_cluster/health?pretty) say?

If it says RED then the problem is that elasticsearch isn't happy.  I'm happy to make the error message less obtuse and more helpful.

Comment 3 keyler 2013-10-08 16:29:42 UTC

the "..." are just the other 494 lines that said: Timeout waiting for [1m]

Comment 4 keyler 2013-10-08 16:33:38 UTC

Sorry, I do not know how to use this: "ES_HOST:9200/_cluster/health?pretty"
But with elasticsearch-head I get this (yellow)

{

    cluster_name: kc-alpha1
    status: yellow
    timed_out: false
    number_of_nodes: 1
    number_of_data_nodes: 1
    active_primary_shards: 8
    active_shards: 8
    relocating_shards: 0
    initializing_shards: 0
    unassigned_shards: 16

}

Just for you info. It worked a couple of weeks ago and I did not change my elasticsearch config. But you never know....

Comment 5 Nik Everett 2013-10-08 16:41:31 UTC

I've got it!

A few weeks ago we changed the default number of replicas from 1 to 2 so we could have more redundancy during updates.  Which is cool and all, but when you have a single node elasticsearch you can't have _any_replicas.

The reason it was working before is that we are using elasticsearch's default write consistency: quorum.  My guess is quorum let the write through when the master was up but the other shards weren't.

You can work around this by adding this to your LocalSettings:
$wgCirrusSearchContentReplicaCount = array( 'content' => 0, 'general' => 0 );


We're going to make that the default and add a note to the readme about changing it for production use.  That commit will go in later on today.

I'm going to usurp this bug to make the error message better.

Comment 6 keyler 2013-10-08 17:09:23 UTC

This is the LocalSettings.php now:

##Start --------------------------------------- CirrusSearch
require_once( "$IP/extensions/Elastica/Elastica.php" );
require_once( "$IP/extensions/CirrusSearch/CirrusSearch.php" );
#$wgDisableSearchUpdate = true;
$wgCirrusSearchServers = array( 'localhost' );
$wgCirrusSearchContentReplicaCount = array( 'content' => 0, 'general' => 0 );
#$wgSearchType = 'CirrusSearch';
##End   --------------------------------------- CirrusSearch

Input:
 php forceSearchIndex.php

Output:
...
...
...
index: /wikidb_mw1_test_general_first/page/499 caused UnavailableShardsException[[wikidb_mw1_test_general_first][3] [3] shardIt, [1] active : Timeout waiting for [1m], request: org.elasticsearch.action.bulk.BulkShardRequest@6a3b300c]

Indexed 475 pages ending at 499 at 4/second
Indexed a total of 475 pages at 4/second

Comment 7 Nik Everett 2013-10-08 17:18:02 UTC

Sorry, for replica count to take effect you have to rerun updateSearchIndexConfig.  I'll make sure variables like that are more clearly marked in future.

Comment 8 keyler 2013-10-08 17:24:10 UTC

now it only took 10 seconds..... :-)
you are the BEST!!!

php forceSearchIndex.php

Indexed 475 pages ending at 499 at 106/second
Indexed a total of 475 pages at 106/second

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links