Last modified: 2013-04-26 12:22:23 UTC
PoolCounter as currently deployed is a SPOF in our infrastructure. If it's enabled in MediaWiki and the poolcounterd server is completely down, an error page will displayed for any article in need of parsing. There is a separate RT ticket to make poolcounterd redundant in our infrastructure but we'd still like to make sure total failure is handled gracefully.
I thought I fixed this in r84322, which was deployed in March.
(In reply to comment #0) > PoolCounter as currently deployed is a SPOF in our infrastructure. If it's > enabled in MediaWiki and the poolcounterd server is completely down, an error > page will displayed for any article in need of parsing. > > There is a separate RT ticket to make poolcounterd redundant in our > infrastructure but we'd still like to make sure total failure is handled > gracefully. Are the conditions available for you to reproduce this bug (e.g. poolcounter server down), or can we trust Tim that it's been fixed in https://www.mediawiki.org/wiki/Special:Code/MediaWiki/84322 ?
A connection error would return a Status of type fatal, thus with r84322 the apache instance would do the work itself. The poolcounter failing won't result in downtime for the wiki *if* Michael Jackson doesn't die. In which case we would be subject to the same overload as without the poolcounter (and the solution is just to restart it). Assuming that the server would cope with all those connections in an overload (fd max, tcp buffers...), this is fixed.
With the exception of the recent conversation I generated, this bug has not been touched in at least six months. With this in mind, I've been asked by the bugmeister to bump this bug's priority down for "High". Concerns should be addressed to mah@everybody.org.
Does not look like "high" priority to me, hence setting to normal. More general info: https://wikitech.wikimedia.org/wiki/PoolCounter