Last modified: 2013-09-11 17:38:06 UTC
currently all the "check_job_queue" checks on Nagios fail with: JOBQUEUE CRITICAL - check plugin (check_job_queue) or PHP errors - investigating this i saw the problem does not appear to be in "check_job_queue" itself, but rather in CommonSettings.php , as check_job_queue misses this: PHP Warning: require(/home/wikipedia/common/php-1.18/../wmf-config/ExtensionMessages-1.18.php): failed to open stream: No such file or directory in /home/wikipedia/common/wmf-config/CommonSettings.php on line 2506 and .. PHP Fatal error: require(): Failed opening required '/home/wikipedia/common/php-1.18/../wmf-config/ExtensionMessages-1.18.php' (include_path='/home/wikipedia/common/php-1.20wmf2/extensions/OggHandler/PEAR/File_Ogg:/home/wikipedia/common/php-1.18:/home/wikipedia/common/php-1.18/lib:/usr/local/lib/php:/usr/share/php') in /home/wikipedia/common/wmf-config/CommonSettings.php on line 2506 that line 2506 in CommonSettings.php is: require( "$wmfConfigDir/ExtensionMessages-$wmfExtendedVersionNumber.php" ); so it is looking in /php-1.18/ because $wmfExtendedVersionNumber.php is set to that, and that setting seems outdated. Where should it be fixed?
15:37 < jeremyb> do you have anything in /home/wikipedia/common/wikiversion* ? 15:37 < mutante> where should it get the info from? 15:38 < mutante> yea, wikiversion.data 15:38 < mutante> .dat 15:38 < mutante> 2012-05-09 15:39 < mutante> the string "18" does not appear in the file 15:40 < mutante> and wikiversions.cdb , modified 05-10 15:41 < jeremyb> so, strace and find out which wikiversions file it's using? or if it's using one at all? 15:41 < jeremyb> 1.18 was once hardcoded into CommonSettings.php as a fallback. but not in the current cluster version so I'm looking elsewhere 15:43 < mutante> open("/usr/local/apache/common-local/wikiversions.cdb", O_RDONLY) = 3 15:43 < jeremyb> there you go 15:44 < jeremyb> does that (or it's .dat) have 1.18? 15:45 < mutante> yes 15:45 < mutante> so "getMWVersion" should be changed to use /home ? 15:46 < mutante> or add mechanism to copy to /usr/local 15:46 < jeremyb> or -local should be made to be reliably up to date 13:48 mutante: copying outdated wikiversions.dat/.cdb files from /home to /usr/local on spence, which fixes check_job_queue (thanks jeremyb) ./check_job_queue JOBQUEUE OK - all job queues below 10,000
You should probably use the /usr/local/apache/common/php/maintenance/showJobs.php in some way or another. We could just push all the MW files to spence...
Daniel: Is this still an issue, or can this be closed as obsolete?
what i wrote in 2012 is not an issue anymore. since then we switched to a single job_queue check. Which was ok at some point but maybe it is not ok again, because: Current Status: OK (for 72d 17h 59m 48s) Status Information: Could not open input file: /home/wikipedia/common/multiversion/MWScript.php JOBQUEUE OK - all job queues below 10,000 https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=neon&service=check_job_queue
#!/bin/bash # nagios plugin to check the mediawiki job queue LARGEQUEUES= while read wiki count do if [ ! $(echo "$count" | grep -E "^[0-9]+$") ]; then echo "JOBQUEUE CRITICAL - check plugin (`basename $0`) or PHP errors - $wiki" exit 2 elif [ $count -gt 9999 ]; then LARGEQUEUES="$LARGEQUEUES, $wiki ($count)" fi # The line below is a bash-ism that's needed for the LARGEQUEUES variable above to be in the right scope # If you do php ... | while read wiki count; do LARGEQUEUE=blah; done , then the LARGEQUEUE variable will # be manipulated in a subshell and the changes won't be visible to the if check below done < <( php /home/wikipedia/common/multiversion/MWScript.php extensions/WikimediaMaintenance/getJobQueueLengths.php ) if [ -z "$LARGEQUEUES" ]; then echo "JOBQUEUE OK - all job queues below 10,000" exit 0 else echo "JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: $LARGEQUEUES" exit 2 fi
root@neon:/usr/lib/nagios/plugins# ./check_job_queue Could not open input file: /home/wikipedia/common/multiversion/MWScript.php JOBQUEUE OK - all job queues below 10,000 root@neon:~# cd /h/w/ -bash: cd: /h/w/: No such file or directory of course, neon does not have /h/w. spence did. this could never work if it relies on that
Is this still a problem?