Last modified: 2014-05-27 22:58:48 UTC
The deployment-jobrunner01 instance is quite busy. Looking at /data/project/logs/runJobs.log it keeps running 31 periodic queue tasks every few seconds for the simplewiki: 2014-04-08 15:44:51 deployment-jobrunner01 simplewiki: Executed 31 periodic queue task(s). 2014-04-08 15:44:55 deployment-jobrunner01 simplewiki: Executed 31 periodic queue task(s). 2014-04-08 15:45:03 deployment-jobrunner01 simplewiki: Executed 31 periodic queue task(s). 2014-04-08 15:45:07 deployment-jobrunner01 simplewiki: Executed 31 periodic queue task(s). 2014-04-08 15:45:10 deployment-jobrunner01 simplewiki: Executed 31 periodic queue task(s). 2014-04-08 15:45:16 deployment-jobrunner01 simplewiki: Executed 31 periodic queue task(s). 2014-04-08 15:45:19 deployment-jobrunner01 simplewiki: Executed 31 periodic queue task(s). I have no clue how to list the tasks. maintenance/showJobs.php does not give any clue nor getJobQueueLenghts :-/ Looking at the cli.log there is a bunch of: simplewiki-c91d295c: 2.0489 22.5M [memcached] get(simplewiki:jobqueuegroup:taskruns:v1) simplewiki-c91d295c: 2.0498 22.5M [memcached] add(simplewiki:jobqueuegroup:taskruns:v1) simplewiki-c91d295c: 2.0507 22.5M [memcached] result: NOT STORED Ccing Aaron who probably knows how to fix that :-]
Do the cas operations succeed? If they do, maybe memcached has a high eviction rate or something?
I don't see this probably locally or on production, so it seems specific to beta.
Here is the debug log for session simplewiki-6a538816 from cli.log: 1.7059 14.2M [caches] main: MemcachedPeclBagOStuff, message: MemcachedPeclBagOStuff, parser: MemcachedPhpBagOStuff 2.1357 18.2M [caches] LocalisationCache: using store LCStoreCDB 2.9236 22.0M Fully initialised 3.0402 22.0M [memcached] get(simplewiki:jobqueuegroup:taskruns:v1) 3.1017 22.5M Creating a new RedisConnectionPool instance with id 95797b002a3bb33394c101391fbb456c43b076e3. 3.1666 22.8M [memcached] get(simplewiki:jobqueuegroup:taskruns:v1) 3.1680 22.8M [memcached] add(simplewiki:jobqueuegroup:taskruns:v1) 3.1688 22.8M [memcached] result: NOT STORED 3.1690 22.8M [memcached] get(simplewiki:jobqueuegroup:taskruns:v1) 3.1696 22.8M [memcached] add(simplewiki:jobqueuegroup:taskruns:v1) 3.1703 22.8M [memcached] result: NOT STORED 3.1705 22.8M [memcached] get(simplewiki:jobqueuegroup:taskruns:v1) 3.1712 22.8M [memcached] add(simplewiki:jobqueuegroup:taskruns:v1) 3.1720 22.8M [memcached] result: NOT STORED 3.1722 22.8M [memcached] get(simplewiki:jobqueuegroup:taskruns:v1) 3.1730 22.8M [memcached] add(simplewiki:jobqueuegroup:taskruns:v1) 3.1744 22.8M [memcached] result: NOT STORED 3.1748 22.8M [memcached] get(simplewiki:jobqueuegroup:taskruns:v1) 3.1754 22.8M [memcached] add(simplewiki:jobqueuegroup:taskruns:v1) 3.1762 22.8M [memcached] result: NOT STORED 3.1763 22.8M [memcached] get(simplewiki:jobqueuegroup:taskruns:v1) 3.1769 22.8M [memcached] add(simplewiki:jobqueuegroup:taskruns:v1) 3.1776 22.8M [memcached] result: NOT STORED 3.1777 22.8M [memcached] get(simplewiki:jobqueuegroup:taskruns:v1) 3.1783 22.8M [memcached] add(simplewiki:jobqueuegroup:taskruns:v1) 3.1790 22.8M [memcached] result: NOT STORED 3.1791 22.8M [memcached] get(simplewiki:jobqueuegroup:taskruns:v1) 3.1797 22.8M [memcached] add(simplewiki:jobqueuegroup:taskruns:v1) 3.1809 22.8M [memcached] result: NOT STORED 3.1811 22.8M [memcached] get(simplewiki:jobqueuegroup:taskruns:v1) 3.1822 22.8M [memcached] add(simplewiki:jobqueuegroup:taskruns:v1) 3.1829 22.8M [memcached] result: NOT STORED 3.1831 22.8M [memcached] get(simplewiki:jobqueuegroup:taskruns:v1) 3.1838 22.8M [memcached] add(simplewiki:jobqueuegroup:taskruns:v1) 3.1848 22.8M [memcached] result: NOT STORED Looking at the memcached key simplewiki:jobqueuegroup:taskruns:v1: $ mwscript mcc.php > get simplewiki:jobqueuegroup:taskruns:v1 Getting simplewiki:jobqueuegroup:taskruns:v1[] array(31) { ["refreshLinks"]=> array(1) { ["recyclePruneAndUndelayJobs"]=> int(1396479307) } <snip> I get a 31 elements array, each element having a timestamp of 1396479307 which is Wed, 02 Apr 2014 22:55:07 GMT. Seems to me there is some logical error in JobQueueGroup::executeReadyPeriodicTasks() which never clear out the previous runs :-/ I am not deleting the key so we can keep investigating.
If the add() gave NOT STORED then you should have seen cas(). That sounds like some problem happens with memcached in BagOStuff::mergeViaCas().
Do we have any unit test covering that code or anyway to potentially reproduce the issue? I am not familiar at all with the cas stuff, much less with the job queue.
(still seeing those log entries)