Last modified: 2014-09-04 16:05:54 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T72374, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 70374 - beta labs job queue stuck (again)
beta labs job queue stuck (again)
Status: RESOLVED FIXED
Product: Wikimedia Labs
Classification: Unclassified
deployment-prep (beta) (Other open bugs)
unspecified
All All
: High major
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-09-04 00:28 UTC by Kunal Mehta (Legoktm)
Modified: 2014-09-04 16:05 UTC (History)
9 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Kunal Mehta (Legoktm) 2014-09-04 00:28:10 UTC
http://en.wikipedia.beta.wmflabs.org/wiki/Special:GlobalRenameProgress/ZFilipin_%28WMF%29 had a bunch of jobs queued that were just sitting there...I pushed them through manually just now (24+ hours later) with runJobs.php

legoktm@deployment-bastion:~$ mwscript showJobs.php --wiki=enwiki --group
refreshLinks: 3 queued; 0 claimed (0 active, 0 abandoned); 0 delayed
htmlCacheUpdate: 10 queued; 0 claimed (0 active, 0 abandoned); 0 delayed
enotifNotify: 11 queued; 0 claimed (0 active, 0 abandoned); 0 delayed
cirrusSearchLinksUpdate: 2 queued; 0 claimed (0 active, 0 abandoned); 0 delayed
cirrusSearchLinksUpdatePrioritized: 756 queued; 0 claimed (0 active, 0 abandoned); 0 delayed
renameUser: 1 queued; 0 claimed (0 active, 0 abandoned); 0 delayed
updateBetaFeaturesUserCounts: 1 queued; 0 claimed (0 active, 0 abandoned); 0 delayed
ParsoidCacheUpdateJobOnEdit: 715 queued; 0 claimed (0 active, 0 abandoned); 0 delayed
ParsoidCacheUpdateJobOnDependencyChange: 1305 queued; 0 claimed (0 active, 0 abandoned); 0 delayed
EchoNotificationJob: 1311 queued; 0 claimed (0 active, 0 abandoned); 0 delayed
Flow\Jobs\WatchTitle: 1422 queued; 0 claimed (0 active, 0 abandoned); 0 delayed

That looks high.
Comment 1 Bryan Davis 2014-09-04 01:32:16 UTC
Manually started jobrunner service on deployment-jobrunner01. 

$ mwscript showJobs.php --wiki=enwiki --group
Flow\Jobs\WatchTitle: 0 queued; 1422 claimed (1422 active, 0 abandoned); 0 delayed


Lots of log message from the runner like this now:

2014-09-04T01:31:06+0000: Runner loop 0 process in slot 4 gave status '255':
nice -19 php /usr/local/apache/common/multiversion/MWScript.php runJobs.php --wiki='incubatorwiki' --type='LocalRenameUserJob' --maxtime='60' --memory-limit='300M' --result=json
        /usr/local/apache/common-local/wikiversions-labs.cdb has no version entry for `incubatorwiki`.
Comment 2 Bryan Davis 2014-09-04 01:43:57 UTC
$ tail -5000 /var/log/mediawiki/jobrunner.log|grep 'Fatal error'|grep 'has no version entry'|awk '{print $9}'|sort|uniq -c
     52 `afwikibooks`.
     51 `afwikiquote`.
     54 `afwiktionary`.
     52 `akwiki`.
     49 `alswiki`.
     37 `alswikibooks`.
     46 `alswiktionary`.
     43 `amwiki`.
     48 `amwikiquote`.
     64 `arwikibooks`.
     51 `arwikiquote`.
     40 `arwikiversity`.
     66 `incubatorwiki`.
     38 `mkwikibooks`.
     54 `nlwiki`.
     46 `nlwikibooks`.
     43 `nlwikiquote`.
Comment 3 Bryan Davis 2014-09-04 15:55:16 UTC
Tried to clean up bad jobs manually:

redis 127.0.0.1:6379> keys *:jobqueue:LocalRenameUserJob:l-*
 1) "nlwikiquote:jobqueue:LocalRenameUserJob:l-unclaimed"
 2) "amwikiquote:jobqueue:LocalRenameUserJob:l-unclaimed"
 3) "arwikibooks:jobqueue:LocalRenameUserJob:l-unclaimed"
 4) "amwiki:jobqueue:LocalRenameUserJob:l-unclaimed"
 5) "afwikiquote:jobqueue:LocalRenameUserJob:l-unclaimed"
 6) "arwikiversity:jobqueue:LocalRenameUserJob:l-unclaimed"
 7) "akwiki:jobqueue:LocalRenameUserJob:l-unclaimed"
 8) "nlwiki:jobqueue:LocalRenameUserJob:l-unclaimed"
 9) "nlwikibooks:jobqueue:LocalRenameUserJob:l-unclaimed"
10) "afwiktionary:jobqueue:LocalRenameUserJob:l-unclaimed"
11) "alswiktionary:jobqueue:LocalRenameUserJob:l-unclaimed"
12) "arwikiquote:jobqueue:LocalRenameUserJob:l-unclaimed"
13) "incubatorwiki:jobqueue:LocalRenameUserJob:l-unclaimed"
14) "afwikibooks:jobqueue:LocalRenameUserJob:l-unclaimed"
15) "mkwikibooks:jobqueue:LocalRenameUserJob:l-unclaimed"
16) "alswikibooks:jobqueue:LocalRenameUserJob:l-unclaimed"
17) "alswiki:jobqueue:LocalRenameUserJob:l-unclaimed"
redis 127.0.0.1:6379> del "nlwikiquote:jobqueue:LocalRenameUserJob:l-unclaimed"
...
redis 127.0.0.1:6379> del "alswiki:jobqueue:LocalRenameUserJob:l-unclaimed"
redis 127.0.0.1:6379> keys *:jobqueue:LocalRenameUserJob:l-*
(empty list or set)
redis 127.0.0.1:6379> save
OK


But deployment-jobrunner01 is still seeing them?

2014-09-04T15:53:41+0000: Runner loop 0 process in slot 4 gave status '255':
nice -19 php /usr/local/apache/common/multiversion/MWScript.php runJobs.php --wiki='incubatorwiki' --type='LocalRenameUserJob' --maxtime='60' --memory-limit='300M' --result=json
        /usr/local/apache/common-local/wikiversions-labs.cdb has no version entry for `incubatorwiki`.

Fatal error: /usr/local/apache/common-local/wikiversions-labs.cdb has no version entry for `incubatorwiki`.
 in /srv/common-local/multiversion/MWMultiVersion.php on line 358
Comment 4 Bryan Davis 2014-09-04 16:05:54 UTC
W00t figured it out. There is a special hash for the new jobrunner that tracks what queues to try and process:

redis 127.0.0.1:6379> hkeys "jobqueue:aggregator:h-ready-queues:v2"
 1) "webVideoTranscode/commonswiki"
 2) "LocalRenameUserJob/afwikiquote"
 3) "LocalRenameUserJob/afwiktionary"
 4) "LocalRenameUserJob/akwiki"
 5) "LocalRenameUserJob/alswiki"
 6) "LocalRenameUserJob/alswikibooks"
 7) "LocalRenameUserJob/alswiktionary"
 8) "LocalRenameUserJob/amwiki"
 9) "LocalRenameUserJob/amwikiquote"
10) "LocalRenameUserJob/arwikibooks"
11) "LocalRenameUserJob/arwikiquote"
12) "LocalRenameUserJob/arwikiversity"
13) "LocalRenameUserJob/incubatorwiki"
14) "LocalRenameUserJob/mkwikibooks"
15) "LocalRenameUserJob/nlwiki"
16) "LocalRenameUserJob/nlwikibooks"
17) "LocalRenameUserJob/nlwikiquote"
18) "gwtoolsetUploadMediafileJob/commonswiki"
19) "gwtoolsetUploadMetadataJob/commonswiki"
20) "cirrusSearchLinksUpdate/commonswiki"
21) "globalUsageCachePurge/commonswiki"
22) "cirrusSearchLinksUpdatePrioritized/enwiki"
redis 127.0.0.1:6379> hdel "jobqueue:aggregator:h-ready-queues:v2" "LocalRenameUserJob/afwikiquote" ...
(integer) 16

Restarted runner on deployment-jobrunner01 and log is not filling with junk now.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links