Last modified: 2014-05-29 17:23:20 UTC
Kirsten reports that some jobs she started in production with very small cohorts (like test2) are idling after hours. There's currently no way to kill them or restart them from the UI as they are still marked as pending. I can't check what's going wrong with these requests on stat1001, but a query like this one, when run on the dev instance, takes less than 10 seconds to complete: http://127.0.0.1:4000/cohorts/test2/threshold?project=enwiki&t=154&n=1&group=REGISTRATION&refresh See also: https://bugzilla.wikimedia.org/show_bug.cgi?id=47236
Possibly related to: https://github.com/rfaulkner/E3_analysis/issues/86
Confirmed: the prod instance currently doesn't accept new requests, all new jobs (including very short ones) are stuck in the queue.
FYI: Andrew restarted the server and new requests are now successfully completing, but we still need to find the causes of this behavior that doesn't seem to affect the dev instance on stat1 or local instances.
Lowering priority.
https://mingle.corp.wikimedia.org/projects/analytics/cards/1106
[moving tickets as per bug 65903]