Last modified: 2014-02-05 05:35:59 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T55629, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 53629 - jstart doesn't check existence of resubmitted tasks
jstart doesn't check existence of resubmitted tasks
Status: RESOLVED WORKSFORME
Product: Wikimedia Labs
Classification: Unclassified
tools (Other open bugs)
unspecified
All All
: Unprioritized normal
: ---
Assigned To: Marc A. Pelletier
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-08-31 12:52 UTC by Liangent
Modified: 2014-02-05 05:35 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Liangent 2013-08-31 12:52:05 UTC
I got in qstat:

job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
-----------------------------------------------------------------------------------------------------------------
 801291 0.32618 php_dispat local-liange Rr    08/28/2013 02:00:02 continuous@tools-exec-05.pmtpa     1        
 869600 0.26803 php_dispat local-liange r     08/27/2013 19:00:17 continuous@tools-exec-01.pmtpa     1

with having a jstart call in crontab. I guess it's because jstart didn't see that Rr task and started a new one.

Category 	State 	SGE Letter Code
Running 	running 	r
Running 	running, re-submit 	Rr
Comment 1 Tim Landscheidt 2014-02-03 03:05:51 UTC
I can't reproduce that:

| scfc@tools-login:~$ echo sleep 10m > sleep-test.sh && chmod +x sleep-test.sh 
| scfc@tools-login:~$ jstart -N sleep-test ./sleep-test.sh 
| Your job 2415536 ("sleep-test") has been submitted
| scfc@tools-login:~$ qstat
| job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
| -----------------------------------------------------------------------------------------------------------------
| 2415536 0.25000 sleep-test scfc         r     02/03/2014 03:03:38 continuous@tools-exec-06.pmtpa     1        
| scfc@tools-login:~$ qmod -rj 2415536
| Pushed rescheduling of job 2415536 on host tools-exec-06.pmtpa.wmflabs
| scfc@tools-login:~$ qstat
| job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
| -----------------------------------------------------------------------------------------------------------------
| 2415536 0.25000 sleep-test scfc         Rr    02/03/2014 03:04:38 continuous@tools-exec-03.pmtpa     1        
| scfc@tools-login:~$ jstart -N sleep-test ./sleep-test.sh 
| scfc@tools-login:~$ qstat
| job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
| -----------------------------------------------------------------------------------------------------------------
| 2415536 0.25000 sleep-test scfc         Rr    02/03/2014 03:04:38 continuous@tools-exec-03.pmtpa     1        
| scfc@tools-login:~$
Comment 2 Liangent 2014-02-04 07:28:07 UTC
So is there any other possible cause for the original issue?
Comment 3 Marc A. Pelletier 2014-02-04 16:07:47 UTC
There is always the possibility of a race condition; there is no locking, so if you jstart twice within a very short period of time (a few seconds) both invocations would so none running and start; but that seems unlikely if you start with cron unless the interval is fairly short and tools-login was *really* loaded.
Comment 4 Liangent 2014-02-04 16:30:00 UTC
(In reply to comment #3)
> There is always the possibility of a race condition; there is no locking, so
> if
> you jstart twice within a very short period of time (a few seconds) both
> invocations would so none running and start; but that seems unlikely if you
> start with cron unless the interval is fairly short and tools-login was
> *really* loaded.

That cron entry is "0/10 * * * * $HOME/mw/startLabsDispatchRC.sh". Is the interval too short?

Also do you think it's a good bug report (so it's not WONTFIXed) about having no locking?
Comment 5 Marc A. Pelletier 2014-02-04 16:38:32 UTC
10 minutes seems long enough that I'm really surprised this could have happened at all; I might have expected it to happen at the 1-2 minute range at the most.

Locking would be a reasonable added safeguard, and even when cron gets replaced it would remain useful, but has a few implementation gotchas that will be tricky to get right.  Nevertheless, having a bug for it would not be a bad thing.
Comment 6 Liangent 2014-02-05 05:35:59 UTC
So bug 60862.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links