Last modified: 2014-02-05 05:35:59 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T55629, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 53629 - jstart doesn't check existence of resubmitted tasks


Summary:	jstart doesn't check existence of resubmitted tasks

Status:	RESOLVED WORKSFORME

Product:	Wikimedia Labs
Classification:	Unclassified
Component:	tools (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	Unprioritized normal
Target Milestone:	---
Assigned To:	Marc A. Pelletier

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2013-08-31 12:52 UTC by Liangent
Modified:	2014-02-05 05:35 UTC (History)
CC List:	2 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Liangent 2013-08-31 12:52:05 UTC

I got in qstat:

job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
-----------------------------------------------------------------------------------------------------------------
 801291 0.32618 php_dispat local-liange Rr    08/28/2013 02:00:02 continuous@tools-exec-05.pmtpa     1        
 869600 0.26803 php_dispat local-liange r     08/27/2013 19:00:17 continuous@tools-exec-01.pmtpa     1

with having a jstart call in crontab. I guess it's because jstart didn't see that Rr task and started a new one.

Category 	State 	SGE Letter Code
Running 	running 	r
Running 	running, re-submit 	Rr

Comment 1 Tim Landscheidt 2014-02-03 03:05:51 UTC

I can't reproduce that:

| scfc@tools-login:~$ echo sleep 10m > sleep-test.sh && chmod +x sleep-test.sh 
| scfc@tools-login:~$ jstart -N sleep-test ./sleep-test.sh 
| Your job 2415536 ("sleep-test") has been submitted
| scfc@tools-login:~$ qstat
| job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
| -----------------------------------------------------------------------------------------------------------------
| 2415536 0.25000 sleep-test scfc         r     02/03/2014 03:03:38 continuous@tools-exec-06.pmtpa     1        
| scfc@tools-login:~$ qmod -rj 2415536
| Pushed rescheduling of job 2415536 on host tools-exec-06.pmtpa.wmflabs
| scfc@tools-login:~$ qstat
| job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
| -----------------------------------------------------------------------------------------------------------------
| 2415536 0.25000 sleep-test scfc         Rr    02/03/2014 03:04:38 continuous@tools-exec-03.pmtpa     1        
| scfc@tools-login:~$ jstart -N sleep-test ./sleep-test.sh 
| scfc@tools-login:~$ qstat
| job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
| -----------------------------------------------------------------------------------------------------------------
| 2415536 0.25000 sleep-test scfc         Rr    02/03/2014 03:04:38 continuous@tools-exec-03.pmtpa     1        
| scfc@tools-login:~$

Comment 2 Liangent 2014-02-04 07:28:07 UTC

So is there any other possible cause for the original issue?

Comment 3 Marc A. Pelletier 2014-02-04 16:07:47 UTC

There is always the possibility of a race condition; there is no locking, so if you jstart twice within a very short period of time (a few seconds) both invocations would so none running and start; but that seems unlikely if you start with cron unless the interval is fairly short and tools-login was *really* loaded.

Comment 4 Liangent 2014-02-04 16:30:00 UTC

(In reply to comment #3)
> There is always the possibility of a race condition; there is no locking, so
> if
> you jstart twice within a very short period of time (a few seconds) both
> invocations would so none running and start; but that seems unlikely if you
> start with cron unless the interval is fairly short and tools-login was
> *really* loaded.

That cron entry is "0/10 * * * * $HOME/mw/startLabsDispatchRC.sh". Is the interval too short?

Also do you think it's a good bug report (so it's not WONTFIXed) about having no locking?

Comment 5 Marc A. Pelletier 2014-02-04 16:38:32 UTC

10 minutes seems long enough that I'm really surprised this could have happened at all; I might have expected it to happen at the 1-2 minute range at the most.

Locking would be a reasonable added safeguard, and even when cron gets replaced it would remain useful, but has a few implementation gotchas that will be tricky to get right.  Nevertheless, having a bug for it would not be a bad thing.

Comment 6 Liangent 2014-02-05 05:35:59 UTC

So bug 60862.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links