Before discussing job dependencies we need to point out that
sbatch
on biowulf, and therefore in the examples below, is a
wrapper script that returns just the jobid. That is different
from stock sbatch
which returns Submitted batch job 123456.
You can think of the wrapper doing something equivalent to
#! /bin/bash sbr="$(/path/to/real/sbatch "$@")" if [[ "$sbr" =~ Submitted\ batch\ job\ ([0-9]+) ]]; then echo "${BASH_REMATCH[1]}" exit 0 else echo "sbatch failed" exit 1 fi
Job dependencies are used to defer the start of a job until the specified
dependencies have been satisfied. They are specified with the
--dependency
option to sbatch
or swarm
in the format
sbatch --dependency=<type:job_id[:job_id][,type:job_id[:job_id]]> ...
Dependency types:
after:jobid[:jobid...] | job can begin after the specified jobs have started |
afterany:jobid[:jobid...] | job can begin after the specified jobs have terminated |
afternotok:jobid[:jobid...] | job can begin after the specified jobs have failed |
afterok:jobid[:jobid...] | job can begin after the specified jobs have run to completion with an exit code of zero (see the user guide for caveats). |
singleton | jobs can begin execution after all previously launched jobs with the same name and user have ended. This is useful to collate results of a swarm or to send a notification at the end of a swarm. |
See also the Job Dependencies section of the User Guide.
To set up pipelines using job dependencies the most useful types are afterany, afterok and singleton. The simplest way is to use the afterok dependency for single consecutive jobs. For example:
b2$ sbatch job1.sh 11254323 b2$ sbatch --dependency=afterok:11254323 job2.sh
Now when job1 ends with an exit code of zero, job2 will become eligible for scheduling. However, if job1 fails (ends with a non-zero exit code), job2 will not be scheduled but will remain in the queue and needs to be canceled manually.
As an alternative, the afterany dependency can be used and checking for successful execution of the prerequisites can be done in the jobscript itself.
The sections below give more complicated examples of using job dependencies for pipelines in bash, perl, and python.
The following bash script is a stylized example of some useful patterns for using job dependencies:
#! /bin/bash # first job - no dependencies jid1=$(sbatch --mem=12g --cpus-per-task=4 job1.sh) # multiple jobs can depend on a single job jid2=$(sbatch --dependency=afterany:$jid1 --mem=20g job2.sh) jid3=$(sbatch --dependency=afterany:$jid1 --mem=20g job3.sh) # a single job can depend on multiple jobs jid4=$(sbatch --dependency=afterany:$jid2:$jid3 job4.sh) # swarm can use dependencies jid5=$(swarm --dependency=afterany:$jid4 -t 4 -g 4 -f job5.sh) # a single job can depend on an array job # it will start executing when all arrayjobs have finished jid6=$(sbatch --dependency=afterany:$jid5 job6.sh) # a single job can depend on all jobs by the same user with the same name jid7=$(sbatch --dependency=afterany:$jid6 --job-name=dtest job7.sh) jid8=$(sbatch --dependency=afterany:$jid6 --job-name=dtest job8.sh) sbatch --dependency=singleton --job-name=dtest job9.sh # show dependencies in squeue output: squeue -u $USER -o "%.8A %.4C %.10m %.20E"
A more complete example of a mock chipseq pipeline can be found here.
And here is a simple bash script that will submit a series of jobs for a benchmark test. This script submits the same job with 1 MPI process, 2 MPI processes, 4 MPI processes ... 128 MPI processes. The Slurm batch script 'jobscript' uses the environment variable $SLURM_NTASKS to specify the number of MPI processes that the program should start. The reason to use job dependencies here is that all the jobs write some temporary files with the same name, and would clobber each other if run at the same time.
#!/bin/sh id=`sbatch --job-name=factor9-1 --ntasks=1 --ntasks-per-core=1 --output=${PWD}/results/x2650-1.slurmout jobscript` echo "ntasks 1 jobid $id" for n in 2 4 8 16 32 64 128; do id=`sbatch --depend=afterany:$id --job-name=factor9-$n --ntasks=$n --ntasks-per-core=1 --output=${PWD}/results/x2650-$n.slurmout jobscript`; echo "ntasks $n jobid $id" done
The batch script corresponding to this example:
#!/bin/bash module load amber/14 module list echo "Using $SLURM_NTASKS cores" cd /data/user /amber/factor_ix.amber10 `which mpirun` -np $SLURM_NTASKS `which sander.MPI` -O -i mdin -c inpcrd -p prmtop
A sample perl script that submits 3 jobs, each one dependent on the completion (in any state) of the previous job.
#!/usr/local/bin/perl $num = 8; $jobnum = `sbatch --cpus-per-task=$num myjobscript`; chop $jobnum; print "Job number $jobnum submitted\n\n"; $jobnum = `sbatch --depend=afterany:${jobnum} --cpus-per-task=8 --mem=2g mysecondjobscript`; chop $jobnum; print "Job number $jobnum submitted\n\n"; $jobnum = `sbatch --depend=afterany:${jobnum} --cpus-per-task=8 --mem=2g mythirdjobscript`; chop $jobnum; print "Job number $jobnum submitted\n\n"; system("sjobs");
The sample Python script below submits 3 jobs that are dependent on each other, and shows the status of those jobs.
#!/usr/local/bin/python import commands, os # submit the first job cmd = "sbatch Job1.bat" print "Submitting Job1 with command: %s" % cmd status, jobnum = commands.getstatusoutput(cmd) if (status == 0 ): print "Job1 is %s" % jobnum else: print "Error submitting Job1" # submit the second job to be dependent on the first cmd = "sbatch --depend=afterany:%s Job2.bat" % jobnum print "Submitting Job2 with command: %s" % cmd status,jobnum = commands.getstatusoutput(cmd) if (status == 0 ): print "Job2 is %s" % jobnum else: print "Error submitting Job2" # submit the third job (a swarm) to be dependent on the second cmd = "swarm -f swarmfile --module blast --depend=afterany:%s Job2.bat" % jobnum print "Submitting swarm job with command: %s" % cmd status,jobnum = commands.getstatusoutput(cmd) if (status == 0 ): print "Job3 is %s" % jobnum else: print "Error submitting Job3" print "\nCurrent status:\n" #show the current status with 'sjobs' os.system("sjobs")
Running this script:
[user @biowulf ~]$ submit_jobs.py Submitting Job1 with command: sbatch Job1.bat Job1 is 25452702 Submitting Job2 with command: sbatch --depend=afterany:25452702 Job2.bat Job2 is 25452703 Submitting swarm job with command: swarm -f swarm.cmd --module blast --depend=afterany:25452703 Swarm job is 25452706 Current status: User JobId JobName Part St Reason Runtime Walltime Nodes CPUs Memory Dependency ============================================================================================================== user 25452702 Job1.bat norm PD --- 0:00 4:00:00 1 1 2GB/cpu user 25452703 Job2.bat norm PD Dependency 0:00 4:00:00 1 1 2GB/cpu afterany:25452702 user 25452706_[0-11] swarm norm PD Dependency 0:00 4:00:00 1 12 1GB/node afterany:25452703 ============================================================================================================== cpus running = 0 cpus queued = 14 jobs running = 0 jobs queued = 14