Before discussing job dependencies we need to point out that sbatch on biowulf, and therefore in the examples below, is a wrapper script that returns just the jobid. That is different from stock sbatch which returns Submitted batch job 123456. You can think of the wrapper doing something equivalent to

#! /bin/bash

sbr="$(/path/to/real/sbatch "$@")"

if [[ "$sbr" =~ Submitted\ batch\ job\ ([0-9]+) ]]; then
    echo "${BASH_REMATCH[1]}"
    exit 0
else
    echo "sbatch failed"
    exit 1
fi

Job dependencies are used to defer the start of a job until the specified dependencies have been satisfied. They are specified with the --dependency option to sbatch or swarm in the format

To set up pipelines using job dependencies the most useful types are afterany, afterok and singleton. The simplest way is to use the afterok dependency for single consecutive jobs. For example:

Now when job1 ends with an exit code of zero, job2 will become eligible for scheduling. However, if job1 fails (ends with a non-zero exit code), job2 will not be scheduled but will remain in the queue and needs to be canceled manually.

As an alternative, the afterany dependency can be used and checking for successful execution of the prerequisites can be done in the jobscript itself.

The sections below give more complicated examples of using job dependencies for pipelines in bash, perl, and python.

The following bash script is a stylized example of some useful patterns for using job dependencies:

#! /bin/bash # first job - no dependencies jid1=$(sbatch --mem=12g --cpus-per-task=4 job1.sh) # multiple jobs can depend on a single job jid2=$(sbatch --dependency=afterany:$jid1 --mem=20g job2.sh) jid3=$(sbatch --dependency=afterany:$jid1 --mem=20g job3.sh) # a single job can depend on multiple jobs jid4=$(sbatch --dependency=afterany:$jid2:$jid3 job4.sh) # swarm can use dependencies jid5=$(swarm --dependency=afterany:$jid4 -t 4 -g 4 -f job5.sh) # a single job can depend on an array job # it will start executing when all arrayjobs have finished jid6=$(sbatch --dependency=afterany:$jid5 job6.sh) # a single job can depend on all jobs by the same user with the same name jid7=$(sbatch --dependency=afterany:$jid6 --job-name=dtest job7.sh) jid8=$(sbatch --dependency=afterany:$jid6 --job-name=dtest job8.sh) sbatch --dependency=singleton --job-name=dtest job9.sh # show dependencies in squeue output: squeue -u $USER -o "%.8A %.4C %.10m %.20E"

And here is a simple bash script that will submit a series of jobs for a benchmark test. This script submits the same job with 1 MPI process, 2 MPI processes, 4 MPI processes ... 128 MPI processes. The Slurm batch script 'jobscript' uses the environment variable $SLURM_NTASKS to specify the number of MPI processes that the program should start. The reason to use job dependencies here is that all the jobs write some temporary files with the same name, and would clobber each other if run at the same time.

#!/bin/sh id=`sbatch --job-name=factor9-1 --ntasks=1 --ntasks-per-core=1 --output=${PWD}/results/x2650-1.slurmout jobscript` echo "ntasks 1 jobid $id" for n in 2 4 8 16 32 64 128; do id=`sbatch --depend=afterany:$id --job-name=factor9-$n --ntasks=$n --ntasks-per-core=1 --output=${PWD}/results/x2650-$n.slurmout jobscript`; echo "ntasks $n jobid $id" done

#!/bin/bash module load amber/14 module list echo "Using $SLURM_NTASKS cores" cd /data/user /amber/factor_ix.amber10 `which mpirun` -np $SLURM_NTASKS `which sander.MPI` -O -i mdin -c inpcrd -p prmtop

A sample perl script that submits 3 jobs, each one dependent on the completion (in any state) of the previous job.

#!/usr/local/bin/perl $num = 8; $jobnum = `sbatch --cpus-per-task=$num myjobscript`; chop $jobnum; print "Job number $jobnum submitted\n\n"; $jobnum = `sbatch --depend=afterany:${jobnum} --cpus-per-task=8 --mem=2g mysecondjobscript`; chop $jobnum; print "Job number $jobnum submitted\n\n"; $jobnum = `sbatch --depend=afterany:${jobnum} --cpus-per-task=8 --mem=2g mythirdjobscript`; chop $jobnum; print "Job number $jobnum submitted\n\n"; system("sjobs");

The sample Python script below submits 3 jobs that are dependent on each other, and shows the status of those jobs.

#!/usr/local/bin/python import commands, os # submit the first job cmd = "sbatch Job1.bat" print "Submitting Job1 with command: %s" % cmd status, jobnum = commands.getstatusoutput(cmd) if (status == 0 ): print "Job1 is %s" % jobnum else: print "Error submitting Job1" # submit the second job to be dependent on the first cmd = "sbatch --depend=afterany:%s Job2.bat" % jobnum print "Submitting Job2 with command: %s" % cmd status,jobnum = commands.getstatusoutput(cmd) if (status == 0 ): print "Job2 is %s" % jobnum else: print "Error submitting Job2" # submit the third job (a swarm) to be dependent on the second cmd = "swarm -f swarmfile --module blast --depend=afterany:%s Job2.bat" % jobnum print "Submitting swarm job with command: %s" % cmd status,jobnum = commands.getstatusoutput(cmd) if (status == 0 ): print "Job3 is %s" % jobnum else: print "Error submitting Job3" print "\nCurrent status:\n" #show the current status with 'sjobs' os.system("sjobs")

[user @biowulf ~]$ submit_jobs.py Submitting Job1 with command: sbatch Job1.bat Job1 is 25452702 Submitting Job2 with command: sbatch --depend=afterany:25452702 Job2.bat Job2 is 25452703 Submitting swarm job with command: swarm -f swarm.cmd --module blast --depend=afterany:25452703 Swarm job is 25452706 Current status: User JobId JobName Part St Reason Runtime Walltime Nodes CPUs Memory Dependency ============================================================================================================== user 25452702 Job1.bat norm PD --- 0:00 4:00:00 1 1 2GB/cpu user 25452703 Job2.bat norm PD Dependency 0:00 4:00:00 1 1 2GB/cpu afterany:25452702 user 25452706_[0-11] swarm norm PD Dependency 0:00 4:00:00 1 12 1GB/node afterany:25452703 ============================================================================================================== cpus running = 0 cpus queued = 14 jobs running = 0 jobs queued = 14

after:jobid[:jobid...]	job can begin after the specified jobs have started
afterany:jobid[:jobid...]	job can begin after the specified jobs have terminated
afternotok:jobid[:jobid...]	job can begin after the specified jobs have failed
afterok:jobid[:jobid...]	job can begin after the specified jobs have run to completion with an exit code of zero (see the user guide for caveats).
singleton	jobs can begin execution after all previously launched jobs with the same name and user have ended. This is useful to collate results of a swarm or to send a notification at the end of a swarm.