High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
Swarm on Biowulf

Swarm is a script designed to simplify submitting a group of commands to the Biowulf cluster. Some programs do not scale well or can't use distributed memory. Other programs may be 'embarrassingly parallel', in that many independent jobs need to be run. These programs are well suited to running 'swarms of jobs'. The swarm script simplifies these computational problems.

Swarm reads a list of command lines (termed "commands" or "processes") from a swarm command file (termed the "swarmfile"), then automatically submits those commands to the batch system to execute. Command lines in the swarmfile should appear just as they would be entered on a Linux command line. Swarm encapsulates each command line in a single temporary command script, then submits all command scripts to the Biowulf cluster as a Slurm job array. By default, swarm runs one command per core on a node, making optimum use of a node. Thus, a node with 16 cores will run 16 commands in parallel.

For example, create a file that looks something like this (NOTE: lines that begin with a # character are interpreted as comments and are not executed):

[biowulf]$ cat file.swarm
# My first swarmfile -- this file is file.swarm
uptime
uptime
uptime
uptime

Then submit to the batch system:

[biowulf]$ swarm -f file.swarm --verbose 1
4 commands run in 4 subjobs, each command requiring 1.5 gb and 1 thread
12345

This will result in a single job (jobid 12345) of four subjobs (subjobids 0, 1, 2, 3), with each swarmfile line being run independently as a single subjob. By default, each subjob is allocated a 1.5 gb of memory and 1 core (consisting of 2 cpus). The subjobs will be executed within the same directory from which the swarm was submitted.

The following diagram visualizes how the job array will look:

------------------------------------------------------------
SWARM
├── subjob 0: 1 command (1 cpu, 1.50 gb)
|   ├──  uptime
├── subjob 1: 1 command (1 cpu, 1.50 gb)
|   ├──  uptime
├── subjob 2: 1 command (1 cpu, 1.50 gb)
|   ├──  uptime
├── subjob 3: 1 command (1 cpu, 1.50 gb)
|   ├──  uptime
------------------------------------------------------------

All output will be written to that same directory. By default, swarm will create two output files for each independent subjob, one for STDOUT and one for STDERR. The format is name_jobid_subjobid.{e,o}:

[biowulf]$ ls
file.swarm       swarm_12345_0.o  swarm_12345_1.o  swarm_12345_2.o  swarm_12345_3.o
swarm_12345_0.e  swarm_12345_1.e  swarm_12345_2.e  swarm_12345_3.e
Video Tutorials
back to top
Usage
back to top
Usage: swarm [swarm options] [sbatch options]

  -f,--file [file]       name of file with list of command lines to execute,
                         with a single command line per subjob

  -g,--gb-per-process    gb per process (can be fractions of GB, e.g. 3.5)
  [float]

  -t,                    threads per process (can be an integer or the word
  --threads-per-process  auto).  This option is only valid for multi-
  [int]/"auto"           threaded swarms (-p 1).

  -p,                    processes per subjob (default = 1).  This option is
  --processes-per-subjob only valid for single-threaded swarms (-t 1).
  [int]                  
                      
  --noht                 don't use hyperthreading, equivalent to slurm option
                         --threads-per-core=1

  -b,--bundle [int]      bundle more than one command line per subjob and run
                         sequentially

  --usecsh               use tcsh as the shell instead of bash
  --err-exit             exit the subjob immediately on first non-zero exit status

  --module               provide a list of environment modules to load
                         prior to execution, comma-delimited
  --no-comment           don't ignore text following comment character #
  --comment-char [chr]   use something other than # as the comment character

  --logdir               directory to which .o and .e files are to be written
                         (default is current working directory)

  --maxrunning           limit the number of simultaenously running subjobs

Development options:

  --help                 print this help message
  --keep-scripts         don't remove temporary swarm scripts
  --no-scripts           don't create temporary swarm scripts (with --debug
                         or --devel)
  --debug                don't actually run 
  --devel                combine --debug and --no-scripts, and be very chatty
  --verbose [int]        can range from 0 to 6, with 6 the most verbose
  --silent               don't give any feedback, just jobid

sbatch options:

  --job-name [str]       set the name of the job
  --dependency [str]     set up dependency (i.e. run swarm before or after)
  --time [str]           change the walltime for each subjob (default is
                         02:00:00, or 2 hours).  If  a  swarm  is  bundled,
                         then the time is multiplied by the bundle factor.
  --license [str]        obtain software licenses (e.g. --license=matlab)
  --partition [str]      change the partition (default is norm)
  --gres [str]           set generic resources for swarm
  --qos [str]            set quality of service for swarm

Other sbatch options

  --sbatch [string]      add sbatch-specific options to swarm.  These options
                         will be added last, which means that swarm options
                         for allocation of cpus and memory take precedence.

Environment variables

  The following environment variables will affect how sbatch allocates
  resources:

  SBATCH_JOB_NAME        Same as --job-name
  SBATCH_PARTITION       Same as --partition
  SBATCH_QOS             Same as --qos
  SBATCH_TIMELIMIT       Same as --time
  SBATCH_EXCLUSIVE       Same as --exclusive
Details
back to top
A node consists of a hierarchy of resources.
  • A socket is a receptacle on the motherboard for one physically packaged processor, each can contain one or more cores.
  • A core is a complete private set of registers, execution units, and retirement queues needed to execute programs. Nodes on the biowulf cluster can have 8, 16, or 32 cores.
  • A cpu has the attributes of one core, but is managed and scheduled as a single logical processor by the operating system. Hyperthreading is the implementation of multiple cpus on a single core. All nodes on the biowulf cluster have hyperthreading enabled, with 2 cpus per core.

Slurm allocates on the basis of cores. The smallest subjob runs on a single core, meaning the smallest number of cpus that swarm can allocate is 2.

Swarm reads a swarmfile and creates a single subjob per line. By default a subjob is allocated to a single core. Each line from a swarmfile has access to 2 cpus. Running swarm with the option -t 2 is thus no different than running swarm without the -t option, as both cpus (hyperthreads) are available to each subjob.
If commands in the swarmfile are multi-threaded, passing the -t option guarantees enough cpus will be available to the generated slurm subjobs. For example, if the commands require either 3 or 4 threads, giving the -t 3 or -t 4 option allocates 2 cores per subjob.

The nodes on the biowulf cluster are configured to constrain threads within the cores the subjob is allocated. Thus, if a multi-threaded command exceeds the cpus available, the command will run much slower than normal! This may not be reflected in the overall cpu load for the node.

Memory is allocated per subjob by swarm, and is strictly enforced by slurm. If a single subjob exceeds its memory allocation (by default 1.5 GB per swarmfile line), then the subjob will be killed by the batch system. See below for examples on how to allocate threads and memory.

More than one swarmfile line can be run per subjob using the -p option. This is only valid for single-threaded swarms (i.e. -t 1). Under these circumstances, all cpus are used. See below for more information on -p.

Input
back to top

The swarmfile

The only required input for swarm is a swarmfile, as designated by the -f or --file option. Each line in the swarmfile is run as a single command. For example, the swarmfile file.swarm

[biowulf]$ cat file.swarm
uptime
uptime
uptime
uptime

when submitted like this

[biowulf]$ swarm -f file.swarm

will create a swarm of 4 subjobs, with each subjob running the single command "uptime".

Bundling

There are occasions when running a single swarmfile line per subjob is inappropriate, such as when commands are very short (e.g. a few seconds) or when there are many thousands or millions of commands in a swarmfile. In these circumstances, it makes more sense to bundle the swarm. For example, a swarmfile of 10,000 commands when run with a bundle value of 40 will generate 250 subjobs (10000/40 = 250):

[biowulf]$ swarm --devel -f x -b 40
10000 commands run in 250 subjobs, each requiring 1 gb and 1 thread, running 40 commands serially per subjob

NOTE: If a swarmfile results in more than 1000 subjobs, swarm will automatically bundle the commands. In previous versions of swarm, this was enabled with --autobundle; this is now the default.

Comments

By default, any text on a single line that follows a # character is assumed to be a comment, and is ignored. For example,

[biowulf]$ cat file.swarm
# Here are my commands
uptime      # this gives the current load status
pwd         # this gives the current working directory
hostname    # this gives the host name

However, there are some applications that require a # character in the input:

[biowulf]$ cat odd.file.swarm
bogus_app -n 365#AX -w -another-flag=nonsense > output

The option --no-comment can be given to avoid removal of text following the # character. Alternatively, another comment character can be designated using the --comment-char option.

Command lists

Multiple commands can be run serially (one after the other) when they are separated by a semi-colon (;). This is also known as a command list. For example,

[biowulf]$ cat file.swarm
hostname ; date ; sleep 200 ; uptime
hostname ; date ; sleep 200 ; uptime
hostname ; date ; sleep 200 ; uptime
hostname ; date ; sleep 200 ; uptime

[biowulf]$ swarm -f file.swarm

will create 4 subjobs, each running independently on a single cpu. Each subjob will run "hostname", followed by "date", then "sleep 200", then "uptime", all in order.

Complex commands

Environment variables can be set, directory locations can be changed, subshells can be spawned all within a single command list, and conditional statements can be given. For example, if you wanted to run some commands in a newly created random temporary directory, you could use this:

[biowulf]$ cat file.swarm
export d=/data/user/${RANDOM} ; mkdir -p $d ; if [[ -d $d ]] ; then cd $d && pwd ; else echo "FAIL" >&2 ; fi
export d=/data/user/${RANDOM} ; mkdir -p $d ; if [[ -d $d ]] ; then cd $d && pwd ; else echo "FAIL" >&2 ; fi
export d=/data/user/${RANDOM} ; mkdir -p $d ; if [[ -d $d ]] ; then cd $d && pwd ; else echo "FAIL" >&2 ; fi
export d=/data/user/${RANDOM} ; mkdir -p $d ; if [[ -d $d ]] ; then cd $d && pwd ; else echo "FAIL" >&2 ; fi

NOTE: By default, command lists are interpreted as bash commands. If a swarmfile contains tcsh- or csh-specific commands, swarm may fail unless --usecsh is included.

Line continuation markers

Application commands can be very long, with dozens of options and flags, and multiple commands separated by semi-colons. To ease file editing, line continuation markers can be used to break up the single swarm commands into multiple lines. For example, the swarmfile

cd /data/user/project; KMER="CCCTAACCCTAACCCTAA"; jellyfish count -C -m ${#KMER} -t 32 -c 7 -s 1000000000 -o /lscratch/$SLURM_JOB_ID/39sHMC_Tumor_genomic <(samtools bam2fq /data/user/bam/0A4HMC/DNA/genomic/39sHMC_genomic.md.bam ); echo ${KMER} | jellyfish query /lscratch/$SLURM_JOB_ID/39sHMC_Tumor_genomic_0 > 39sHMC_Tumor_genomic.telrpt.count

can be written like this:

cd /data/user/project; KMER="CCCTAACCCTAACCCTAA"; \
jellyfish count -C 
  -m ${#KMER} \
  -t 32 \
  -c 7 \
  -s 1000000000 \
  -o /lscratch/$SLURM_JOB_ID/39sHMC_Tumor_genomic \
  <(samtools bam2fq /data/user/bam/0A4HMC/DNA/genomic/39sHMC_genomic.md.bam ); \
echo ${KMER} | jellyfish query /lscratch/$SLURM_JOB_ID/39sHMC_Tumor_genomic_0 > 39sHMC_Tumor_genomic.telrpt.count

Modules

Environment modules can be loaded for an entire swarm using the --module option. The

swarm -f file.swarm --module python,tophat,ucsc,samtools,vcftools -g 4 -t 8
Output
back to top

Default output files

STDOUT and STDERR output from subjobs executed under swarm will be directed to a file named swarm_jobid_subjobid.o and swarm_jobid_subjobid.e, respectively.

Please pay attention to the memory requirements of your swarm jobs! When a swarm job runs out of memory, the node stalls and the job is eventually killed or dies. At the bottom of the .e file, you may see a warning like this:

slurmstepd: Exceeded job memory limit at some point. Job may have been partially swapped out to disk.

If a job dies before it is finished, this output may not be available. Contact staff@hpc.nih.gov when you have a question about why a swarm stopped prematurely.

Renaming output files

The sbatch option --job-name can be used to rename the default output files.

[biowulf]$ swarm -f file.swarm --job-name programAOK
...
[biowulf]$ ls
programAOK_21381_0.e  programAOK_21381_2.e  programAOK_21381_4.e  programAOK_21381_6.e
programAOK_21381_0.o  programAOK_21381_2.o  programAOK_21381_4.o  programAOK_21381_6.o
programAOK_21381_1.e  programAOK_21381_3.e  programAOK_21381_5.e  programAOK_21381_7.e
programAOK_21381_1.o  programAOK_21381_3.o  programAOK_21381_5.o  programAOK_21381_7.o

Writing output files to a separate directory

By default, the STDOUT and STDERR files are written to the same directory from which the swarm was submitted. To redirect the files to a different directory, use --logdir:

swarm -f files.swarm --logdir /path/to/another/directory

Redirecting output

Input/output redirects (and everything in the swarmfile) should be bash compatible. For example,

[biowulf]$ cat bash_file.swarm
program1 -o -f -a -n 1 > output1.txt 2>&1
program1 -o -f -a -n 2 > output2.txt 2>&1
[biowulf]$ swarm -f bash_file.swarm

csh-style redirects like 'program >&; output' will not work correctly unless the --usecsh option is included. For example,

[biowulf]$ cat csh_file.swarm
program1 -o -f -a -n 1 >& output1.txt
program1 -o -f -a -n 2 >& output2.txt
[biowulf]$ swarm --usecsh -f csh_file.swarm

Be aware of programs that write directly to a file using a fixed filename. A file will be overwritten and garbled if multiple processes are writing to the same file. If you run multiple instances of such programs then for each instance you will need to a) change the name of the file in each command or b) alter the path to the file. See the EXAMPLES section for some ideas.

Examples
back to top
To see how swarm works, first create a file containing a few simple commands, then use swarm to submit them to the batch queue:
[biowulf]$ cat > cmdfile
date
hostname
ls -l
^D
[biowulf]$ swarm -f cmdfile

Use sjobs to monitor the status of your request; an "R" in the "St"atus column indicates your job is running. This particular example will probably run to completion before you can give the qstat command. To see the output from the commands, see the files named swarm_#_#.o.


back to top

A program that reads to STDIN and writes to STDOUT

For each invocation of the program the names for the input and output files vary:

[biowulf]$ cat > runbix
./bix < testin1 > testout1
./bix < testin2 > testout2
./bix < testin3 > testout3
./bix < testin4 > testout4
^D

back to top

Bundling large numbers of commands

If you have over 1000 commands, especially if each one runs for a short time, you should 'bundle' your jobs with the -b flag. If the swarmfile contains 2500 commands, the following swarm command will group them into bundles of 40 commands each, producing 64 bundles. Swarm will then submit two bundles as a single swarm job, so there will be 32 (2500/64) swarm jobs.

[biowulf]$ swarm -f cmdfile -b 40

Note that commands in a bundle will run sequentially on the assigned node.


back to top

Allocating memory and threads with -g and -t options

If the subjobs require significant amounts of memory (> 1.5 GB) or threads (> 1 per core), a swarm can run fewer subjobs per node than the number of cores available on a node. For example, if the commands in a swarmfile need up to 40 GB of memory each using 8 threads, running swarm with --devel shows what might happen:

[biowulf]$ swarm -f swarmfile -g 40 -t 8 --devel
14 commands run in 14 subjobs, each requiring 40 gb and 8 threads

If a command requires to use as many cpus on a node as possible, then the option -t auto should be added. This causes each subjob in the swarmfile to allocate an entire node exclusively to the subjob, allowing the subjob to use all available cpus on the node.

The default partition norm has nodes with a maximum of 120GB memory. If -g exceeds 120GB, swarm will give a warning message:

[biowulf]$ swarm -f swarmfile -g 130
ERROR: -g 130 requires --partition largemem

To allocate more than 120GB of memory per command, include --partition largmem:

[biowulf]$ swarm -f swarmfile -g 500 --partition largemem

For more information about partitions, please see https://hpc.nih.gov/docs/userguide.html#partitions


back to top

Using -p option

By default, swarm allocates a single command line per subjob. If the command is single-threaded, then swarm wastes half the cpus allocated, because the slurm batch system allocates no less than a single core (or two cpus) per subjob. This effect can be seen using the jobload command for a 4-command swarm:

[biowulf]$ swarm -f swarmfile
219433
[biowulf]$$ jobload -u user
         JOBID         TIME   NODES   CPUS  THREADS   LOAD             MEMORY
                                     Alloc  Running                Used/Alloc
      219433_3         0:37  cn0070      2        1    50%      1.0 GB/1.5 GB
      219433_2         0:37  cn0070      2        1    50%      1.0 GB/1.5 GB
      219433_1         0:37  cn0069      2        1    50%      1.0 GB/1.5 GB
      219433_0         0:37  cn0069      2        1    50%      1.0 GB/1.5 GB

USER SUMMARY
     Jobs: 2
    Nodes: 2
     CPUs: 4
 Load Avg: 50%

In order to use all the cpus allocated to a single-threaded swarm, the option -p will set the number of commands run per subjob. Including -p 2, half as many subjobs are created, each using twice as many cpus and twice as much memory:

[biowulf]$ swarm -f swarmfile -p 2
219434
[biowulf]$$ jobload -u user
         JOBID         TIME   NODES   CPUS  THREADS   LOAD             MEMORY
                                     Alloc  Running                Used/Alloc
      219434_1         0:24  cn0069      2        2   100%      2.0 GB/3.0 GB
      219434_0         0:24  cn0069      2        2   100%      2.0 GB/3.0 GB

USER SUMMARY
     Jobs: 2
    Nodes: 2
     CPUs: 4
 Load Avg: 100%

NOTE: The cpus on the biowulf cluster are hypercores, and some programs run more inefficiently when packed onto hypercores. Please test your application to see if it actually benefits from running two commands per core rather than one.

Keep in mind:

[biowulf]$$ swarm -f ../file.swarm -p 2
14 commands run in 7 subjobs, each command requiring 1.5 gb and 1 thread, packing 2 processes per subjob
221574
[biowulf]$$ ls
swarm_221574_0_0.e  swarm_221574_1_1.e  swarm_221574_3_0.e  swarm_221574_4_1.e  swarm_221574_6_0.e
swarm_221574_0_0.o  swarm_221574_1_1.o  swarm_221574_3_0.o  swarm_221574_4_1.o  swarm_221574_6_0.o
swarm_221574_0_1.e  swarm_221574_2_0.e  swarm_221574_3_1.e  swarm_221574_5_0.e  swarm_221574_6_1.e
swarm_221574_0_1.o  swarm_221574_2_0.o  swarm_221574_3_1.o  swarm_221574_5_0.o  swarm_221574_6_1.o
swarm_221574_1_0.e  swarm_221574_2_1.e  swarm_221574_4_0.e  swarm_221574_5_1.e
swarm_221574_1_0.o  swarm_221574_2_1.o  swarm_221574_4_0.o  swarm_221574_5_1.o

back to top

Setting walltime with --time

By default all jobs and subjobs have a walltime of 2 hours. If a swarm subjob exceeds its walltime, it will be killed!. On the other hand, if your swarm subjobs have a very short walltime, then their priority on the queue may be elevated. Therefore, it is best practice to set a walltime using the --time option that reflects the estimated execution time of the subjobs. For example, if the command lines in a swarm are expected to require no more than half an hour to complete, the swarm command should be:

[biowulf]$ swarm -f swarmfile --time 00:30:00

Because a subjob is typically running a single command from the swarmfile, the value of --time can be considered the amount of time to run a single command. When a swarm is bundled, the value for --time is then multiplied by the bundle factor. For example, if a swarm that normally creates 14 commands is bundled to run 4 commands serially, the value of --time is multiplied by 4:

[biowulf]$ swarm -f swarmfile --time 00:30:00 -b 4 --devel
14 commands run in 3 subjobs, each command requiring 1.5 gb and 1 thread, running 4 processes serially per subjob, allocating 3 cpus
sbatch --array=0-3 --job-name=swarm --time=2:00:00 --cpus-per-task=2 --partition=norm --mem=1536

If a swarm has more than 1000 commands and is autobundled, there is a chance that the time requested will exceed the maximum allowed. In that case, an error will be thrown:

ERROR: Total time for bundled commands is greater than partition walltime limit.
Try lowering the time per command (--time=04:00:00), lowering the bundle factor
(if not autobundled), picking another partition, or splitting up the swarmfile.

See the Biowulf User Guide for a discussion of walltime limits.


back to top

Handling job dependencies

If a swarm is run as a single step in a pipeline, job dependencies can be handled with the --dependency options. For example, a first script (first.sh) is to be run to generate some initial data files. Once this job is finished, a swarm of commands (swarmfile.txt) is run to take the output of the first script and process it. Then, a last script (last.sh) is run to consolidate the output of the swarm and further process it into its final form.

Below, the swarm is run with a dependency on the first script. Then the last script is run with a dependency on the swarm. The swarm will sit in a pending state until the first job (10001) is completed, and the last job will sit in a pending state until the entire swarm (10002) is completed.

[biowulf]$ sbatch first.sh
10001
[biowulf]$ swarm -f swarmfile.txt --dependency afterany:10001
10002
[biowulf]$ sbatch --dependency=afterany:10002 last.sh
10003

The jobid of a job can be captured from the sbatch command and passed to subsequent submissions in a script (master.sh). For example, here is a bash script which automates the above procedure, passing the variable $id to the first script. In this way, the master script can be reused for different inputs:

[biowulf]$ cat master.sh
#!/bin/bash
jobid1=$(sbatch first.sh)
echo $jobid1
jobid2=$(swarm -f swarmfile.txt --dependency afterany:$jobid1)
echo $jobid2
jobid3=$(sbatch --dependency=afterany:$jobid2 last.sh)
echo $jobid3

Now, master.sh can be submitted with a single argument

[biowulf]$ bash master.sh mydata123
10001
10002
10003
[biowulf]$

You can check on the job status using squeue:

[biowulf]$ squeue -u user
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
       10002_[0-3]      norm    swarm     user PD       0:00      1 (Dependency)
             10003      norm  last.sh     user PD       0:00      1 (Dependency)
             10001      norm first.sh     uwer  R       0:33      1 cn0121

The dependency key 'afterany' means run only after the entire job finishes, regardless of its exit status. Swarm passes the exit status of the last command executed back to Slurm, and Slurm consolidates all the exit statuses of the subjobs in the job array into a single exit status.

The final statuses for the jobs can be seen with sacct. The individual subjobs from swarm are designated by jobid_subjobid:

[biowulf]$ sacct
       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
10001          first.sh       norm       user          2  COMPLETED      0:0
10001.batch       batch                  user          1  COMPLETED      0:0
10002_3           swarm       norm       user          2     FAILED      2:0
10002_3.bat+      batch                  user          1     FAILED      2:0
10003           last.sh       norm       user          2  COMPLETED      0:0
10003.batch       batch                  user          1  COMPLETED      0:0
10002_0           swarm       norm       user          2  COMPLETED      0:0
10002_0.bat+      batch                  user          1  COMPLETED      0:0
10002_1           swarm       norm       user          2  COMPLETED      0:0
10002_1.bat+      batch                  user          1  COMPLETED      0:0
10002_2           swarm       norm       user          2  COMPLETED      0:0
10002_2.bat+      batch                  user          1  COMPLETED      0:0

If any of the subjobs in the swarm failed, the job is marked as FAILED. It almost all cases, it is better to rely on afterany rather than afterok, since the latter may cause the dependent job to remain queued forever:

[biowulf]$ sjobs
                                                       ................Requested............................
User       JobId    JobName    Part  St      Runtime   Nodes  CPUs     Mem        Dependency     Features             Nodelist
user       10003    last.sh    norm   PD          0:00   1       1   2.0GB/cpu   afterok:10002_*   (null)               (DependencyNeverSatisfied)

See the Biowulf User Guide, or SchedMD for a discussion on how Slurm handles exit codes.

NOTE: Setting -p causes multiple commands to run per subjob. Because of this, the exit status of the subjob can come from any of the multiple processes in the subjob.


back to top

A program that writes to a fixed filepath

If a program writes to a fixed filename, then you may need to run the program in different directories. First create the necessary directories (for instance run1, run2), and in the swarmfile cd to the unique output directory before running the program: (cd using either an absolute path beginning with "/" or a relative path from your home directory). Lines with leading "#" are considered comments and ignored.

[biowulf]$ cat > batchcmds
# Run ped program using different directory
# for each run
cd pedsystem/run1; ../ped
cd pedsystem/run2; ../ped
cd pedsystem/run3; ../ped
cd pedsystem/run4; ../ped
...

[biowulf]$ swarm -f batchcmds

back to top

Running mixed asynchronous and serial commands in a swarm

There are occasions when a single swarm command can contain a mixture of asynchronous and serial commands. For example, collating the results of several commands into a single output and then running another command on the pooled results. If run interactively, it would look like this:

[biowulf]$ cmdA < inp.1 > out.1
[biowulf]$ cmdA < inp.2 > out.2
[biowulf]$ cmdA < inp.3 > out.3
[biowulf]$ cmdA < inp.4 > out.4
[biowulf]$ cmdB -i out.1 -i out.2 -i out.3 -i out.4 > final_result

It would be more efficient if the four cmdA commands could run asynchronously (in parallel), and then the last cmdB command would wait until they were all done and then run, all on the same node and in the same swarm command. This can be achieved using process substitution with this one-liner in a swarmfile:

( cmdA < inp.1 > out.1 & cmdA < inp.2 > out.2 & \
  cmdA < inp.3 > out.3 & cmdA < inp.4 > out.4 & wait ) ; \
  cmdB -i out.1 -i out.2 -i out.3 -i out.4 > final_result

Here, the cmdA commands are all run asynchronously in four background processes, and the wait command is given to prevent cmdB from running until all the background processes are finished. Note that line continuation markers were used for easier editing.


back to top

Using --module option

It is sometimes difficult to set the environment properly before running commands. The easiest way to do this on Biowulf is with environment modules. Running commands via swarm complicates the issue, because the modules must be loaded prior to every line in the swarmfile. Instead, you can use the --module option to load a list of modules:

[biowulf]$ swarm -f testfile --module ucsc,matlab,python/2.7.1

Here, the environment is set to use the UCSC executables, Matlab, and an older, non-default version of Python.


back to top

Using --gres option

Local scratch disk space is NOT automatically available under Slurm. Instead, local scratch disk space is allocated using --gres. Here is an example of how to allocate 200GB of local scratch disk space for each swarm command:

[biowulf$ swarm -f swarmfile --gres=lscratch:200

Including --gres=lscratch:N, where N is the number of GB required, will create a subdirectory on the node corresponding to the jobid, e.g.:

/lscratch/987654/

This local scratch directory can be accessed dynamically using the $SLURM_JOB_ID environment variable:

/lscratch/$SLURM_JOB_ID/

Local scratch space is allocated per job. By default, that means each command or command list (single line in swarmfile) is allocated its own independent local scratch space. HOWEVER, there are two situations where some thought must be given to local scratch space:


back to top

Setting environment variables

If an entire swarm requires one or more environment variables to be set, the sbatch option --export can be used to set the variables prior to running. In this example, we need to set the BOWTIE_INDEXES environment variable to the correct path for all subjobs in the swarm:

[biowulf]$ swarm -f swarmfile --sbatch "--export=BOWTIE_INDEXES=/fdb/igenomes/Mus_musculus/UCSC/mm9/Sequence/BowtieIndex/"

NOTE: Environment variables set with the --sbatch "--export=" option are defined PRIOR to the job being submitted. This prevents setting environment variables using Slurm-generated environment variables, such as $SLURM_JOB_ID or $SLURM_MEM_PER_NODE.

However, if each command line in the swarm requires a unique set of environment variables, this must be done in the swarmfile. For example, setting TMPDIR to a unique subdirectory of /lscratch/$SLURM_JOB_ID:

[biowulf]$ cat swarmfile
export TMPDIR=/lscratch/$SLURM_JOB_ID/xyz1; mkdir $TMPDIR; cmdxyz -x 1 -y 1 -z 1
export TMPDIR=/lscratch/$SLURM_JOB_ID/xyz2; mkdir $TMPDIR; cmdxyz -x 2 -y 2 -z 2
export TMPDIR=/lscratch/$SLURM_JOB_ID/xyz3; mkdir $TMPDIR; cmdxyz -x 3 -y 3 -z 3
export TMPDIR=/lscratch/$SLURM_JOB_ID/xyz4; mkdir $TMPDIR; cmdxyz -x 4 -y 4 -z 4

Further, if individual commands within each command line require unique environment variables, this can be done by prefacing the command itself with the variable set:

[biowulf]$ cat swarmfile
MYENV=1 command ; MYENV=2 command ; MYENV=3 command
MYENV=4 command ; MYENV=5 command ; MYENV=6 command

back to top

Using sbatch flags

Swarm creates a single job array via the sbatch command ; all valid sbatch commandline options are also valid for swarm. However, they must be passed with a single --sbatch option, surrounded by quotation marks. In this example some extra sbatch options are added.

[biowulf]$ swarm -f testfile --sbatch "--mail-type=FAIL --export=var=100,nctype=12 --workdir=/data/user/test"

In this case,


back to top

Using --devel and --verbose options

Before submitting a large complex swarm to the batch system, it is better to see what would happen before it's too late. In this case, the --devel option will display a good deal of information. This example shows a huge number of commands autobundled to run 346 command lines serially per core.

[biowulf]$ swarm -f file.swarm --devel
345029 commands run in 998 subjobs, each requiring 1 gb and 1 thread, running 346 commands serially per subjob

--verbose accepts a number between 0 (the same as --silent) and 6. Increasing the verbosity level with --verbose and including --devel will give a visual representation of the swarm, along with lots of information about the swarm:

[biowulf]$ swarm --devel --verbose 5 --file file.swarm -g 5 -p 4 -b 4
basedir = /spin1/swarm/user
script dir = /spin1/swarm/user/cF2_0V7N
------------------------------------------------------------
SWARM
├── subjob 0: 16 commands (4 cpus, 20.00 gb)
|   ├── cmd 0 ; cmd 1 ; cmd 2 ; cmd 3 ;
|   ├── cmd 4 ; cmd 5 ; cmd 6 ; cmd 7 ;
|   ├── cmd 8 ; cmd 9 ; cmd 10 ; cmd 11 ;
|   ├── cmd 12 ; cmd 13 ; cmd 14 ; cmd 15 ;
├── subjob 1: 16 commands (4 cpus, 20.00 gb)
|   ├── cmd 16 ; cmd 17 ; cmd 18 ; cmd 19 ;
|   ├── cmd 20 ; cmd 21 ; cmd 22 ; cmd 23 ;
|   ├── cmd 24 ; cmd 25 ; cmd 26 ; cmd 27 ;
|   ├── cmd 28 ; cmd 29 ; cmd 30 ; cmd 31 ;
------------------------------------------------------------
2 subjobs, 32 commands, 8 output files
32 commands run in 2 subjobs, each command requiring 5 gb and 1 thread, packing 4 processes per subjob, running 4 processes serially per subjob
sbatch --array=0-1 --job-name=swarm --output=/dev/null --error=/dev/null --cpus-per-task=4 --mem=20480 /spin1/swarm/user/cF2_0V7N.batch

This shows a swarm of 32 commands (show as "cmd 0" ==> "cmd 31") within 2 subjobs. Each command requires 5 gb of memory, and the commands are bundled to run 4 commands sequentially on the cpus allocated.


back to top

back to top

Users will typically want to write a script to create a large swarm file. This script can be written in any scripting language, such as bash, perl, or the language of your choice. Some examples are given below to get you started.

Example 1: processing all files in a directory
Suppose you have 800 image files in a directory. You want to set up a swarm job to run an FSL command (e.g. 'mcflirt') on each one of these files.

# this file is make-swarmfile

cd /data/user/mydir   
touch swarm.cmd
for file in `ls`
do
echo "mcflirt -in $file -out $file.mcf1 -mats -plots -refvol 90 -rmsrel -rmsabs" >> swarm.cmd
done

Execute this file with

bash make-swarmfile

You should get a file called swarm.cmd which is suitable for submission to the swarm command.

Example 2: Use swarm to pull sequences out of the NCBI nt blast database.
Suppose you have a file containing 1,000,000 GI numbers of sequences. You want to pull these sequences out of the Helix/Biowulf NCBI nt Blast database. You can divide your GI file into chunks, and run a swarm of jobs, each one working on one chunk of GIs, to pull these sequences out of the database.

Once the swarm jobs are complete, you could if desired combine all the sequences into a single file with

[biowulf]$ cat x*.fas > myseqs.fas

Monitoring a swarm
back to top

Monitoring a swarm is handled the same wayas any other batch job on the cluster, using sjobs, squeue, jobload and sacct.

Deleting/Canceling a swarm
back to top

Because a swarm is treated as a single job by Slurm, deleting a swarm is handled with the same command as other batch jobs, scancel.

Downloading swarm
back to top

Swarm is available for download here. Keep in mind that swarm was written for our own systems. It will need to be adapted for other batch systems to work properly.