Online class: Introduction to Biowulf

Advanced Quiz

What's the problem with this sbatch submission command?
biowulf% sbatch  myscript.sh  --time=10:00:00  

Answer

The batch script name needs to be the last parameter on the sbatch command line. Any parameters after that are considered by the batch system to be input parameters for the script. Thus, the batch system will ignore the --time=10:00:00 and schedule this job with the default walltime.


What's the difference between these three commands:
swarm -b 2 -f swarmfile

swarm -p 2 -f swarmfile

swarm -t 2 -f swarmfile

Answer

The -b 2 flag 'bundles' the commands so that 2 processes (2 lines in your swarm command file) run sequentially on the allocated cores. This flag is valid for both single-threaded and multi-threaded processes. If the processes are single-threaded, one CPU on the allocated core will be idle.

The -p 2 flag 'packs' the commands so that two commands (2 lines in your swarm command file) run simultaneously on the 2 CPUs of an allocated core. This flag is set to '1' by default, can only be set to '1' or '2', and can only be set to '2' for single-threaded processes.

The -t 2 flag allocates 2 CPUs to each process (1 line in your swarm command file). The default is '-t 1', and it should only be increased if the swarm is running multi-threaded processes.


How many subjobs would you expect if you submitted the following script via swarm?
#!/bin/bash

cd /data/$USER
module load bam2mpg
bam2mpg --region chr1 --mpg aln.chr1.mpg.out ref.fasta aln.sort.bam

Answer

That's a batch script, not a swarm command file. If you submit it with swarm, you will get 3 subjobs, one for each non-comment line in the script. The results will probably not be what you wanted. A swarm file to run bam2mpg should look like this:
cd /data/$USER; bam2mpg --region chr1 --mpg aln1.chr1.mpg.out ref.fasta aln1.sort.bam
cd /data/$USER; bam2mpg --region chr1 --mpg aln2.chr1.mpg.out ref.fasta aln2.sort.bam
[..other such commands, one line for each bamfile...]
and be submitted with
swarm -f swarmfile --module bam2mpg


What's the problem with this swarm command file?
sbatch --time=10:00:00 job1
sbatch --time=10:00:00 job2
[...]

Answer

To submit a bunch of sbatch commands, you do not need to use swarm at all. You can simply run the script on the Biowulf login node command line. The login node is intended for job submission.

Submitting it as a swarm simply adds a layer of overhead to the process and needlessly loads the batch system.

In general, you should choose to use either swarm or sbatch to submit a collection of jobs.