Online class: Introduction to Biowulf

Hands-On: Submit a multi-threaded batch job

In the previous Data Storage hands-on section, you should have copied the class scripts to your /data area. If you skipped or missed that section, type

hpc-classes biowulf
now. This command will copy the scripts and input files used in this online class to your /data area, and will take about 5 minutes.

In the following session, you will submit a batch job for Bowtie2, a genome alignment program for short read sequences. If you're not a genomicist, don't worry -- this is just an example. The basic principles of job submission are not specific for Bowtie.

cd /data/$USER/hpc-classes/biowulf/bowtie

# look at the batch script -- these are the same commands you would type on a command line
cat bowtie.bat

# submit the job
sbatch --cpus-per-task=16 --mem=5g bowtie.bat

# check if it's in the queue
squeue -u $USER

# try the 'sjobs' command to see the status of your job
sjobs

Quiz

What Slurm environment variables are used in the bowtie.bat script?

Answer

$SLURM_SUBMIT_DIR = the directory from which you submitted the job
$SLURM_CPUS_PER_TASK = the number of allocated CPUs. This was specified on the command line as '--cpus-per-task=16', so Slurm will allocate 16 CPUs to this job. When the job runs, the environment variable $SLURM_CPUS_PER_TASK is set to 16.

How many threads does the bowtie process spawn?

Answer

The bowtie flag '-p' within the bowtie.bat script is set to $SLURM_CPUS_PER_TASK, so it will always spawn as many threads as the allocated CPUS. In this case, 16 threads.

Why is the number of threads important?

Answer

If your job is overloaded (more threads than allocated CPUs) it will take longer. If your job is very underloaded (many fewer threads than allocated CPUs), you are wasting resources. It's best to match the number of threads to the number of allocated CPUs.