Biowulf High Performance Computing at the NIH
Online class: Introduction to Biowulf

Hands-On: Job Monitoring

In the previous Data Storage hands-on section, you should have copied the class scripts to your /data area. If you skipped or missed that section, type

hpc-classes biowulf
now. This command will copy the scripts and input files used in this online class to your /data area, and will take about 5 minutes.

In the following session, you will submit a batch job for Bowtie2, a genome alignment program for short read sequences. If you're not a genomicist, don't worry -- this is just an example. The basic principles of job submission are not specific for Bowtie.

cd /data/$USER/hpc-classes/biowulf/bowtie

# submit the job
sbatch --cpus-per-task=16 bowtie.bat

# check the status with 'jobload'
jobload -u $USER

# after the job finishes, use 'jobhist' to see how much memory it used
# (replace 'jobnumber' in the command below with the job number of your bowtie job)
jobhist  jobnumber

Point your web browser to https://hpc.nih.gov/dashboard. You will need to log in with your NIH username and password. Click on the 'Job Info' tab.

You should see a list of your jobs from the last few days. Click on the job ID of the running or completed bowtie job, and you will see a plot showing the memory and CPU usage of the job. It should look something like this at the end:

You can scroll down further and get additional information about the job.

Quiz

What was the maximum memory used by the job?

Answer

Jobload reports the memory usage at that moment. For the max memory used, you need to use jobhist or the dashboard. This job should have used about 3.4 GB. An allocation of 5GB would be best for this job, it provides a small buffer beyond the actual usage. So future such jobs could be submitted with:
sbatch --cpus-per-task=16 --mem=5g bowtie.bat

Did jobload show the job using the full 16 allocated CPUs?

Answer

During the first 20 seconds or so, this job is doing I/O and using only a single CPU. After that, the bowtie process multithreads to the allocated 16 CPUS, and you should be able to see that with 'jobload' or the dashboard. You might see it using 17 threads occasionally, which is fine.

Could this job run faster if more CPUs or memory were allocated?

Answer

There's no point allocating more memory. The job only used 3.6 GB, and allocating more memory will not help at all.

It might help to allocate more CPUs if the bowtie process scales well, and the job is not limited by I/O. Try it by submitting with

sbatch --cpus-per-task=32 --mem=5g bowtie.bat
After it completes, use 'jobhist jobnumber' to check the walltime. In my tests, I got the following walltimes:
8 CPUs: 280 seconds
16 CPUs: 146 seconds
32 CPUs: 120 seconds
(exact times may vary depending on I/O performance at different times of day, different filesystems etc).

So for this job, it is worth using 16 rather than 8 CPUs, given the speedup. But doubling the CPUs from 16 to 32 does not double the performance (i.e. halve the walltime), which suggests that I/O is a limiting factor. Thus, 16 CPUs is the sweet spot for this job.