Online class: Introduction to Biowulf

Hands-On: Submit a simple batch job

In the previous Data Storage hands-on section, you should have copied the class scripts to your /data area. If you skipped or missed that section, type

hpc-classes biowulf
now. This command will copy the scripts and input files used in this online class to your /data area, and will take about 5 minutes.

In the following session, you will submit a batch job for Plink, a whole-genome association analysis program. If you're not familiar with whole genome analysis or Plink, don't worry -- this is just an example. The basic principles of job submission are not specific for Plink.

cd /data/$USER/hpc-classes/biowulf/plink

# look at the batch script -- these are the same commands you would type on a command line
#  (use Ctrl-C to end the 'more' process)
more plink.bat

# submit the job
sbatch plink.bat

# check if it's in the queue
squeue -u $USER

# try the 'sjobs' command to see the status of your job
sjobs

Quiz

What is the job number (aka jobid) for this Plink job?

Answer

This would have printed to your screen by the 'sbatch' command you ran. (If you contact the HPC staff about a job problem, it's very helpful if you can include the job number.)

Which node is your job running on?

Answer

'sjobs' will show you this info. But you generally don't need to know which node your jobs run on. This is the power of the batch system; once you submit your job, the batch system will find the appropriate resources and start/end your job without you needing to know which nodes are available or which node your job ran on.

Where is the output file?

Answer

The output file is called slurm-#####.out, where ##### is the job number. It will be in the same directory from which you submitted the job.

How many cores/CPUs were allocated? How much memory?

Answer

You did not specify any cores or CPUs or memory in the sbatch command, so the batch system would allocate the default of 1 core = 2 CPUs, and 4 GB of memory.

How many cores/CPUs were used?

Answer

Plink is a single-threaded program, therefore it would use only a single CPU. One of the 2 allocated CPUs would have been idle.

In most cases, the Biowulf application webpages will indicate whether an application is single-threaded or multi-threaded. You may also need to read the documentation for the application, as some applications like R have a mix of single-threaded and multi-threaded packages.