Information about job submission, job management and job monitoring on the
NIH HPC Biowulf cluster.
Acknowledgement/Citation
The continued growth and support of NIH's Biowulf cluster is dependent upon its demonstrable value to the NIH Intramural Research Program. If you publish research that involved significant use of Biowulf, please cite the cluster. Suggested citation text: This work utilized the computational resources of the NIH HPC Biowulf cluster (https://hpc.nih.gov).
|
Quick Links
|
Use 'ssh biowulf.nih.gov' to connect from the command line. See Connecting to the NIH HPC systems
Your username and password are your NIH login username and password, as on Helix.
Computationally demanding or memory intensive processes are not permitted on the Biowulf login node. See Interactive jobs below.
Email from the Biowulf batch system goes to $USER@biowulf.nih.gov, which is forwarded to your NIH email address.
Your /home, /data and shared space is set up exactly the same on Helix and Biowulf. See the storage section for details.
A summary of Biowulf job submission is available for download or printing (PDF).
Use the 'sbatch' or 'swarm' command to submit a batch script.
--partition=partname | Job to run on partition 'partname'. (default: 'norm') |
--ntasks=# | Number of tasks (processes) to be run |
--cpus-per-task=# | Number of CPUs required for each task (e.g. '8' for an 8-way multithreaded job) |
--ntasks-per-core=1 | Do not use hyperthreading (this flag typically used for parallel jobs) |
--mem=#g | Memory required for the job (Note the g (GB) in this option) |
--exclusive | Allocate the node exclusively |
--no-requeue | --requeue | If an allocated node hangs, whether the job should be requeued or not. |
--error=/path/to/dir/filename | Location of std class="softBottom"err file (by default, slurm######.out in the submitting directory) |
--output=/path/to/dir/filename | Location of stdout file (by default, slurm######.out in the submitting directory) |
--wrap="command arg1 arg2" | Submit a single command with arguments instead of a script (note quotes) |
--license=idl:6 | Request 6 IDL licenses (Minimum necessary for an instance of IDL) |
More useful flags and environment variables are detailed in the sbatch manpage, which can be read on the system by invoking man sbatch.
[biowulf ~] sbatch jobscriptThis job will be allocated 2 CPUs and 4 GB of memory.
[biowulf ~] sbatch --cpus-per-task=# jobscriptThe above job will be allocated '#' CPUs, and (# * 2) GB of memory. e.g. with --cpus-per-task=4, the default memory allocation is 8 GB of memory.
You should use the Slurm environment variable $SLURM_CPUS_PER_TASK within your script to specify the number of threads to the program. For example, to run a Novoalign job with 8 threads, set up a batch script like this:
#!/bin/bash module load novocraft novoalign -c $SLURM_CPUS_PER_TASK -f s_1_sequence.txt -d celegans -o SAM > out.sam
and submit with:
sbatch --cpus-per-task=8 jobscript
Note: when jobs are submitted without specifying the number of CPUs per task explicitly
the $SLURM_CPUS_PER_TASK
environment variable is not set.
[biowulf ~] sbatch --mem=#g jobscriptThe above job will be allocated # GB of memory. Note the g (GB) following the memory specification. Without this addition the job will allocate # MB of memory. Unless your job uses very little memory this will likely cause it to fail.
The Biowulf batch system will allocate by core, rather than by node. Thus, your job may be allocated 4 cores and 8 GB of memory on a node which has 16 cores and 32 GB of memory. Other jobs may be utilizing the remaining 12 cores and 24 GB of memory, so that your jobs may not have exclusive use of the node. Slurm will not allow any job to utilize more memory or cores than were allocated.
The default Slurm allocation is 1 physical core (2 CPUs) and 4 GB of memory. For any jobs that require more memory or CPU, you need to specify these requirements when submitting the job. Examples:
Command | Allocation |
sbatch jobscript | 2 CPUs, 4 GB memory on a shared node. |
sbatch --mem=8g | 2 CPUs, 8 GB memory on a shared node. |
sbatch --mem=200g --cpus-per-task=4 | 4 CPUs, 200 GB memory on a shared node |
sbatch --mem=24g --cpus-per-task=16 | 16 CPUs, 24 GB memory on a shared node. |
sinteractive --mem=Mg --cpus-per-task=C | interactive job with C cpus and M GB of memory on a shared node. |
Note: add --exclusive if you want the node allocated exclusively.
Options to sbatch that can be given on the command line can also be embedded into the job script as job directives. These are specified one to a line at the top of the job script file, immediately after the #!/bin/bash line, by the string #SBATCH at the start of the line, followed by the option that is to be set. For example, to have stdout captured in a file called "myjob.out" in your home directory, and stderr captured in a file called "myjob.err", the job file would start out as:
#!/bin/bash #SBATCH -o ~/myjob.out #SBATCH -e ~/myjob.errNote that the #SBATCH must be in the first column of the file. Also, if an option is given on the command line that conflicts with a job directive inside the job script, the value given on the command line takes precedence.
Job arrays can be submitted on Biowulf using swarm. e.g.
swarm -g G -t T -f swarmfile --module afniwill submit a swarm job with each command (a single line in the swarm command file) allocated T cpus (for T threads) and G GB of memory. You can use the environment variable $SLURM_CPUS_PER_TASK within the swarm command file to specify the number of threads to the program. See the swarm webpage for details or watch the videos and go through the hands-on exercises in the Swarm section of the Biowulf Online Class
Video: Multinode parallel jobs on Biowulf (27 mins)
Making efficient use of Biowulf's multinode partition
Parallel (MPI) jobs that run on more than 1 node: Use the environment variable $SLURM_NTASKS within the script to specify the number of MPI processes. For example:
#!/bin/bash module load meep/1.2/mpi/gige cd /data/$USER/mydir meme infile params -p $SLURM_NTASKS
Submit with, for example:
sbatch --ntasks=C --constraint=nodetype --exclusive --ntasks-per-core=1 [--mem-per-cpu=Gg] jobscriptwhere:
--ntasks=C | number of tasks (MPI processes) to run |
--constraint=nodetype | all nodes should be of the same type, e.g. 'x2650'. |
--exclusive | for jobs with interprocess communication, it is best to allocate the nodes exclusively |
--ntasks-per-core=1 | Most parallel jobs do better running only 1 process per physical CPU |
[optional] --mem-per-cpu=Gg | only needed if each process needs more than the default 2 GB per hyperthreaded core |
See the webpage for the application for more details.
Video: Slurm Resources, Partitions and Scheduling on Biowulf (14 mins).
Biowulf nodes are grouped into partitions. A partition can be specified when submitting a job. The default partition is 'norm'. The freen command can be used to see free nodes and CPUs, and available types of nodes on each partition.
Nodes available to all users | |
norm | the default partition. Restricted to single-node jobs |
multinode | Intended to be used for large-scale parallel jobs. Single node jobs are not allowed. See here for detailed information. |
largemem | Large memory nodes. Reserved for jobs with memory requirements that cannot fit on the norm partition. Jobs in the largemem partition must request a memory allocation of at least 350GB. |
unlimited | Reserved for jobs that require more than the default 10-day walltime. Note that this is a small partition with a low CPUs-per-user limit. Only jobs that absolutely require more than 10 days runtime, that cannot be split into shorter subjobs, or that are a first-time run where the walltime is unknown, should be run on this partition. |
quick | For jobs < 4 hours long. These jobs are scheduled at higher priority. They may run on the dedicated quick partition nodes, or on the buy-in nodes when they are free. |
gpu | GPU nodes reserved for applications that are built for GPUs. |
visual | Small number of GPU nodes reserved for jobs that require hardware accelerated graphics for data visualization. |
Buy-in nodes | |
ccr* | for NCI CCR users |
forgo | for individual groups from NHLBI and NINDS |
persist | for NIMH users |
Jobs and job arrays can be submitted to a single partition (e.g. --partition=ccr
)
or to two partitions (e.g. --partition=norm,ccr
), in which case they will
be run on the first partition where the job(s) can be scheduled. Please note:
--partition=norm,quick
, make sure that the walltime
limit is no more than the maximal walltime of the quick partition.Video: Allocating GPUs on Biowulf (7 mins)
To make use of GPUs, jobs have to be submitted to the gpu partition and specifically request the type and number of GPUs. For example:
# request one k80 GPU [biowulf ~]$ sbatch --partition=gpu --gres=gpu:k80:1 script.sh # request two k80 GPUs [biowulf ~]$ sbatch --partition=gpu --gres=gpu:k80:2 script.sh # request 1 k80 GPU and 8 CPUs on a single K80 node [biowulf ~]$ sbatch --partition=gpu --cpus-per-task=8 --gres=gpu:k80:1 script.sh # request all 4 k80 GPUs and 56 CPUs on a single K80 node [biowulf ~]$ sbatch --partition=gpu --cpus-per-task=56 --gres=gpu:k80:4 script.sh # request 2 P100 GPUs [biowulf ~]$ sbatch --partition=gpu --gres=gpu:p100:2 script.sh
All GPU nodes have 4 GPUs. The 'freen' command can be used to see the CPUs and memory on each type of node. For example:
[biowulf ~]$ freen | grep -E 'Partition|----|gpu' .......Per-Node Resources...... Partition FreeNds FreeCPUs FreeGPUs Cores CPUs GPUs Mem Disk ----------------------------------------------------------------------------------- gpu (v100x) 2 / 53 2042 / 3816 17 / 212 36 72 4 373g 1600g gpu (v100) 0 / 8 178 / 448 3 / 32 28 56 4 121g 800g gpu (p100) 9 / 48 2234 / 2688 75 / 192 28 56 4 121g 650g gpu (k80) 5 / 67 1904 / 3752 25 / 268 28 56 4 247g 800gwill show you, for example, that the K80 nodes have 28 cores (56 CPUs) and 247 GB of allocatable memory. For each allocated GPU, no more than #CPUS / #GPUs on a node can be allocated. For example, for each allocated k20x GPU 32 / 2 = 16 CPUs can be allocated. Likewise, 56 / 4 = 14 CPUs can be allocated for each P100. Slurm will accept jobs with a higher number of CPUs than possible, but the job will remain in the queue indefinitely.
The request for the GPU resource is in the form
resourceName:resourceType:number
.
To allocate a GPU for an interactive session, e.g. to compile a program, use:
[biowulf ~]$ sinteractive --gres=gpu:k80:1To request more than the default 2 CPUs, use
[biowulf ~]$ sinteractive --gres=gpu:k80:1 --cpus-per-task=8
Video: Interactive Jobs Biowulf (11 mins)
To allocate resources for an interactive job, use the sinteractive
command. The options are largely the
same as for the sbatch command. e.g.
[biowulf ~]$ sinteractive salloc.exe: Granted job allocation 22261 [cn0004 ~]$ ...some interactive commands.... [cn0004 ~]$exit exit salloc.exe: Relinquishing job allocation 22261 salloc.exe: Job allocation 22261 has been revoked. [biowulf ~]$
The default sinteractive
allocation is 1 core (2 CPUs) and 768 MB/CPU (1.5 GB) of memory. You can request additional resources. e.g.
Command | Allocation |
sinteractive --cpus-per-task=4 | 4 CPUs (2 cores) on a single node |
sinteractive --constraint=ibfdr --ntasks=64 --exclusive | IB FDR nodes, 2 nodes exclusively allocated |
sinteractive --constraint=x2650 --ntasks=16 --ntasks-per-core=1 | 16 cores on an x2650 node |
sinteractive --mem=5g --cpus-per-task=8 | 8 CPUs and 5 Gigabytes of memory in the norm (default) partition |
sinteractive
supports, via the -T
/--tunnel
option, automatically creating SSH tunnels that can be used to access application servers you run within your job.
See SSH Tunneling on Biowulf for details.
Use sinteractive -h
to see all available options.
The number of concurrent interactive jobs is currently limited to 2 and the longest walltime is 36 hours.
To see all up-to-date limits that apply to sinteractive sessions use the
batchlim
command
Re-connecting to interactive sessions: Interactive sessions are terminated if the controlling Biowulf session exits (e.g. laptop drops off the VPN). To maintain the interactive sessions even when you disconnect, we recommend tmux (tmux crash course, quick guide) or screen for text-based sessions. Start your sinteractive session from a tmux/screen window and then disconnect from the tmux/screen session before logging out. Then, when you reconnect to the Biowulf login node, you can re-attach to the tmux/screen session where your interactive session will be waiting for you. To reconnect to graphical sessions, use NX. Please do not run tmux or screen inside of NX.
Note: When interactive jobs are submitted without specifying the number of CPUs per task explicitly, the $SLURM_CPUS_PER_TASK
environment variable is not set.
To allocate resources for an interactive job using a node that has been configured for remote visualization, use the svis command. These nodes are allocated in their entirety so you will probably not need to supply any additional options. As demonstrated by the example below, the svis command prompts you to open an additional terminal and create an ssh tunnel to biowulf. Detailed instructions for using the visual partiton can be found here
[biowulf ~]$ svis salloc.exe: Pending job allocation 11051463 salloc.exe: job 11051463 queued and waiting for resources salloc.exe: job 11051463 has been allocated resources salloc.exe: Granted job allocation 11051463 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn0655 are ready for job srun: error: x11: no local DISPLAY defined, skipping error: unable to open file /tmp/slurm-spank-x11.11051463.0 slurmstepd: error: x11: unable to read DISPLAY value [+] Loading TurboVNC Starting VNC server ... please be patient... VNC server started on display 1 port 5901 VNC configured with SSH forwarding. After creating a tunnel from your workstation to biowulf.nih.gov port 38130, connect your VNC client to localhost port 38130. See https://hpc.nih.gov/nih/vnc for details. The VNC connection will terminate when this shell exits. Please create a SSH tunnel from your workstation to these ports on biowulf. On Linux/MacOS, open a terminal and run: ssh -L 38130:localhost:38130 user@biowulf.nih.gov For Windows instructions, see https://hpc.nih.gov/docs/tunneling
After establishing a TurboVNC session you can run your application with hardware graphics acceleration like so:
[cn0655 ~]$ module load virtualgl graphics-benchmarks [+] Loading VirtualGL [+] Loading graphics-benchmarks 0.0.1 on cn0655 [+] Loading singularity 3.7.2 on cn0655 [cn0655 ~]$ vglrun valley
Please see the visual partition user guide for detailed instructions and examples.
Video: Setting Walltimes on Biowulf (4 mins)
Most partitions have walltime limits. Use batchlim to see the default and max walltime limits for each partition.
If no walltime is requested on the command line, the walltime set for the job will be the default walltime in the table above. To request a specific walltime, use the --time option to sbatch. For example:
sbatch --time=24:00:00 jobscriptwill submit a job to the norm partition, and request a walltime of 24 hrs. If the job runs over 24 hrs it will be killed by the batch system.
To see the walltime limits and current runtimes for jobs, you can use the 'squeue' command.
[user@biowulf ~]$ squeue -O jobid,timelimit,timeused -u username JOBID TIME_LIMIT TIME 1418444 10-00:00:00 5-05:44:09 1563535 5-00:00:00 1:35:12 1493019 3-00:00:00 2-17:03:27 1501256 5-00:00:00 2-03:08:42 1501257 5-00:00:00 2-03:08:42 1501258 5-00:00:00 2-03:08:42 1501259 5-00:00:00 2-03:08:42 1501260 5-00:00:00 2-03:08:42 1501261 5-00:00:00 2-03:08:42For many more squeue options, see the squeue man page.
Several licensed software products are available on the cluster, including MATLAB, IDL, and Mathematica. Starting in June 2016, MATLAB licenses can only be allocated to interactive jobs. (See this announcement.) To use other licensed software in your batch job, you must specify the --license flag when submitting your job. This flag ensures that the batch system will wait until a license is available before starting the job. If you do not specify this flag, there is a risk that the batch system will start your job, the job will then be unable to get a license, and will exit immediately.
Example:
sbatch --license=idl:6 jobscript (request the 6 licenses necessary to run a single instance of IDL) sinteractive --license=idl:6 (interactive job that needs to run IDL)
It is no longer necessary to specify MATLAB licenses when running MATLAB in an interactive session via the sinteractive command. The current availability of licenses can be seen on the Systems Status page, or by typing 'licenses' on the command line.
The scancel command is used to delete jobs. Examples:
scancel 232323 (delete job 232323) scancel --user=username (delete all jobs belonging to user) scancel --user=username --state=PENDING (delete pending jobs belonging to user) scancel --user=username --state=RUNNING (delete running jobs belonging to user) scancel --name=JobName (delete job with the name JobName) scancel --nodelist=cn0005 (delete any jobs running on node cn0005)
Common job states:
Job State Code | Means |
R | Running |
PD | Pending (Queued). Some possible reasons: |
CG | Completing |
CA | Cancelled |
F | Failed |
TO | Timeout |
NF | Node failure |
Use the sacct command to check on the states of completed jobs.
Show all your jobs in any state since midnight:
sacct
Show all jobs that failed since midnight
sacct --state f
Show all jobs that failed this month
sacct --state f --starttime 2015-07-01
Slurm Jobs will display reasons for either not running, or ending prematurely. For example QOSMaxCpuPerUserLimit or ReqNodeNotAvail. An explanation for those reasons can be found here: https://slurm.schedmd.com/job_reason_codes.html
The completion status of a job is essentially the exit status of the job script with all the complications that entails. For example take the following job script:
#! /bin/bash module load GATK/2.3.4 GATK -m 5g -T RealignerTargetCreator ... echo "DONE"
This script tries to load a non-existent GATK version and then calls GATK.
This will fail. However, bash by default keeps executing even if commands fail,
so the script will eventually print 'DONE'. Since the exit status of a bash
script is the exit status of the last command and echo returns 0 (SUCCESS), the
script as a whole will exit with an exit code of 0, signalling sucess and the
job state will show COMPLETED
since SLURM uses the exit code to
judge if a job completed sucessfully.
Similarly, if a command in the middle of the job script were killed for
exceeding memory, the rest of the job script would still be executed and could
potentially return an exit code of 0 (SUCCESS), resulting again in a state of
COMPLETED
.
Conversely, in the following example a sucessful analysis is followed by a command that fails
#! /bin/bash module load GATK/3.4.0 GATK -m 5g -T RealignerTargetCreator ... touch /file/in/non/existing/directory/DONE
Even though the actual analysis (here the GATK call) finished sucessfully,
the last command will fail, resulting in a final state of FAILED
for the batch job.
Some defensive bash programming techniques can help ensure that a job script
will show a final state of FAILED
if anything goes wrong.
Use set -e
Starting a bash script with set -e
will tell bash to stop
executing a script if a command fails and signal failure with a non-zero exit
code which will be reflected as a FAILED
state in SLURM.
#! /bin/bash set -e module load GATK/3.4.0 GATK -m 5g -T RealignerTargetCreator ... echo "DONE"
One complication with this approach is that some commands will return non-zero exit codes. For example grepping for a string that does not exist.
Check errors for individual commands
A more selective approach involves carefully checking the exit codes of the important parts of a job script. This can be done with conventional if/else statements or with conditional short circuit evaluation often seen in scripts. For example:
#! /bin/bash function fail { echo "FAIL: $@" >&2 exit 1 # signal failure } module load GATK/3.4.0 || fail "Could not load GATK module" GATK -m 5g -T RealignerTargetCreator ... || fail "RealignerTargetCreator failed" echo "DONE"
Special exit code when Slurm maintenance occuring
Please note that the Slurm batch system may occasionally be shut down either briefly for a necessary configuration change or for longer periods when system maintenance is underway. In these situations, a "downtime maintenance" script will be installed in place of the normal Slurm commands (sbatch, squeue, etc.). This downtime script will terminate with an exit code of 123. This provides an easy way for job scripts, workflows and/or pipelines to test if the batch system if offline for maintenance. You can test for exit code 123 and, if found, can know to try your request again later. For example:
#!/bin/bash SLEEPTIME=120 sbatch job_script.sh while [ $? -eq 123 ] ; do echo "Batch system currently unavailable, trying again in $SLEEPTIME seconds..." sleep $SLEEPTIME sbatch job_script.sh done
Video: Job Dependencies (9 mins)
You may want to run a set of jobs sequentially, so that the second job runs only after the first one has completed. This can be accomplished using Slurm's job dependencies options. For example, if you have two jobs, Job1.bat and Job2.bat, you can utilize job dependencies as in the example below.
[user@biowulf]$ sbatch Job1.bat 123213 [user@biowulf]$ sbatch --dependency=afterany:123213 Job2.bat 123214
The flag --dependency=afterany:123213 tells the batch system to start the second job only after completion of the first job. afterany indicates that Job2 will run regardless of the exit status of Job1, i.e. regardless of whether the batch system thinks Job1 completed successfully or unsuccessfully.
Once job 123213 completes, job 123214 will be released by the batch system and then will run as the appropriate nodes become available. Exit status: The exit status of a job is the exit status of the last command that was run in the batch script. An exit status of '0' means that the batch system thinks the job completed successfully. It does not necessarily mean that all commands in the batch script completed successfully.
There are several options for the '--dependency' flag that depend on the status of Job1. e.g.
--dependency=afterany:Job1 | Job2 will start after Job1 completes with any exit status |
--dependency=after:Job1 | Job2 will start any time after Job1 starts |
--dependency=afterok:Job1 | Job2 will run only if Job1 completed with an exit status of 0 |
--dependency=afternotok:Job1 | Job2 will run only if Job1 completed with a non-zero exit status |
Making several jobs depend on the completion of a single job is trivial. This is accomplished in the example below:
[user@biowulf]$ sbatch Job1.bat 13205 [user@biowulf]$ sbatch --dependency=afterany:13205 Job2.bat 13206 [user@biowulf]$ sbatch --dependency=afterany:13205 Job3.bat 13207 [user@biowulf]$ squeue -u $USER -S S,i,M -o "%12i %15j %4t %30E" JOBID NAME ST DEPENDENCY 13205 Job1.bat R 13206 Job2.bat PD afterany:13205 13207 Job3.bat PD afterany:13205
Making a job depend on the completion of several other jobs: example below.
[user@biowulf]$ sbatch Job1.bat 13201 [user@biowulf]$ sbatch Job2.bat 13202 [user@biowulf]$ sbatch --dependency=afterany:13201,13202 Job3.bat 13203 [user@biowulf]$ squeue -u $USER -S S,i,M -o "%12i %15j %4t %30E" JOBID NAME ST DEPENDENCY 13201 Job1.bat R 13202 Job2.bat R 13203 Job3.bat PD afterany:13201,afterany:13202
Chaining jobs is most easily done by submitting the second dependent job from within the first job. Example batch script:
#!/bin/bash cd /data/mydir run_some_command sbatch --dependency=afterany:$SLURM_JOB_ID my_second_job
More detailed examples are shown on a separate page.
Video: Utilizing local disk on Biowulf nodes (10 mins)
Each Biowulf node has some amount of local disk available for use. For most nodes this is generally 800GB of fast solid state storage. A limited number have 2400GB (2.4TB) . Use the freen command to see how much is available on each node type. For jobs that read/write lots of temporary files during the run, it may be advantageous to use the local disk as scratch or temp space.
The command
sbatch --gres=lscratch:500 jobscript
will allocate 500 GB of local scratch space from the /lscratch directory. Other jobs may allocate from the remaining 300 GB on that node.
For multi-node jobs, each node will have the amount specified in the command line reserved for the job.
To access the directory allocated to the job, refer to it as
/lscratch/$SLURM_JOB_ID
. Users will no longer be able to
read/write to the top level of /lscratch
but have full read/write
access to the /lscratch/$SLURM_JOB_ID
set up for the job.
Note that each subjob in a swarm will have a separate lscratch directory. That means that lscratch cannot be used to share data between subjobs. Commands bundled into a single subjob with the -b option to swarm will all share the same lscratch directory, however.
When the job is terminated, all data in /lscratch/$SLURM_JOB_ID
directory will be automatically deleted. Any data that needs to be saved
should be copied to your /data directory before the job concludes.
Performance of lscratch will suffer for all users on a node when large numbers of files are created in a single directory. Please avoid these situations by either removing files no longer needed for the ongoing job , or structure your data differently (subdirectories, sqlite3 database, python shelf, ...).
TMPDIR is a near-universally agreed upon environment variable that defines where a program will write temporary files. By default, Unix systems set the value of TMPDIR to /tmp. On the Biowulf cluster, leaving TMPDIR set to /tmp can lead to problems due to:
Because of this, users are strongly encouraged to allocate local scratch disk for their jobs, as well as setting TMPDIR to that local scratch disk. Because local scratch is not defined until the job begins running, setting TMPDIR must be done either within the batch script:
#!/bin/bash export TMPDIR=/lscratch/$SLURM_JOB_ID ... run batch commands here ...
or once an interactive session begins:
[biowulf ~]$ sinteractive --gres=lscratch:5 salloc.exe: Granted job allocation 12345 [cn1234 ~]$ export TMPDIR=/lscratch/$SLURM_JOB_ID [cn1234 ~]$ ...some interactive commands.... [cn1234 ~]$ exit
To request more than one Generic Resource (GRES) like local scratch or GPUs, use the following format:
[biowulf ~]$ sinteractive --constraint=gpuk80 --gres=lscratch:10,gpu:k80:1
Note that this is not the same as using the --gres
option multiple times in which case only the last will be honored.
Video: Slurm Resources, Partitions and Scheduling (14 mins)
Cluster status info is available on the System Status page. The partitions page shows free and allocated cores for each partition over the last 24 hrs.
On the command line, freen will report free nodes/cores on the cluster, and batchlim will report the current per-user limits and walltime limits on the partitions. e.g.
Video: Job Monitoring tools on Biowulf (21 mins)
squeue will report all jobs on the cluster. squeue -u username will report your running jobs. An in-house variant of squeue is sjobs, which provides the information in a different format. Slurm commands like squeue are very flexible, so that you can easily create your own aliases.
Examples of squeue and sjobs:
[biowulf ~]$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 22392 norm meme_sho user PD 0:00 1 (Dependency) 22393 norm meme_sho user PD 0:00 1 (Dependency) 22404 norm stmv-1 user R 10:04 1 cn0414 22391 norm meme_sho user R 10:06 1 cn0413 [biowulf ~]$ sjobs ................Requested................. User JobId JobName Part St Runtime Nodes CPUs Mem Dependency Features Nodelist user 22391 meme_short norm R 10:09 1 32 1.0GB/cpu (null) cn0413 user 22404 stmv-1 norm R 10:07 1 32 1.0GB/cpu (null) cn0414 user 22392 meme_short norm PD 0:00 1 2 1.0GB/cpu afterany:22391 (null) (Dependency) user 22393 meme_short norm PD 0:00 1 4 1.0GB/cpu afterany:22392 (null) (Dependency)[More about sjobs]
jobload will report running jobs, the %CPU usage, and the memory usage while they are running. See here for details and an example.
Jobhist will report the CPU and memory usage of completed jobs. Details and example.
An excellent web-based utility that allows you to monitor your running and completed jobs is the User Dashboard.
Using the --mail-type=<type>
option to sbatch, users can request email
notifications from SLURM as certain events occur. Email will be sent to $USER@biowulf.nih.gov which
is automatically forwarded to your NIH email address. Multiple event types can be
specified as a comma separated list. For example
[user@biowulf]$ sbatch --mail-type=BEGIN,TIME_LIMIT_90,END batch_script.sh
Available event types:
Event type | Description |
---|---|
BEGIN | Job started |
END | Job finished |
FAIL | Job failed |
REQUEUE | Job was requeued |
ALL | BEGIN,END,FAIL,REQUEUE |
TIME_LIMIT_50 | Job reached 50% of its time limit |
TIME_LIMIT_80 | Job reached 80% of its time limit |
TIME_LIMIT_90 | Job reached 90% of its time limit |
TIME_LIMIT | Job reached its time limit |
--mail-user
, please make sure to specify a literal, valid address.
After a job is submitted, some of the submission parameters can be modified using the scontrol command. Examples:
Change the job dependency:
scontrol update JobId=181766 dependency=afterany:18123Request a matlab license:
scontrol update JobId=181755 licenses=matlabJob was submitted to the norm partition, resend to ccr partition:
scontrol update JobID=181755 partition=ccr QOS=ccrWalltimes on pending and running jobs can also be increased or decreased using the newwall command. Examples:
Reduce the walltime for job id 12345 to 2 hours.
newwall --jobid 12345 --time 2:00:00Increase the walltime for job id 12345 to 8 hours.
newwall --jobid 12345 --time 8:00:00
See newwall --help for usage details.
Note: Users can only increase walltimes up to the walltime limit of the partition, which you can check using batchlim. If you need a longer walltime increase, contact staff@hpc.nih.gov