Biowulf High Performance Computing at the NIH
Frequently Asked Questions

Why is my job pending?

Why a job remains pending depends on a number of factors, roughly in this order of relevance:

  1. The user has reached their limits. This is shown using a variety of pending job reasons, such as 'QOSMaxCpusPerUserLimit', 'QOSJobLimit', or 'QOSMaxGRESPerUser', as seen with the squeue or sjobs commands. Each user is limited to a number of concurrent cpus and jobs, depedent on the partition requested. Use the command batchlim to see the current limits on each partition.
  2. No resources available. This is shown by the 'Resources' reason for pending jobs. This simply means that the CPU or memory requirements of the job cannot be fulfilled right now in the requested partition. Consider submitting to 2 partitions to increase the chances of resources becoming available. e.g. --partition quick,norm.
  3. Low priority of the job. This is shown by the 'Priority' reason for pending jobs. Priority is a score mostly dependent on the number of CPU hours accumulated in the recent past; the more CPU hours used within a time span, the lower the priority score will be for the next job. It is also dependent on the partition requested; the quick and interactive partitions have higher priority than the other partitions.
  4. Free cpus are prevented from being used by memory usage or scratch space. If a job allocates all the memory of a node, but only a fraction of the cpus, the remaining cpus are not available for other jobs. Similarly, if a job requires a large amount of local scratch space, but only a few cpus, those remaining cpus will be unavailable for other jobs that require scratch space.
  5. The cpus and memory are being held for higher-priority jobs. An example of this scenario would be a high-priority job requesting an entire node of 32 CPUs. In order for that job to run, the CPUs on that node are left idle as the previous jobs utilizing them end, until enough CPUs accumulate to satisfy the high-priority job's request. During the time in which the CPUs are kept idle, lower-priority jobs will be prevented from running on the idle CPUs.

There are some simple steps you can do to speed up the scheduling of your job.

  1. Request two partitions. If a job can utilize the resources from two partitions, the job can request any of them in a comma-delimited list. For example, swarm --partition=ccr,norm for NCI users. This will increase the pool of possible nodes available for jobs. Note, however, that submissions are limited to no more than two partitions since larger lists of partitions have caused scheduling problems in the past.
  2. Estimate your resources accurately. If a job is expected to finish in less than 4 hrs, submit to the quick/ partition. If a job is expected to use 4GB of memory and 4 CPUs and run for 5 hrs, only request what is needed with a small memory and walltime buffer (e.g. 5 GB memory, 6 hrs). The more resources a job requests, the longer it will take for the batch system to carve out a slot for that job.
  3. Bundle large swarms of short-running processes. If there are 10,000 lines in a swarmfile, and each process line is expected to run for 5 minutes, bundle the swarm to run multiple processes serially in a single job. This will cut down the number of jobs and increase the efficiency of each job.
  4. Run packed swarms for simple single-threaded swarms. By default, each subjob in a swarm is allocated two CPUs. In most cases, if the processes within a swarm only require a single CPU, then including -p 2 in the swarm command line will pack two processes into single jobs, rather than wasting the additional CPU, thereby utilizing both CPUs allocated in a default job.

How do I resubmit a whole swarm?

If most subjobs of a swarm failed, it's probably simplest to resubmit the entire swarm. You can simply cancel the remaining jobs in the swarm with scancel jobid. If you have several swarms running, and you're not sure which swarm file corresponds to this swarm, you can determine that with 'jobhist jobnumber'. e.g.
# jobhist 46435925

JobId              : 46435925
User               : $USER
Submitted          : 20170801 15:36:28
Started            : 20170801 15:36:48
Submission Path    : /data/$USER/rna-seq/oligos
Submission Command : sbatch --array=0-459 --output=/data/$USER/rna-
                      seq/oligos/Oligos2_%A_%a.o --error=/data/$USER/rna-
                      seq/oligos/Oligos2_%A_%a.e --cpus-per-task=2
		      --job-name=Oligos2
                      --mem=4096 --partition=norm --time=6-16:00:00
                      /spin1/swarm/$USER/46435925/swarm.batch
Swarm Path         : /data/$USER/rna-seq/oligos
Swarm Command      : /usr/local/bin/swarm -f oligos.swarm1  -g 4 -t 2 --time 160:00:00 
This tells you that the swarmfile corresponding to this swarm was /data/$USER/rna-seq/oligos/oligos.swarm1. You can resubmit it by cd'ing to the Submission Path, and submitting using the Swarm Command from the info above. In this case:
cd /data/$USER/rna-seq/oligos
/usr/local/bin/swarm -f oligos.swarm1  -g 4 -t 2 --time 160:00:00 
If the swarm subjobs were failing due to lack of memory, of course you will want to increase the memory allocation, by increasing the -g # parameter. Likewise, if the failures were due to walltime limits, you would increase the walltime with --time xxxx.

How do I resubmit some subjobs of a swarm?

Suppose you submitted a swarm of 1000 commands, and you see (either from your swarm.o/swarm.e files, or from 'jobhist jobnumber') that subjobs 3, 5, 18 failed.

(Note: snakemake makes it easy to rerun only the failed steps in a pipeline. You might consider using snakemake if you have complicated pipelines and frequently need to rerun subjobs. This FAQ entry deals with an occasional such failure.)

# jobhist 46435925

JobId              : 46435925
User               : $USER
Submitted          : 20170801 15:36:28
Started            : 20170801 15:36:48
Submission Path    : /data/$USER/rna-seq/oligos
Submission Command : sbatch --array=0-459 --output=/data/$USER/rna-
                      seq/oligos/Oligos2_%A_%a.o --error=/data/$USER/rna-
                      seq/oligos/Oligos2_%A_%a.e --cpus-per-task=2
		      --job-name=Oligos2
                      --mem=4096 --partition=norm --time=6-16:00:00
                      /spin1/swarm/$USER/46435925/swarm.batch
Swarm Path         : /data/$USER/rna-seq/oligos
Swarm Command      : /usr/local/bin/swarm -f oligos.swarm1  -g 4 -t 2 --time 160:00:00 

Jobid        Partition       State  Nodes  CPUs      Walltime       Runtime         MemReq  MemUsed  Nodelist
46435925_0        norm   COMPLETED      1     2    6-16:00:00    1-01:38:35     4.0GB/node    3.0GB  cn3415
46435925_1        norm   COMPLETED      1     2    6-16:00:00    3-14:18:34     4.0GB/node    3.0GB  cn3444
46435925_2        norm   COMPLETED      1     2    6-16:00:00    3-14:14:52     4.0GB/node    3.0GB  cn3461
46435925_3        norm      FAILED      1     2    6-16:00:00    1-01:38:35     4.0GB/node    4.0GB  cn3415
[...]
46435925_5        norm      FAILED      1     2    6-16:00:00    1-01:38:35     4.0GB/node    4.0GB  cn3415
46435925_18       norm      FAILED      1     2    6-16:00:00    1-01:38:35     4.0GB/node    4.0GB  cn3415
[...]

In this case, it looks like the failed jobs used the full 4 GB allocated memory, so they probably failed due to memory. (You can also check this on the dashboard). Since most of the jobs ran successfully with a 4 GB memory allocation, you probably don't want to resubmit the entire swarm with an increased memory allocation.

Swarm creates its own set of batch scripts, one for each job, in a directory called /spin1/swarm/$USER/jobid and submits those to the batch system. That directory contains a swarm.batch file, and a cmd.0 file corresponding to subjob 0, cmd.1 file corresponding to subjob 1..etc... You can resubmit only the failed jobs by pulling out the swarm-created commands corresponding to the failed jobs, and resubmit those as a new swarm. e.g. for the example above, pull out the commands corresponding to subjobs 3,5,18 into a new swarm file:

% cd /spin1/swarm/$USER/46435925/
% cat cmd.3  cmd.5 cmd.18 > /data/$USER/new_swarm
Check the number of lines in the new_swarm file. (should be 3 lines corresponding to the 3 subjobs)
% wc -l /data/$USER/new_swarm
3
Resubmit requesting 8 GB of memory per subjob, and leaving the threads and walltime the same.
% swarm -f /data/$USER/new_swarm -g 8 -t 2 --time 160:00:00 

Can I get a copy of the files in /lscratch/$SLURM_JOBID if a job fails?

The local disk directory /lscratch/$SLURM_JOBID is set up when a job that requests local disk is started, and the directory is removed when that job exits. [More info about using local disk]

If a job fails, you might want to examine the tmp files in /lscratch/$SLURM_JOBID. In that case, you would want to set up your batch script to check for failure of a command, and if it fails, copy the /lscratch/$SLURM_JOBID directory back to /scratch or your /data area before the job exits. There is no built-in mechanism in sbatch or slurm to set this up. Each user's definition of 'failure' may be different (e.g. a command may complete but the output file is size 0) and each user may have specific actions they want taken if a command in their script fails. The example below is one suggestion of how to set up a batch script that 'rescues' /lscratch/$SLURM_JOBID if a command fails.

This file jobscript.sh was submitted with 'sbatch jobscript.sh'.

#!/bin/bash

# allocate 50 GB of local disk on the allocated node
#SBATCH --gres=lscratch:50
# send email if the job fails
#SBATCH --mail-type=FAIL         

cd /lscratch/$SLURM_JOBID
module load samtools

# copy the data from the /data/$USER area to the local disk
cp /data/$USER/HLA-PRG-LA/NA12878.mini.cram .
# run samtools index
samtools index NA12878.mini.cram
# test if file exists and is non-zero size. If not, tar the entire /lscratch/$SLURM_JOBID area to /scratch/$USER
if [ ! -s NA12878.mini.cram.bai ] 
then
    echo "samtools index failed. "
    echo "Tarring and copying /lscratch/$SLURM_JOBID to /scratch/$USER/lscratch_${SLURM_JOBID}.tar.gz"
    tar -cvzf   /scratch/$USER/lscratch_${SLURM_JOBID}.tar.gz    /lscratch/$SLURM_JOBID 
    exit 1
fi

# if the file exists and is non-zero size, run the next command
module load HLA-PRG-LA/f0833ed
HLA-PRG-LA.pl --BAM NA12878.mini.cram --graph PRG_MHC_GRCh38_withIMGT --sampleID NA12878 --maxThreads $cpus --workingDir .

# copy the final output back to /data/$USER
cp -r NA12878.mini.hla /data/$USER/HLA-PRG-LA/
      

In the example batch script above, if the samtools index fails and the output file NA12878.mini.cram.bai does not exist, or is a size-zero file, the user will get an email saying the job failed. The slurm output file slurm-#####.out will report that the samtools index failed, and the location of the tar file that is created.

What are the best practices for HPC?

  1. Do not flood the cluster with a large number (> 100) of very short running jobs (< 15 minutes).
  2. When running a swarm of commands, each of which require very little time, bundle the swarm such that each subjob requires at least 15 minutes of time.
  3. I/O (reads, writes, downloads, uploads) can not be easily parallelized.
  4. Wherever possible and practical, large files that are to be read by many individual jobs should be copied to /lscratch prior to reading.
  5. If a swarm or jobarray contains a step where a subset of data from a large file (> 8GB) is parsed out and used for subsequent steps, the parsing should broken into its own job. Further, the large file should be copied to /lscratch prior to the read. The subset data can be written to files and used in a secondary swarm or jobarray.
  6. Filenames should not contain non-printable or blank characters.
  7. Use /lscratch for temporary files. This is typically done by first allocating /lscratch with your job with --gres=lscratch:N, and then setting TMPDIR=/lscratch/$SLURM_JOB_ID within your batch script or swarm.
  8. Do not overly decorate your startup files (~/.bashrc, ~/.bash_profile) with environment initializations and functions. Errors or incompatibilities can interfere with the login process and may prevent successful connections in unexpected ways.
  9. Do not store more than 1000 files in a single directory. Files should be stored in nested subdirectories instead. Too many files can significantly slow down basic read/write operations.
  10. Load only the minimum environment modules necessary to run your application, process, or job. Modules can sometimes scramble paths required, especially for Python-, Perl-, and R-dependent applications.

Annoying popup: Mac keeps trying to connect to server

Occasionally a Mac may repeatedly keep trying to connect to an old or current HPC server such as helixdrive. The symptom looks like this:
Here are some steps that can resolve the problem.
  • Check your login items that start automatically upon login. (System Preferences -> Users and Groups -> Login Items. Is there anything that could be related to the problem mount? If so, delete it, log out and back in.

  • Check your Finder window. Is there an icon relating to the server? If so, hold down the Command key and drag it out.

  • In your main window menu, go to 'Connect to Server' -> click on Recent Servers -> Clear Recent Server. Reboot.

  • Log out. When logging out, un-check the 'Reopen windows when logging back in' button.

Why won't my graphics applications run (optimally, or at all)?

Your laptop or desktop will almost always do a better job displaying graphics than a remote computational resource like Biowulf. Graphics hardware is designed to be connected directly to a monitor.

If your application requires intensive (hardware accelerated) graphics performance, you can use one of the dedicated visualizaton nodes in the visual partition following the instructions here

When you use a graphical application on a remote system like Biowulf (outside of the visual partition), you are still using your local hardware to display graphics. Intensive graphics computations such as 3D rendering will run into bottlenecks because instead of a GPU performing the calculations and presenting the information directly on a local monitor, a CPU will need to package the required information and send it over the network. This means that all graphics performance will be degraded and that 3D graphics applications in particular will run very slowly if they run at all.

On Linux, there are a collection of libraries and applications that can be used to run graphical applications on remote resources (assuming those resources have graphics hardware). You can take advantage of these libraries by using the visual partition which is composed of nodes containing GPU hardware set aside for graphics visualization. Please bear in mind that graphics performance will still be substantially lower on these nodes than the performance you can achieve on your local workstation connected directly to your monitor. This is because of network latency.

There are some common problems that users of grapical applications run into on Biowulf. Some of these can be improved using the following troubleshooting steps, while a few are just indicative of the nature of Biowulf as a remote computational resource.

  1. Conflicting environment variables or paths? Your environment may include libraries, programs or specific paths that are causing issues with the graphical application/library. To check if this is the case:
    mv ~/.bashrc ~/.bashrc.ORIG
    cp -p /etc/skel/.bashrc ~
    mv ~/.bash_profile ~/.bash_profile.ORIG
    cp -p /etc/skel/.bash_profile ~
    module purge
    
    Now start a new Biowulf session, load only the modules needed for your graphical application, and test again. If it works without problems, something in your .bashrc (e.g. a conda environment, see below), .bash_profile, or another module is causing the problem. Add any personal modifications back one by one to your .bashrc and .bash_profile until you identify the cause.

    If you still have errors or problems with your graphics, then the environment is not the cause. You can return your environment to its former state with:

    mv ~/.bashrc.ORIG ~/.bashrc
    mv ~/.bash_profile.ORIG ~/.bash_profile
    
    If you have a non-standard shell (csh, tcsh or zsh), you can follow the same general idea with the corresponding startup files (e.g. .cshrc). If you aren't sure what these files are, please contact staff@hpc.nih.gov
  2. Do you have any conda environments activated? conda includes libraries that can interfere with the graphics applications on the host system. Make sure to deactivate any conda environments, and double check your ~/.bashrc and ~/.bash_profile files to make sure that a conda environment is not being automatically activated on your behalf when you start a new shell. conda sometimes writes to your .bashrc file automatically.
  3. Are you using NoMachine or the visual partition? There are several applications available to display graphics on a remote resource using X11. The Biowulf staff strongly recommends nomachine as it tends to have fewer issues than other X11 applications (like XQuartz for Mac) and works across platforms. More info can be found here.

    More intensive graphics applications like medical imaging and molecular modeling can run tolerably well on the visual partition. More info can be found here.

  4. How much memory have you allocated to your job? Graphics applications may require more memory to run, and may fail with errors like libGL error: failed to load driver: swrast when they run out of memory. You can use the user dashboard to track the memory utilization of your jobs.
  5. Are you trying to use the GPU partition? (Don't.) Counterintuitively, nodes in the GPU partition do not work well for visualization since the GPUs are configured for computation and not visualization. Sometimes the drivers can actually get in the way of displaying graphics remotely causing graphics errors.
  6. Does your application (e.g. MATLAB) select software OpenGL rendering and then fail to display graphics properly? Applications running on a remote host can potentially utilize software OpenGL rendering to display the graphics on your desktop or laptop. Hardware OpenGL rendering is only possible on the Biowulf visual partition due to the availability of properly configured graphics hardware. Software OpenGL rendering will be sufficient for many users, but for those who are doing complex rendering on a Biowulf compute node, the graphics may be very slow or not work at all.

    For many users, using NX to display remote graphics will suffice. For those doing complex graphics, benchmarking suggests that the graphics performance obtained from running Matlab directly on your desktop or laptop will be much better than running Matlab remotely on Biowulf compute nodes. The visual partition provides an intermediate solution for users who cannot practically copy large data sets to a local resource for visualization.

'module load xxx' fails

  1. Are you logged on to Helix instead of Biowulf?
  2. People new to the systems often confuse which machine they are working on. Helix is designated for interactive data transfer, and scientific applications are not available on Helix. Instead, you should connect to biowulf.nih.gov, and if you are doing anything cpu- or memory-intensive, get an interactive session.
  3. Have you modified your .bashrc?
  4. If you have a syntax error in your ~/.bashrc, you might have removed the 'module' function. Revert to an older version of your ~/.bashrc or ask the HPC staff to take a look.

Can't find an application or command

Scientific applications on Biowulf are accessed via the modules system. You first need to load the appropriate module before the executable will be available in your $PATH.

My job got killed

Most likely your job got killed because it ran out of memory. This can be checked via the dashboard -> Job Info -> click on the Job id. You can also check the memory usage via jobhist jobid. You should resubmit your job with a larger memory allocation.

Why is my home directory full?

Your /home directory has a quota of 16 GB which cannot be increased and if it fills up a lot of things can go wrong. To find which directory is eating your /home space:

du --max-depth=1 -h | sort -h 
Here are some reasons your home directory might be full:

  1. You stored data in your home directory. The /home directory should not be used for biomedical data, but only for small programs, executables and configuration files. Please use your /data directory for your biomedical data.
  2. Your local software installations.
    1. Your conda environment. It is always good idea to set up your conda environment in your /data directory as shown in our python docs.
    2. Your R packages. Normally R packages will not take much space (you can run a quick check with du -hs ~/R ), but if you are managing multiple versions of R and have installed a lot of packages, it's better to use your /data directory for your R packages too. Adding the following environment to ~/.bashrc can switch the location:
      export R_LIBS_USER=/data/your_usrname/R/%v/library 
      Just remember to create the R directory before installing your packages. If you have previously installed packages, you might have to reinstall those.
  3. Your cache files. Cache files are used by applications to speed up the loading of recently view data. Some cache files need to be cleaned up manually.
    1. The pip cache directory (~/.cache/pip) can get surprisingly large for frequent pip users. Quick solution:
      rm -rf ~/.cache/pip
    2. Your singularity cache folder. The default cache directory is under ~/.singularity/cache, and can readily be moved to your data directory by setting
      export SINGULARITY_CACHEDIR=/data/$USER/.singularity
      manually or in your ~/.bashrc.
    3. Your VEP cache directory. By default, VEP will create a cache directory in ~/.vep. Make sure to include --cache --dir_cache $VEP_CACHEDIR with all VEP commands.

Best practices when asking for help

When contacting the HPC staff with questions, the more relative information you can provide, the better. It will help us to resolve your problem quickly, if we can replicate the problem ourselves or solve it with the information you provided with out asking many additional questions.

Possible shortcut: If there is a generic error message, doing a Google search on that error message will often produce the answer.

The minimum information we need

  • if an interactive session, are you working on Biowulf or Helix?
  • if Biowulf batch jobs, the jobids of failed and/or successful jobs.
  • the directory in which you’re working (pwd).
  • the modules you’ve loaded (module list).
  • the version of the modules, does the error persist if you switch version?
  • the command you used or if it was caused by a script, the full path to the script.
  • the error message, including the full path to the error files if they were generated.
  • It's best not to send us a screenshot of the commands unless you’re using a graphics program (reason: it’s harder for us to copy text from a screenshot).

Examples of excellent problem report:

From: xx.xx@nih.gov
To: staff@hpc.nih.gov
Subject: problem with XXX version 1.2.0

The module XXX/1.1.0 works, but 1.2.0 gives an error. The error file is in /data/user/mydir

[user@biowulf]$ sinteractive
[user@cn3133]$ module load XXX/1.1.0
[user@cn3133]$ XXX /A/B/C/YYY 2>XXX_1.1.0.noerror 
[user@cn3133]$ module load XXX/1.2.0
[user@cn3133]$ XXX /A/B/C/YYY 2>XXX_1.2.0.error 

From: xx.xx@nih.gov
To: staff@hpc.nih.gov
Subject: batch job problem

I ran two similar jobs, but one succeeded (job 12345) and one failed (job id 456789). I can't figure out why the failed job failed. The slurm output files are in directory /data/user/mydir. Could you help?

How do I create a swarm command file to run a command for each file in a directory?

A common task is to run a program on each image or each sequence file in a directory. You could write a script to loop over the files one by one, but this is a slow sequential process. Instead, you should use swarm for such large-scale independent jobs, and utilize the power of the Biowulf cluster to run all the commands simultaneously.

Swarm requires a 'swarm command file' with a command line for each run. In this case, each line would run the desired program against a single file. Here is a sample swarm command file:

blat /fdb/genome/hg19/chr10.fa indir/gi_615762_gb_T33664  -maxGap=3   outdir/gi_615762_gb_T33664.blat
blat /fdb/genome/hg19/chr10.fa idir/gi_615763_gb_T33665   -maxGap=3   outdir/gi_615763_gb_T33665.blat
blat /fdb/genome/hg19/chr10.fa indir/gi_615764_gb_T33666  -maxGap=3   outdir/gi_615764_gb_T33666.blat
blat /fdb/genome/hg19/chr10.fa indir/gi_615765_gb_T33667  -maxGap=3   outdir/gi_615765_gb_T33667.blat
If you have 1000 input files and therefore 1000 lines, you certainly don't want to create this swarm command file by hand -- the risk of typos would be very large. Instead, write a script to create the swarm command file. This script can be written in bash, perl, python or any language of your choice.

Here is a sample bash script that creates a swarm command file to run Novoalign (a genomics alignment program) for each sequence file in a directory. At the end of the script it submits the swarm.

#!/bin/bash
# this script is called make_swarmfile.sh
# Sample bash script to create a swarm command file and submit it to the Biowulf batch system
# This file creates a swarmfile to run a genomics alignment program on every sequence file in a directory.
# After the swarmfile is created, it submits the swarm
#
# Run this script with: sh make_swarmfile.sh
#
# Swarm docs at https://hpc.nih.gov/apps/swarm.html
# Remember! Before submitting a swarm, you should have checked
#   (a) the memory and CPUs required for a single command
#   (b) the total disk space (temporary files, output files) required for a single command
#   (c) and therefore the total disk space required for the whole swarm. Do you have enough disk space?
#       Use 'checkquota' to check!

# you will need to modify the two paths below
inputDir=/path/to/my/input/dir
outputDir=/path/to/my/output/dir

# loop over each file and align against hg19 using Novoalign
for file in ${inputDir}/*
   do
    filename=`basename $file`
    echo "working on file $filename"
    echo "novoalign -c \$SLURM_CPUS_PER_TASK -f ${inputDir}/$filename -d /fdb/novoalign/chr_all_hg19.nbx -o SAM > ${outputDir}/$filename.sam" >> novo_swarm.sh
done

At this point, you should check the generated swarm command file (novo_swarm.sh in this case) for sanity. If all looks well, submit with
swarm -f novo_swarm.sh  -g # -t #
where '-g # ' is the number of GigaBytes of memory required for each swarm command, and '-t #' is the number of threads that you want each instance of the program to run. In this example, Novoalign can multithread successfully to 8 threads, and previous tests have shown that it requires 4 GB of memory, so one would submit with
swarm -f novo_swarm.sh -g 4 -t 8