Biowulf High Performance Computing at the NIH
parabricks on Biowulf

Parabricks (www.parabricks.com) provides high-performance GPU-based bioinformatics software solutions for analyses of next-generation sequencing data, resulting in both high throughput and high-speed turnaround. Parabricks enables GPU-accelerated GATK. Parabricks can analyze whole human genomes in about 45 minutes, compared to about 30 hours for 30x WGS data. The best part is the output results exactly match the commonly used software. So, it's fairly simple to verify the accuracy of the ouput.

During the COVID-19 pandemic, Nvidia has generously provided a 90-day Parabricks license for Biowulf users. This license is temporary, and will expire on July 5, 2020. More information.

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. In the example below, a GPU P100 node, all 4 GPUs on that node, and 100 GB of local disk have been requested. Temporary files are directed to local disk with the --tmp-dir flag.

Sample session (user input in bold):

[user@biowulf]$ sinteractive --mem=120g  --cpus-per-task=56 --gres=gpu:p100:4,lscratch:100
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load parabricks

[user@cn3144 ~]$ cd /data/$USER/parabricks_test 

[user@cn3144 ~]$ tar xvzf /usr/local/apps/parabricks/parabricks_sample.tar.gz

[user@cn3144 ~]$ pbrun fq2bam \
    --ref parabricks_sample/Ref/Homo_sapiens_assembly38.fasta \
    --tmp-dir=/lscratch/${SLURM_JOBID}/ \
    --in-fq parabricks_sample/Data/sample_1.fq.gz parabricks_sample/Data/sample_2.fq.gz \
    --out-bam output.bam
 
[+] Loading parabricks 2.5.0 on cn2396
[+] Loading singularity  on cn2396
------------------------------------------------------------------------------
||                 Parabricks accelerated Genomics Pipeline                 ||
||                              Version v2.5.0                              ||
||              GPU-BWA mem, Sorting, Marking Duplicates, BQSR              ||
||                       Contact: info@parabricks.com                       ||
------------------------------------------------------------------------------
[M::bwa_idx_load_from_disk] read 0 ALT contigs

GPU-BWA mem
ProgressMeter	Reads		Base Pairs Aligned
[17:37:24]	5043564		590000000
[17:37:31]	10087128		1180000000
[...]
Total GPU-BWA Mem + Sorting + MarkingDups + BQSR Generation + BAM writing
Processing time: 187.084384 seconds

[main] CMD: PARABRICKS mem -Z ./pbOpts.txt /spin1/users/user/parabricks/parabricks_sample/Ref/Homo_sapiens_assembly38.fasta /spin
1/users/user/parabricks/parabricks_sample/Data/sample_1.fq.gz /spin1/users/user/parabricks/parabricks_sample/Data/sample_2.fq.g
z @RG\tID:sample_rg1\tLB:lib1\tPL:bar\tSM:sample\tPU:sample_rg1
[main] Real time: 191.046 sec; CPU: 2790.274 sec

[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Batch job
Most jobs should be run as batch jobs.
Create a batch input file (e.g. parabricks.sh). For example, the job below uses the sample data.

#!/bin/bash
set -e

cd /data/$USER/parabricks_test
tar xvzf /usr/local/apps/parabricks/parabricks_sample.tar.gz

module load parabricks

pbrun fq2bam \
    --ref parabricks_sample/Ref/Homo_sapiens_assembly38.fasta \
    --tmp-dir=/lscratch/${SLURM_JOBID}/ \
    --in-fq parabricks_sample/Data/sample_1.fq.gz parabricks_sample/Data/sample_2.fq.gz \
    --out-bam output.bam

Submit this job using the Slurm sbatch command.

sbatch  --partition=gpu --gres=gpu:p100:4,lscratch:100 --mem=120g --cpus-per-task=56 --time=##:##:## parabricks.sh
where:

--partition=gpu Submit to the GPU partition. Required.
--gres=gpu:p100:4 Allocate all 4 GPUs on the node
,lscratch:100 Another gres request. Allocate 100 GB of local disk for temporary files
--mem=120g Allocate 120 GB of memory on the node
--cpus-per-task=56 Allocate all 56 CPUs on the node
--time=##:##:## Walltime requested. Required if you need more than the default 2 hr walltime.
This test job ran in 180 seconds, and the memory and CPU usage are as follows. Peak usage was 57 GB and 22 CPUs, but since all 4 GPUs on the node are being allocated, there is no harm in also allocating all the memory and CPUs on the node. (The initial period before the GPUs kick in is due to the untarring of the sample data)