Biowulf High Performance Computing at the NIH
guppy on Biowulf

Guppy is a basecaller from Oxford Nanopore Technologies. Current versions require GPUs to run.

References:

  • R. R. Wick, L. M. Judd, and K. E. Holt. Performance of neural network basecalling tools for Oxford Nanopore sequencing. Genome Biology 2019, 20:129. PubMed |  PMC |  Journal
Documentation
Important Notes
Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session with a suitable GPU for this example. Note that the example data is as subset of data from a synthetic microbial community (see Nicholls et al) sequenced with the SQK-LSK109 1D sequencing kit in a FLO-MIN106 flowcell.
Sample session (user input in bold):

[user@biowulf]$ sinteractive --gres=gpu:p100:1,lscratch:200 --mem=16g --cpus-per-task=6
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn2369 are ready for job

[user@cn2369 ~]$ cd /lscratch/$SLURM_JOB_ID
[user@cn2369 ~]$ module load guppy
[user@cn2369 ~]$ cp -rL ${GUPPY_TEST_DATA:-none}/* .
[user@cn2369 ~]$ ls -lh
drwxr-xr-x 3 user group 4.0K Sep 13 09:13 Zymo-GridION-EVEN-BB-SN
[user@cn2369 ~]$ du -sh Zymo-GridION-EVEN-BB-SN
11G     Zymo-GridION-EVEN-BB-SN
[user@cn2369 ~]$ find Zymo-GridION-EVEN-BB-SN -name '*.fast5' -printf '.' | wc -c
160000
[user@cn2369 ~]$ guppy_basecaller --print_workflows | grep SQK-LSK109 | grep FLO-MIN106
FLO-MIN106 SQK-LSK109           dna_r9.4.1_450bps_hac
FLO-MIN106 SQK-LSK109-XL        dna_r9.4.1_450bps_hac

[user@cn2369 ~]$ guppy_basecaller --input_path Zymo-GridION-EVEN-BB-SN --recursive \
                       --flowcell FLO-MIN106 --kit SQK-LSK109 \
                       -x cuda:all \
                       --num_barcode_threads=4 \
                       --records_per_fastq 0 \
                       --compress_fastq \
                       --save_path fastq
ONT Guppy basecalling software version 3.2.2+9fe0a78
config file:        /opt/ont/guppy/data/dna_r9.4.1_450bps_hac.cfg
model file:         /opt/ont/guppy/data/template_r9.4.1_450bps_hac.jsn
input path:         Zymo-GridION-EVEN-BB-SN
save path:          fastq
chunk size:         1000
chunks per runner:  512
records per file:   0
fastq compression:  ON
num basecallers:    4
gpu device:         cuda:all
kernel path:
runners per device: 4

Found 160000 fast5 files to process.
Init time: 4748 ms

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
Caller time: 1593158 ms, Samples called: 6901545514, samples/s: 4.33199e+06
Finishing up any open output files.
Basecalling completed successfully.

[user@cn2369 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

guppy appears to number the first available GPU as GPU 0 even if it is in fact not the first GPU (i.e. CUDA_VISIBLE_DEVICES=0). The way to use all allocated GPUs is to use -x cuda:all.

For this example data set, guppy_basecaller run ~2.3x faster on our 1st gen V100 (16MB) cards than on the P100 GPUs with the same settings. guppy scales well to 2 GPUs but should not be run with more than two as efficiency falls below the 80% threshold.

guppy runs 2.3x faster on V100/16MB than on P100 GPUs

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. guppy.sh). For example:

#!/bin/bash
set -e
module load guppy/3.2.2 || exit 1
guppy_basecaller --input_path $GUPPY_TEST_DATA/Zymo-GridION-EVEN-BB-SN --recursive \
                       --flowcell FLO-MIN106 --kit SQK-LSK109 \
                       -x cuda:all \
                       --num_barcode_threads=4 \
                       --records_per_fastq 0 \
                       --compress_fastq \
                       --save_path fastq

Submit this job using the Slurm sbatch command.

sbatch --cpus-per-task=14 --mem=16g --gres=lscratch:200,gpu:p100:1 guppy.sh
Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile (e.g. guppy.swarm). For example:

guppy_basecaller --input_path indir1 --flowcell FLO-MIN106 --kit SQK-LSK109 --save_path outdir1 ...
guppy_basecaller --input_path indir2 --flowcell FLO-MIN106 --kit SQK-LSK109 --save_path outdir2 ...
...etc...

Submit this job using the swarm command.

swarm -f guppy.swarm -g 16 -t 14 --gres=gpu:p100:1 --module guppy/3.2.2
where
-g # Number of Gigabytes of memory required for each process (1 line in the swarm command file)
-t # Number of threads/CPUs required for each process (1 line in the swarm command file).
--module guppy Loads the guppy module for each subjob in the swarm