Guppy is a basecaller from Oxford Nanopore Technologies. Current versions require GPUs to run.
References:
- For more documentation see the ONT community site
- Module Name: guppy (see the modules page for more information)
- Current versions of guppy_basecaller require GPUs. On biowulf, guppy can only run on P100 and V100 GPUs. guppy_aligner does not require GPU.
- Example files in
$GUPPY_TEST_DATA
Allocate an interactive session with
a suitable GPU for this example. Note that the example data is as subset of
data from a synthetic microbial community (see
Nicholls
et al) sequenced with the SQK-LSK109 1D sequencing kit in a FLO-MIN106 flowcell.
Sample session
(user input in bold):
[user@biowulf]$ sinteractive --gres=gpu:p100:1,lscratch:200 --mem=16g --cpus-per-task=6 salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn2369 are ready for job [user@cn2369 ~]$ cd /lscratch/$SLURM_JOB_ID [user@cn2369 ~]$ module load guppy [user@cn2369 ~]$ cp -rL ${GUPPY_TEST_DATA:-none}/* . [user@cn2369 ~]$ ls -lh drwxr-xr-x 3 user group 4.0K Sep 13 09:13 Zymo-GridION-EVEN-BB-SN [user@cn2369 ~]$ du -sh Zymo-GridION-EVEN-BB-SN 11G Zymo-GridION-EVEN-BB-SN [user@cn2369 ~]$ find Zymo-GridION-EVEN-BB-SN -name '*.fast5' -printf '.' | wc -c 160000 [user@cn2369 ~]$ guppy_basecaller --print_workflows | grep SQK-LSK109 | grep FLO-MIN106 FLO-MIN106 SQK-LSK109 dna_r9.4.1_450bps_hac FLO-MIN106 SQK-LSK109-XL dna_r9.4.1_450bps_hac [user@cn2369 ~]$ guppy_basecaller --input_path Zymo-GridION-EVEN-BB-SN --recursive \ --flowcell FLO-MIN106 --kit SQK-LSK109 \ -x cuda:all \ --records_per_fastq 0 \ --compress_fastq \ --save_path fastq ONT Guppy basecalling software version 3.2.2+9fe0a78 config file: /opt/ont/guppy/data/dna_r9.4.1_450bps_hac.cfg model file: /opt/ont/guppy/data/template_r9.4.1_450bps_hac.jsn input path: Zymo-GridION-EVEN-BB-SN save path: fastq chunk size: 1000 chunks per runner: 512 records per file: 0 fastq compression: ON num basecallers: 4 gpu device: cuda:all kernel path: runners per device: 4 Found 160000 fast5 files to process. Init time: 4748 ms 0% 10 20 30 40 50 60 70 80 90 100% |----|----|----|----|----|----|----|----|----|----| *************************************************** Caller time: 1593158 ms, Samples called: 6901545514, samples/s: 4.33199e+06 Finishing up any open output files. Basecalling completed successfully. [user@cn2369 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
guppy appears to number the first available GPU as GPU 0 even if it is in fact not the
first GPU (i.e. CUDA_VISIBLE_DEVICES=0). The way to use all allocated GPUs is to
use -x cuda:all
.
For this example data set, guppy_basecaller (5.0.7) run ~2.3x faster on V100(x) GPUs than on the P100 GPUs with the same settings. guppy scales well to 2 GPUs but should not be run with more than two as efficiency falls below the 80% threshold. Results were similar for guppy 6.0.1.

Create a batch input file (e.g. guppy.sh). For example:
#!/bin/bash set -e module load guppy/3.2.2 || exit 1 guppy_basecaller --input_path $GUPPY_TEST_DATA/Zymo-GridION-EVEN-BB-SN --recursive \ --flowcell FLO-MIN106 --kit SQK-LSK109 \ -x cuda:all \ --records_per_fastq 0 \ --compress_fastq \ --save_path fastq
Submit this job using the Slurm sbatch command.
sbatch --partition=gpu --cpus-per-task=14 --mem=16g --gres=lscratch:200,gpu:p100:1 guppy.sh
Create a swarmfile (e.g. guppy.swarm). For example:
guppy_basecaller --input_path indir1 --flowcell FLO-MIN106 --kit SQK-LSK109 --save_path outdir1 ... guppy_basecaller --input_path indir2 --flowcell FLO-MIN106 --kit SQK-LSK109 --save_path outdir2 ... ...etc...
Submit this job using the swarm command.
swarm -f guppy.swarm --partition=gpu -g 16 -t 14 --gres=gpu:p100:1 --module guppy/3.2.2where
-g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
-t # | Number of threads/CPUs required for each process (1 line in the swarm command file). |
--module guppy | Loads the guppy module for each subjob in the swarm |