dorado on Biowulf

Quick Links

Documentation

Notes

Interactive job

Batch job

Dorado is a basecaller for Oxford Nanopore reads.

Documentation

dorado on GitHub

Important Notes

Module Name: dorado (see the modules page for more information)
Requires a V100/V100x or newer GPU for basecalling. Alignment is not accelerated.
Use pod5 input format for optimal performance
Models are found in ${DORADO_MODELS}
Example files in ${DORADO_TEST_DATA}

Interactive job

Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program.
Sample session (user input in bold):

[user@biowulf]$ sinteractive --gres=gpu:v100x:1,lscratch:200 --mem=16g --cpus-per-task=6
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load dorado/0.8.1
[user@cn3144 ~]$ cd /lscratch/$SLURM_JOB_ID
[user@cn3144 ~]$ cp -rL "${DORADO_TEST_DATA:-none}" input
[user@cn3144 ~]$ ls -lh input
-rw-r--r--. 1 user group 20G Jun  2 17:21 reads.pod5
[user@cn3144 ~]$ # emits unaligned bam by default
[user@cn3144 ~]$ dorado basecaller --device cuda:all ${DORADO_MODELS}/dna_r9.4.1_e8_sup@v3.3 input > output.bam
[user@cn3144 ~]$ ls -lh output.bam
-rw-r--r-- 1 user group 2.1G Jun  2 20:02 output.bam
[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Dorado scaling

Dorado scales well to 4 v100X GPUs. For a100 GPUs 3 or fewer GPUs are ideal. Please keep in mind that jobs allocating multiple GPUs may be queued for a longer time waiting for resources.

	Runtime [min]
V100x GPUs	0.7.3	0.8.1	Efficiency
1	90	90	100%
2	45	44	100%
3	30	30	100%
4	23	23	100%
a100 GPUs	0.7.3	0.8.1	Efficiency
1	24	24	100%
2	12	12	100%
3	9	9	88%
4	7	7	85%

Batch job

Most jobs should be run as batch jobs.

Create a batch input file (e.g. dorado.sh). For example:

#!/bin/bash
set -e
module load dorado/0.8.1
cd /lscratch/$SLURM_JOB_ID
cp -rL "${DORADO_TEST_DATA:-none}" input
dorado basecaller --device cuda:all ${DORADO_MODELS}/dna_r9.4.1_e8_sup@v3.3 input > output.bam

Submit this job using the Slurm sbatch command.

sbatch --cpus-per-task=6 --mem=16g --gres=lscratch:50,gpu:v100x:1 --partition=gpu dorado.sh