Dorado is a basecaller for Oxford Nanopore reads.
Allocate an interactive session and run the program.
Sample session (user input in bold):
[user@biowulf]$ sinteractive --gres=gpu:v100x:1,lscratch:200 --mem=16g --cpus-per-task=6 salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144 ~]$ module load dorado/0.8.1 [user@cn3144 ~]$ cd /lscratch/$SLURM_JOB_ID [user@cn3144 ~]$ cp -rL "${DORADO_TEST_DATA:-none}" input [user@cn3144 ~]$ ls -lh input -rw-r--r--. 1 user group 20G Jun 2 17:21 reads.pod5 [user@cn3144 ~]$ # emits unaligned bam by default [user@cn3144 ~]$ dorado basecaller --device cuda:all ${DORADO_MODELS}/dna_r9.4.1_e8_sup@v3.3 input > output.bam [user@cn3144 ~]$ ls -lh output.bam -rw-r--r-- 1 user group 2.1G Jun 2 20:02 output.bam [user@cn3144 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
Dorado scales well to 4 v100X GPUs. For a100 GPUs 3 or fewer GPUs are ideal. Please keep in mind that jobs allocating multiple GPUs may be queued for a longer time waiting for resources.
Runtime [min] | |||
---|---|---|---|
V100x GPUs | 0.7.3 | 0.8.1 | Efficiency |
1 | 90 | 90 | 100% |
2 | 45 | 44 | 100% |
3 | 30 | 30 | 100% |
4 | 23 | 23 | 100% |
a100 GPUs | 0.7.3 | 0.8.1 | Efficiency |
1 | 24 | 24 | 100% |
2 | 12 | 12 | 100% |
3 | 9 | 9 | 88% |
4 | 7 | 7 | 85% |
Create a batch input file (e.g. dorado.sh). For example:
#!/bin/bash set -e module load dorado/0.8.1 cd /lscratch/$SLURM_JOB_ID cp -rL "${DORADO_TEST_DATA:-none}" input dorado basecaller --device cuda:all ${DORADO_MODELS}/dna_r9.4.1_e8_sup@v3.3 input > output.bam
Submit this job using the Slurm sbatch command.
sbatch --cpus-per-task=6 --mem=16g --gres=lscratch:50,gpu:v100x:1 --partition=gpu dorado.sh