medaka on Biowulf

medaka is a tool to create a consensus sequence from nanopore sequencing data. This task is performed using neural networks applied from a pileup of individual sequencing reads against a draft assembly.

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program.

Note - 2.0 and newer may require a v100 or newer GPU due to memory constraints

For this example we will use a P100 GPU.

[user@biowulf]$ sinteractive --mem=32g -c12 --gres=lscratch:200,gpu:p100:1
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job
[user@cn3144]$ cd /lscratch/$SLURM_JOB_ID

As a first step, a draft assembly is created with pomoxis which includes a convenience wrapper arount miniasm.

[user@cn3144]$ module load pomoxis
[user@cn3144]$ cp -rL ${POMOXIS_TEST_DATA:-none} data
[user@cn3144]$ mini_assemble -m 1 -i data/R9_Ecoli_K12_MG1655_lambda_MinKNOW_0.51.1.62.all.fasta \
                   -o draft_asm -p asm -t $SLURM_CPUS_PER_TASK
Copying FASTX input to workspace: data/R9_Ecoli_K12_MG1655_lambda_MinKNOW_0.51.1.62.all.fasta > draft_asm/asm.fa.gz
Skipped adapter trimming.
Skipped pre-assembly correction.
Overlapping reads...
[M::mm_idx_gen::26.495*2.10] collected minimizers
...etc...
Waiting for cleanup.
rm: cannot remove ‘shuffled*’: No such file or directory
rm: cannot remove ‘*paf*’: No such file or directory
Final assembly written to draft_asm/asm_final.fa. Have a nice day.

[user@cn3144]$ ls -lh draft_asm
total 4.5M
-rw-r--r-- 1 user group 4.5M Sep 16 13:43 asm_final.fa
[user@cn3144]$ 

Polish the draft consensus with medaka. Note that a batch size (-b) of 150 works with our 16GB GPUs.

Note that there can be conflicts between pomoxis and medaka - they both include their own mini_align script - so pomoxis should be unloaded before running medaka.

[user@cn3144]$ module load medaka/2.0.1
[user@cn3144]$ module unload pomoxis
[user@cn3144]$ medaka_consensus -i data/R9_Ecoli_K12_MG1655_lambda_MinKNOW_0.51.1.62.all.fasta \
                      -d draft_asm/asm_final.fa -o consensus -t $SLURM_CPUS_PER_TASK -b 150
Using available GPU
Attempting to automatically select model version.
WARNING: Failed to detect a model version, will use default: 'r1041_e82_400bps_sup_v5.0.0'
Checking program versions
This is medaka 2.0.1
Program    Version    Required   Pass
bcftools   1.14       1.11       True
bgzip      1.14       1.11       True
minimap2   2.17       2.11       True
samtools   1.14       1.11       True
tabix      1.14       1.11       True
Aligning basecalls to draft
...etc...
[14:19:10 - DataIndex] Loaded sample-index from 1/1 (100.00%) of feature files.
[14:19:10 - Stitch] Processing utg000001l.
[14:19:17 - Stitch] Processing utg000002l.
[14:19:17 - Stitch] Processing utg000003l.
[14:19:17 - Stitch] Processing utg000004c.
Polished assembly written to consensus/consensus.fasta, have a nice day.


[user@cn3144]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf]$

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. medaka.sh), which uses the input file 'medaka.in'. For example:

#!/bin/bash
module load medaka/1.9.1 || exit 1
medaka_consensus -i basecalls.fq -d draft_assembly.fa -o consensus -t $SLURM_CPUS_PER_TASK -b 150

Submit this job using the Slurm sbatch command.

sbatch --cpus-per-task=12 --mem=24g medaka.sh