From the documentation
POD5 is a file format for storing nanopore dna data in an easily accessible way. The format is able to be written in a streaming manner which allows a sequencing instrument to directly write the format. Data in POD5 is stored using Apache Arrow, allowing users to consume data in many languages using standard tools.
The --threads option is a bit of a misnomer for this tool as each thread is actually an independent multithreading
process. The tool does not scale efficiently to more than --threads=4 (75% parallel efficiency) and fails at 12 due
to biowulf's ulimit settings.
Allocate an interactive session and run the program.
Sample session (user input in bold):
[user@biowulf]$ sinteractive --cpus-per-task=4 --mem=12g --gres=lscratch:150
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job
[user@cn3144 ~]$ module load pod5
[user@cn3144 ~]$ cd /lscratch/$SLURM_JOB_ID
[user@cn3144 ~]$ cp -rL ${POD5_TEST_DATA:-none} input
[user@cn3144 ~]$ pod5 convert fast5 --threads $SLURM_CPUS_PER_TASK --output output.pod5 --recursive input
[user@cn3144 ~]$ du -sh input
46G input
[user@cn3144 ~]$ ls -lh output.pod5
-rw-r--r-- 1 user group 38G Jun 2 14:29 output.pod5
[user@cn3144 ~]$ pod5 view --output summary.tsv output.pod5
[user@cn3144 ~]$ head summary.tsv
[user@cn3144 ~]$ pod5 inspect read output.pod5 0001297c-4c07-438e-a29b-6da3b0ad1260
read_id: 0001297c-4c07-438e-a29b-6da3b0ad1260
read_number: 11392
start_sample: 180540114
median_before: 220.86135864257812
channel data:
channel: 284
well: 1
pore_type: not_set
end reason:
name: unknown
forced: False
...
[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$
Note that pod5 convert fast5 requires multi-fast5 input files
Create a batch input file (e.g. pod5.sh). For example:
#!/bin/bash
set -e
module load pod5/0.3.6
cd /lscratch/$SLURM_JOB_ID
mkdir output
cp -rL ${POD5_TEST_DATA:-none} input
pod5 convert fast5 --threads $SLURM_CPUS_PER_TASK --output output input/*
cp
Submit this job using the Slurm sbatch command.
sbatch --cpus-per-task=4 --mem=10g --gres=lscratch:150 pod5.sh