nanopack on Biowulf

nanopack is a collection of long read processing and analysis tools created by Wouter De Coster and Rosa Rademakers. It includes the following tools:

References:

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program.
Sample session (user input in bold):

[user@biowulf]$ sinteractive
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load nanopack

[user@cn3144 ~]$ cd /data/$USER

[user@cn3144 user]$ cp -r $NANOPACK_TEST_DATA .

[user@cn3144 user]$ cd nanotest

[user@cn3144 nanotest]$ NanoPlot --summary sequencing_summary.txt --loglength -o summary-plots-log-transformed

[user@cn3144 nanotest]$ cramino alignment.bam
File name       alignment.bam
Number of alignments    1115
% from total reads      90.14
Yield [Gb]      0.01
Mean coverage   2.17
Yield [Gb] (>25kb)      0.00
N50     19271
N75     10270
Median length   6895.00
Mean length     11070
Median identity 90.84
Mean identity   89.96
Path    alignment.bam

[user@cn3144 nanotest]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. nanopack.sh). For example:

#!/bin/bash
set -e
module load nanopack
cd /data/$USER/reads
NanoComp -t $SLURM_CPUS_PER_TASK \
    --fastq reads1.fastq.gz reads2.fastq.gz reads3.fastq.gz reads4.fastq.gz \
    --names run1 run2 run3 run4

Submit this job using the Slurm sbatch command.

sbatch [--cpus-per-task=#] [--mem=#] nanopack.sh
Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile (e.g. nanopack.swarm). For example:

zcat reads1.fastq.gz | chopper --tailcrop 10 | gzip > reads1_trimmed.fastq.gz
zcat reads2.fastq.gz | chopper --tailcrop 10 | gzip > reads2_trimmed.fastq.gz
zcat reads3.fastq.gz | chopper --tailcrop 10 | gzip > reads3_trimmed.fastq.gz
zcat reads4.fastq.gz | chopper --tailcrop 10 | gzip > reads4_trimmed.fastq.gz

Submit this job using the swarm command.

swarm -f nanopack.swarm [-g #] [-t #] --module nanopack
where
-g # Number of Gigabytes of memory required for each process (1 line in the swarm command file)
-t # Number of threads/CPUs required for each process (1 line in the swarm command file).
--module nanopack Loads the nanopack module for each subjob in the swarm