nanopack on Biowulf

Quick Links

nanopack is a collection of long read processing and analysis tools created by Wouter De Coster and Rosa Rademakers. It includes the following tools:

NanoPlot: creates relevant plots derived from reads (fastq), alignments (bam) and albacore summary files.
NanoComp: compares multiple runs on read length and quality based on reads (fastq), alignments (bam) or albacore summary files.
NanoQC: generates plots to investigate nucleotide composition and quality distribution at the end of reads.
Cramino: generates summary of BAM or CRAM files.
chopper: a tool for filtering, trimming, and removing contaminants
phasius: creates a graphical representation of the read phasing performance (from BAM/CRAM files)

References:

Wouter De Coster, Rosa Rademakers. NanoPack2: population-scale evaluation of long-read sequencing data. Bioinformatics, Volume 39, Issue 5, May 2023, btad311.

Documentation

nanopack GitHub

Important Notes

Module Name: nanopack (see the modules page for more information)
Multithreaded
This application produces HTML reports and image files. You can use hpcdrive to view these reports on your local workstation.
Test data from nanotest respository can be accessed using NANOPACK_TEST_DATA environment variable

Interactive job

Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program.
Sample session (user input in bold):

[user@biowulf]$ sinteractive
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load nanopack

[user@cn3144 ~]$ cd /data/$USER

[user@cn3144 user]$ cp -r $NANOPACK_TEST_DATA .

[user@cn3144 user]$ cd nanotest

[user@cn3144 nanotest]$ NanoPlot --summary sequencing_summary.txt --loglength -o summary-plots-log-transformed

[user@cn3144 nanotest]$ cramino alignment.bam
File name       alignment.bam
Number of alignments    1115
% from total reads      90.14
Yield [Gb]      0.01
Mean coverage   2.17
Yield [Gb] (>25kb)      0.00
N50     19271
N75     10270
Median length   6895.00
Mean length     11070
Median identity 90.84
Mean identity   89.96
Path    alignment.bam

[user@cn3144 nanotest]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Batch job

Most jobs should be run as batch jobs.

Create a batch input file (e.g. nanopack.sh). For example:

#!/bin/bash
set -e
module load nanopack
cd /data/$USER/reads
NanoComp -t $SLURM_CPUS_PER_TASK \
    --fastq reads1.fastq.gz reads2.fastq.gz reads3.fastq.gz reads4.fastq.gz \
    --names run1 run2 run3 run4

Submit this job using the Slurm sbatch command.

sbatch [--cpus-per-task=#] [--mem=#] nanopack.sh

Swarm of Jobs

A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile (e.g. nanopack.swarm). For example:

zcat reads1.fastq.gz | chopper --tailcrop 10 | gzip > reads1_trimmed.fastq.gz
zcat reads2.fastq.gz | chopper --tailcrop 10 | gzip > reads2_trimmed.fastq.gz
zcat reads3.fastq.gz | chopper --tailcrop 10 | gzip > reads3_trimmed.fastq.gz
zcat reads4.fastq.gz | chopper --tailcrop 10 | gzip > reads4_trimmed.fastq.gz

Submit this job using the swarm command.

swarm -f nanopack.swarm [-g #] [-t #] --module nanopack

where

`-g #`	Number of Gigabytes of memory required for each process (1 line in the swarm command file)
`-t #`	Number of threads/CPUs required for each process (1 line in the swarm command file).
`--module nanopack`	Loads the nanopack module for each subjob in the swarm