High-Performance Computing at the NIH
humann2 on Biowulf

From the humann2 home page:

HUMAnN is a pipeline for efficiently and accurately profiling the presence/absence and abundance of microbial pathways in a community from metagenomic or metatranscriptomic sequencing data (typically millions of short DNA/RNA reads). This process, referred to as functional profiling, aims to describe the metabolic potential of a microbial community and its members. More generally, functional profiling answers the question "What are the microbes in my community-of-interest doing (or capable of doing)?"
Interactive job
Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive -c6 --mem=10g
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load humann2 --cpus-per-task=6
[user@cn3144 ~]$ cp -r /usr/local/apps/humann2/TEST_DATA demo
[user@cn3144 ~]$ humann2 --threads 6 --input demo/demo.fastq --output demo.out
Creating output directory: /spin1/users/wresch/test_data/humann2/demo_out
Output files will be written to: /spin1/users/wresch/test_data/humann2/demo_out

Running metaphlan2.py ........

Found g__Bacteroides.s__Bacteroides_dorei : 59.40% of mapped reads
Found g__Bacteroides.s__Bacteroides_vulgatus : 40.60% of mapped reads

Total species selected from prescreen: 2

Selected species explain 100.00% of predicted community composition

[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Batch job
Create a batch input file (e.g. humann2.sh), which uses the input file 'humann2.in'. For example:

#! /bin/bash

module load humann2/0.11.1 || exit 1
cd /lscratch/$SLURM_JOB_ID || exit 1
cp $HUMANN2_TEST_DATA/demo.fastq .
mkdir out

humann2 --threads $SLURM_CPUS_PER_TASK \
  --input demo.fastq \
  --output out

Submit this job using the Slurm sbatch command.

sbatch [--cpus-per-task=#] [--mem=#] humann2.sh
Swarm of Jobs
Create a swarmfile (e.g. humann2.swarm). For example:

humann2 --input bample1.bam --output sample1.out
humann2 --input bample2.bam --output sample2.out
humann2 --input bample3.bam --output sample3.out

Submit this job using the swarm command.

swarm -f humann2.swarm -g 10 -t 4 --module humann2/0.11.1
-g # Number of Gigabytes of memory required for each process (1 line in the swarm command file)
-t # Number of threads/CPUs required for each process (1 line in the swarm command file).
--module humann2 Loads the humann2 module for each subjob in the swarm