High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
Subread on Biowulf & Helix

Subread package: high-performance read alignment, quantification and mutation discovery

The Subread package comprises a suite of software programs for processing next-gen sequencing read data including:

Subread can be run multi-threaded using -T flag on biowulf. See example below.

Running on Helix

$ module load subread
$ cd /data/$USER/dir
$ subread-align -i indexfile -r inputfile -o output

Running a single batch job on Biowulf

Create a script file similar to the lines below.

#!/bin/bash

module load subread
cd /data/$USER/
subread-align -i indexfile -r inputfile -o output

Submit the script on biowulf:

$ sbatch jobscript   

To run multi-threaded subread on biowulf:

#!/bin/bash

module load subread
cd /data/$USER/
subread-align -T $SLURM_CPUS_PER_TASK -i indexfile -r inputfile -o output

Submit the script:

$ sbatch --cpus-per-task=4 jobscript

--cpus-per-task: allocate 4 cpus. This number will be assigned to $SLURM_CPUS_PER_TASK automatically

For more memory requirement (default 4gb), use --mem flag:

$ sbatch --cpus-per-task=4 --mem=10g jobscript

Running a swarm of jobs on Biowulf

Setup a swarm command file:

  cd /data/$USER/dir1; subread-align -T $SLURM_CPUS_PER_TASK -i indexfile -r inputfile -o output
  cd /data/$USER/dir2; subread-align -T $SLURM_CPUS_PER_TASK -i indexfile -r inputfile -o output
  cd /data/$USER/dir3; subread-align -T $SLURM_CPUS_PER_TASK -i indexfile -r inputfile -o output
	[......]

Submit the swarm file:

  $ swarm -f swarmfile -t 4 --module subread

-f: specify the swarmfile name
-t: specify thread number, this number will be assigned to $SLURM_CPUS_PER_TASK in the swarmfile automatically
--module: set environmental variables for each command line in the file

To allocate more memory, use -g flag:

  $ swarm -f swarmfile -t 4 -g 10 --module subread

-g: allocate more memory

For more information regarding running swarm, see swarm.html

Running an interactive job on Biowulf

It may be useful for debugging purposes to run jobs interactively. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.

biowulf$ sinteractive 
salloc.exe: Granted job allocation 16535

cn999$ module load subread
cn999$ cd /data/$USER/dir
cn999$ subread-align -i indexfile -r inputfile -o output
cn999$ exit

biowulf$

Make sure to exit the job once finished.

If more memory is needed, use --mem flag. For example

biowulf$ sinteractive --mem=10g

Documentation

http://subread.sourceforge.net/