High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
Transabyss on Biowulf & Helix

Description

Transabyss perform de novo assembly of RNA-Seq data using ABySS. It comes with 3 main applications:

The hg19 annotation files are under /usr/local/apps/transabyss/hg19

Running on Helix

$ cd /data/$USER/dir
$ module load transabyss
$ transabyss --se 1.fq.gz 2.fq.gz --outdir /data/$USER/dir --name test --threads 2 --island 0 -c 1

Running a single batch job on Biowulf

1. Create a batch script along the following lines:

#!/bin/bash 

module load transabyss
cd /data/$USER/mydir
transabyss --se 1.fq.gz 2.fq.gz --outdir /data/$USER/dir --name test --threads $SLURM_CPUS_PER_TASK --island 0 -c 1

2. on the biowulf login node, submit the job:

$ sbatch --cpus-per-task=8 jobscript

The number assigned to '--cpus-per-task' will be assigned to $SLURM_CPUS_PER_TASK in the script automatically.

Running a swarm of batch jobs on Biowulf

1. Create a swarm file along the following lines:

cd /data/$USER/dir1; transabyss --se 1.fq.gz 2.fq.gz --outdir /data/$USER/dir --name test --threads $SLURM_CPUS_PER_TASK --island 0 -c 1
cd /data/$USER/dir2; transabyss --se 1.fq.gz 2.fq.gz --outdir /data/$USER/dir --name test --threads $SLURM_CPUS_PER_TASK --island 0 -c 1
cd /data/$USER/dir3; transabyss --se 1.fq.gz 2.fq.gz --outdir /data/$USER/dir --name test --threads $SLURM_CPUS_PER_TASK --island 0 -c 1
[....]

Submit this swarm with:

$ swarm -t 2 -g 10 -f swarmfile --module transabyss
  • The '-f swarmfile' tells swarm which command file to run.
  • The '-t 2' tells swarm to use 2 threads
  • The '-g 10' request 10gb per line of command is the swarm file
  • The '--module transabyss' flag tells swarm to set up the paths for transabyss for each line of commands.

For more information regarding running swarm, see swarm.html

Running an interactive job on Biowulf

It may be useful for debugging purposes to run jobs interactively. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.

biowulf$ sinteractive --cpus-per-task=4  
salloc.exe: Granted job allocation 16535
cn999$ module load transabyss
cn999$ cd /data/$USER/dir
cn999$ transabyss --se 1.fq.gz 2.fq.gz --outdir /data/$USER/dir --name test --threads $SLURM_CPUS_PER_TASK --island 0 -c 1
[...etc...]

cn999$ exit
exit

biowulf$

Make sure to exit the job once you have finished your run.

If more memory is needed it can be requested with --mem. For example to request 12g per task:

biowulf$ sinteractive --cpus-per-task=4 --mem=12g

Documentation

http://www.bcgsc.ca/platform/bioinfo/software/trans-abyss