High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
Abyss on Biowulf & Helix

Description

ABySS is a de novo, parallel, paired-end sequence assembler that is designed for short reads. The parallel version is implemented using MPI and is capable of assembling larger genomes.

Running on Helix
$ module load abyss
$ `which abyss-pe` k=25 n=10 in='input1 input2' name=FileName

Running a single batch job on Biowulf

1. Create a batch script along the following lines:

#!/bin/bash 


module load abyss
cd /data/$USER/mydir

`which abyss-pe` np=$SLURM_NTASKS k=25 n=10 in='in1.fq in2.fq' name=outname

2. on the biowulf login node, submit the job:

$ sbatch --ntasks=64 --exclusive ./slurmscript

The job will be submitted to 64 cpus. The $SLURM_NTASKS will be assigned the same number as --ntasks automatically.

Running a swarm of jobs on Biowulf

Setup a swarm command file:

  cd /data/$USER/dir1; `which abyss-pe` np=$SLURM_NTASKS k=25 n=10 in='in1.fq in2.fq' name=outname
  cd /data/$USER/dir2; `which abyss-pe` np=$SLURM_NTASKS k=25 n=10 in='in1.fq in2.fq' name=outname
  cd /data/$USER/dir3; `which abyss-pe` np=$SLURM_NTASKS k=25 n=10 in='in1.fq in2.fq' name=outname
	[......]

Submit the swarm file:

  $ swarm -f swarmfile --module abyss --sbatch "--exclusive"

-f: specify the swarmfile name
--module: set environmental variables for each command line in the file
--sbatch: sbatch flags to pass into swarm. --exclusive means allocate the whole node instead of sharing with other user. Use this when running mpi programs.

By defaut, each line in a swarm file will be submitted to a core, 1.5gb, 4 hours of walltime.
To allocate more memory for each line of commands in the swarm file, use -g flag
for more time, use --time flag:

  $ swarm -f swarmfile -g 12 --time 08:00:00 --module abyss --sbatch "--exclusive"

For more information regarding running swarm, see swarm.html

Running an interactive job on Biowulf

It may be useful for debugging purposes to run jobs interactively. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.

biowulf$ sinteractive --ntasks=32 --exclusive  

pXXX$ module load abyss
pXXX$ cd /data/$USER/dir
pXXX$ `which abyss-pe` np=$SLURM_NTASKS k=25 n=10 in='in1.fq in2.fq' name=outname
[...etc...]

pXXX$ exit
exit

biowulf$

Make sure to exit the job once you have finished your run.

Documentation

http://www.bcgsc.ca/platform/bioinfo/software/abyss