High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
Pandaseq on Biowulf & Helix

PANDASEQ is a program to align Illumina reads, optionally with PCR primers embedded in the sequence, and reconstruct an overlapping sequence. It's name came from PAired-eND Assembler for DNA sequences

Running on Helix

$ module load pandaseq
$ cd /data/$USER/Examples
$ pandaseq -f forward.fq -r reverse.fq

Running a single batch job on Biowulf

1. Create a script file similar to the lines below.


module load pandaseq
cd /data/$USER/Examples
pandaseq -f forward.fq -r reverse.fq

2. Submit the script on biowulf:

$ sbatch jobscript

For more memory requirement (default 4gb), use --mem flag:

$ sbatch --mem=10g jobscript

Running a swarm of jobs on Biowulf

Setup a swarm command file:

  cd /data/$USER/dir1; pandaseq -f forward.fq -r reverse.fq
  cd /data/$USER/dir2; pandaseq -f forward.fq -r reverse.fq
  cd /data/$USER/dir3; pandaseq -f forward.fq -r reverse.fq

Submit the swarm file:

  $ swarm -f swarmfile --module pandaseq

-f: specify the swarmfile name
--module: set environmental variables for each command line in the file

To allocate more memory, use -g flag:

  $ swarm -f swarmfile -g 10 --module pandaseq

-g: allocate more memory

For more information regarding running swarm, see swarm.html

Running an interactive job on Biowulf

It may be useful for debugging purposes to run jobs interactively. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.

biowulf$ sinteractive 
salloc.exe: Granted job allocation 16535

cn999$ module load pandaseq
cn999$ cd /data/$USER/Examples
cn999$pandaseq -f forward.fq -r reverse.fq[...etc...]

cn999$ exit


Make sure to exit the job once finished.

If more memory is needed, use --mem flag. For example

biowulf$ sinteractive --mem=10g