High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
Shrimp on Biowulf & Helix

SHRiMP is a software package for aligning genomic reads against a target genome. It was primarily developed with the multitudinous short reads of next generation sequencing machines in mind, as well as Applied Biosystem's colourspace genomic representation.

Example files can be copied from:

$ cp -r /usr/local/apps/shrimp/example /data/$USER/

Running on Helix
$ module load shrimp
$ cd /data/$USER/example
$ gmapper-cs test_S1_F3.csfasta ch11_12_validated.fasta -N 8 -o 5 -h 80% >map.out 2>map.log

Running a single batch job on Biowulf

1. Create a script file. The file will contain the lines similar to the lines below.

#!/bin/bash


module load shrimp
cd /data/$USER/example
gmapper-cs test_S1_F3.csfasta ch11_12_validated.fasta -N 8 -o 5 -h 80% >map.out 2>map.log

2. Submit the script on biowulf:

$ sbatch jobscript

If more momory is required (default 4gb), specify --mem=Mg, for example --mem=10g:

$ sbatch --mem=10g jobscript

Running a swarm of jobs on Biowulf

Setup a swarm command file:

  cd /data/$USER/dir1; gmapper-cs test_S1_F3.csfasta ch11_12_validated.fasta -N 8 -o 5 -h 80% >map.out 2>map.log
  cd /data/$USER/dir2; gmapper-cs test_S1_F3.csfasta ch11_12_validated.fasta -N 8 -o 5 -h 80% >map.out 2>map.log
  cd /data/$USER/dir3; gmapper-cs test_S1_F3.csfasta ch11_12_validated.fasta -N 8 -o 5 -h 80% >map.out 2>map.log
	[......]
  

Submit the swarm file, -f specify the swarmfile name, and --module will be loaded the required module for each command line in the file:

  $ swarm -f swarmfile --module shrimp

If more memory is needed for each line of commands, the below example allocate 10g for each command:

  $ swarm -f swarmfile -g 10 --module shrimp

For more information regarding running swarm, see swarm.html

Running an interactive job on Biowulf

It may be useful for debugging purposes to run jobs interactively. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.

biowulf$ sinteractive 
salloc.exe: Granted job allocation 16535

cn999$ module load shrimp
cn999$ cd /data/$USER/dir
cn999$ gmapper-cs test_S1_F3.csfasta ch11_12_validated.fasta -N 8 -o 5 -h 80% >map.out 2>map.log
[...etc...]

cn999$ exit
exit

biowulf$

Make sure to exit the job once finished.

If more memory is needed, use --mem. For example

biowulf$ sinteractive --mem=8g

Documentation

http://compbio.cs.toronto.edu/shrimp/