High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
stampy on Biowulf & Helix

Description

Stampy is a short read mapper for Illumina data with particular emphasis on mapping reads with a higher number of sequence differences.

There may be multiple versions of stampy available. An easy way of selecting the version is to use modules. To see the modules available, type

module avail stampy 

To select a module use

module load stampy/[version]

where [version] is the version of choice.

Environment variables set

References

Documentation

Interactive job on Biowulf

Allocate an interactive session with sinteractive and use as follows

biowulf$ sinteractive --cpus-per-task=2
[...snip...]
salloc.exe: Nodes node are ready for job

node$ module load stampy samtools
node$ cp $STAMPY_TEST_DATA/* .
node$ ls -lh
total 68M
-rw-r--r-- 1 user group 56M Jun 21 08:54 ERR458495.fastq.gz
-rwxrwxr-x 1 user group 12M Jun 21 08:54 sacCer3.fa

Build a genome and a hash file

node$ stampy.py -G sacCer3 \
           --species Saccharomyces_cerevisiae \
           --assembly sacCer3 sacCer3.fa
stampy: Building genome...
stampy: Input files: ['sacCer3.fa']
stampy: Done
node$ stampy.py -g sacCer3 -H sacCer3
stampy: Building hash table...
stampy: Initializing...
stampy: Counting...
stampy: Initializing hash...         
stampy: Flagging high counts...           
stampy: Creating hash...            
stampy: Writing...                  
stampy: Finished building hash table
stampy: Done

node$ ls -lh
total 87M
-rw-r--r-- 1 user group  56M Jun 21 08:54 ERR458495.fastq.gz
-rwxrwxr-x 1 user group  12M Jun 21 08:54 sacCer3.fa
-rw-r--r-- 1 user group  16M Jun 21 08:58 sacCer3.sthash
-rw-r--r-- 1 user group 2.9M Jun 21 08:58 sacCer3.stidx

Align single end data

node$ stampy.py -g sacCer3 -h sacCer3 -M ERR458495.fastq.gz | samtools view -b > test.bam
stampy: Mapping...
stampy: # Nucleotides (all/1/2):        52874862        52874862        0
stampy: # Variants:                     329629  329629  0
stampy: # Fraction:                     0.0062  0.0062  0.0000
stampy: Done

End the sinteractive session

node$ exit
biowulf$
Batch job on Biowulf

Create a batch script similar to the following example:

#! /bin/bash
# this file is stampy.batch

module load stampy/1.0.31 samtools/1.4 || exit 1
stampy.py -g sacCer3 -h sacCer3 -M ERR458495.fastq.gz | samtools view -b > test.bam

Submit to the queue with sbatch:

biowulf$ sbatch --mem=4g stampy.batch

Note that larger genomes will require more memory.