High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
Somaticsniper on Biowulf & Helix

The purpose of this program is to identify single nucleotide positions that are different between tumor and normal (or, in theory, any two bam files). It takes a tumor bam and a normal bam and compares the two to determine the differences. It outputs a file in a format very similar to Samtools consensus format. It uses the genotype likelihood model of MAQ (as implemented in Samtools) and then calculates the probability that the tumor and normal genotypes are different. This probability is reported as a somatic score. The somatic score is the Phred-scaled probability (between 0 to 255) that the Tumor and Normal genotypes are not different where 0 means there is no probability that the genotypes are different and 255 means there is a probability of 1 – 10(255/-10) that the genotypes are different between tumor and normal. This is consistent with how the SAM format reports such probabilities. It is currently available as source code via github or as a Debian APT package.

This tool is developed by David E. Larson etc.

Running on Helix

$ module load somaticsniper
$ cd /data/$USER/dir
$ bam-somaticsniper -f ref.fasta tumor.bam normal.bam Outfile

Running a single batch job on Biowulf

1. Create a script file similar to the lines below.

#!/bin/bash

module load somaticsniper
cd /data/$USER/dir
bam-somaticsniper -f ref.fasta tumor.bam normal.bam Outfile

2. Submit the script on biowulf:

$ sbatch jobscript

For more memory requirement (default 4gb), use --mem flag:

$ sbatch --mem=10g jobscript

Running a swarm of jobs on Biowulf

Setup a swarm command file:

  cd /data/$USER/dir1; bam-somaticsniper -f ref.fasta tumor.bam normal.bam Outfile
  cd /data/$USER/dir2; bam-somaticsniper -f ref.fasta tumor.bam normal.bam Outfile
  cd /data/$USER/dir3; bam-somaticsniper -f ref.fasta tumor.bam normal.bam Outfile
	[......]

Submit the swarm file:

  $ swarm -f swarmfile --module somaticsniper

-f: specify the swarmfile name
--module: set environmental variables for each command line in the file

To allocate more memory, use -g flag:

  $ swarm -f swarmfile -g 10 --module somaticsniper

-g: allocate more memory

For more information regarding running swarm, see swarm.html

Running an interactive job on Biowulf

It may be useful for debugging purposes to run jobs interactively. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.

biowulf$ sinteractive 
salloc.exe: Granted job allocation 16535

cn999$ module load somaticsniper
cn999$ cd /data/$USER/dir
cn999$ bam-somaticsniper -f ref.fasta tumor.bam normal.bam Outfile
cn999$ exit

biowulf$

Make sure to exit the job once finished.

If more memory is needed, use --mem flag. For example

biowulf$ sinteractive --mem=10g

Documentation

http://gmt.genome.wustl.edu/packages/somatic-sniper/documentation.html