Biowulf High Performance Computing at the NIH
SomaticSeq on Biowulf

SomaticSeq is an ensemble approach to accurately detect somatic mutations. It incorporates multiple somatic mutation caller(s) to obtain a combined call set, and then uses machine learning to distinguish true mutations from false positives from that call set.


Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

This program is not suitable for running interactively. See below for an example of batch submission.

Batch job
Most jobs should be run as batch jobs.

First, copy the workflow configuration to your current directory.

$ cp $SOMATICSEQ_HOME/utilities/snakemake/config.yaml .

Edit config.yaml to your specifications. Then prepare your inputs and log directory.

$ mkdir log inputs
$ # intermediate mpileup files will be created in the same directory as the inputs,
$ # so use symlinks to a location where we have write access.
$ ln -s /fdb/DREAM/SMC/synthetic.challenge.set3.{tumor,normal}.bam* inputs/
Then create a batch input file (e.g. For example:

#SBATCH --time 4-0  
set -e

module load snakemake somaticseq

snakemake \
    -s $SOMATICSEQ_HOME/utilities/snakemake/Snakefile \
    --config \
        tumor=$PWD/inputs/synthetic.challenge.set3.tumor.bam \
        normal=$PWD/inputs/synthetic.challenge.set3.normal.bam \
        reference=/fdb/GATK_resource_bundle/b37/human_g1k_v37_decoy.fasta \
        dbsnp=/fdb/dbSNP/organisms/human_9606_b150_GRCh37p13/00-All.vcf.gz \
        gatk=$GATK_HOME/gatk-package-*-local.jar \
	varscan="$VARSCANHOME/varscan.jar" \
	caller_threads=36 \
        -j 10 \
	--cluster "sbatch --time {cluster.time} --cpus-per-task {threads} --mem {resources.mem_mb} --out={cluster.out}" \
	--cluster-config cluster.json \

Where cluster.json contains:

    "__default__" :
        "time" : "3-00:00:00",
        "mem"  : "4g",
        "out"  : "log/{rule}-%j.out"

Submit this job using the Slurm sbatch command.