SomaticSeq is an ensemble approach to accurately detect somatic mutations. It incorporates multiple somatic mutation caller(s) to obtain a combined call set, and then uses machine learning to distinguish true mutations from false positives from that call set.
This program is not suitable for running interactively. See below for an example of batch submission.
First, copy the workflow configuration to your current directory.
$ cp $SOMATICSEQ_HOME/utilities/snakemake/config.yaml .
Edit config.yaml to your specifications. Then prepare your inputs and log directory.
$ mkdir log inputs
$ # intermediate mpileup files will be created in the same directory as the inputs,
$ # so use symlinks to a location where we have write access.
$ ln -s /fdb/DREAM/SMC/synthetic.challenge.set3.{tumor,normal}.bam* inputs/
Then create a batch input file (e.g. somaticseq.sh). For example:
#!/bin/sh
#SBATCH --time 4-0
set -e
module load snakemake somaticseq
snakemake \
-s $SOMATICSEQ_HOME/utilities/snakemake/Snakefile \
${SNAKEFLAGS} \
--config \
tumor=$PWD/inputs/synthetic.challenge.set3.tumor.bam \
normal=$PWD/inputs/synthetic.challenge.set3.normal.bam \
reference=/fdb/GATK_resource_bundle/b37/human_g1k_v37_decoy.fasta \
dbsnp=/fdb/dbSNP/organisms/human_9606_b150_GRCh37p13/00-All.vcf.gz \
gatk=$GATK_HOME/gatk-package-*-local.jar \
varscan="$VARSCANHOME/varscan.jar" \
caller_threads=36 \
-j 10 \
--cluster "sbatch --time {cluster.time} --cpus-per-task {threads} --mem {resources.mem_mb} --out={cluster.out}" \
--cluster-config cluster.json \
somaticseq
Where cluster.json contains:
{
"__default__" :
{
"time" : "3-00:00:00",
"mem" : "4g",
"out" : "log/{rule}-%j.out"
},
}
Submit this job using the Slurm sbatch command.
sbatch somaticseq.sh