SomaticSeq on Biowulf
SomaticSeq is an ensemble approach to accurately detect somatic mutations. It incorporates multiple somatic mutation caller(s) to obtain a combined call set, and then uses machine learning to distinguish true mutations from false positives from that call set.
References:
- Li Tai Fang, Pegah Tootoonchi Afshar, Aparna Chhibber, Marghoob Mohiyuddin, Yu Fan, John C. Mu, Greg Gibeling, Sharon Barr, Narges Bani Asadi, Mark B. Gerstein, Daniel C. Koboldt, Wenyi Wang, Wing H. Wong, and Hugo Y.K. Lam. An Ensemble Approach to Accurately Detect Somatic Mutations Using SomaticSeq. Genome Biology, 16(1):197. (2015). DOI: 10.1186/s13059-015-0758-2
Documentation
Important Notes
- Module Name: somaticseq (see the modules page for more information)
- environment variables set
- SOMATICSEQ_HOME
- Example files in /fdb/DREAM/SMC
Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.
This program is not suitable for running interactively. See below for an example of batch submission.
Batch job
Most jobs should be run as batch jobs.
First, copy the workflow configuration to your current directory.
$ cp $SOMATICSEQ_HOME/utilities/snakemake/config.yaml .
Edit config.yaml to your specifications. Then prepare your inputs and log directory.
$ mkdir log inputs $ # intermediate mpileup files will be created in the same directory as the inputs, $ # so use symlinks to a location where we have write access. $ ln -s /fdb/DREAM/SMC/synthetic.challenge.set3.{tumor,normal}.bam* inputs/Then create a batch input file (e.g. somaticseq.sh). For example:
#!/bin/sh #SBATCH --time 4-0 set -e module load snakemake somaticseq snakemake \ -s $SOMATICSEQ_HOME/utilities/snakemake/Snakefile \ ${SNAKEFLAGS} \ --config \ tumor=$PWD/inputs/synthetic.challenge.set3.tumor.bam \ normal=$PWD/inputs/synthetic.challenge.set3.normal.bam \ reference=/fdb/GATK_resource_bundle/b37/human_g1k_v37_decoy.fasta \ dbsnp=/fdb/dbSNP/organisms/human_9606_b150_GRCh37p13/00-All.vcf.gz \ gatk=$GATK_HOME/gatk-package-*-local.jar \ varscan="$VARSCANHOME/varscan.jar" \ caller_threads=36 \ -j 10 \ --cluster "sbatch --time {cluster.time} --cpus-per-task {threads} --mem {resources.mem_mb} --out={cluster.out}" \ --cluster-config cluster.json \ somaticseq
Where cluster.json contains:
{ "__default__" : { "time" : "3-00:00:00", "mem" : "4g", "out" : "log/{rule}-%j.out" }, }
Submit this job using the Slurm sbatch command.
sbatch somaticseq.sh