SomaticSeq is an ensemble approach to accurately detect somatic mutations. It incorporates multiple somatic mutation caller(s) to obtain a combined call set, and then uses machine learning to distinguish true mutations from false positives from that call set.


There may be multiple versions of SomaticSeq available. An easy way of selecting the version is to use modules. To see the modules available, type

module avail SomaticSeq

To select a module, type

module load SomaticSeq/[ver]

where [ver] is the version of choice.

On Helix

The SomaticSeq wrapper script requires GATK, which cannot be used on Helix.

Batch job on Biowulf

Create a batch input file (e.g. somaticseq.sh). For example:

module load SomaticSeq
mkdir results
SomaticSeq.Wrapper.sh \
  --snpeff-dir "$SNPEFF_JARPATH" \
  --gatk "$GATK_HOME" \
  --ada-r-script "$SOMATICSEQ_HOME/r_scripts/ada_model_builder.R" \
  --genome-reference /fdb/genome/human-feb2009/hg19.fa \
  --output-dir results \
  --normal-bam <normal.bam> \
  --tumor-bam <tumor.bam> \
  ... (additional parameters) ...

Submit this job using the Slurm sbatch command.

sbatch --cpus-per-task=1 somaticseq.sh