High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
SomaticSeq on Biowulf

SomaticSeq is an ensemble approach to accurately detect somatic mutations. It incorporates multiple somatic mutation caller(s) to obtain a combined call set, and then uses machine learning to distinguish true mutations from false positives from that call set.


There may be multiple versions of SomaticSeq available. An easy way of selecting the version is to use modules. To see the modules available, type

module avail SomaticSeq

To select a module, type

module load SomaticSeq/[ver]

where [ver] is the version of choice.

Environment variables set:

Example Data

On Helix

The SomaticSeq wrapper script requires GATK, which cannot be used on Helix.

Swarm of Jobs on Biowulf

Create a swarmfile following the swarm guide using the example commands on this page.

Batch job on Biowulf

Create a batch input file (e.g. somaticseq.sh). For example:

module load SomaticSeq
mkdir results
SomaticSeq.Wrapper.sh \
  --snpeff-dir "$SNPEFF_JARPATH" \
  --gatk "$GATK_HOME" \
  --ada-r-script "$SOMATICSEQ_HOME/r_scripts/ada_model_builder.R" \
  --genome-reference /fdb/genome/human-feb2009/hg19.fa \
  --output-dir results \
  --normal-bam <normal.bam> \
  --tumor-bam <tumor.bam> \
  ... (additional parameters) ...

Submit this job using the Slurm sbatch command.

sbatch --cpus-per-task=1 somaticseq.sh