High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
SomaticSeq on Biowulf

SomaticSeq is an ensemble approach to accurately detect somatic mutations. It incorporates multiple somatic mutation caller(s) to obtain a combined call set, and then uses machine learning to distinguish true mutations from false positives from that call set.

References:

There may be multiple versions of SomaticSeq available. An easy way of selecting the version is to use modules. To see the modules available, type

module avail SomaticSeq

To select a module, type

module load SomaticSeq/[ver]

where [ver] is the version of choice.

Environment variables set:

On Helix

The SomaticSeq wrapper script requires GATK, which cannot be used on Helix.

Swarm of Jobs on Biowulf

Create a swarmfile following the swarm guide using the example commands on this page.

Batch job on Biowulf

Create a batch input file (e.g. somaticseq.sh). For example:

#!/bin/bash
module load SomaticSeq
mkdir results
SomaticSeq.Wrapper.sh \
  --snpeff-dir "$SNPEFF_JARPATH" \
  --gatk "$GATK_HOME" \
  --ada-r-script "$SOMATICSEQ_HOME/r_scripts/ada_model_builder.R" \
  --genome-reference /fdb/genome/human-feb2009/hg19.fa \
  --output-dir results \
  --normal-bam <normal.bam> \
  --tumor-bam <tumor.bam> \
  ... (additional parameters) ...

Submit this job using the Slurm sbatch command.

sbatch --cpus-per-task=1 somaticseq.sh
Documentation