High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
Strelka on Biowulf & Helix

Description

Strelka is an analysis package designed to detect somatic SNVs and small indels from the aligned sequencing reads of matched tumor-normal samples.

There may be multiple versions of Strelka available. An easy way of selecting the version is to use modules. To see the modules available, type

module avail strelka 

To select a module use

module load strelka/[version]

where [version] is the version of choice.

Strelka can use multiple CPUs. Make sure to match the number of cpus requested with the number of parallel processes.

Environment variables set

References

Documentation

Interactive job on Biowulf

Strelka 1.X

Running strelka is a two step process:

  1. Configure the workflow with configureStrelkaWorkflow.pl. Example configuration files for this step can be found in $STRELKA_INSTALL_DIR/etc.
  2. Run the makefile generated in step 1

Allocate an interactive session with sinteractive requesting 10 CPUs and 10GB of memory

biowulf$ sinteractive --cpus-per-task=10 --mem=10g
[...snip...]
salloc.exe: Nodes xxxx are ready for job
node$ module load strelka/1.0.15
node$ cp -r $STRELKA_TEST_DATA/demo . # copy example data
node$ cd demo
node$ configureStrelkaWorkflow.pl \
          --normal=data/NA12892_dupmark_chr20_region.bam \
          --tumor=data/NA12891_dupmark_chr20_region.bam \
          --ref=data/chr20_860k_only.fa \
          --config=./strelka_demo_config.ini \
          --output-dir=./demo_analysis
node$ make -C demo_analysis -j $SLURM_CPUS_PER_TASK
node$ exit
biowulf$

Then the files in the expected_results directory can be compared with the files in demo_analysis/results

Strelka 2.X

Running strelka is still a 2 step process. However the software has been re-written and the command line interface is now different. The current version still does not support parallelization across more than one node (i.e. the workflow engine used does not support submitting tasks as jobs with SLURM).

Illumina recommends first running manta on the samples and supplying manta's candidate indels as input to strelka with the --indelCandidates command line option. The example below skips this step.

Allocate an interactive session with sinteractive requesting 10 CPUS and 10GB of memory:

biowulf$ sinteractive --cpus-per-task=10 --mem=10g
[...snipp...]
salloc.exe: Nodes xxxx are ready for job
node$ module load strelka/2.7.1
node$ cp -r $STRELKA_TEST_DATA/demo . # copy example data
node$ cd demo
node$ # configure the workflow with 'configureStrelka${type}Workflow.py'
      # where type is Starling, Germline, or Somatic
node$ configureStrelkaSomaticWorkflow.py \
             --normalBam=data/NA12892_dupmark_chr20_region.bam \
             --tumorBam=data/NA12891_dupmark_chr20_region.bam \
             --referenceFasta=data/chr20_860k_only.fa \
             --runDir demo_out
node$ demo_out/runWorkflow.py -m local -j $SLURM_CPUS_PER_TASK
[...snip...]
node$ ls -lh demo_out
total 20K
drwxr-xr-x 4 user group 4.0K Apr 24 08:12 results
-rwxr-xr-x 1 user group 7.3K Apr 24 08:12 runWorkflow.py
-rw-r--r-- 1 user group 3.2K Apr 24 08:12 runWorkflow.py.config.pickle
-rw-r--r-- 1 user group    0 Apr 24 08:13 workflow.error.log.txt
-rw-r--r-- 1 user group    2 Apr 24 08:13 workflow.exitcode.txt
-rw-r--r-- 1 user group    0 Apr 24 08:13 workflow.warning.log.txt
drwxr-xr-x 3 user group 4.0K Apr 24 08:13 workspace
node$ exit
biowulf$
Batch job on Biowulf

Strelka 1.X

Create a batch script similar to the following example:

#!/bin/bash
# ----  this file is called strelka.sh ---------
module load strelka/1.0.15 || exit 1

configureStrelkaWorkflow.pl \
  --normal=/path/to/normal.bam \
  --tumor=/path/to/tumor.bam \
  --ref=/path/to/hg19.fa \
  --config=/path/to/config.ini \
  --output-dir=/path/to/myAnalysis

# -j N allows make to use up to N parallel processes

cd /path/to/myAnalysis
make -j $SLURM_CPUS_PER_TASK

Several things must be changed in this script:

Submit to the queue with sbatch:

biowulf$ sbatch --cpus-per-task=6 --mem=10g strelka.sh

Strelka 2.X

Remember to adapt the example above to the new command line interface in 2.X as shown in the interactive example.