High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
Manta on Biowulf & Helix

Description

Manta is a packaged used to discover structural variants and indels from next generation sequencing data. It is optimized for rapid clinical analysis, calling structural variants, medium-sized indels and large insertions. Manta makes use of split read and paired end information and includes scoring models optimized for germline analysis of diploid genomes and tumor-normal genome comparisons. Major use cases (as listed in the manta manual):

There is also experimental RNA-Seq support.

Manta uses pyFlow to parallelize processing. On Biowulf this allows parallel processing on a single node, not across nodes.

There are multiple versions of manta available. An easy way of selecting the version is to use modules. To see the modules available, type

module avail manta 

To select a module use

module load manta/[version]

where [version] is the version of choice.

Environment variables set

References

Documentation

Batch job on Biowulf

Manta is run in two stages - configuration and execution of the workflow. Execution of the workflow is restartable - i.e. it can be resumed if interrupted. configManta.py is used to create a new configuration for a run. This is done in an interactive session or on the login node:

b2$ module load manta
b2$ configManta.py \
  --normalBam=${MANTA_TEST_DATA}/HCC1954.NORMAL.30x.compare.COST16011_region.bam \
  --tumorBam=${MANTA_TEST_DATA}/G15512.HCC1954.1.COST16011_region.bam \
  --referenceFasta=${MANTA_TEST_DATA}/Homo_sapiens_assembly19.COST16011_region.fa \
  --region=8:107652000-107655000 \
  --region=11:94974000-94989000 \
  --candidateBins=4 --exome --runDir=./test
b2$ tree test
test
|-- [user   4.0K]  results
|   |-- [user   4.0K]  stats
|   `-- [user   4.0K]  variants
|-- [user   7.0K]  runWorkflow.py
|-- [user   3.0K]  runWorkflow.py.config.pickle
`-- [user   4.0K]  workspace

The workflow is executed by running the generated runWorkflow.py script. In our case, this is wrapped into a slurm batch script:

#! /bin/bash

module load manta || exit 1
test/runWorkflow.py -m local -j $SLURM_CPUS_PER_TASK -g $((SLURM_MEM_PER_NODE / 1024))

And submit to the batch queue

b2$ sbatch --mem=10g --cpus-per-task=4 manta_batch.sh
4412000
b2$ jobhist 4411914
JobId              : 4412000
User               : user
Submitted          : 20151019 12:46:55
Submission Path    : /spin1/users/wresch/test_data/manta
Submission Command : sbatch --mem=10g --cpus-per-task=4 test_manta.sh


 Partition       State  Nodes  CPUs      Walltime       Runtime         MemReq  MemUsed  Nodelist
      norm   COMPLETED      1     4      04:00:00      00:00:11    10.0GB/node    0.0GB  cn0041
Swarm of jobs on Biowulf

A simple example for using swarm to run manta: Assuming we have a file with two columns where each line represents a tumor/normal pair we can configure workflows in a loop:

b2$ head tumor_normal_list
normal1.bam  tumor1.bam
normal2.bam  tumor2.bam
normal3.bam  tumor3.bam
b2$ while read normal tumor; do
  configManta.py \
  --normalBam=$normal \
  --tumorBam=$tumor \
  --exome --runDir=$(basename $normal)_vs_$(basename $tumor);
done

Then create a swarm file to run the workflows:

normal1_vs_tumor1/runWorkflow.py -m local -j $SLURM_CPUS_PER_TASK -g $((SLURM_MEM_PER_NODE / 1024))
normal2_vs_tumor2/runWorkflow.py -m local -j $SLURM_CPUS_PER_TASK -g $((SLURM_MEM_PER_NODE / 1024))
normal3_vs_tumor3/runWorkflow.py -m local -j $SLURM_CPUS_PER_TASK -g $((SLURM_MEM_PER_NODE / 1024))

and submit it with swarm

b2$ swarm -t 10 -g 10 -f manta.swarm --module manta
Interactive job on Biowulf

To use manta interactively on biowulf allocate an interactive session with sufficient resources to carry out your analysis:

b2$ sinteractive --mem=10g --cpus-per-task=10
salloc.exe: Pending job allocation 4413407
salloc.exe: job 4413407 queued and waiting for resources
salloc.exe: job 4413407 has been allocated resources
salloc.exe: Granted job allocation 4413407
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn1883 are ready for job
cn1883$ module load manta
cn1883$ configManta.py \
  --normalBam=${MANTA_TEST_DATA}/HCC1954.NORMAL.30x.compare.COST16011_region.bam \
  --tumorBam=${MANTA_TEST_DATA}/G15512.HCC1954.1.COST16011_region.bam \
  --referenceFasta=${MANTA_TEST_DATA}/Homo_sapiens_assembly19.COST16011_region.fa \
  --region=8:107652000-107655000 \
  --region=11:94974000-94989000 \
  --candidateBins=4 --exome --runDir=./test
cn1883$ test/runWorkflow.py -m local -j 10 -g 10
[...snip...]
cn1883$ exit