High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
THetA on Biowulf & Helix

Description

Tumor Heterogeneity Analysis (THetA) is an algorithm used to estimate tumor purity and clonal/subclonal copy number aberrations simultaneously from high-throughput DNA sequencing data.

There may be multiple versions of THetA available. An easy way of selecting the version is to use modules. To see the modules available, type

module avail theta

To select a module use

module load theta/[version]

where [version] is the version of choice.

THetA is a multithreaded application. Make sure to match the number of cpus requested with the number of threads.

Environment variables set

References

Documentation

Batch job on Biowulf

Copy the example data to the current working directory

helix$ cp -r /usr/local/theta/TEST_DATA/example/ .

Then create a batch script similar to the following example:

#! /bin/bash
#SBATCH --cpus-per-task=6
#SBATCH --mem=14g

# this file is theta.sh
module load theta || exit 1

alloc_cpus=${SLURM_CPUS_PER_TASK:-4}
nproc=$((alloc_cpus - 1))
RunTHetA example/Example.intervals \
  --NUM_PROCESSES=$nproc \
  --TUMOR_FILE example/TUMOR_SNP.formatted.txt \
  --NORMAL_FILE example/NORMAL_SNP.formatted.txt

And submit to the queue with sbatch:

biowulf$ sbatch theta.sh

The analysis will create a number of files including some graphs. For example, the following shows one of the models (2 components):

THetA model n=2

In addition to RunTHetA there are several other tools included in this package

helix$ ls /usr/local/apps/theta/0.7/bin
|-- [  274]  CreateExomeInput
|-- [ 294K]  getAlleleCounts.jar
|-- [  14K]  runBICSeqToTHetA.jar
`-- [  260]  RunTHetA
helix$ java -jar $THETA_JARPATH/runBICSeqToTHetA.jar
Error! Incorrect number of arguments.

Program: BICSeqToTHetA
USAGE (src): java BICSeqToTHetA <INPUT_FILE> [Options]
USAGE (jar): java -jar BICSeqToTHetA <INPUT_FILE> [Options]
<INPUT_FILE> [String]
         A file output by BIC-Seq.
-OUTPUT_PREFIX [STRING]
         Prefix for all output files.
-MIN_LENGTH [Integer]
         The minimum length of intervals to keep.

For a more detailed manual see

/usr/local/apps/theta/<version>/MANUAL.txt
Swarm of jobs on Biowulf

Create a swarm command file similar to the following example:

# this file is theta.swarm
RunTHetA sample1.intervals \
  --NUM_PROCESSES=$((SLURM_CPUS_PER_TASK - 1)) \
  --TUMOR_FILE sample1_TUMOR_SNP.formatted.txt \
  --NORMAL_FILE sample1_NORMAL_SNP.formatted.txt \
  -p sample1
RunTHetA sample2.intervals \
  --NUM_PROCESSES=$((SLURM_CPUS_PER_TASK - 1)) \
  --TUMOR_FILE sample2_TUMOR_SNP.formatted.txt \
  --NORMAL_FILE sample2_NORMAL_SNP.formatted.txt \
  -p sample2

And submit to the queue with swarm

biowulf$ swarm -f theta.swarm -g14 -t4 --module theta