THetA on Biowulf
Tumor Heterogeneity Analysis (THetA) is an algorithm used to estimate tumor purity and clonal/subclonal copy number aberrations simultaneously from high-throughput DNA sequencing data.
References:
- L. Oesper, G. Satas, and B.J. Raphael. Quantifying Tumor Heterogeneity in Whole-Genome and Whole-Exome Sequencing Data. Bioinformatics 2014, 30:3532-3540. Pubmed | PMC | Journal
Documentation
Important Notes
- Module Name: theta (see the modules page for more information)
- THetA is a multithreaded application. Please match the number of processes with your allocation.
- Example files in /usr/local/apps/theta/TEST_DATA/example
Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive --mem=14g --cpus-per-target=6 salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144 ~]$ module load theta [user@cn3144 ~]$ cp -r /usr/local/apps/theta/TEST_DATA/example/ . [user@cn3144 ~]$ RunTHetA example/Example.intervals \ --NUM_PROCESSES=$((SLURM_CPUS_PER_TASK - 1)) \ --TUMOR_FILE example/TUMOR_SNP.formatted.txt \ --NORMAL_FILE example/NORMAL_SNP.formatted.txt ================================================= Arguments are: Query File: example/Example.intervals k: 3 tau: 2 Output Directory: ./ Output Prefix: Example Num Processes: 4 Graph extension: .pdf Valid sample for THetA analysis: Ratio Deviation: 0.1 Min Fraction of Genome Aberrated: 0.05 Program WILL cluster intervals. ================================================= Reading in query file... [...snip...] [user@cn3144 ~]$ ls -lh drwxrwx--- 2 user group 4.0K May 3 2016 example drwxrwx--- 9 user group 4.0K Aug 2 2017 Example_2_cluster_data drwxrwx--- 10 user group 4.0K Aug 2 2017 Example_3_cluster_data -rw-rw---- 1 user group 20K Aug 2 2017 Example_assignment.png -rw-rw---- 1 user group 1.7K Aug 2 2017 Example.BEST.results -rw-rw---- 1 user group 155K Aug 2 2017 Example_by_chromosome.png -rw-rw---- 1 user group 24K Aug 2 2017 Example_classifications.png -rw-rw---- 1 user group 18K Aug 2 2017 Example.n2.graph.pdf -rw-rw---- 1 user group 1.7K Aug 2 2017 Example.n2.results -rw-rw---- 1 user group 3.6K Aug 2 2017 Example.n2.withBounds -rw-rw---- 1 user group 19K Aug 2 2017 Example.n3.graph.pdf -rw-rw---- 1 user group 1.8K Aug 2 2017 Example.n3.results -rw-rw---- 1 user group 3.6K Aug 2 2017 Example.n3.withBounds -rw-rw---- 1 user group 251 Aug 2 2017 Example.RunN3.bash
The analysis will create a number of files including some graphs. For example, the following shows one of the models (2 components):

In addition to RunTHetA
there are several other tools included
in this package
helix$ ls /usr/local/apps/theta/0.7/bin |-- [ 274] CreateExomeInput |-- [ 294K] getAlleleCounts.jar |-- [ 14K] runBICSeqToTHetA.jar `-- [ 260] RunTHetA helix$ java -jar $THETA_JARPATH/runBICSeqToTHetA.jar Error! Incorrect number of arguments. Program: BICSeqToTHetA USAGE (src): java BICSeqToTHetA <INPUT_FILE> [Options] USAGE (jar): java -jar BICSeqToTHetA <INPUT_FILE> [Options] <INPUT_FILE> [String] A file output by BIC-Seq. -OUTPUT_PREFIX [STRING] Prefix for all output files. -MIN_LENGTH [Integer] The minimum length of intervals to keep.
For a more detailed manual see
/usr/local/apps/theta/<version>/MANUAL.txt
[user@cn3144 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
Batch job
Most jobs should be run as batch jobs.
Create a batch input file (e.g. THetA.sh), which uses the input file 'THetA.in'. For example:
#! /bin/bash module load theta || exit 1 alloc_cpus=${SLURM_CPUS_PER_TASK:-4} nproc=$((alloc_cpus - 1)) RunTHetA example/Example.intervals \ --NUM_PROCESSES=$nproc \ --TUMOR_FILE example/TUMOR_SNP.formatted.txt \ --NORMAL_FILE example/NORMAL_SNP.formatted.txt
Submit this job using the Slurm sbatch command.
sbatch --cpus-per-task=6 --mem=14 theta.sh
Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.
Create a swarmfile (e.g. THetA.swarm). For example:
RunTHetA sample1/Example.intervals --NUM_PROCESSES=$((SLURM_CPUS_PER_TASK-1)) \ --TUMOR_FILE sample1/TUMOR_SNP.formatted.txt --NORMAL_FILE sample2/NORMAL_SNP.formatted.txt RunTHetA sample2/Example.intervals --NUM_PROCESSES=$((SLURM_CPUS_PER_TASK-1)) \ --TUMOR_FILE sample2/TUMOR_SNP.formatted.txt --NORMAL_FILE sample2/NORMAL_SNP.formatted.txt
Submit this job using the swarm command.
swarm -f THetA.swarm -g 14 -t 6 --module thetawhere
-g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
-t # | Number of threads/CPUs required for each process (1 line in the swarm command file). |
--module THetA | Loads the THetA module for each subjob in the swarm |