Biowulf High Performance Computing at the NIH
XHMM on Biowulf

The XHMM (eXome-Hidden Markov Model) C++ software suite was written to call copy number variation (CNV) from next-generation sequencing projects, where exome capture was used (or targeted sequencing, more generally).

XHMM uses principal component analysis (PCA) normalization and a hidden Markov model (HMM) to detect and genotype copy number variation (CNV) from normalized read-depth data from targeted sequencing experiments.

XHMM was explicitly designed to be used with targeted exome sequencing at high coverage (at least 60x - 100x) using Illumina HiSeq (or similar) sequencing of at least ~50 samples. However, no part of XHMM explicitly requires these particular experimental conditions, just high coverage of genomic regions for many samples.


Important Notes

A params.txt will need to be created. Here is an example:

1e-8	6	70	-3	1.00	0	1.00	3	1.00

A parameters file consists of the following 9 values (tab-delimited):

  1. Exome-wide CNV rate
  2. Mean number of targets in CNV
  3. Mean distance between targets within CNV (in KB)
  4. Mean of DELETION z-score distribution
  5. Standard deviation of DELETION z-score distribution
  6. Mean of DIPLOID z-score distribution
  7. Standard deviation of DIPLOID z-score distribution
  8. Mean of DUPLICATION z-score distribution
  9. Standard deviation of DUPLICATION z-score distribution
Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program.
Sample session (user input in bold):

[user@biowulf]$ sinteractive
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load XHMM
[user@cn3144 ~]$ xhmm -p params.txt

[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. For example:

module load XHMM
xhmm --mergeGATKdepths -o ./DATA.RD.txt \
--GATKdepths group1.DATA.sample_interval_summary \
--GATKdepths group2.DATA.sample_interval_summary \
--GATKdepths group3.DATA.sample_interval_summary

Submit this job using the Slurm sbatch command.

sbatch [--cpus-per-task=#] [--mem=#]