High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
XHMM (eXome-Hidden Markov Model)

Description

The XHMM C++ software suite was written to call copy number variation (CNV) from next-generation sequencing projects, where exome capture was used (or targeted sequencing, more generally).

XHMM uses principal component analysis (PCA) normalization and a hidden Markov model (HMM) to detect and genotype copy number variation (CNV) from normalized read-depth data from targeted sequencing experiments.

XHMM was explicitly designed to be used with targeted exome sequencing at high coverage (at least 60x - 100x) using Illumina HiSeq (or similar) sequencing of at least ~50 samples. However, no part of XHMM explicitly requires these particular experimental conditions, just high coverage of genomic regions for many samples.

How to Use

XHMM uses environment modules. Type

module load XHMM

at the prompt. Then type

xhmm -p params.txt

A params.txt will need to be created. Here is an example:

1e-8	6	70	-3	1.00	0	1.00	3	1.00

A parameters file consists of the following 9 values (tab-delimited):

  1. Exome-wide CNV rate
  2. Mean number of targets in CNV
  3. Mean distance between targets within CNV (in KB)
  4. Mean of DELETION z-score distribution
  5. Standard deviation of DELETION z-score distribution
  6. Mean of DIPLOID z-score distribution
  7. Standard deviation of DIPLOID z-score distribution
  8. Mean of DUPLICATION z-score distribution
  9. Standard deviation of DUPLICATION z-score distribution

See the documentation below for more information.

Documentation