The XHMM (eXome-Hidden Markov Model) C++ software suite was written to call copy number variation (CNV) from next-generation sequencing projects, where exome capture was used (or targeted sequencing, more generally).
XHMM uses principal component analysis (PCA) normalization and a hidden Markov model (HMM) to detect and genotype copy number variation (CNV) from normalized read-depth data from targeted sequencing experiments.
XHMM was explicitly designed to be used with targeted exome sequencing at high coverage (at least 60x - 100x) using Illumina HiSeq (or similar) sequencing of at least ~50 samples. However, no part of XHMM explicitly requires these particular experimental conditions, just high coverage of genomic regions for many samples.
References:
- Menachem Fromer, Jennifer L. Moran, Kimberly Chambert, Eric Banks, Sarah E. Bergen, Douglas M. Ruderfer, Robert E. Handsaker, Steven A. McCarroll, Michael C. O'Donovan, Michael J. Owen, George Kirov, Patrick F. Sullivan, Christina M. Hultman, Pamela Sklar, and Shaun M. Purcell. Discovery and statistical genotyping of copy-number variation from whole-exome sequencing depth. American Journal of Human Genetics, 91:597-607, Oct 2012.
- Christopher S. Poultney, Arthur P. Goldberg, Elodie Drapeau, Yan Kou, Hala Harony-Nicolas, Yuji Kajiwara, Silvia De Rubeis, Simon Durand, Christine Stevens, Karola Rehnstrom, Aarno Palotie, Mark J. Daly, Avi Ma'ayan, Menachem Fromer, and Joseph D. Buxbaum. Identification of small exonic CNV from whole-exome sequence data and application to autism spectrum disorder. American Journal of Human Genetics, 93(4):607-619, 2013.
- Menachem Fromer and Shaun M. Purcell. Using XHMM software to detect copy number variation in whole-exome sequencing data. In Current Protocols in Human Genetics. John Wiley and Sons, Inc., 2014.
- Module Name: XHMM (see the modules page for more information)
A params.txt will need to be created. Here is an example:
1e-8 6 70 -3 1.00 0 1.00 3 1.00
A parameters file consists of the following 9 values (tab-delimited):
- Exome-wide CNV rate
- Mean number of targets in CNV
- Mean distance between targets within CNV (in KB)
- Mean of DELETION z-score distribution
- Standard deviation of DELETION z-score distribution
- Mean of DIPLOID z-score distribution
- Standard deviation of DIPLOID z-score distribution
- Mean of DUPLICATION z-score distribution
- Standard deviation of DUPLICATION z-score distribution
Allocate an interactive session and run the program.
Sample session (user input in bold):
[user@biowulf]$ sinteractive salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144 ~]$ module load XHMM [user@cn3144 ~]$ xhmm -p params.txt [user@cn3144 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
Create a batch input file (e.g. XHMM.sh). For example:
#!/bin/bash module load XHMM xhmm --mergeGATKdepths -o ./DATA.RD.txt \ --GATKdepths group1.DATA.sample_interval_summary \ --GATKdepths group2.DATA.sample_interval_summary \ --GATKdepths group3.DATA.sample_interval_summary
Submit this job using the Slurm sbatch command.
sbatch [--cpus-per-task=#] [--mem=#] XHMM.sh