The XHMM (eXome-Hidden Markov Model) C++ software suite was written to call copy number variation (CNV) from next-generation sequencing projects, where exome capture was used (or targeted sequencing, more generally).
XHMM uses principal component analysis (PCA) normalization and a hidden Markov model (HMM) to detect and genotype copy number variation (CNV) from normalized read-depth data from targeted sequencing experiments.
XHMM was explicitly designed to be used with targeted exome sequencing at high coverage (at least 60x - 100x) using Illumina HiSeq (or similar) sequencing of at least ~50 samples. However, no part of XHMM explicitly requires these particular experimental conditions, just high coverage of genomic regions for many samples.
A params.txt will need to be created. Here is an example:
1e-8 6 70 -3 1.00 0 1.00 3 1.00
A parameters file consists of the following 9 values (tab-delimited):
Allocate an interactive session and run the program.
Sample session (user input in bold):
[user@biowulf]$ sinteractive salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144 ~]$ module load XHMM [user@cn3144 ~]$ xhmm -p params.txt [user@cn3144 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
Create a batch input file (e.g. XHMM.sh). For example:
#!/bin/bash module load XHMM xhmm --mergeGATKdepths -o ./DATA.RD.txt \ --GATKdepths group1.DATA.sample_interval_summary \ --GATKdepths group2.DATA.sample_interval_summary \ --GATKdepths group3.DATA.sample_interval_summary
Submit this job using the Slurm sbatch command.
sbatch [--cpus-per-task=#] [--mem=#] XHMM.sh