High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
MutSig

Description

MutSig analyzes lists of mutations discovered in DNA sequencing, to identify genes that were mutated more often than expected by chance given background mutation processes.

MutSigCV starts from the observation that the data is very sparse, and that there are usually too few silent mutations in a gene for its background mutation rate (BMR) to be estimated with any confidence. MutSigCV improves the BMR estimatation by pooling data from 'neighbor' genes in covariate space. These neighbor genes are chosen on the basis of having similar genomic properties to the central gene in question: properties such as DNA replication time, chromatin state (open/closed), and general level of transcription activity (e.g. highly transcribed vs. not transcribed at all). These genomic parameters have been observed to strongly correlate (co-vary) with background mutation rate. For instance, genes that replicate early in S-phase tend to have much lower mutation rates than late-replicating genes. Genes that are highly transcribed also tend to have lower mutation rates than unexpressed genes, due in part to the effects of transcription-coupled repair (TCR). Genes in closed chromatin (as measured by HiC or ChipSeq) have higher mutation rates than genes in open chromatin. Incorporating these covariates into the background model substantially reduces the number of false-positive findings.

Reference

Lawrence, M. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214-218 (2013).

How to Use

Set your environment using modules.

module load MutSig

Loading this module will set these three variables:

Interactive Use

You will need to create a mutations file in MAF format. Here is an example command:

MutSigCV \
  $MUTSIG_EX/LUSC.mutations.maf \
  $MUTSIG_REF/exome_full192.coverage.txt \
  $MUTSIG_REF/gene.covariates.txt \
  output \
  $MUTSIG_REF/mutation_type_dictionary_file.txt \
  $MUTSIG_REF/chr_files_hg19

Successful execution will result in these files:

-rw-r--r-- 1 user user      510 Aug 29 12:47 output.categs.txt
-rw-r--r-- 1 user user  8989673 Aug 29 12:47 output.coverage.txt
-rw-r--r-- 1 user user 38760466 Aug 29 12:47 output.mutations.txt
-rw-r--r-- 1 user user      750 Aug 29 12:43 output.mutcateg_discovery.txt
-rw-r--r-- 1 user user  1397350 Aug 29 13:09 output.sig_genes.txt

Batch Use

Create a batch file, for example MutSig.sh:

#!/bin/bash

module load MutSig
MutSigCV \
  $MUTSIG_EX/LUSC.mutations.maf \
  $MUTSIG_REF/exome_full192.coverage.txt \
  $MUTSIG_REF/gene.covariates.txt \
  output \
  $MUTSIG_REF/mutation_type_dictionary_file.txt \
  $MUTSIG_REF/chr_files_hg19

Then submit, allocating an appropriate amount of memory:

$ sbatch --mem=12gb MutSig.sh

Documentation