MutSig analyzes lists of mutations discovered in DNA sequencing, to identify genes that were mutated more often than expected by chance given background mutation processes.
MutSigCV starts from the observation that the data is very sparse, and that there are usually too few silent mutations in a gene for its background mutation rate (BMR) to be estimated with any confidence. MutSigCV improves the BMR estimatation by pooling data from 'neighbor' genes in covariate space. These neighbor genes are chosen on the basis of having similar genomic properties to the central gene in question: properties such as DNA replication time, chromatin state (open/closed), and general level of transcription activity (e.g. highly transcribed vs. not transcribed at all). These genomic parameters have been observed to strongly correlate (co-vary) with background mutation rate. For instance, genes that replicate early in S-phase tend to have much lower mutation rates than late-replicating genes. Genes that are highly transcribed also tend to have lower mutation rates than unexpressed genes, due in part to the effects of transcription-coupled repair (TCR). Genes in closed chromatin (as measured by HiC or ChipSeq) have higher mutation rates than genes in open chromatin. Incorporating these covariates into the background model substantially reduces the number of false-positive findings.
You will need to create a mutations file in MAF format.
Allocate an interactive session and run the program.
Sample session (user input in bold):
[user@biowulf]$ sinteractive --mem=10g salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144 ~]$ [user@cn3144 ~]$ ml MutSig [user@cn3144 ~]$ MutSigCV \ $MUTSIG_EX/LUSC.mutations.maf \ $MUTSIG_REF/exome_full192.coverage.txt \ $MUTSIG_REF/gene.covariates.txt \ output \ $MUTSIG_REF/mutation_type_dictionary_file.txt \ $MUTSIG_REF/chr_files_hg19 [user@cn3144 ~]$ ls -l -rw-r--r-- 1 user user 510 Aug 29 12:47 output.categs.txt -rw-r--r-- 1 user user 8989673 Aug 29 12:47 output.coverage.txt -rw-r--r-- 1 user user 38760466 Aug 29 12:47 output.mutations.txt -rw-r--r-- 1 user user 750 Aug 29 12:43 output.mutcateg_discovery.txt -rw-r--r-- 1 user user 1397350 Aug 29 13:09 output.sig_genes.txt [user@cn3144 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
Create a batch input file (e.g. MutSig.sh). For example:
#!/bin/bash module load MutSig MutSigCV \ $MUTSIG_EX/LUSC.mutations.maf \ $MUTSIG_REF/exome_full192.coverage.txt \ $MUTSIG_REF/gene.covariates.txt \ output \ $MUTSIG_REF/mutation_type_dictionary_file.txt \ $MUTSIG_REF/chr_files_hg19
Submit this job using the Slurm sbatch command.
sbatch --mem=10g MutSig.sh10 GB memory is sufficient for this example job. You may need to increase the memory allocation for your own MutSig jobs.