High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
MuTect

Description

MuTect is a method developed at the Broad Institute for the reliable and accurate identification of somatic point mutations in next generation sequencing data of cancer genomes.

NOTE: MuTect has been merged into the most recent versions of GATK, and is no longer supported as a separate application. Please see https://hpc.nih.gov/apps/GATK.html for information on using GATK.

Environment variables set

How to Use

MuTect uses environment modules. Type

module load muTect

at the prompt. Then type

muTect

Two extra options have been added to allow for memory allocation and temporary file directory.

  • --memory memory allocated (default = 2gb)
  • --tmpdir tmpdir location (default = /scratch)

By default, muTect uses 2gb of memory. To allocate 5gb of memory, include --memory 5g on the commandline.

NOTE: muTect uses code base from GATK, and therefore has many of the same options. One option, -nt or --num_threads DOES NOT work properly. DO NOT use this option.

MuTect requires two BAM input files, one for normal tissues, the other for the tumor tissue. MuTect outputs a wiggle format coverage file. An additional wiggle file can be generated to display observed depth.

MuTect takes as parameters database files, depending on the build of your alignments and which dbSNP version you are using. These files are located in /fdb/muTect.

A typical batch script for an 8gb memory, single-threaded job would be as follows:

Then submit to the batch system:

sbatch --mem=8gb muTect.run

The reference files in the above example are for alignments against the UCSC reference genome. For alignments against the Ensembl/NCBI/1000genomes reference genome, use:

--reference_sequence /fdb/muTect/human_g1k_v37.fasta \
--dbsnp              /fdb/muTect/dbsnp_137.b37.vcf \
--cosmic             /fdb/muTect/cosmic_v67.b37.vcf \

Documentation