High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
MUMmer on Biowulf & Helix

MUMmer is an open source software package for the rapid alignment of very large DNA and amino acid sequences. MUMmer relies on a suffix tree data structure for efficient pattern matching. Suffix trees are suited for large data sets because they can be constructed and searched in linear time and space. This allows mummer to find all 20 base pair maximal exact matches between two ~5 million base pair bacterial genomes in 10 seconds, using 90 MB of RAM, on a typical Biowulf cpu.

MUMmer on Helix

To run MUMmer, type 'module load mummer', and then the desired mummer command. Sample session comparing Human Genome chromosome 1 against chromosome 2.

helix% module load mummer

helix% mummer  /fdb/genome/hg19/chr1.fa   /fdb/genome/hg19/chr2.fa > chr1-x-chr2.mummer.out

Running a MUMmer batch job on Biowulf

Create a batch script along the following lines. This sample script compares chr 1 against the hg19 human genome.

#!/bin/bash

module load mummer

mummer /fdb/genome/hg19/chrX.fa   /fdb/genome/hg19/chr_all.fa > chrX-v-all.mummer.out

MUMmer is a single-threaded program and should therefore be submitted to a single cpu on Biowulf. However, MUMmer runs may require more than the default 4 GB of memory. An approximate memory requirement can be estimated by using the human vs human table at the MUMmer website. An additional data point: comparing hg19 chrX against the entire genome (chr_all.fa) as in the example above, required 5.3 GB of memory and took 45 mins to complete.

The job above should be submitted with:

sbatch --mem=6g mummer.bat

Running a swarm of MUMmer jobs

Set up a swarm command file along the following lines. This sample file compares mouse chromosome 1 against each of the other mouse chromosomes.

mummer   /fdb/genome/mm9/chr1.fa    /fdb/genome/mm9/chr2.fa > chr1x2.mummer.out
mummer   /fdb/genome/mm9/chr1.fa    /fdb/genome/mm9/chr3.fa > chr1x3.mummer.out
mummer   /fdb/genome/mm9/chr1.fa    /fdb/genome/mm9/chr4.fa > chr1x4.mummer.out
mummer   /fdb/genome/mm9/chr1.fa    /fdb/genome/mm9/chr5.fa > chr1x5.mummer.out
...etc...

Submit it with:

swarm -g 5 -f swarmfile
This submission commands tells swarm that each process (one line in the swarm command file above) requires 5 GB of memory. This is a best guess based on previous runs.

Documentation

MUMmer website