High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
Genome Mapping and Assembly with MAQ on Biowulf & Helix

Maq stands for Mapping and Assembly with Quality. It builds assembly by mapping short reads to reference sequence.

Maq is a software that builds mapping assemblies from short reads generated by the next-generation sequencing machines. It is particularly designed for Illumina-Solexa 1G Genetic Analyzer, and has preliminary functions to handle ABI SOLiD data.

Maq first aligns reads to reference sequences and then calls the consensus. At the mapping stage, maq performs ungapped alignment. For single-end reads, maq is able to find all hits with up to 2 or 3 mismatches, depending on a command-line option; for paired-end reads, it always finds all paired hits with one of the two reads containing up to 1 mismatch. At the assembling stage, maq calls the consensus based on a statistical model. It calls the base which maximizes the posterior probability and calculates a phred quality at each position along the consensus. Heterozygotes are also called in this process.

Maq is a project hosted by SourceForge.net. The project page is available at http://sourceforge.net/projects/maq/.

Example files can be copied from:

$ mkdir /data/$USER/maq
$ cp /usr/local/apps/maq/ref.fasta /data/$USER/maq
$ cp /usr/local/apps/maq/calib-36.dat /data/$USER/maq

Running on Helix

$ module load maq
$ cd /data/$USER/maq
$ maq.pl demo ref.fasta calib-36.dat

Running a single batch job on Biowulf

1. Create a script file. The file will contain the lines similar to the lines below.

#!/bin/bash


module load maq
cd /data/$USER/maq
maq.pl demo ref.fasta calib-36.dat

2. Submit the script on biowulf:

$ sbatch jobscript

If more momory is required (default 4gb), specify --mem=Mg, for example

$ sbatch --mem=10g jobscript

Running a swarm of jobs on Biowulf

Setup a swarm command file:

  cd /data/$USER/dir1; maq.pl demo ref.fasta calib-36.dat
  cd /data/$USER/dir2; maq.pl demo ref.fasta calib-36.dat
  cd /data/$USER/dir2; maq.pl demo ref.fasta calib-36.dat
	[......]
  

Submit the swarm file, -f specify the swarmfile name, and --module will be loaded the required module for each command line in the file:

  $ swarm -f swarmfile --module maq

If more memory is needed for each line of commands, the below example allocate 10g for each command:

  $ swarm -f swarmfile -g 10 --module maq
For more information regarding running swarm, see swarm.html

Running an interactive job on Biowulf

It may be useful for debugging purposes to run jobs interactively. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.

biowulf$ sinteractive 
salloc.exe: Granted job allocation 16535

cn999$ module load maq
cn999$ cd /data/$USER/dir
cn999$ maq.pl demo ref.fasta calib-36.dat
[...etc...]

cn999$ exit
exit

biowulf$

Make sure to exit the job once finished.

If more memory is needed, use --mem. For example

biowulf$ sinteractive --mem=8g

Documentation

http://maq.sourceforge.net/index.shtml