High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
bam2mpg on HPC Systems

The program "bam2mpg" calls genotypes from sequence reads of haploid or diploid DNA aligned to a closely-related reference sequence. The program reads alignments in BAM format (http://samtools.sourceforge.net). The MPG (Most Probable Genotype) algorithm is based on a Bayesian model which simulates sampling from one or two alleles with sequencing error, and then calculates the likelihood of each possible genotype given the observed sequence data. Using prior probabilities dependent on the expected heterozygosity of the sequence, MPG then predicts the "Most Probable Genotype" at each site, along with quality scores estimating the accuracy of the calls.

bam2mpg was developed by Nancy Fisher Hansen at NHGRI. (nhansen@mail.nih.gov)

On Helix

Helix is only suited for short, infrequent jobs. If you have lots of bam2mpg jobs to run, please use Biowulf.

Sample session:

helix%  module load bam2mpg
helix%  bam2mpg --bam_filter '-q31' --qual_filter 20 --only_nonref --snv_vcf aln.snv.vcf --div_vcf aln.div.vcf ref.fasta aln.sorted.bam
Batch job on Biowulf

Create a batch input file (e.g. bam2mpg.sh). For example:

#!/bin/bash

module load bam2mpg
bam2mpg --bam_filter '-q31' --qual_filter 20 --only_nonref ref.fasta aln.sorted.bam --mpg aln.mpg.out

Submit this job using the Slurm sbatch command.

sbatch  --mem=12g bam2mpg.sh

The command above allocates 12 GB of memory for this job. You may need to modify this value for your own jobs.

Swarm of Jobs on Biowulf

Create a swarmfile (e.g. template.swarm). For example:

bam2mpg --qual_filter 20 --region chr1 --mpg aln.chr1.mpg.out ref.fasta aln.sort.bam
bam2mpg --qual_filter 20 --region chr2 --mpg aln.chr2.mpg.out ref.fasta aln.sort.bam
bam2mpg --qual_filter 20 --region chr3 --mpg aln.chr3.mpg.out ref.fasta aln.sort.bam
bam2mpg --qual_filter 20 --region chr4 --mpg aln.chr4.mpg.out ref.fasta aln.sort.bam
[...etc...]

Submit this job using the swarm command.

swarm -g 12  -f bam2mpg.swarm 
to request a memory allocation of 12 Gigabytes for each bam2mpg process. You may need to modify this value.
Interactive job on Biowulf

Allocate an interactive node and run bam2mpg there. Sample session:

[user@biowulf ~]$ sinteractive
salloc.exe: Granted job allocation 4409909
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn1883 are ready for job

[user@cn1883 ~]$ module load bam2mpg
[+] Loading Perl 5.18.2 ...
[+] Loading samtools 1.2 ...

[user@cn1883 ~]$ bam2mpg --qual_filter 20 --snv_vcf aln.snv.vcf.gz --div_vcf aln.div.vcf.gz ref.fasta aln.sorted.bam
[...]

[user@cn1883 ~]$ exit
alloc.exe: Relinquishing job allocation 4409909
salloc.exe: Job allocation 4409909 has been revoked.
[user@biowulf ~]$

Documentation