The program "bam2mpg" calls genotypes from sequence reads of haploid or diploid DNA aligned to a closely-related reference sequence. The program reads alignments in BAM format (http://samtools.sourceforge.net). The MPG (Most Probable Genotype) algorithm is based on a Bayesian model which simulates sampling from one or two alleles with sequencing error, and then calculates the likelihood of each possible genotype given the observed sequence data. Using prior probabilities dependent on the expected heterozygosity of the sequence, MPG then predicts the "Most Probable Genotype" at each site, along with quality scores estimating the accuracy of the calls.
bam2mpg was developed by Nancy Fisher Hansen at NHGRI. (nhansen@mail.nih.gov)
Create a batch input file (e.g. bam2mpg.sh). For example:
#!/bin/bash module load bam2mpg bam2mpg --bam_filter '-q31' --qual_filter 20 --only_nonref ref.fasta aln.sorted.bam --mpg aln.mpg.out
Submit this job using the Slurm sbatch command.
sbatch --mem=12g bam2mpg.sh
The command above allocates 12 GB of memory for this job. You may need to modify this value for your own jobs.
Create a swarmfile (e.g. template.swarm). For example:
bam2mpg --qual_filter 20 --region chr1 --mpg aln.chr1.mpg.out ref.fasta aln.sort.bam bam2mpg --qual_filter 20 --region chr2 --mpg aln.chr2.mpg.out ref.fasta aln.sort.bam bam2mpg --qual_filter 20 --region chr3 --mpg aln.chr3.mpg.out ref.fasta aln.sort.bam bam2mpg --qual_filter 20 --region chr4 --mpg aln.chr4.mpg.out ref.fasta aln.sort.bam [...etc...]
Submit this job using the swarm command.
swarm -g 12 -f bam2mpg.swarmto request a memory allocation of 12 Gigabytes for each bam2mpg process. You may need to modify this value.
Allocate an interactive node and run bam2mpg there. Sample session:
[user@biowulf ~]$ sinteractive salloc.exe: Granted job allocation 4409909 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn1883 are ready for job [user@cn1883 ~]$ module load bam2mpg [+] Loading Perl 5.18.2 ... [+] Loading samtools 1.2 ... [user@cn1883 ~]$ bam2mpg --qual_filter 20 --snv_vcf aln.snv.vcf.gz --div_vcf aln.div.vcf.gz ref.fasta aln.sorted.bam [...] [user@cn1883 ~]$ exit alloc.exe: Relinquishing job allocation 4409909 salloc.exe: Job allocation 4409909 has been revoked. [user@biowulf ~]$