Biowulf High Performance Computing at the NIH
MonoVar on Biowulf

Monovar is a statistical method for detecting and genotyping single-nucleotide variants in single-cell data. It does take onto account an allelic dropout, false-positive errors and a nonuniformity of coverage.

Reference:

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program.
Sample session (user input in bold):

[user@biowulf]$ sinteractive
[user@cn3144 ~]$ module load monovar
[+] Loading samtools 1.9  ...
[+] Loading bwa 0.7.17 on cn0868
[+] Loading monovar  20200403
Copy sample data to the current folder:
[user@cn3144 ~]$ cp $MONOVAR_DATA/* .
Define a reference sequence:
[user@cn3144 ~]$ ln -s /fdb/igenomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa ref.fa
Index the reference genome:
[user@cn3144 ~]$ bwa index ref.fa 
[bwa_index] Pack FASTA... 29.13 sec
[bwa_index] Construct BWT for the packed sequence...
[BWTIncCreate] textLength=6191387966, availableWord=447648912
[BWTIncConstructFromPacked] 10 iterations done. 99999998 characters processed.
[BWTIncConstructFromPacked] 20 iterations done. 199999998 characters processed.
[BWTIncConstructFromPacked] 30 iterations done. 299999998 characters processed.
[BWTIncConstructFromPacked] 40 iterations done. 399999998 characters processed.
...
[BWTIncConstructFromPacked] 650 iterations done. 6098386190 characters processed.
[BWTIncConstructFromPacked] 660 iterations done. 6127941854 characters processed.
[BWTIncConstructFromPacked] 670 iterations done. 6154206942 characters processed.
[BWTIncConstructFromPacked] 680 iterations done. 6177547390 characters processed.
[bwt_gen] Finished constructing BWT in 687 iterations.
[bwa_index] 2969.56 seconds elapse.
[bwa_index] Update BWT... 18.63 sec
[bwa_index] Pack forward-only FASTA... 18.17 sec
[bwa_index] Construct SA from BWT and Occ... 833.37 sec
[main] Version: 0.7.17-r1188
[main] CMD: bwa index ref.fa
[main] Real time: 3869.670 sec; CPU: 3868.872 sec
Run Monovar on the provided bam files:
[user@cn3144 ~]$ samtools mpileup -BQ0 -d10000 -f ref.fa -q 40 -b filenames.txt | monovar.py -p 0.002 -a 0.2 -t 0.05 -m 2 -f ref.fa -b filenames.txt -o output.vcf
[mpileup] 3 samples in 3 input files
[mplp_func] Skipping because 146016580 is outside of 145138636 [ref:8]
[mplp_func] Skipping because 146016580 is outside of 145138636 [ref:8]
[mplp_func] Skipping because 146016580 is outside of 145138636 [ref:8]
...
A file output.vcf will be produced:
[user@cn3144 ~]$ tail -n 10 output.vcf
chrY    21276176        .       T       C       323     PASS    AC=2;AF=0.50;AN=4;BaseQRankSum=0.00;DP=348;QD=0.92816091954;SOR=0.00;MPR=311.21;PSARR=173.00     GT:AD:DP:GQ:PL  0/1:0,166:166:1:3240,0,3240     ./.     0/1:0,180:182:1:3240,0,3240      <1X1>
chrY    21276178        .       T       G       323     PASS    AC=2;AF=0.50;AN=4;BaseQRankSum=1.23;DP=348;QD=0.92816091954;SOR=0.00;MPR=311.21;PSARR=173.00     GT:AD:DP:GQ:PL  0/1:1,165:166:1:3240,0,3240     ./.     0/1:1,181:182:1:3240,0,3240      <1X1>
chrY    21276179        .       A       G       323     PASS    AC=2;AF=0.50;AN=4;BaseQRankSum=0.00;DP=348;QD=0.92816091954;SOR=0.00;MPR=312.58;PSARR=173.00     GT:AD:DP:GQ:PL  0/1:0,165:166:1:3240,0,3240     ./.     0/1:0,181:182:1:3240,0,3240      <1X1>
chrY    21276180        .       G       A       323     PASS    AC=2;AF=0.50;AN=4;BaseQRankSum=0.00;DP=348;QD=0.92816091954;SOR=0.00;MPR=311.21;PSARR=174.00     GT:AD:DP:GQ:PL  0/1:0,166:166:1:3240,0,3240     ./.     0/1:0,182:182:1:3240,0,3240      <1X1>
chrY    21276181        .       T       G       323     PASS    AC=2;AF=0.50;AN=4;BaseQRankSum=0.00;DP=348;QD=0.92816091954;SOR=0.00;MPR=311.21;PSARR=174.00     GT:AD:DP:GQ:PL  0/1:0,166:166:1:3240,0,3240     ./.     0/1:0,182:182:1:3240,0,3240      <1X1>
chrY    21276183        .       T       A       323     PASS    AC=2;AF=0.50;AN=4;BaseQRankSum=1.29;DP=344;QD=0.938953488372;SOR=0.00;MPR=319.70;PSARR=112.67    GT:AD:DP:GQ:PL  0/1:2,162:165:1:3240,0,3240     ./.     0/1:1,176:179:1:3240,0,3240      <1X1>
chrY    21276184        .       T       C       323     PASS    AC=2;AF=0.50;AN=4;BaseQRankSum=0.00;DP=344;QD=0.938953488372;SOR=0.00;MPR=318.34;PSARR=172.00    GT:AD:DP:GQ:PL  0/1:0,165:165:1:1342,0,3240     ./.     0/1:0,179:179:1:3240,0,3240      <1X1>
chrY    21276186        .       T       G       4.57124728771   PASS    AC=2;AF=0.50;AN=4;BaseQRankSum=-0.59;DP=329;QD=0.0138943686557;SOR=inf;MPR=17.42;PSARR=0.04      GT:AD:DP:GQ:PL  0/1:143,8:155:1:39,0,177        ./.     0/1:170,4:174:1:12,0,177 <1X1>
chrY    21276187        .       T       A       323     PASS    AC=4;AF=1.00;AN=4;BaseQRankSum=0.00;DP=150;QD=2.15333333333;SOR=0.00;MPR=736.46;PSARR=74.50      GT:AD:DP:GQ:PL  1/1:0,69:70:0:1820,32,0 ./.     1/1:0,80:80:0:2089,32,0 <2X2>
Exit an interactive session:
[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. monovar.sh). For example:

#!/bin/bash
module load monovar
cp $MONOVAR_DATA/* .
ln -s /fdb/igenomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa ref.fa
bwa index ref.fa
samtools mpileup -BQ0 -d10000 -f ref.fa -q 40 -b filenames.txt | monovar.py -p 0.002 -a 0.2 -t 0.05 -m 2 -f ref.fa -b filenames.txt -o output.vcf

Submit this job using the Slurm sbatch command.

sbatch  [--mem=#g] monovar.sh