MonoVar on Biowulf
Monovar is a statistical method for detecting and genotyping single-nucleotide variants in single-cell data. It does take onto account an allelic dropout, false-positive errors and a nonuniformity of coverage.
Reference:
- Hamim Zafar, Yong Wang, Luay Nakhleh, Nicholas Navin and Ken Chen
Monovar: single-nucleotide variant detection in single cells.
Nature Methods 13, pages 505–507 (2016).
Documentation
Important Notes
- Module Name: MonoVar (see the modules page for more information)
- Sample data in /usr/local/apps/monovar/20200403/sample_data
- Unusual environment variables set
- MONOVAR_HOME installation directory
- MONOVAR_BIN executable directory
- MONOVAR_DATA sample testing data
Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.
Allocate an interactive session and run the program.
Sample session (user input in bold):
[user@biowulf]$ sinteractive [user@cn3144 ~]$ module load monovar [+] Loading samtools 1.9 ... [+] Loading bwa 0.7.17 on cn0868 [+] Loading monovar 20200403Copy sample data to the current folder:
[user@cn3144 ~]$ cp $MONOVAR_DATA/* .Define a reference sequence:
[user@cn3144 ~]$ ln -s /fdb/igenomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa ref.faIndex the reference genome:
[user@cn3144 ~]$ bwa index ref.fa [bwa_index] Pack FASTA... 29.13 sec [bwa_index] Construct BWT for the packed sequence... [BWTIncCreate] textLength=6191387966, availableWord=447648912 [BWTIncConstructFromPacked] 10 iterations done. 99999998 characters processed. [BWTIncConstructFromPacked] 20 iterations done. 199999998 characters processed. [BWTIncConstructFromPacked] 30 iterations done. 299999998 characters processed. [BWTIncConstructFromPacked] 40 iterations done. 399999998 characters processed. ... [BWTIncConstructFromPacked] 650 iterations done. 6098386190 characters processed. [BWTIncConstructFromPacked] 660 iterations done. 6127941854 characters processed. [BWTIncConstructFromPacked] 670 iterations done. 6154206942 characters processed. [BWTIncConstructFromPacked] 680 iterations done. 6177547390 characters processed. [bwt_gen] Finished constructing BWT in 687 iterations. [bwa_index] 2969.56 seconds elapse. [bwa_index] Update BWT... 18.63 sec [bwa_index] Pack forward-only FASTA... 18.17 sec [bwa_index] Construct SA from BWT and Occ... 833.37 sec [main] Version: 0.7.17-r1188 [main] CMD: bwa index ref.fa [main] Real time: 3869.670 sec; CPU: 3868.872 secRun Monovar on the provided bam files:
[user@cn3144 ~]$ samtools mpileup -BQ0 -d10000 -f ref.fa -q 40 -b filenames.txt | monovar.py -p 0.002 -a 0.2 -t 0.05 -m 2 -f ref.fa -b filenames.txt -o output.vcf [mpileup] 3 samples in 3 input files [mplp_func] Skipping because 146016580 is outside of 145138636 [ref:8] [mplp_func] Skipping because 146016580 is outside of 145138636 [ref:8] [mplp_func] Skipping because 146016580 is outside of 145138636 [ref:8] ...A file output.vcf will be produced:
[user@cn3144 ~]$ tail -n 10 output.vcf chrY 21276176 . T C 323 PASS AC=2;AF=0.50;AN=4;BaseQRankSum=0.00;DP=348;QD=0.92816091954;SOR=0.00;MPR=311.21;PSARR=173.00 GT:AD:DP:GQ:PL 0/1:0,166:166:1:3240,0,3240 ./. 0/1:0,180:182:1:3240,0,3240 <1X1> chrY 21276178 . T G 323 PASS AC=2;AF=0.50;AN=4;BaseQRankSum=1.23;DP=348;QD=0.92816091954;SOR=0.00;MPR=311.21;PSARR=173.00 GT:AD:DP:GQ:PL 0/1:1,165:166:1:3240,0,3240 ./. 0/1:1,181:182:1:3240,0,3240 <1X1> chrY 21276179 . A G 323 PASS AC=2;AF=0.50;AN=4;BaseQRankSum=0.00;DP=348;QD=0.92816091954;SOR=0.00;MPR=312.58;PSARR=173.00 GT:AD:DP:GQ:PL 0/1:0,165:166:1:3240,0,3240 ./. 0/1:0,181:182:1:3240,0,3240 <1X1> chrY 21276180 . G A 323 PASS AC=2;AF=0.50;AN=4;BaseQRankSum=0.00;DP=348;QD=0.92816091954;SOR=0.00;MPR=311.21;PSARR=174.00 GT:AD:DP:GQ:PL 0/1:0,166:166:1:3240,0,3240 ./. 0/1:0,182:182:1:3240,0,3240 <1X1> chrY 21276181 . T G 323 PASS AC=2;AF=0.50;AN=4;BaseQRankSum=0.00;DP=348;QD=0.92816091954;SOR=0.00;MPR=311.21;PSARR=174.00 GT:AD:DP:GQ:PL 0/1:0,166:166:1:3240,0,3240 ./. 0/1:0,182:182:1:3240,0,3240 <1X1> chrY 21276183 . T A 323 PASS AC=2;AF=0.50;AN=4;BaseQRankSum=1.29;DP=344;QD=0.938953488372;SOR=0.00;MPR=319.70;PSARR=112.67 GT:AD:DP:GQ:PL 0/1:2,162:165:1:3240,0,3240 ./. 0/1:1,176:179:1:3240,0,3240 <1X1> chrY 21276184 . T C 323 PASS AC=2;AF=0.50;AN=4;BaseQRankSum=0.00;DP=344;QD=0.938953488372;SOR=0.00;MPR=318.34;PSARR=172.00 GT:AD:DP:GQ:PL 0/1:0,165:165:1:1342,0,3240 ./. 0/1:0,179:179:1:3240,0,3240 <1X1> chrY 21276186 . T G 4.57124728771 PASS AC=2;AF=0.50;AN=4;BaseQRankSum=-0.59;DP=329;QD=0.0138943686557;SOR=inf;MPR=17.42;PSARR=0.04 GT:AD:DP:GQ:PL 0/1:143,8:155:1:39,0,177 ./. 0/1:170,4:174:1:12,0,177 <1X1> chrY 21276187 . T A 323 PASS AC=4;AF=1.00;AN=4;BaseQRankSum=0.00;DP=150;QD=2.15333333333;SOR=0.00;MPR=736.46;PSARR=74.50 GT:AD:DP:GQ:PL 1/1:0,69:70:0:1820,32,0 ./. 1/1:0,80:80:0:2089,32,0 <2X2>Exit an interactive session:
[user@cn3144 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
Batch job
Most jobs should be run as batch jobs.
Create a batch input file (e.g. monovar.sh). For example:
#!/bin/bash module load monovar cp $MONOVAR_DATA/* . ln -s /fdb/igenomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa ref.fa bwa index ref.fa samtools mpileup -BQ0 -d10000 -f ref.fa -q 40 -b filenames.txt | monovar.py -p 0.002 -a 0.2 -t 0.05 -m 2 -f ref.fa -b filenames.txt -o output.vcf
Submit this job using the Slurm sbatch command.
sbatch [--mem=#g] monovar.sh