High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
VarDictJava on NIH HPC Systems

VarDictJava is a variant discovery program written in Java and Perl. It is a partial Java port of the VarDict variant caller.

Citation: Lai Z, Markovets A, Ahdesmaki M, Chapman B, Hofmann O, McEwen R, Johnson J, Dougherty B, Barrett JC, and Dry JR. VarDict: a novel and versatile variant caller for next-generation sequencing in cancer research. Nucleic Acids Res. 2016, pii: gkw227.

Note: both VarDict and VarDictJava are installed, and can be accessed with module load VarDictJava. To run the perl version of VarDict, use lowercase vardict.

VarDict needs R, which cannot be run on the Biowulf login node. To run VarDict interactively, you need to run on Helix,Felix or in a Biowulf interactive session.

On Helix
Sample session:
[user@helix ~]$ module load VarDictJava
[+] Loading gdal 2.0 ...
[+] Loading proj 4.9.2 ...
[+] Loading gcc 4.9.1 ...
[+] Loading openmpi 1.10.0 for GCC 4.9.1
[+] Loading tcl_tk 8.6.3
[+] Loading pandoc 1.15.0.6 ...
[+] Loading R 3.2.3 on helix.nih.gov
[+] Loading VarDictJava 1.4.5 ...

[user@helix ~]$ export AF_THR="0.01"
[user@helix ~]$ VarDict -D -G /fdb/igenomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa \
-f $AF_THR -N sample_name -b /data/$USER/mydir/exampleBAM.bam -z -c 1 -S 2 -E 3 -g 4  \
/data/$USER/mydir/exampleBAM.bed | teststrandbias.R | var2vcf_valid.pl -N sample_name -E -f $AF_THR

[...]
Batch Job on Biowulf
Create a batch script along the following lines:
#!/bin/bash

cd /data/$USER/mydir

module load VarDictJava

AF_THR="0.01"
VarDict -D -G /fdb/igenomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa \
-f $AF_THR -N sample_name -b /data/$USER/mydir/exampleBAM.bam -z -c 1 -S 2 -E 3 -g 4  \
/data/$USER/mydir/exampleBAM.bed | teststrandbias.R | var2vcf_valid.pl -N sample_name -E -f $AF_THR
Submit this job to the batch system with:
sbatch jobscript.sh
Using VarDict interactively on Biowulf

Sample session (user input in bold):

[susanc@biowulf ~]$ sinteractive
salloc.exe: Pending job allocation 17961915
salloc.exe: job 17961915 queued and waiting for resources
salloc.exe: job 17961915 has been allocated resources
salloc.exe: Granted job allocation 17961915
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn1517 are ready for job

[susanc@cn1517 ~]$ module load VarDictJava
[+] Loading gdal 2.0 ...
[+] Loading proj 4.9.2 ...
[+] Loading gcc 4.9.1 ...
[+] Loading openmpi 1.10.0 for GCC 4.9.1
[+] Loading tcl_tk 8.6.3
[+] Loading pandoc 1.15.0.6 ...
[+] Loading R 3.2.3 on helix.nih.gov
[+] Loading VarDictJava 1.4.5 ...

[susanc@cn1517 ~]$ VarDict -D -G /fdb/igenomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa \
-f $AF_THR -N sample_name -b /data/$USER/mydir/exampleBAM.bam -z -c 1 -S 2 -E 3 -g 4  \
/data/$USER/mydir/exampleBAM.bed
[...]

[susanc@cn1517 ~]$ exit
logout

salloc.exe: Relinquishing job allocation 17961915
[susanc@biowulf ~]$

Documentation