Biowulf High Performance Computing at the NIH


ANNOVAR is an efficient software tool to utilize update-to-date information to functionally annotate genetic variants detected from diverse genomes (including human genome hg18, hg19, as well as mouse, worm, fly, yeast and many others).


If you use ANNOVAR, please cite:

How to Use

There are multiple versions of ANNOVAR available. An easy way of selecting the version is to use modules. To see the modules available, type

module avail annovar

To select a module, type

module load annovar/[ver]

where [ver] is the version of choice. This will set your $PATH variable, as well as $ANNOVAR_HOME and $ANNOVAR_DATA.

ANNOVAR takes text-based input files, where each line corresponds to one variant. On each line, the first five space- or tab- delimited columns represent chromosome, start position, end position, the reference nucleotides and the observed nucleotides. Here is the example file $ANNOVAR_HOME/example/ex1.avinput

1	948921	948921	T	C	comments: rs15842, a SNP in 5' UTR of ISG15
1	1404001	1404001	G	T	comments: rs149123833, a SNP in 3' UTR of ATAD3C
1	5935162	5935162	A	T	comments: rs1287637, a splice site variant in NPHP4
1	162736463	162736463	C	T	comments: rs1000050, a SNP in Illumina SNP arrays
1	84875173	84875173	C	T	comments: rs6576700 or SNP_A-1780419, a SNP in Affymetrix SNP arrays
1	13211293	13211294	TC	-	comments: rs59770105, a 2-bp deletion
1	11403596	11403596	-	AT	comments: rs35561142, a 2-bp insertion
1	105492231	105492231	A	ATAAA	comments: rs10552169, a block substitution
1	67705958	67705958	G	A	comments: rs11209026 (R381Q), a SNP in IL23R associated with Crohn's disease
2	234183368	234183368	A	G	comments: rs2241880 (T300A), a SNP in the ATG16L1 associated with Crohn's disease
16	50745926	50745926	C	T	comments: rs2066844 (R702W), a non-synonymous SNP in NOD2
16	50756540	50756540	G	C	comments: rs2066845 (G908R), a non-synonymous SNP in NOD2
16	50763778	50763778	-	C	comments: rs2066847 (c.3016_3017insC), a frameshift SNP in NOD2
13	20763686	20763686	G	-	comments: rs1801002 (del35G), a frameshift mutation in GJB2, associated with hearing loss
13	20797176	21105944	0	-	comments: a 342kb deletion encompassing GJB6, associated with hearing loss

Reference files are pre-installed in $ANNOVAR_DATA/{build}, where {build} can be either hg18 or hg19. If other builds are needed, contact

At the command line, type

[helix]$ cp $ANNOVAR_HOME/example/ex1.avinput .
[helix]$ --geneanno --dbtype refGene --buildver hg19 ex1.avinput $ANNOVAR_DATA/hg19

The script allows running for a single input against multiple databases simultaneously using multiple cpus. Here is an example: ex1.avinput $ANNOVAR_DATA/hg19 \
  --tempdir /path/to/temporary/directory \
  --thread 4 \
  --buildver hg19 \
  --outfile ex1.out \
  --remove \
  --protocol gene,avsift,ljb26_all,dbnsfp30a,cg46,dbscsnv11,cosmic64,cosmic70,exac03,exac03nontcga,1000g2015aug_all,1000g2012apr_all,snp138,avsnp147,clinvar_20160302 \
  --operation g,f,f,f,f,f,f,f,f,f,f,f,f,f,f \
  --nastring ''

Type --help for more information about running.

Biowulf Cluster Use


Create an sbatch file (

module load annovar --geneanno --dbtype gene --buildver hg19 ex1.avinput $ANNOVAR_DATA/hg19

Then submit, supplying the appropriate sbatch options to ensure 8 cpus (to match the --threads option) on a single node:



The easiest way to run ANNOVAR with multiple VCF files is via swarm. Create a file containing these lines: -format vcf4 file1.vcf > file1.inp; --geneanno --dbtype gene --buildver hg19 file1.inp $ANNOVAR_DATA/hg19 -format vcf4 file2.vcf > file2.inp; --geneanno --dbtype gene --buildver hg19 file2.inp $ANNOVAR_DATA/hg19 -format vcf4 file3.vcf > file3.inp; --geneanno --dbtype gene --buildver hg19 file3.inp $ANNOVAR_DATA/hg19 -format vcf4 file4.vcf > file4.inp; --geneanno --dbtype gene --buildver hg19 file4.inp $ANNOVAR_DATA/hg19

Then submit with the --module option:

swarm -f swarmfile --module annovar