High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
ANNOVAR

Description

ANNOVAR is an efficient software tool to utilize update-to-date information to functionally annotate genetic variants detected from diverse genomes (including human genome hg18, hg19, as well as mouse, worm, fly, yeast and many others).

Citation

If you use ANNOVAR, please cite:

How to Use

There are multiple versions of ANNOVAR available. An easy way of selecting the version is to use modules. To see the modules available, type

module avail annovar

To select a module, type

module load annovar/[ver]

where [ver] is the version of choice. This will set your $PATH variable, as well as $ANNOVAR_HOME and $ANNOVAR_DATA.

ANNOVAR takes text-based input files, where each line corresponds to one variant. On each line, the first five space- or tab- delimited columns represent chromosome, start position, end position, the reference nucleotides and the observed nucleotides. Here is the example file $ANNOVAR_HOME/example/ex1.avinput

Reference files are pre-installed in $ANNOVAR_DATA/{build}, where {build} can be either hg18 or hg19. If other builds are needed, contact staff@hpc.nih.gov.

At the command line, type

[helix]$ cp $ANNOVAR_HOME/example/ex1.avinput .
[helix]$ annotate_variation.pl --geneanno --dbtype refGene --buildver hg19 ex1.avinput $ANNOVAR_DATA/hg19

table_annovar.pl

The table_annovar.pl script allows running annotate_variation.pl for a single input against multiple databases simultaneously using multiple cpus. Here is an example:

Type table_annovar.pl --help for more information about running.

Biowulf Cluster Use

sbatch

Create an sbatch file (script.sh):

#!/bin/bash
module load annovar
annotate_variation.pl --geneanno --dbtype gene --buildver hg19 ex1.avinput $ANNOVAR_DATA/hg19

Then submit, supplying the appropriate sbatch options to ensure 8 cpus (to match the --threads option) on a single node:

sbatch script.sh

swarm

The easiest way to run ANNOVAR with multiple VCF files is via swarm. Create a file containing these lines:

Then submit with the --module option:

swarm -f swarmfile --module annovar

Notes On Reference Files

Some of the reference files for ANNOVAR are updated on a regular basis. The environment variable $ANNOVAR_DATA is set to the reference files as they existed at the time that ANNOVAR was updated. As a consequence, some of the reference files are not current. In order to use the most current, up-to-date reference files for ANNOVAR, use /fdb/annovar/current as the base directory for reference files. For example,

annotate_variation.pl --geneanno --dbtype refGene --buildver hg19 ex1.avinput /fdb/annovar/current/hg19

Alternatively, the environment variable $ANNOVAR_DATA_CURRENT can be used instead:

annotate_variation.pl --geneanno --dbtype refGene --buildver hg19 ex1.avinput $ANNOVAR_DATA_CURRENT/hg19

Please note that the reference files in /fdb/annovar/current are subject to change. This means that identical ANNOVAR jobs run on different days may give different results. For more information, contact staff@hpc.nih.gov.

Documentation