ANNOVAR is an efficient software tool to utilize update-to-date information to functionally annotate genetic variants detected from diverse genomes (including human genome {hg18,hg19,hg38} as well as mouse, worm, fly, yeast and many others).
ANNOVAR takes text-based input files, where each line corresponds to one variant. On each line, the first five space- or tab- delimited columns represent chromosome, start position, end position, the reference nucleotides and the observed nucleotides. Here is the example file $ANNOVAR_HOME/example/ex1.avinput
1 948921 948921 T C comments: rs15842, a SNP in 5' UTR of ISG15 1 1404001 1404001 G T comments: rs149123833, a SNP in 3' UTR of ATAD3C 1 5935162 5935162 A T comments: rs1287637, a splice site variant in NPHP4 1 162736463 162736463 C T comments: rs1000050, a SNP in Illumina SNP arrays 1 84875173 84875173 C T comments: rs6576700 or SNP_A-1780419, a SNP in Affymetrix SNP arrays 1 13211293 13211294 TC - comments: rs59770105, a 2-bp deletion 1 11403596 11403596 - AT comments: rs35561142, a 2-bp insertion 1 105492231 105492231 A ATAAA comments: rs10552169, a block substitution 1 67705958 67705958 G A comments: rs11209026 (R381Q), a SNP in IL23R associated with Crohn's disease 2 234183368 234183368 A G comments: rs2241880 (T300A), a SNP in the ATG16L1 associated with Crohn's disease 16 50745926 50745926 C T comments: rs2066844 (R702W), a non-synonymous SNP in NOD2 16 50756540 50756540 G C comments: rs2066845 (G908R), a non-synonymous SNP in NOD2 16 50763778 50763778 - C comments: rs2066847 (c.3016_3017insC), a frameshift SNP in NOD2 13 20763686 20763686 G - comments: rs1801002 (del35G), a frameshift mutation in GJB2, associated with hearing loss 13 20797176 21105944 0 - comments: a 342kb deletion encompassing GJB6, associated with hearing loss
Reference files are pre-installed in $ANNOVAR_DATA/{build}, where {build} can be either hg18, hg19 or hg38. If other builds are needed, contact staff@hpc.nih.gov. To list all builds currently available, type
ls $ANNOVAR_DATA
Allocate an interactive session and run the program.
Sample session (user input in bold):
[user@biowulf]$ sinteractive -t 4 --gres=lscratch:20 salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144 ~]$ module load annovar [user@cn3144 ~]$ cp $ANNOVAR_HOME/example/ex1.avinput . [user@cn3144 ~]$ annotate_variation.pl --geneanno --dbtype refGene --buildver hg38 ex1.avinput $ANNOVAR_DATA/hg38 [user@cn3144 ~]$ table_annovar.pl ex1.avinput $ANNOVAR_DATA/hg38 \ --tempdir /lscratch/$SLURM_JOB_ID \ --thread $SLURM_CPUS_ON_NODE \ --buildver hg38 \ --outfile ex1.out \ --remove \ --protocol gene,clinvar_20220320,cosmic70,ljb26_all,avsnp150,cadd_1.5 \ --operation g,f,f,f,f,f \ --nastring '' [user@cn3144 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
Create a batch input file (e.g. annovar.sh). For example:
#!/bin/bash set -e module load annovar annotate_variation.pl --geneanno --dbtype gene --buildver hg38 ex1.avinput $ANNOVAR_DATA/hg38
Submit this job using the Slurm sbatch command.
sbatch [--cpus-per-task=#] [--mem=#] annovar.sh
Create a swarmfile (e.g. annovar.swarm). For example:
convert2annovar.pl -format vcf4 file1.vcf > file1.inp; annotate_variation.pl --geneanno --dbtype gene --buildver hg38 file1.inp $ANNOVAR_DATA/hg38 convert2annovar.pl -format vcf4 file2.vcf > file2.inp; annotate_variation.pl --geneanno --dbtype gene --buildver hg38 file2.inp $ANNOVAR_DATA/hg38 convert2annovar.pl -format vcf4 file3.vcf > file3.inp; annotate_variation.pl --geneanno --dbtype gene --buildver hg38 file3.inp $ANNOVAR_DATA/hg38 convert2annovar.pl -format vcf4 file4.vcf > file4.inp; annotate_variation.pl --geneanno --dbtype gene --buildver hg38 file4.inp $ANNOVAR_DATA/hg38
Submit this job using the swarm command.
swarm -f annovar.swarm [-g #] [-t #] --module annovarwhere
-g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
-t # | Number of threads/CPUs required for each process (1 line in the swarm command file). |
--module annovar | Loads the annovar module for each subjob in the swarm |