High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
VEP

Description

VEP (Variant Effect Predictor) determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.

Source/Citation

How to Use

There are multiple versions of VEP available. An easy way of selecting the version is to use environment modules. To see the versions available, type

module avail VEP

To select a version, type

module load VEP/[ver]

where [ver] is the version of choice.

NOTE: by default VEP requires internet connectivity to the Ensembl databases. THIS IS NOT POSSIBLE ON THE BIOWULF CLUSTER!. Instead, the databases have been locally cached into a version-specific directory ($VEPCACHEDIR, as set by the VEP module), allowing for offline analysis.

This requires including these options for all commands:

--offline --cache --dir_cache $VEPCACHEDIR

Commands

NOTE: VEP versions > 88 have removed the gtf2vep.pl command and replaced variant_effect_predictor.pl with vep. See https://github.com/Ensembl/ensembl-vep for more details.

Reference Files

--assembly is needed for human sequences because there are two available (GRCh37 and GRCh38). For cat, dog, and mouse, no assembly required.

Example

Plugins

There are a large number of plugins available for use with VEP. Some of these plugins require third-party reference data. Most of this data is available within $VEPCACHEDIR, but some are available in the /fdb tree. Here is an example for using plugins:

ver=90
module load VEP/${ver}
vep \
 -i $VEPHOME/examples/homo_sapiens_GRCh38.vcf \
 -o example.out \
 --offline \
 --cache \
 --force_overwrite \
 --dir_cache $VEPCACHEDIR \
 --species human \
 --assembly GRCh38 \
 --fasta $VEPCACHEDIR/GRCh38.fa \
 --plugin CSN \
 --plugin Blosum62 \
 --plugin Carol \
 --plugin Condel,$VEPCACHEDIR/Plugins/config/Condel/config,b \
 --plugin Phenotypes \
 --plugin ExAC,$VEPCACHEDIR/ExAC.r0.3.sites.vep.vcf.gz \
 --plugin GeneSplicer,$GS/bin/genesplicer,$GS/human,context=200 \
 --plugin CADD,$VEPCACHEDIR/whole_genome_SNVs.tsv.gz,$VEPCACHEDIR/InDels.tsv.gz \
 --plugin Downstream \
 --plugin LoFtool \
 --plugin Gwava,tss,$VEPCACHEDIR/gwava_scores.bed.gz \
 --plugin FATHMM,"python $VEPCACHEDIR/fathmm.py" \
 --af_gnomad \
 --custom $VEPCACHEDIR/gnomad.exomes.r2.0.1.sites.GRCh38.noVEP.vcf.gz,gnomADg,vcf,exact,0,AF_AFR,AF_AMR,AF_ASJ,AF_EAS,AF_FIN,AF_NFE,AF_OTH

For more information about plugins, type

perldoc $VEPCACHEDIR/Plugins/[name].pm

where [name] is the name of the plugin.

Sample Swarm

NOTE: By default, variant_effect_predictor.pl will write to the same output file ("variant_effect_output.txt") unless directed to do otherwise using the --output option. For swarms of multiple runs, be sure to include this option.

Then submit to swarm:

swarm --module VEP --file swarmfile

Documentation