High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
snpEff

Description

snpEff is a variant annotation and effect prediction tool. It annotates and predicts the effects of variants on genes (such as amino acid changes).

Typical usage :

clinEff is a professional version of the snpEff and SnpSift packages, suitable for production in clincal labs. It's also available for use on Helix and Biowulf, and it's usege mirrors that of snpEff.

How to Use

snpEff

snpEff uses environment modules. Type

module load snpEff

at the prompt.

Environment variables set by the module:

snpEff is a java application. See https://hpc.nih.gov/development/java.html for information about running java applications.

To see the help menu, type

java -jar $SNPEFF_JAR

at the prompt.

By default, snpEff uses 1gb of memory. For large VCF input files, this may not be enough. To allocate 20gb of memory, use:

java -Xmx20g -jar $SNPEFF_JAR

In more detail,

java -Xmx20g -jar $SNPEFF_JAR -v [ database ] [ vcf file ] 

To see what databases are available, type:

ls $SNPEFF_HOME/data

snpSift

SnpSift is a collection of tools to manipulate VCF (variant call format) files. Here's what you can do:

Examples

These examples are available from $SNPEFF_HOME/../protocols, downloaded from http://snpeff.sourceforge.net.

Annotate against GRCh37.71:

module load snpEff
ln -s $SNPEFF_HOME/../protocols .
java -Xmx12g -jar $SNPEFF_JAR -v -lof -motif -hgvs -nextProt GRCh37.71 protocols/ex1.vcf > ex1.eff.vcf

Pull out 'HIGH IMPACT' or 'MODERATE IMPACT' variants:

cat ex.eff.vcf | java -jar $SNPSIFT_JAR filter "(Cases[0] = 3) & (Controls[0] = 0) & ((EFF[*].IMPACT = 'HIGH') | (EFF[*].IMPACT = 'MODERATE'))"  > ex1.filtered.vcf

Annotate against the dbNSFP database:

java -jar $SNPSIFT_JAR dbnsfp -v -db /fdb/dbNSFP2/dbNSFP2.9.txt.gz ex1.eff.vcf > file.annotated.vcf

How to run on Biowulf

When run on the Biowulf cluster, snpEff automatically matches the memory allocated to the job when the --mem= option is given. Under these circumstances, the -m option is not needed.

As a batch job

Create a batch script, for example:

Then submit it to the cluster. Since we are requiring 8 GB of memory, we include --mem=8g:

sbatch --mem=8g snpEff.sh

As a swarm job

Create a swarmfile containing commandlines (for example, the file is named "swarmfile"):

java -Xmx8 -jar $SNPEFF_JAR -t -v hg19 file1.vcf > file1.eff.vcf
java -Xmx8 -jar $SNPEFF_JAR -t -v hg19 file2.vcf > file2.eff.vcf
java -Xmx8 -jar $SNPEFF_JAR -t -v hg19 file3.vcf > file3.eff.vcf
java -Xmx8 -jar $SNPEFF_JAR -t -v hg19 file4.vcf > file4.eff.vcf

Then submit it to swarm like this, again allocating 8 gb of memory with the -g option:

swarm -g 8 --module snpEff --file swarmfile

Notes on previous versions

Versions previous to 3.5 used plain old tab-delimited annotation database file. More recent versions require tabix indexed database files.

Documentation