Biowulf High Performance Computing at the NIH
snpEff on Biowulf

snpEff is a variant annotation and effect prediction tool. It annotates and predicts the effects of variants on genes (such as amino acid changes).

Typical usage :

References:

Documentation
Important Notes

snpEff is a java application. See https://hpc.nih.gov/development/java.html for information about running java applications.

To see the help menu, type

java -jar $SNPEFF_JAR

at the prompt.

By default, snpEff uses 1gb of memory. For large VCF input files, this may not be enough. To allocate 20gb of memory, use:

java -Xmx20g -jar $SNPEFF_JAR

In more detail,

java -Xmx20g -jar $SNPEFF_JAR -v [ database ] [ vcf file ] 

To see what databases are available, type:

ls $SNPEFF_HOME/data

snpSift

SnpSift is a collection of tools to manipulate VCF (variant call format) files. Here's what you can do:

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load snpEff
[user@cn3144 ~]$ ln -s $SNPEFF_HOME/../protocols .
[user@cn3144 ~]$ java -Xmx12g -jar $SNPEFF_JAR -v -lof -motif -hgvs -nextProt GRCh37.71 protocols/ex1.vcf > ex1.eff.vcf
[user@cn3144 ~]$ cat ex.eff.vcf | java -jar $SNPSIFT_JAR filter "(Cases[0] = 3) & (Controls[0] = 0) & ((EFF[*].IMPACT = 'HIGH') | (EFF[*].IMPACT = 'MODERATE'))"  > ex1.filtered.vcf
[user@cn3144 ~]$ java -jar $SNPSIFT_JAR dbnsfp -v -db /fdb/dbNSFP2/dbNSFP2.9.txt.gz ex1.eff.vcf > file.annotated.vcf

[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. snpEff.sh). For example:

#!/bin/bash
# -- this file is snpEff.sh --

module load snpEff
ln -s $SNPEFF_HOME/example/file.vcf .
java -Xmx${SLURM_MEM_PER_NODE} -jar $SNPEFF_JAR -v hg19 file.vcf > file.eff.vcf
cat file.eff.vcf | java -jar $SNPSIFT_JAR filter "( EFF[*].IMPACT = 'HIGH' )" > file.filtered.vcf
java -jar $SNPSIFT_JAR dbnsfp -v -db /fdb/dbNSFP2/dbNSFP3.2a.txt.gz file.eff.vcf > file.annotated.vcf

Submit this job using the Slurm sbatch command.

sbatch [--cpus-per-task=#] [--mem=#] snpEff.sh
Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile (e.g. snpEff.swarm). For example:

java -Xmx${SLURM_MEM_PER_NODE} -jar $SNPEFF_JAR -v hg19 file1.vcf > file1.eff.vcf
java -Xmx${SLURM_MEM_PER_NODE} -jar $SNPEFF_JAR -v hg19 file2.vcf > file2.eff.vcf
java -Xmx${SLURM_MEM_PER_NODE} -jar $SNPEFF_JAR -v hg19 file3.vcf > file3.eff.vcf
java -Xmx${SLURM_MEM_PER_NODE} -jar $SNPEFF_JAR -v hg19 file4.vcf > file4.eff.vcf

Submit this job using the swarm command.

swarm -f snpEff.swarm [-g #] --module snpEff
where
-g # Number of Gigabytes of memory required for each process (1 line in the swarm command file)
--module snpEff Loads the snpEff module for each subjob in the swarm