Biowulf High Performance Computing at the NIH
snpEff on Biowulf

snpEff is a variant annotation and effect prediction tool. It annotates and predicts the effects of variants on genes (such as amino acid changes).

Typical usage :

clinEff is a professional version of the snpEff and SnpSift packages, suitable for production in clincal labs. It's also available for use on Helix and Biowulf, and it's usege mirrors that of snpEff.

References:

Documentation
Important Notes

snpEff is a java application. See https://hpc.nih.gov/development/java.html for information about running java applications.

To see the help menu, type

java -jar $SNPEFF_JAR

at the prompt.

By default, snpEff uses 1gb of memory. For large VCF input files, this may not be enough. To allocate 20gb of memory, use:

java -Xmx20g -jar $SNPEFF_JAR

In more detail,

java -Xmx20g -jar $SNPEFF_JAR -v [ database ] [ vcf file ] 

To see what databases are available, type:

ls $SNPEFF_HOME/data

snpSift

SnpSift is a collection of tools to manipulate VCF (variant call format) files. Here's what you can do:

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load snpEff
[user@cn3144 ~]$ ln -s $SNPEFF_HOME/../protocols .
[user@cn3144 ~]$ java -Xmx12g -jar $SNPEFF_JAR -v -lof -motif -hgvs -nextProt GRCh37.71 protocols/ex1.vcf > ex1.eff.vcf
[user@cn3144 ~]$ cat ex.eff.vcf | java -jar $SNPSIFT_JAR filter "(Cases[0] = 3) & (Controls[0] = 0) & ((EFF[*].IMPACT = 'HIGH') | (EFF[*].IMPACT = 'MODERATE'))"  > ex1.filtered.vcf
[user@cn3144 ~]$ java -jar $SNPSIFT_JAR dbnsfp -v -db /fdb/dbNSFP2/dbNSFP2.9.txt.gz ex1.eff.vcf > file.annotated.vcf

[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. snpEff.sh). For example:

#!/bin/bash
# -- this file is snpEff.sh --

module load snpEff
ln -s $SNPEFF_HOME/example/file.vcf .
java -Xmx${SLURM_MEM_PER_NODE} -jar $SNPEFF_JAR -v hg19 file.vcf > file.eff.vcf
cat file.eff.vcf | java -jar $SNPSIFT_JAR filter "( EFF[*].IMPACT = 'HIGH' )" > file.filtered.vcf
java -jar $SNPSIFT_JAR dbnsfp -v -db /fdb/dbNSFP2/dbNSFP3.2a.txt.gz file.eff.vcf > file.annotated.vcf

Submit this job using the Slurm sbatch command.

sbatch [--cpus-per-task=#] [--mem=#] snpEff.sh
Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile (e.g. snpEff.swarm). For example:

java -Xmx${SLURM_MEM_PER_NODE} -jar $SNPEFF_JAR -t -v hg19 file1.vcf > file1.eff.vcf
java -Xmx${SLURM_MEM_PER_NODE} -jar $SNPEFF_JAR -t -v hg19 file2.vcf > file2.eff.vcf
java -Xmx${SLURM_MEM_PER_NODE} -jar $SNPEFF_JAR -t -v hg19 file3.vcf > file3.eff.vcf
java -Xmx${SLURM_MEM_PER_NODE} -jar $SNPEFF_JAR -t -v hg19 file4.vcf > file4.eff.vcf

Submit this job using the swarm command.

swarm -f snpEff.swarm [-g #] [-t #] --module snpEff
where
-g # Number of Gigabytes of memory required for each process (1 line in the swarm command file)
-t # Number of threads/CPUs required for each process (1 line in the swarm command file).
--module snpEff Loads the snpEff module for each subjob in the swarm