snpEff is a variant annotation and effect prediction tool. It annotates and predicts the effects of variants on genes (such as amino acid changes).
Typical usage :
snpEff is a java application. See https://hpc.nih.gov/development/java.html for information about running java applications.
To see the help menu, type
java -jar $SNPEFF_JAR
at the prompt.
By default, snpEff uses 1gb of memory. For large VCF input files, this may not be enough. To allocate 20gb of memory, use:
java -Xmx20g -jar $SNPEFF_JAR
In more detail,
java -Xmx20g -jar $SNPEFF_JAR -v [ database ] [ vcf file ]
To see what databases are available, type:
ls $SNPEFF_HOME/data
SnpSift is a collection of tools to manipulate VCF (variant call format) files. Here's what you can do:
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144 ~]$ module load snpEff [user@cn3144 ~]$ ln -s $SNPEFF_HOME/../protocols . [user@cn3144 ~]$ java -Xmx12g -jar $SNPEFF_JAR -v -lof -motif -hgvs -nextProt GRCh37.71 protocols/ex1.vcf > ex1.eff.vcf [user@cn3144 ~]$ cat ex.eff.vcf | java -jar $SNPSIFT_JAR filter "(Cases[0] = 3) & (Controls[0] = 0) & ((EFF[*].IMPACT = 'HIGH') | (EFF[*].IMPACT = 'MODERATE'))" > ex1.filtered.vcf [user@cn3144 ~]$ java -jar $SNPSIFT_JAR dbnsfp -v -db /fdb/dbNSFP2/dbNSFP2.9.txt.gz ex1.eff.vcf > file.annotated.vcf [user@cn3144 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
Create a batch input file (e.g. snpEff.sh). For example:
#!/bin/bash # -- this file is snpEff.sh -- module load snpEff ln -s $SNPEFF_HOME/example/file.vcf . java -Xmx${SLURM_MEM_PER_NODE}m -jar $SNPEFF_JAR -v hg19 file.vcf > file.eff.vcf cat file.eff.vcf | java -jar $SNPSIFT_JAR filter "( EFF[*].IMPACT = 'HIGH' )" > file.filtered.vcf java -jar $SNPSIFT_JAR dbnsfp -v -db /fdb/dbNSFP2/dbNSFP3.2a.txt.gz file.eff.vcf > file.annotated.vcf
Submit this job using the Slurm sbatch command.
sbatch [--cpus-per-task=#] [--mem=#] snpEff.sh
Create a swarmfile (e.g. snpEff.swarm). For example:
java -Xmx${SLURM_MEM_PER_NODE}m -jar $SNPEFF_JAR -v hg19 file1.vcf > file1.eff.vcf java -Xmx${SLURM_MEM_PER_NODE}m -jar $SNPEFF_JAR -v hg19 file2.vcf > file2.eff.vcf java -Xmx${SLURM_MEM_PER_NODE}m -jar $SNPEFF_JAR -v hg19 file3.vcf > file3.eff.vcf java -Xmx${SLURM_MEM_PER_NODE}m -jar $SNPEFF_JAR -v hg19 file4.vcf > file4.eff.vcf
Submit this job using the swarm command.
swarm -f snpEff.swarm [-g #] --module snpEffwhere
-g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
--module snpEff | Loads the snpEff module for each subjob in the swarm |