High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
Jannovar on Biowulf & Helix

Jannovar: A Java library for Exome Annotation
Transcript-based annotation and pedigree analysis are two basic steps in the computational analysis of whole-exome sequencing experiments in disease-gene discovery projects or diagnostics. Jannovar is a stand-alone Java application as well as a Java library designed to be used in larger software frameworks for exome analysis. Jannovar uses an interval tree to identify all transcripts affected by a given variant, and provides HGVS-compliant annotations for both for variants affecting coding sequences and splice junctions as well as UTR sequences and non-coding RNA transcripts. Jannovar can also perform family-based pedigree analysis with VCF files with data from members of a family segregating a Mendelian disorder. Using a desktop computer, Jannovar requires a few seconds to annotate a typical VCF file with exome data.

Examples can be copied from:

$ cp -r /usr/local/apps/jannovar/data /data/$USER/dir
$ cp -r /usr/local/apps/jannovar/examples /data/$USER/dir

Running on Helix
$ module load jannovar
$ cd /data/$USER/dir
$ java -Xmx4g -jar $JANNOVARPATH/jannovar-cli-0.12.jar annotate data/hg19_ucsc.ser examples/small.vcf

Running a single batch job on Biowulf

1. Create a script file similar to the lines below.

#!/bin/bash


module load jannovar
cd /data/$USER/dir
java -Xmx4g -jar $JANNOVARPATH/jannovar-cli-0.12.jar annotate data/hg19_ucsc.ser examples/small.vcf

2. Submit the script on biowulf:

$ sbatch jobscript

For more memory requirement (default 4gb), use --mem flag
and at the mean time, change -Xmx4g in your script to corresponding number (-Xmx4g to -Xmx10g in this example):
$ sbatch --mem=10g jobscript

Running a swarm of jobs on Biowulf

Setup a swarm command file:

  cd /data/$USER/dir1; java -Xmx4g -jar $JANNOVARPATH/jannovar-cli-0.12.jar annotate data/hg19_ucsc.ser examples/small.vcf
  cd /data/$USER/dir2; java -Xmx4g -jar $JANNOVARPATH/jannovar-cli-0.12.jar annotate data/hg19_ucsc.ser examples/small.vcf
  cd /data/$USER/dir3; java -Xmx4g -jar $JANNOVARPATH/jannovar-cli-0.12.jar annotate data/hg19_ucsc.ser examples/small.vcf
	[......]
  

Submit the swarm file:

  $ swarm -f swarmfile -g 4 --module jannovar

-f: specify the swarmfile name
--module: load the required module for each command line in the file

For more memory requirement (default 1.5gb each line of commands), use -g flag
and at the mean time, change -Xmx4g in your script to corresponding number (-Xmx4g to -Xmx10g in this example): :

  $ swarm -f swarmfile -g 10 --module jannovar

For more information regarding running swarm, see swarm.html

Running an interactive job on Biowulf

It may be useful for debugging purposes to run jobs interactively. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.

biowulf$ sinteractive
salloc.exe: Granted job allocation 16535

cn999$ module load jannovar
cn999$ cd /data/$USER/dir
cn999$ java -Xmx4g -jar $JANNOVARPATH/jannovar-cli-0.12.jar annotate data/hg19_ucsc.ser examples/small.vcf
[...etc...]

cn999$ exit
exit

biowulf$

Make sure to exit the job once finished.

For more memory requirement (default 4gb), use --mem flag
and at the mean time, change -Xmx4g in your command to corresponding number (-Xmx4g to -Xmx10g in this example):

biowulf$ sinteractive --mem=10g

Documentation

http://charite.github.io/jannovar/