High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
Gemini on Biowulf & Helix

GEMINI (GEnome MINIng) is designed to be a flexible framework for exploring genetic variation in the context of the wealth of genome annotations available for the human genome. By placing genetic variants, sample genotypes, and useful genome annotations into an integrated database framework, GEMINI provides a simple, flexible, yet very powerful system for exploring genetic variation for for disease and population genetics.

Gemini was developed in the Quinlan lab at U. Virginia. Gemini website.
Gemini paper: Paila U, Chapman BA, Kirchner R, Quinlan AR (2013) GEMINI: Integrative Exploration of Genetic Variation and Genome Annotations. PLoS Comput Biol 9(7): e1003153. doi:10.1371/journal.pcbi.1003153

Gemini is installed in /usr/local/apps/gemini. The associated data is in /fdb/gemini. The easiest way to add the GEmini executables to your path is by using 'module load gemini'.

Running Gemini on Helix

Sample session running the test scripts provided with Gemini. Note: you may see warnings from python labelled ImportWarning; these can be ignored.

helix% mkdir /data/$USER/gemini
helix% cd /data/$USER/gemini
helix% cp -r /fdb/gemini/gemini/test .
helix% cp /fdb/gemini/gemini/master-test.sh .
helix% module load gemini
helix% bash master-test.sh
Bgzipping test.query.vcf into test.query.vcf.gz.
Indexing test.query.vcf.gz with grabix.
Loading 879 variants.
Breaking test.query.vcf.gz into 2 chunks.
Loading chunk 0.
Loading chunk 1.
Done loading 879 variants in 1 chunks.

Batch job on Biowulf

The following sample batch script copies and runs the test scripts provided with Gemini.

#SBATCH --job-name="Gemini"
# this file is called gemini.bat

cd /data/$USER/gemini
cp -r /fdb/gemini/gemini/test .
cp /fdb/gemini/gemini/master-test.sh .
bash master-test.sh

Submit this job with:

sbatch --mem=5g gemini.bat


Gemini Wiki.