High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
GEMMA on Biowulf & Helix
GEMMA is the software implementing the Genome-wide Efficient Mixed Model Association algorithm for a standard linear mixed model and some of its close relatives for genome-wide association studies (GWAS): It fits a univariate linear mixed model (LMM) for marker association tests with a single phenotype to account for population stratification and sample structure, and for estimating the proportion of variance in phenotypes explained (PVE) by typed genotypes (i.e. "chip heritability"). It fits a multivariate linear mixed model (mvLMM) for testing marker associations with multiple phenotypes simultaneously while controlling for population stratification, and for estimating genetic correlations among complex phenotypes. It fits a Bayesian sparse linear mixed model (BSLMM) using Markov chain Monte Carlo (MCMC) for estimating PVE by typed genotypes, predicting phenotypes, and identifying associated markers by jointly modeling all markers while controlling for population structure. It estimates variance component/chip heritability, and partitions it by different SNP functional categories. In particular, it uses HE regression or REML AI algorithm to estimate variance components when individual-level data are available. It uses MQS to estimate variance components when only summary statisics are available.

GEMMA was developed in the Zhou Lab at U. Michigan. GEMMA website.

Batch job on Biowulf

Set up a batch script along the following lines:

# this file is called gemma.sh

module load GEMMA
#generate a relatedness matrix
gemma -bfile [prefix] -gk [num] -o [prefix]

#generate the S matrix:
gemma -bfile [prefix] -gs -o [prefix]

Submit this job with

sbatch gemma.sh

Online documentation is available by typing 'gemma -h'. e.g.

susanc@biowulf docs]$ gemma -h 1
 to generate a relatedness matrix:
         ./gemma -bfile [prefix] -gk [num] -o [prefix]
         ./gemma -g [filename] -p [filename] -gk [num] -o [prefix]
 to generate the S matrix:
         ./gemma -bfile [prefix] -gs -o [prefix]
         ./gemma -p [filename] -g [filename] -gs -o [prefix]
         ./gemma -bfile [prefix] -cat [filename] -gs -o [prefix]
         ./gemma -p [filename] -g [filename] -cat [filename] -gs -o [prefix]
         ./gemma -bfile [prefix] -sample [num] -gs -o [prefix]
         ./gemma -p [filename] -g [filename] -sample [num] -gs -o [prefix]
 to generate the q vector:
         ./gemma -beta [filename] -gq -o [prefix]
         ./gemma -beta [filename] -cat [filename] -gq -o [prefix]
 to generate the ldsc weigthts:
         ./gemma -beta [filename] -gw -o [prefix]
         ./gemma -beta [filename] -cat [filename] -gw -o [prefix]
 to perform eigen decomposition of the relatedness matrix:
         ./gemma -bfile [prefix] -k [filename] -eigen -o [prefix]
         ./gemma -g [filename] -p [filename] -k [filename] -eigen -o [prefix]
 to estimate variance components:
         ./gemma -bfile [prefix] -k [filename] -vc [num] -o [prefix]
         ./gemma -p [filename] -k [filename] -vc [num] -o [prefix]
         ./gemma -bfile [prefix] -mk [filename] -vc [num] -o [prefix]
         ./gemma -p [filename] -mk [filename] -vc [num] -o [prefix]
         ./gemma -beta [filename] -cor [filename] -vc [num] -o [prefix]
         ./gemma -beta [filename] -cor [filename] -cat [filename] -vc [num] -o [prefix]
         options for the above two commands: -crt -windowbp [num]
         ./gemma -mq [filename] -ms [filename] -mv [filename] -vc [num] -o [prefix]
         or with summary statistics, replace bfile with mbfile, or g or mg; vc=1 for HE weights and vc=2 for LDSC weights
         ./gemma -beta [filename] -bfile [filename] -cat [filename] -wsnp [filename] -wcat [filename] -vc [num] -o [prefix]
         ./gemma -beta [filename] -bfile [filename] -cat [filename] -wsnp [filename] -wcat [filename] -ci [num] -o [prefix]
 to fit a linear mixed model:
         ./gemma -bfile [prefix] -k [filename] -lmm [num] -o [prefix]
         ./gemma -g [filename] -p [filename] -a [filename] -k [filename] -lmm [num] -o [prefix]
 to fit a linear mixed model to test g by e effects:
         ./gemma -bfile [prefix] -gxe [filename] -k [filename] -lmm [num] -o [prefix]
         ./gemma -g [filename] -p [filename] -a [filename] -gxe [filename] -k [filename] -lmm [num] -o [prefix]
 to fit a univariate linear mixed model with different residual weights for different individuals:
         ./gemma -bfile [prefix] -weight [filename] -k [filename] -lmm [num] -o [prefix]
         ./gemma -g [filename] -p [filename] -a [filename] -weight [filename] -k [filename] -lmm [num] -o [prefix]
 to fit a multivariate linear mixed model:
         ./gemma -bfile [prefix] -k [filename] -lmm [num] -n [num1] [num2] -o [prefix]
         ./gemma -g [filename] -p [filename] -a [filename] -k [filename] -lmm [num] -n [num1] [num2] -o [prefix]
 to fit a Bayesian sparse linear mixed model:
         ./gemma -bfile [prefix] -bslmm [num] -o [prefix]
         ./gemma -g [filename] -p [filename] -a [filename] -bslmm [num] -o [prefix]
 to obtain predicted values:
         ./gemma -bfile [prefix] -epm [filename] -emu [filename] -ebv [filename] -k [filename] -predict [num] -o [prefix]
         ./gemma -g [filename] -p [filename] -epm [filename] -emu [filename] -ebv [filename] -k [filename] -predict [num] -o [prefix]
 to calculate correlations between SNPs:
         ./gemma -bfile [prefix] -calccor -o [prefix]
         ./gemma -g [filename] -p [filename] -calccor -o [prefix]