SAIGE is installed as a container with it's own R environment on the Biowulf Cluster, please do not load R module when running SAIGE. If there are conflicts/errors about R, please check the loaded modules with 'module list'.
SAIGE is an R package developed with Rcpp for genome-wide association tests in large-scale data sets and biobanks. The method:
- accounts for sample relatedness based on the generalized mixed models
- allows for model fitting with either full or sparse genetic relationship matrix (GRM)
- works for quantitative and binary traits
- handles case-control imbalance of binary traits
- computationally efficient for large data sets
- performs single-variant association tests
- provides effect size estimation through Firth's Bias-Reduced Logistic Regression
- performs conditional association analysis
SAIGE-GENE (now known as SAIGE-GENE+) are new method extension in the R package for testing rare variant in set-based tests.
- performs BURDEN, SKAT, and SKAT-O tests
- allows for tests on multiple minor allele frequencies cutoffs and functional annotations
- allows for specifying weights for markers in the set-based tests
- performs conditional analysis to identify associations independent from nearly GWAS signals
The package takes genotype file input in the following formats
- PLINK (bed, bim, fam), BGEN, VCF, BCF, SAV
References:
- Zhou W et.al. Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nat Genet. 2018 Sep;50(9):1335-1341. PubMed | Journal
- Zhou W et.al. Scalable generalized linear mixed model for region-based association tests in large biobanks and cohorts Nature genetics, 52(6), pp.634-639. Journal
- Zhou W et.al. SAIGE-GENE+ improves the efficiency and accuracy of set-based rare variant association tests Nature genetics, 54(10), pp.1466-1469. Journal
- Module Name: SAIGE (see the modules page for more information)
step1_fitNULLGLMM.R --help
- Environment variables set
- $SAIGE_TEST_DATA
Allocate an interactive session and run the program.
Sample session (user input in bold):
[user@biowulf]$ sinteractive --cpus-per-task=6 --mem=4G salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144 ~]$ module load SAIGE [user@cn3144 ~]$ cp -r ${SAIGE_TEST_DATA:-none}/extdata . [user@cn3144 ~]$ cd extdata [user@cn3144 ~]$ step1_fitNULLGLMM.R \ --plinkFile=./input/nfam_100_nindep_0_step1_includeMoreRareVariants_poly \ --phenoFile=./input/pheno_1000samples.txt_withdosages_withBothTraitTypes.txt \ --phenoCol=y_binary \ --covarColList=a9 \ --sampleIDColinphenoFile=IID \ --traitType=binary \ --outputPrefix=./output/example_binary_includenonAutoforvarRatio \ --nThreads=4 \ --LOCO=FALSE \ --relatednessCutoff=0.0 \ --FemaleCode=2 \ --MaleCode=1 \ --IsOverwriteVarianceRatioFile=TRUE [user@cn3144 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
Create a batch input file (e.g. SAIGE.sh). For example:
#!/bin/bash #SBATCH --cpus-per-task=6 #SBATCH --mem=4G #SBATCH --time=2:00:00 #SBATCH --partition=norm set -e module load SAIGE step1_fitNULLGLMM.R \ --plinkFile=./input/nfam_100_nindep_0_step1_includeMoreRareVariants_poly \ --phenoFile=./input/pheno_1000samples.txt_withdosages_withBothTraitTypes.txt \ --phenoCol=y_binary \ --covarColList=a9 \ --sampleIDColinphenoFile=IID \ --traitType=binary \ --outputPrefix=./output/example_binary_includenonAutoforvarRatio \ --nThreads=4 \ --LOCO=FALSE \ --relatednessCutoff=0.0 \ --FemaleCode=2 \ --MaleCode=1 \ --IsOverwriteVarianceRatioFile=TRUE
Submit the job:
sbatch SAIGE.sh
Create a swarmfile (e.g. job.swarm). For example:
cd dir1; step1_fitNULLGLMM.R --help cd dir2; step1_fitNULLGLMM.R --help
Submit this job using the swarm command.
swarm -f job.swarm [-g #] --module SAIGEwhere
-g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
--module | Loads the module for each subjob in the swarm |