SAIGE is installed as a container with it's own R environment on the Biowulf Cluster, please do not load R module when running SAIGE. If there are conflicts/errors about R, please check the loaded modules with 'module list'.
SAIGE is an R package developed with Rcpp for genome-wide association tests in large-scale data sets and biobanks. The method:
SAIGE-GENE (now known as SAIGE-GENE+) are new method extension in the R package for testing rare variant in set-based tests.
The package takes genotype file input in the following formats
step1_fitNULLGLMM.R --help
Allocate an interactive session and run the program.
Sample session (user input in bold):
[user@biowulf]$ sinteractive --cpus-per-task=6 --mem=4G salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144 ~]$ module load SAIGE [user@cn3144 ~]$ cp -r ${SAIGE_TEST_DATA:-none}/extdata . [user@cn3144 ~]$ cd extdata [user@cn3144 ~]$ step1_fitNULLGLMM.R \ --plinkFile=./input/nfam_100_nindep_0_step1_includeMoreRareVariants_poly \ --phenoFile=./input/pheno_1000samples.txt_withdosages_withBothTraitTypes.txt \ --phenoCol=y_binary \ --covarColList=a9 \ --sampleIDColinphenoFile=IID \ --traitType=binary \ --outputPrefix=./output/example_binary_includenonAutoforvarRatio \ --nThreads=4 \ --LOCO=FALSE \ --relatednessCutoff=0.0 \ --FemaleCode=2 \ --MaleCode=1 \ --IsOverwriteVarianceRatioFile=TRUE [user@cn3144 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
Create a batch input file (e.g. SAIGE.sh). For example:
#!/bin/bash #SBATCH --cpus-per-task=6 #SBATCH --mem=4G #SBATCH --time=2:00:00 #SBATCH --partition=norm set -e module load SAIGE step1_fitNULLGLMM.R \ --plinkFile=./input/nfam_100_nindep_0_step1_includeMoreRareVariants_poly \ --phenoFile=./input/pheno_1000samples.txt_withdosages_withBothTraitTypes.txt \ --phenoCol=y_binary \ --covarColList=a9 \ --sampleIDColinphenoFile=IID \ --traitType=binary \ --outputPrefix=./output/example_binary_includenonAutoforvarRatio \ --nThreads=4 \ --LOCO=FALSE \ --relatednessCutoff=0.0 \ --FemaleCode=2 \ --MaleCode=1 \ --IsOverwriteVarianceRatioFile=TRUE
Submit the job:
sbatch SAIGE.sh
Create a swarmfile (e.g. job.swarm). For example:
cd dir1; step1_fitNULLGLMM.R --help cd dir2; step1_fitNULLGLMM.R --help
Submit this job using the swarm command.
swarm -f job.swarm [-g #] --module SAIGEwhere
-g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
--module | Loads the module for each subjob in the swarm |