regenie: whole genome regression modelling
of large genome-wide association studies.
regenie is a C++ program for whole genome regression modelling
of large genome-wide association studies. It is developed and supported
by a team of scientists at the Regeneron Genetics Center.
regenie employs the BGEN library.
Important Notes
- Module Name: regenie (see the modules page for more information)
- Unusual environment variables set
- REGENIE_HOME installation directory
- REGENIE_BIN executable directory
- REGENIE_SRC source code directory
- REGENIE_DATA sample data and checkpoints directory
Interactive job
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive [user@cn3101 ~]$module load regenie/3.0.3 [+] Loading singularity 3.10.0 on cn3063 [+] Loading regenie 3.0.3The available executables are:
[user@cn3101]$ ls $REGENIE_BIN bgenix cat-bgen edit-bgen regenie zstdIn particular, the command line options of the executable regenie are as follows:
[user@cn3101]$ regenie --help |============================| | REGENIE v3.0.3 | |============================| Copyright (c) 2020-2022 Joelle Mbatchou, Andrey Ziyatdinov and Jonathan Marchini. Distributed under the MIT License. Usage: /regenie/regenie [OPTION...] -h, --help print list of available options --helpFull print list of all available options Main options: --step INT specify if fitting null model (=1) or association testing (=2) --bed PREFIX prefix to PLINK .bed/.bim/.fam files --pgen PREFIX prefix to PLINK2 .pgen/.pvar/.psam files --bgen FILE BGEN file --sample FILE sample file corresponding to BGEN file --ref-first use the first allele as the reference for ...To perform training of the predictor network using this executable, copy sample data to the current folder:
[user@cn3101]$ cp $REGENIE_DATA/* .A sample command to run regenie:
[user@cn3101]$ regenie \ --step 1 \ --bgen example.bgen \ --out my_output \ --bsize 200 \ --phenoFile phenotype_bin.txt Start time: Tue Aug 16 13:24:00 2022 |============================| | REGENIE v3.0.3 | |============================| Copyright (c) 2020-2022 Joelle Mbatchou, Andrey Ziyatdinov and Jonathan Marchini. Distributed under the MIT License. Log of output saved in file : my_output.log Options in effect: --bgen example.bgen \ --out my_output \ --step 1 \ --bsize 200 \ --phenoFile phenotype_bin.txt Fitting null model * bgen : [example.bgen] -summary : bgen file (v1.2 layout, zlib compressed) with 500 named samples and 1000 variants with 8-bit encoding. -index bgi file [example.bgen.bgi] * phenotypes : [phenotype_bin.txt] n_pheno = 2 -keeping and mean-imputing missing observations (done for each trait) -number of phenotyped individuals = 500 * number of individuals used in analysis = 500 -residualizing and scaling phenotypes...done (0ms) * # threads : [55] * block size : [200] * # blocks : [5] for 1000 variants * # CV folds : [5] * ridge data_l0 : [5 : 0.01 0.25 0.5 0.75 0.99 ] * ridge data_l1 : [5 : 0.01 0.25 0.5 0.75 0.99 ] * approximate memory usage : 2MB * setting memory...done Chromosome 1 block [1] : 200 snps (4ms) -residualizing and scaling genotypes...done (3ms) -calc working matrices...done (420ms) -calc level 0 ridge...done (79ms) block [2] : 200 snps (2ms) -residualizing and scaling genotypes...done (1ms) -calc working matrices...done (439ms) -calc level 0 ridge...done (79ms) block [3] : 200 snps (2ms) -residualizing and scaling genotypes...done (1ms) -calc working matrices...done (483ms) -calc level 0 ridge...done (81ms) block [4] : 200 snps (3ms) -residualizing and scaling genotypes...done (1ms) -calc working matrices...done (366ms) -calc level 0 ridge...done (78ms) block [5] : 200 snps (2ms) -residualizing and scaling genotypes...done (1ms) -calc working matrices...done (485ms) -calc level 0 ridge...done (78ms) Level 1 ridge... -on phenotype 1 (Y1)...done (0ms) -on phenotype 2 (Y2)...done (0ms) Output ------ phenotype 1 (Y1) : 0.01 : Rsq = 0.00292408, MSE = 0.995083<- min value 0.25 : Rsq = 0.00619743, MSE = 0.998022 0.5 : Rsq = 0.00679147, MSE = 1.00153 0.75 : Rsq = 0.00753375, MSE = 1.00367 0.99 : Rsq = 0.00733694, MSE = 1.01373 * making predictions...writing LOCO predictions...done (9ms) phenotype 2 (Y2) : 0.01 : Rsq = 0.012437, MSE = 0.98745<- min value 0.25 : Rsq = 0.00739346, MSE = 0.997094 0.5 : Rsq = 0.00612812, MSE = 1.00169 0.75 : Rsq = 0.00621549, MSE = 1.00343 0.99 : Rsq = 0.0082828, MSE = 1.00621 * making predictions...writing LOCO predictions...done (9ms) List of blup files written to: [my_output_pred.list] Elapsed time : 2.66076s End time: Tue Aug 16 13:24:02 2022Another sample command:
[user@cn3101]$ regenie \ --bgen example.bgen \ --step 2 \ --bsize 200 \ --threads 1 \ --covarFile covariates.txt \ --phenoFile phenotype_bin_wNA.txt \ --bt --firth --approx \ --pred my_output_pred.list \ --out my_output_step2.txt Association testing mode with fast multithreading using OpenMP * bgen : [example.bgen] -summary : bgen file (v1.2 layout, zlib compressed) with 500 named samples and 1000 variants with 8-bit encoding. -index bgi file [example.bgen.bgi] * phenotypes : [phenotype_bin_wNA.txt] n_pheno = 2 -number of phenotyped individuals = 500 * covariates : [covariates.txt] n_cov = 3 -number of individuals with covariate data = 500 * number of individuals used in analysis = 500 * case-control counts for each trait: - 'Y1': 111 cases and 339 controls - 'Y2': 115 cases and 385 controls * LOCO predictions : [my_output_pred.list] -file [/vf/users/denisovga/regenie/test/my_output_1.loco] for phenotype 'Y1' -file [/vf/users/denisovga/regenie/test/my_output_2.loco] for phenotype 'Y2' * # threads : [1] * block size : [200] * # blocks : [5] * approximate memory usage : 2MB * using minimum MAC of 5 (variants with lower MAC are ignored) * using fast Firth correction for logistic regression p-values less than 0.05 Chromosome 1 [5 blocks in total] -reading loco predictions for the chromosome...done (0ms) -fitting null logistic regression on binary phenotypes...done (1ms) -fitting null Firth logistic regression on binary phenotypes...done (0ms) block [1/5] : done (10ms) block [2/5] : done (8ms) block [3/5] : done (8ms) block [4/5] : done (7ms) block [5/5] : done (8ms) Association results stored separately for each trait in files : * [my_output_step2.txt_Y1.regenie] * [my_output_step2.txt_Y2.regenie] Number of tests with Firth correction : 108 Number of failed tests : (0/108) Number of ignored tests due to low MAC : 0 Elapsed time : 0.086111s End time: Mon Dec 16 15:21:50 2024End the interactive session:
[user@cn3101 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$