TensorQTL: a GPU-enabled, ultrafast QTL mapper
TensoorQTL leverages general-purpose libraries and graphics processing units (GPUs) to achieve high efficiency of computations at low costR. Using PyTorch or TensorFlow it allows > 200-fold decreases in runtime and ~ 5–10-fold reductions in cost when running on GPUs relative to CPUs.
References:
- Amaro Taylor-Weiner, François Aguet, Nicholas J. Haradhvala, Sager Gosai, Shankara Anand, Jaegil Kim, Kristin Ardlie, Eliezer M. Van Allen and Gad Getz
Scaling computational genomics to millions of individuals with GPUs.
Genome Biology (2019) 20:228,
Documentation
Important Notes
- Module Name: TensorQTL (see the modules page for more information)
- Singlethreaded
- Unusual environment variables set
- TQTL_HOME installation directory
- TQTL_BIN executable directory
- TQTL_DATA test data directory
- Example files in $TQTL_EXAMPLE
Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive --mem=24g --gres=gpu:p100:1,lscratch:50 -c8 [user@cn4199 ~]$ module load TensorQTL [+] Loading singularity 3.8.5-1 [+] Loading cuDNN/7.6.5/CUDA-10.2 libraries... [+] Loading CUDA Toolkit 10.2.89 ... [+] Loading TensorQTL 1.0.7 ...Usage:
[user@cn4199 ~]$ tensorqtl -h usage: tensorqtl [-h] [--mode {cis,cis_nominal,cis_independent,cis_susie,trans,trans_susie}] [--covariates COVARIATES] [--paired_covariate PAIRED_COVARIATE] [--permutations PERMUTATIONS] [--interaction INTERACTION] [--cis_output CIS_OUTPUT] [--phenotype_groups PHENOTYPE_GROUPS] [--window WINDOW] [--pval_threshold PVAL_THRESHOLD] [--maf_threshold MAF_THRESHOLD] [--maf_threshold_interaction MAF_THRESHOLD_INTERACTION] [--dosages] [--return_dense] [--return_r2] [--best_only] [--output_text] [--batch_size BATCH_SIZE] [--chunk_size CHUNK_SIZE] [--susie_loci SUSIE_LOCI] [--disable_beta_approx] [--warn_monomorphic] [--max_effects MAX_EFFECTS] [--fdr FDR] [--qvalue_lambda QVALUE_LAMBDA] [--seed SEED] [-o OUTPUT_DIR] genotype_path phenotypes prefix tensorQTL: GPU-based QTL mapper positional arguments: genotype_path Genotypes in PLINK format phenotypes Phenotypes in BED format (.bed, .bed.gz, .bed.parquet), or optionally for 'trans' mode, parquet or tab-delimited. prefix Prefix for output file names options: -h, --help show this help message and exit --mode {cis,cis_nominal,cis_independent,cis_susie,trans,trans_susie} Mapping mode. Default: cis --covariates COVARIATES Covariates file, tab-delimited, covariates x samples --paired_covariate PAIRED_COVARIATE Single phenotype-specific covariate. Tab-delimited file, phenotypes x samples --permutations PERMUTATIONS Number of permutations. Default: 10000 --interaction INTERACTION Interaction term(s) --cis_output CIS_OUTPUT Output from 'cis' mode with q-values. Required for independent cis-QTL mapping. --phenotype_groups PHENOTYPE_GROUPS Phenotype groups. Header-less TSV with two columns: phenotype_id, group_id --window WINDOW Cis-window size, in bases. Default: 1000000. --pval_threshold PVAL_THRESHOLD Output only significant phenotype-variant pairs with a p-value below threshold. Default: 1e-5 for trans-QTL --maf_threshold MAF_THRESHOLD Include only genotypes with minor allele frequency >= maf_threshold. Default: 0 --maf_threshold_interaction MAF_THRESHOLD_INTERACTION MAF threshold for interactions, applied to lower and upper half of samples --dosages Load dosages instead of genotypes (only applies to PLINK2 bgen input). --return_dense Return dense output for trans-QTL. --return_r2 Return r2 (only for sparse trans-QTL output) --best_only Only write lead association for each phenotype (interaction mode only) --output_text Write output in txt.gz format instead of parquet (trans-QTL mode only) --batch_size BATCH_SIZE GPU batch size (trans-QTLs only). Reduce this if encountering OOM errors. --chunk_size CHUNK_SIZE For cis-QTL mapping, load genotypes into CPU memory in chunks of chunk_size variants, or by chromosome if chunk_size is 'chr'. --susie_loci SUSIE_LOCI Table (parquet or tsv) with loci to fine-map (phenotype_id, chr, pos) with mode 'trans_susie'. --disable_beta_approx Disable Beta-distribution approximation of empirical p-values (not recommended). --warn_monomorphic Warn if monomorphic variants are found. --max_effects MAX_EFFECTS Maximum number of non-zero effects in the SuSiE regression model. --fdr FDR FDR for cis-QTLs --qvalue_lambda QVALUE_LAMBDA lambda parameter for pi0est in qvalue. --seed SEED Seed for permutations. -o OUTPUT_DIR, --output_dir OUTPUT_DIR Output directoryRunning the test example:
[user@cn4199 ~]$ git clone https://github.com/broadinstitute/tensorqtl [user@cn4199 ~]$ cd tensorqtl/example [user@cn4199 ~]$ module load jupyter [user@cn4199 ~]$ jupyter nbconvert --to script tensorqtl_examples.ipynb [user@cn4199 ~]$ python-tqtl tensorqtl_examples.py & [user@cn4199 ~]$ nvidia-smi +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA A100-SXM4-80GB On | 00000000:46:00.0 Off | 0 | | N/A 36C P0 133W / 400W | 2492MiB / 81920MiB | 25% Default | | | | Disabled | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 1733667 C /usr/bin/python3 2478MiB | +---------------------------------------------------------------------------------------+ torch: 2.1.2+cu121 (CUDA 12.1), device: cuda pandas 2.1.4 cis-QTL mapping: nominal associations for all variant-phenotype pairs * 445 samples * 301 phenotypes * 26 covariates * 367759 variants * cis-window: ±1,000,000 * checking phenotypes: 301/301 * Computing associations Mapping chromosome chr18 processing phenotype 301/301 time elapsed: 0.02 min * writing output done. cis-QTL mapping: empirical p-values for phenotypes * 445 samples * 301 phenotypes * 26 covariates * 367759 variants * cis-window: ±1,000,000 * using seed 123456 * checking phenotypes: 301/301 * computing permutations processing phenotype 301/301 Time elapsed: 0.19 min done. Computing q-values * Number of phenotypes tested: 301 * Correlation between Beta-approximated and empirical p-values: 1.0000 * Calculating q-values with lambda = 0.850 * Proportion of significant phenotypes (1-pi0): 0.76 * QTL phenotypes @ FDR 0.05: 205 * min p-value threshold @ FDR 0.05: 0.135284 trans-QTL mapping * 445 samples * 19836 phenotypes * 26 covariates * 367759 variants processing batch 37/37 elapsed time: 0.01 min * 210838 variants passed MAF >= 0.05 filtering done. [user@cn4199 ~]$ exit salloc.exe: Relinquishing job allocation 59748321 [user@biowulf ~]$