TensorQTL: a GPU-enabled, ultrafast QTL mapper

TensoorQTL leverages general-purpose libraries and graphics processing units (GPUs) to achieve high efficiency of computations at low costR. Using PyTorch or TensorFlow it allows > 200-fold decreases in runtime and ~ 5–10-fold reductions in cost when running on GPUs relative to CPUs.

References:

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive  --mem=24g --gres=gpu:p100:1,lscratch:50 -c8
[user@cn4199 ~]$ module load TensorQTL
[+] Loading singularity  3.8.5-1 
[+] Loading cuDNN/7.6.5/CUDA-10.2 libraries...
[+] Loading CUDA Toolkit  10.2.89  ...
[+] Loading TensorQTL 1.0.7  ...
Usage:
[user@cn4199 ~]$ tensorqtl -h
usage: tensorqtl [-h] [--mode {cis,cis_nominal,cis_independent,cis_susie,trans,trans_susie}] [--covariates COVARIATES]
                 [--paired_covariate PAIRED_COVARIATE] [--permutations PERMUTATIONS] [--interaction INTERACTION]
                 [--cis_output CIS_OUTPUT] [--phenotype_groups PHENOTYPE_GROUPS] [--window WINDOW]
                 [--pval_threshold PVAL_THRESHOLD] [--maf_threshold MAF_THRESHOLD]
                 [--maf_threshold_interaction MAF_THRESHOLD_INTERACTION] [--dosages] [--return_dense] [--return_r2]
                 [--best_only] [--output_text] [--batch_size BATCH_SIZE] [--chunk_size CHUNK_SIZE]
                 [--susie_loci SUSIE_LOCI] [--disable_beta_approx] [--warn_monomorphic] [--max_effects MAX_EFFECTS]
                 [--fdr FDR] [--qvalue_lambda QVALUE_LAMBDA] [--seed SEED] [-o OUTPUT_DIR]
                 genotype_path phenotypes prefix

tensorQTL: GPU-based QTL mapper

positional arguments:
  genotype_path         Genotypes in PLINK format
  phenotypes            Phenotypes in BED format (.bed, .bed.gz, .bed.parquet), or optionally for 'trans' mode, parquet or
                        tab-delimited.
  prefix                Prefix for output file names

options:
  -h, --help            show this help message and exit
  --mode {cis,cis_nominal,cis_independent,cis_susie,trans,trans_susie}
                        Mapping mode. Default: cis
  --covariates COVARIATES
                        Covariates file, tab-delimited, covariates x samples
  --paired_covariate PAIRED_COVARIATE
                        Single phenotype-specific covariate. Tab-delimited file, phenotypes x samples
  --permutations PERMUTATIONS
                        Number of permutations. Default: 10000
  --interaction INTERACTION
                        Interaction term(s)
  --cis_output CIS_OUTPUT
                        Output from 'cis' mode with q-values. Required for independent cis-QTL mapping.
  --phenotype_groups PHENOTYPE_GROUPS
                        Phenotype groups. Header-less TSV with two columns: phenotype_id, group_id
  --window WINDOW       Cis-window size, in bases. Default: 1000000.
  --pval_threshold PVAL_THRESHOLD
                        Output only significant phenotype-variant pairs with a p-value below threshold. Default: 1e-5 for
                        trans-QTL
  --maf_threshold MAF_THRESHOLD
                        Include only genotypes with minor allele frequency >= maf_threshold. Default: 0
  --maf_threshold_interaction MAF_THRESHOLD_INTERACTION
                        MAF threshold for interactions, applied to lower and upper half of samples
  --dosages             Load dosages instead of genotypes (only applies to PLINK2 bgen input).
  --return_dense        Return dense output for trans-QTL.
  --return_r2           Return r2 (only for sparse trans-QTL output)
  --best_only           Only write lead association for each phenotype (interaction mode only)
  --output_text         Write output in txt.gz format instead of parquet (trans-QTL mode only)
  --batch_size BATCH_SIZE
                        GPU batch size (trans-QTLs only). Reduce this if encountering OOM errors.
  --chunk_size CHUNK_SIZE
                        For cis-QTL mapping, load genotypes into CPU memory in chunks of chunk_size variants, or by
                        chromosome if chunk_size is 'chr'.
  --susie_loci SUSIE_LOCI
                        Table (parquet or tsv) with loci to fine-map (phenotype_id, chr, pos) with mode 'trans_susie'.
  --disable_beta_approx
                        Disable Beta-distribution approximation of empirical p-values (not recommended).
  --warn_monomorphic    Warn if monomorphic variants are found.
  --max_effects MAX_EFFECTS
                        Maximum number of non-zero effects in the SuSiE regression model.
  --fdr FDR             FDR for cis-QTLs
  --qvalue_lambda QVALUE_LAMBDA
                        lambda parameter for pi0est in qvalue.
  --seed SEED           Seed for permutations.
  -o OUTPUT_DIR, --output_dir OUTPUT_DIR
                        Output directory
Running the test example:
[user@cn4199 ~]$ git clone https://github.com/broadinstitute/tensorqtl
[user@cn4199 ~]$ cd tensorqtl/example
[user@cn4199 ~]$ module load jupyter
[user@cn4199 ~]$ jupyter nbconvert --to script tensorqtl_examples.ipynb
[user@cn4199 ~]$ python tensorqtl_examples.py &
[user@cn4199 ~]$ nvidia-smi
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100-SXM4-80GB          On  | 00000000:46:00.0 Off |                    0 |
| N/A   36C    P0             133W / 400W |   2492MiB / 81920MiB |     25%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A   1733667      C   /usr/bin/python3                           2478MiB |
+---------------------------------------------------------------------------------------+

torch: 2.1.2+cu121 (CUDA 12.1), device: cuda
pandas 2.1.4
cis-QTL mapping: nominal associations for all variant-phenotype pairs
  * 445 samples
  * 301 phenotypes
  * 26 covariates
  * 367759 variants
  * cis-window: ±1,000,000
  * checking phenotypes: 301/301
  * Computing associations
    Mapping chromosome chr18
    processing phenotype 301/301
    time elapsed: 0.02 min
    * writing output
done.
cis-QTL mapping: empirical p-values for phenotypes
  * 445 samples
  * 301 phenotypes
  * 26 covariates
  * 367759 variants
  * cis-window: ±1,000,000
  * using seed 123456
  * checking phenotypes: 301/301
  * computing permutations
    processing phenotype 301/301
  Time elapsed: 0.19 min
done.
Computing q-values
  * Number of phenotypes tested: 301
  * Correlation between Beta-approximated and empirical p-values: 1.0000
  * Calculating q-values with lambda = 0.850
  * Proportion of significant phenotypes (1-pi0): 0.76
  * QTL phenotypes @ FDR 0.05: 205
  * min p-value threshold @ FDR 0.05: 0.135284
trans-QTL mapping
  * 445 samples
  * 19836 phenotypes
  * 26 covariates
  * 367759 variants
    processing batch 37/37
    elapsed time: 0.01 min
  * 210838 variants passed MAF >= 0.05 filtering
done.
[user@cn4199 ~]$ exit
salloc.exe: Relinquishing job allocation 59748321
[user@biowulf ~]$