Biowulf High Performance Computing at the NIH
LDpred on Biowulf

Adjust GWAS summary statistics for the effects of linkage disequilibrium (LD).

References:

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive --mem=10g --gres=lscratch:10
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144]$ module load ldpred
[user@cn3144]$ cd /lscratch/$SLURM_JOB_ID
[user@cn3144]$ LDpred.py coord \
                    --gf=LDpred_data_p0.001_test_0 \
                    --ssf=LDpred_data_p0.001_ss_0.txt --N=10000 \
                    --ssf-format STANDARD \
                    --out=coord.hdf5
=============================== LDpred v. 1.0.6 ================================

Parsing summary statistics file: LDpred_data_p0.001_ss_0.txt
100.00%
SS file loaded, now sorting and storing in HDF5 file.
Coordinating datasets (Summary statistics and LD reference genotypes).
100.00%

========================= Summary of coordination step =========================
Summary statistics filename:                         LDpred_data_p0.001_ss_0.txt
LD reference genotypes filename:                       LDpred_data_p0.001_test_0
Coordinated data output filename:                                     coord.hdf5
------------------------------ Summary statistics ------------------------------
Num SNPs parsed from sum stats file                                        10000
--------------------------------- Coordination ---------------------------------
Num individuals in LD Reference data:                                       2000
SNPs in LD Reference data:                                                 10000
Num chromosomes used:                                                          1
SNPs common across datasets:                                               10000
SNPs retained after filtering:                                             10000
SNPs w MAF<0.010 filtered:                                                     0
SNPs w allele freq discrepancy > 0.100 filtered:                               0
-------------------------------- Running times ---------------------------------
Run time for parsing summary stats:                           0 min and 3.47 sec
Run time for coordinating datasets:                           0 min and 3.07 sec
================================================================================


[user@cn3144]$ LDpred.py gibbs --cf coord.hdf5 --ldr 50 --ldf ldf_file \
                    --out gibbs --N 10000
=============================== LDpred v. 1.0.6 ================================

Calculating LD information w. radius 50
Storing LD information to compressed pickle file
Applying LDpred with LD radius: 50
10000 SNP effects were found
Heritability used for inference: 0.0738
Calculating LDpred-inf weights
Calculating SNP weights for Chromosome 1
Starting LDpred gibbs with f=1.0000
100.00%
Starting LDpred gibbs with f=0.3000
100.00%
Starting LDpred gibbs with f=0.1000
100.00%
Starting LDpred gibbs with f=0.0300
100.00%
Starting LDpred gibbs with f=0.0100
100.00%
Starting LDpred gibbs with f=0.0030
100.00%
Starting LDpred gibbs with f=0.0010
100.00%

=========================== Summary of LDpred Gibbs ============================
Coordinated data filename                                             coord.hdf5
SNP weights output file (prefix)                                           gibbs
LD data filename (prefix)                                               ldf_file
LD radius used                                                                50
-------------------------------- LD information --------------------------------
Average LD score:                                               9.51933114886284
Genome-wide (LDscore) estimated heritability:                             0.0738
Chi-square lambda (inflation statistic).                                  1.7022
Running time for calculating LD information:                 0 min and 1.61 secs
----------------------------- LDpred Gibbs sampler -----------------------------
Gibbs sampler fractions used
                                         [1, 0.3, 0.1, 0.03, 0.01, 0.003, 0.001]
Convergence issues (for each fraction)
                                      ['No', 'No', 'No', 'No', 'No', 'No', 'No']
Running time for Gibbs sampler(s):                          1 min and 12.24 secs
================================================================================

[user@cn3144]$ LDpred.py score --gf=LDpred_data_p0.001_test_0 --rf gibbs --out score
=============================== LDpred v. 1.0.6 ================================

Parsing phenotypes
LDpred_data_p0.001_test_0.fam
Parsed 2000 phenotypes successfully

...
Calculating LDpred risk scores using f=1.000e-03
Parsing PLINK bed file: LDpred_data_p0.001_test_0
100.00%
Variance explained (Pearson R2) by PRS: 0.0887
Writing polygenic scores to file score_LDpred_p1.0000e-03.txt
The highest (unadjusted) Pearson R2 was 0.0887, and provided by LDpred_p1.0000e-03

=============================== Scoring Summary ================================
Validation genotype file (prefix):                     LDpred_data_p0.001_test_0
Input weight file(s) (prefix):                                             gibbs
Output scores file(s) (prefix):                                            score
---------------------------------- Phenotypes ----------------------------------
Phenotype file (plink format):                     LDpred_data_p0.001_test_0.fam
Individuals with phenotype information:                                     2000
Running time for parsing phenotypes:                         0 min and 0.00 secs
----------------------------------- Scoring ------------------------------------
LDpred_inf (unadjusted) Pearson R2:                                       0.0377
Best LDpred (f=1.00e-03) (unadjusted) R2:                                 0.0887
Running time for calculating scores:                        0 min and 24.48 secs
--------------------------- Optimal polygenic score ----------------------------
Method with highest (unadjusted) Pearson R2:                  LDpred_p1.0000e-03
Best (unadjusted) Pearson R2:                                             0.0887
================================================================================


[user@cn3144]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf]$

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. LDpred.sh), which uses the input file 'LDpred.in'. For example:

#!/bin/bash
module load LDpred/1.0.6
LDpred.py score --gf=LDpred_data_p0.001_test_0 --rf gibbs --out score

Submit this job using the Slurm sbatch command.

sbatch [--cpus-per-task=#] [--mem=#] LDpred.sh