Biowulf High Performance Computing at the NIH
Pascal on Biowulf

Pascal (Pathway scoring algorithm) is an easy-to-use tool for gene scoring and pathway analysis from GWAS results. Pascal uses external data to estimate linkage disequilibrium. Therefore, the user only needs to supply genome wide SNP p-values. Pascal then derives p-values for genes and predefined pathways. Pascal doesn’t use Monte-Carlo simulation to derive gene p-values. This leads to increased speed and accuracy. This speed in the gene scoring is then leveraged to control the false positive rate in pathway scoring. For pathway scoring we implemented and tested enrichment strategies that compared very favorably compared to hypergeometric enrichment. This comparison was done on a large collection of GWAS results giving us confidence to recommend Pascal for downstream analysis of GWAS results. Pascal is mainly written in Java and has been tested on Unix systems and Mac OsX.


Important Notes

Pascal requires a file containing internal settings be present in the working directory ($PASCAL_HOME/settings.txt), as well as a directory containing reference files ($PASCAL_HOME/resources). These can be symlinked prior to running (see examples below). Alternatively, users can copy the original files and maintain their own versions.

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program.
Sample session (user input in bold):

In this example, the default settings.txt and resources directory are symlinked into the working directory prior to running:

[user@biowulf]$ sinteractive
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$

[user@cn3144 ~]$ module load Pascal
[user@cn3144 ~]$ ln -s $PASCAL_HOME/settings.txt .
[user@cn3144 ~]$ ln -s $PASCAL_HOME/resources .
[user@cn3144 ~]$ Pascal --pval=resources/gwas/EUR.CARDIoGRAM_2010_lipids.HDL_ONE.txt --chr=22

A directory output will be created (if it does not already exist), containing the results files:

[user@cn3144 ~]$ ls output
[user@cn3144 ~]$ EUR.CARDIoGRAM_2010_lipids.HDL_ONE.sum.genescores.chr22.txt   settingsOut.txt
[user@cn3144 ~]$ EUR.CARDIoGRAM_2010_lipids.HDL_ONE.sum.numSnpError.chr22.txt

The default settings.txt file can be copied and edited to allow alternatives:

[user@cn3144 ~]$ cp $PASCAL_HOME/settings.txt my_settings.txt
[user@cn3144 ~]$ pico my_settings.txt

Then run Pascal, using the new settings file:

[user@cn3144 ~]$ Pascal --set=my_settings.txt ...

And finally, end the interactive session:

[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$
Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. For example:

module load Pascal
ln -s $PASCAL_HOME/settings.txt .
ln -s $PASCAL_HOME/resources .
Pascal --pval=resources/gwas/EUR.CARDIoGRAM_2010_lipids.HDL_ONE.txt --chr=22

Submit this job using the Slurm sbatch command. Pascal works well with lots of memory, so be sure to allocate at least 8g:

sbatch --cpus-per-task=1 --mem=32g