Biowulf High Performance Computing at the NIH
EPACTS on Biowulf

EPACTS (Efficient and Parallelizable Association Container Toolbox) is a versatile software pipeline to perform various statistical tests for identifying genome-wide association from sequence data through a user-friendly interface, both to scientific analysts and to method developers.

References:

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load EPACTS

[user@cn3144 ~]$ epacts single --vcf  ${EPACTS_DIR}/data/1000G_exome_chr20_example_softFiltered.calls.vcf.gz   \
       --ped  ${EPACTS_DIR}/data/1000G_dummy_pheno.ped    --min-maf 0.001 --chr 20 --pheno DISEASE \
       --cov AGE --cov SEX --test b.score --anno  --out test --run 2
Detected phenotypes with 2 unique values - 1 and 2 - considering them as binary phenotypes... re-encoding them into 1 and 2
Successfully written phenotypes and 2 covariates across 266 individuals
Processing chromosome 20...
Finished generating EPACTS Makefile
Running 2 parallel jobs of EPACTS
forkExecWait(): make -f /scratch/$USER/test.Makefile -j 2
Rscript /usr/local/apps/EPACTS/3.2.6/share/EPACTS/epactsSingle.R --vanilla /usr/local/apps/EPACTS/3.2.6 /scratch/$USER/test.phe /scratch/$USER/test.cov /scratch/$USER/test.ind /usr/local/apps/EPACTS/3.2.6/share/data/1000G_exome_chr20_example_softFiltered.calls.vcf.gz 20:1-10000000 /scratch/$USER/test.20.1.10000000.epacts GT 0.001 1 3 1000000000 0.5 0 FALSE single.b.score
Rscript /usr/local/apps/EPACTS/3.2.6/share/EPACTS/epactsSingle.R --vanilla /usr/local/apps/EPACTS/3.2.6 /scratch/$USER/test.phe /scratch/$USER/test.cov /scratch/$USER/test.ind /usr/local/apps/EPACTS/3.2.6/share/data/1000G_exome_chr20_example_softFiltered.calls.vcf.gz 20:10000001-20000000 /scratch/$USER/test.20.10000001.20000000.epacts GT 0.001 1 3 1000000000 0.5 0 FALSE single.b.score
Loading required package: epactsR
Loading required package: epactsR
NULL
Sucessfully wrote ( 1248 * 10 ) matrix
NULL
[....]
zcat /scratch/$USER/test.epacts.gz | awk '$9 != "NA" { print $0 }' | sort -g -k 9 | head -n 5000 > /scratch/$USER/test.epacts.top5000
touch /scratch/$USER/test.epacts.OK
[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. EPACTS.sh). For example:

#!/bin/bash
set -e

cd /scratch/$USER
module load EPACTS
epacts single --vcf  ${EPACTS_DIR}/data/1000G_exome_chr20_example_softFiltered.calls.vcf.gz   \
       --ped  ${EPACTS_DIR}/data/1000G_dummy_pheno.ped    --min-maf 0.001 --chr 20 --pheno DISEASE \
       --cov AGE --cov SEX --test b.score --anno  --out test --run 2

Submit this job using the Slurm sbatch command.

sbatch [--mem=#] EPACTS.sh
Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile (e.g. EPACTS.swarm). For example:

EPACTS < EPACTS.in > EPACTS.out
EPACTS < EPACTS.in > EPACTS.out
EPACTS < EPACTS.in > EPACTS.out
EPACTS < EPACTS.in > EPACTS.out

Submit this job using the swarm command.

swarm -f EPACTS.swarm [-g #] [-t #] --module EPACTS
where
-g # Number of Gigabytes of memory required for each process (1 line in the swarm command file)
-t # Number of threads/CPUs required for each process (1 line in the swarm command file).
--module EPACTS Loads the EPACTS module for each subjob in the swarm