High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
EPACTS on Biowulf & Helix
EPACTS (Efficient and Parallelizable Association Container Toolbox) is a versatile software pipeline to perform various statistical tests for identifying genome-wide association from sequence data through a user-friendly interface, both to scientific analysts and to method developers.

EPACTS was developed in the Abecasis lab at the University of Michigan. [EPACTS webpage]

On Helix

The following example uses the sample data that is provided with EPACTS. Sample session (user input in bold):

[$USER@helix ~]$ module load EPACTS
[+] Loading gdal 2.0 ...
[+] Loading proj 4.9.2 ...
[+] Loading gcc 4.9.1 ...
[+] Loading openmpi 1.10.0 for GCC 4.9.1
[+] Loading tcl_tk 8.6.3
[+] Loading pandoc 1.15.0.6 ...
[+] Loading R 3.2.3 on helix.nih.gov
[+] Loading EPACTS 3.2.6 ...

[$USER@helix ~]$ epacts single --vcf  ${EPACTS_DIR}/data/1000G_exome_chr20_example_softFiltered.calls.vcf.gz   \
	--ped  ${EPACTS_DIR}/data/1000G_dummy_pheno.ped    --min-maf 0.001 --chr 20 --pheno DISEASE \
	--cov AGE --cov SEX --test b.score --anno  --out test --run 2
Detected phenotypes with 2 unique values - 1 and 2 - considering them as binary phenotypes... re-encoding them into 1 and 2
Successfully written phenotypes and 2 covariates across 266 individuals
Processing chromosome 20...
Finished generating EPACTS Makefile
Running 2 parallel jobs of EPACTS
forkExecWait(): make -f /scratch/$USER/test.Makefile -j 2
Rscript /usr/local/apps/EPACTS/3.2.6/share/EPACTS/epactsSingle.R --vanilla /usr/local/apps/EPACTS/3.2.6 /scratch/$USER/test.phe /scratch/$USER/test.cov /scratch/$USER/test.ind /usr/local/apps/EPACTS/3.2.6/share/data/1000G_exome_chr20_example_softFiltered.calls.vcf.gz 20:1-10000000 /scratch/$USER/test.20.1.10000000.epacts GT 0.001 1 3 1000000000 0.5 0 FALSE single.b.score
Rscript /usr/local/apps/EPACTS/3.2.6/share/EPACTS/epactsSingle.R --vanilla /usr/local/apps/EPACTS/3.2.6 /scratch/$USER/test.phe /scratch/$USER/test.cov /scratch/$USER/test.ind /usr/local/apps/EPACTS/3.2.6/share/data/1000G_exome_chr20_example_softFiltered.calls.vcf.gz 20:10000001-20000000 /scratch/$USER/test.20.10000001.20000000.epacts GT 0.001 1 3 1000000000 0.5 0 FALSE single.b.score
Loading required package: epactsR
Loading required package: epactsR
NULL
Sucessfully wrote ( 1248 * 10 ) matrix
NULL
[....]
zcat /scratch/$USER/test.epacts.gz | awk '$9 != "NA" { print $0 }' | sort -g -k 9 | head -n 5000 > /scratch/$USER/test.epacts.top5000
touch /scratch/$USER/test.epacts.OK

[$USER@helix ~]$ more test.epacts.top5000
#CHROM	BEGIN	END	MARKER_ID	NS	AC	CALLRATE	MAF	PVALUE	SCORE	NS.CASE	NS.CTRL	AF.CASE	AF.CTRL
20	1610894	1610894	20:1610894_G/A_Synonymous:SIRPG	266	136	1	0.25564	0.0001097	3.8681	145	121	0.32069	0.17769
20	4162411	4162411	20:4162411_T/C_Intron:SMOX	266	204	1	0.38346	0.00055585	-3.4523	145	121	0.31379	0.46694
20	34061918	34061918	20:34061918_T/C_Intron:CEP250	266	39	1	0.073308	0.0011231	3.2577	145	121	0.1069	0.033058
20	4155948	4155948	20:4155948_G/A_Intron:SMOX	266	215	1	0.40414	0.0020791	-3.0787	145	121	0.34138	0.47934
20	4680251	4680251	20:4680251_A/G_Nonsynonymous:PRNP	266	186	1	0.34962	0.0025962	3.0119	145	121	0.40345	0.28512
20	36668874	36668874	20:36668874_G/A_Synonymous:RPRD1B	266	96	1	0.18045	0.003031	2.9646	145	121	0.22414	0.1281
[...]
Batch job on Biowulf
Sample session:

Create a batch input file (e.g. epacts.sh). For example:

#!/bin/bash
#   this file is called epacts.sh

#!/bin/bash

cd /scratch/$USER
module load EPACTS
epacts single --vcf  ${EPACTS_DIR}/data/1000G_exome_chr20_example_softFiltered.calls.vcf.gz   \
       --ped  ${EPACTS_DIR}/data/1000G_dummy_pheno.ped    --min-maf 0.001 --chr 20 --pheno DISEASE \
       --cov AGE --cov SEX --test b.score --anno  --out test --run 2

Submit this job using the Slurm sbatch command.

sbatch epacts.sh

Interactive job on Biowulf

Allocate an interactive node and run EPACTS on there. Sample session:

[$USER@biowulf ~]$ sinteractive 
salloc.exe: Pending job allocation 15813673
salloc.exe: job 15813673 queued and waiting for resources
salloc.exe: job 15813673 has been allocated resources
salloc.exe: Granted job allocation 15813673
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn0198 are ready for job

[$USER@cn0198 ~]$ module load EPACTS
[+] Loading gdal 2.0 ...
[+] Loading proj 4.9.2 ...
[+] Loading gcc 4.9.1 ...
[+] Loading openmpi 1.10.0 for GCC 4.9.1
[+] Loading tcl_tk 8.6.3
[+] Loading pandoc 1.15.0.6 ...
[+] Loading R 3.2.3 on cn1663
[+] Loading EPACTS 3.2.6 ...

[$USER@cn0198 ~]$  epacts single --vcf  ${EPACTS_DIR}/data/1000G_exome_chr20_example_softFiltered.calls.vcf.gz   \
       --ped  ${EPACTS_DIR}/data/1000G_dummy_pheno.ped    --min-maf 0.001 --chr 20 --pheno DISEASE \
       --cov AGE --cov SEX --test b.score --anno  --out test --run 2

Detected phenotypes with 2 unique values - 1 and 2 - considering them as binary phenotypes... re-encoding them into 1 and 2
Successfully written phenotypes and 2 covariates across 266 individuals
Processing chromosome 20...
Finished generating EPACTS Makefile
Running 2 parallel jobs of EPACTS
forkExecWait(): make -f /scratch/$USER/test.Makefile -j 2
Rscript /usr/local/apps/EPACTS/3.2.6/share/EPACTS/epactsSingle.R --vanilla /usr/local/apps/EPACTS/3.2.6 /scratch/$USER/test.phe /scratch/$USER/test.cov /scratch/$USER/test.ind /usr/local/apps/EPACTS/3.2.6/share/data/1000G_exome_chr20_example_softFiltered.calls.vcf.gz 20:1-10000000 /scratch/$USER/test.20.1.10000000.epacts GT 0.001 1 3 1000000000 0.5 0 FALSE single.b.score
Rscript /usr/local/apps/EPACTS/3.2.6/share/EPACTS/epactsSingle.R --vanilla /usr/local/apps/EPACTS/3.2.6 /scratch/$USER/test.phe /scratch/$USER/test.cov /scratch/$USER/test.ind /usr/local/apps/EPACTS/3.2.6/share/data/1000G_exome_chr20_example_softFiltered.calls.vcf.gz 20:10000001-20000000 /scratch/$USER/test.20.10000001.20000000.epacts GT 0.001 1 3 1000000000 0.5 0 FALSE single.b.score
Loading required package: epactsR
Loading required package: epactsR
NULL
Sucessfully wrote ( 1248 * 10 ) matrix
NULL
[....]
zcat /scratch/$USER/test.epacts.gz | awk '$9 != "NA" { print $0 }' | sort -g -k 9 | head -n 5000 > /scratch/$USER/test.epacts.top5000
touch /scratch/$USER/test.epacts.OK


[$USER@cn0198 ~]$ exit
exit
srun: error: cn0198: task 0: Exited with exit code 130
salloc.exe: Relinquishing job allocation 15813673
[$USER@biowulf ~]$
Documentation