EPACTS (Efficient and Parallelizable Association Container Toolbox) is a versatile software pipeline to perform various statistical tests for identifying genome-wide association from sequence data through a user-friendly interface, both to scientific analysts and to method developers.
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144 ~]$ module load EPACTS [user@cn3144 ~]$ cp ${EPACTS_DIR}/1000G_* . [user@cn3144 ~]$ epacts single --vcf 1000G_exome_chr20_example_softFiltered.calls.vcf.gz \ --ped 1000G_dummy_pheno.ped --min-maf 0.001 --chr 20 --pheno DISEASE \ --cov AGE --cov SEX --test b.score --anno --out test --run 2 Detected phenotypes with 2 unique values - 1 and 2 - considering them as binary phenotypes... re-encoding them into 1 and 2 Successfully written phenotypes and 2 covariates across 266 individuals Processing chromosome 20... Finished generating EPACTS Makefile Running 2 parallel jobs of EPACTS forkExecWait(): make -f /data/user/test.Makefile -j 2 Rscript /usr/local/share/EPACTS/epactsSingle.R --vanilla /usr/local /data/user/test.phe /data/user/test.cov /data/user/test.ind /data/user/1000G_exome_chr20_example_softFiltered.calls.vcf.gz 20:1-10000000 /data/user/test.20.1.10000000.epacts GT 0.001 1 3 1000000000 0.5 0 FALSE single.b.score Rscript /usr/local/share/EPACTS/epactsSingle.R --vanilla /usr/local /data/user/test.phe /data/user/test.cov /data/user/test.ind /data/user/1000G_exome_chr20_example_softFiltered.calls.vcf.gz 20:10000001-20000000 /data/user/test.20.10000001.20000000.epacts GT 0.001 1 3 1000000000 0.5 0 FALSE single.b.score Loading required package: epactsR Loading required package: epactsR NOTICE - Reading VCF took 1 seconds [....] zcat /data/user/test.epacts.gz | awk '$9 != "NA" { print $0 }' | sort -g -k 9 | head -n 5000 > /data/user/test.epacts.top5000 touch /data/user/test.epacts.OK [user@cn3144 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
Create a batch input file (e.g. EPACTS.sh). For example:
#!/bin/bash set -e cd /data/$USER module load EPACTS cp ${EPACTS_DIR}/1000G_* . epacts single --vcf 1000G_exome_chr20_example_softFiltered.calls.vcf.gz \ --ped 1000G_dummy_pheno.ped --min-maf 0.001 --chr 20 --pheno DISEASE \ --cov AGE --cov SEX --test b.score --anno --out test --run 2
Submit this job using the Slurm sbatch command.
sbatch [--mem=#] EPACTS.sh
Create a swarmfile (e.g. EPACTS.swarm). For example:
epacts single --vcf input1.vcf --ped pheno.ped --out out1 --run 2 epacts single --vcf input2.vcf --ped pheno.ped --out out2 --run 2 epacts single --vcf input3.vcf --ped pheno.ped --out out3 --run 2 epacts single --vcf input4.vcf --ped pheno.ped --out out4 --run 2
Submit this job using the swarm command.
swarm -f EPACTS.swarm [-g #] [-t #] --module EPACTSwhere
-g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
-t # | Number of threads/CPUs required for each process (1 line in the swarm command file). |
--module EPACTS | Loads the EPACTS module for each subjob in the swarm |