High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
xwas on Biowulf & Helix

Description

XWAS is used in the analysis of the X chromosome in association studies. From the XWAS home page:

The X chromosome plays an important role in complex human traits and diseases, especially those with sexually dimorphic characteristics. Special attention needs to be given to analysis of X due to its unique inheritance pattern and X-inactivation. These lead to several analytical complications that have resulted in the majority of genome-wide association studies (GWAS) to date having excluded the X chromosome or otherwise mishandled it by applying the same tools designed for non-sex chromosomes. With XWAS, we hope to provide the tools and incentive to incorporate the X chromosome into GWAS, hence enabling discoveries of novel loci implicated in many diseases and in their sexual dimorphism.

There may be multiple versions of xwas available. An easy way of selecting the version is to use modules. To see the modules available, type

module avail xwas 

To select a module use

module load xwas/[version]

where [version] is the version of choice.

Environment variables set

References

For a detailed list of references for different aspects of XWAS please see the Home page

Documentation

Interactive job on Biowulf

Allocate an interactive session with sinteractive and use as described below.

First an example of running the included imputation pipeline

biowulf$ sinteractive --mem=6g
salloc.exe: Pending job allocation 33579755
salloc.exe: job 33579755 queued and waiting for resources
salloc.exe: job 33579755 has been allocated resources
salloc.exe: Granted job allocation 33579755
[...snip...]
node$ module load xwas
[+] Loading xwas 1.1
node$ # copy the input data
node$ cp -r ${XWAS_TEST_DATA_IMPUTE} .
node$ # create required directories
node$ mkdir -p example_X_final example_X_step1 example_X_step2
node$ # create a parameter file; note that trailing slashes are required
node$ cat > example.par <<END_OF_CONFIG
FILE: hapmap3_euro_all
OUTPUT: hapmap3_chr23_imputed
NJOBS: 31
BUILD: 18
FILELOC: example_X_loc/
REFLOC: ${XWAS_DB}/impute_ref/
TOOLSLOC: ${XWAS_IMPUTE_BIN}/
RESLOC1: example_X_step1/
RESLOC2: example_X_step2/
FINALRESLOC: example_X_final/
MAFRULE: EUR.MAF<=0.005
END_OF_CONFIG

node$ # create all the scripts
node$ make_imputation_files example.par
node$ # run first step; output will be in RESLOC1
node$ ./hapmap3_euro_all_preimpute.sh)
node$ # run step 2 - run in parallel using swarm
node$ swarm -t1 -p2 -g2 --time=60 -f hapmap3_euro_all_impute2_run_all.sh
node$ # wait for swarm to finish
node$ # run step 3
node$ ./hapmap3_euro_all_impute2_cat.sh
biowulf$

Then run the post-impute QC pipeline. Note that build is now 19.

node$ cd example_X_final
node$ cat > qc.par <<END_OF_CONFIG
filename hapmap3_chr23_imputed
xwasloc ${XWAS_BIN}/
eigstratloc ${XWAS_BIN}/
excludexchrPCA	YES
build 19
alpha 0.05
plinkformat bed
exclind 0
maf 0.005
missindiv 0.1
missgeno 0.1
numpc 10
related 0.125
quant 0
END_OF_CONFIG
node$ xwas_qc.post_imputation.sh qc.par

Next, an example of how to do some association tests

node$ cp -r ${XWAS_TEST_DATA} .
node$ cd example/gene_test
node$ cat > example_params_gene_test.txt <<END_OF_CONFIG
filename	example_gene_test
xwasloc	${XWAS_BIN}/
genescriptloc	${XWAS_BIN}/
genelistname	example_gene_test_list.txt
assocfile	example_gene_test_snp_pval.txt
numindiv 50
buffer	15000
output	example_gene_test_result.txt
END_OF_CONFIG

node$ gene_based_test.sh example_params_gene_test.txt

xwas allows many different analyses. Please see the manual for more documentation.

Batch job on Biowulf

Create a batch script similar to the following example to carry out an imputation:

#! /bin/bash
# this file is impute.sh

# create required directories
mkdir -p example_X_final example_X_step1 example_X_step2

# create a parameter file
cat > example.par <<END_OF_CONFIG
FILE: hapmap3_euro_all
OUTPUT: hapmap3_chr23_imputed
NJOBS: 31
BUILD: 18
FILELOC: example_X_loc/
REFLOC: ${XWAS_DB}/impute_ref/
TOOLSLOC: ${XWAS_IMPUTE_BIN}/
RESLOC1: example_X_step1/
RESLOC2: example_X_step2/
FINALRESLOC: example_X_final/
MAFRULE: EUR.MAF<=0.005
END_OF_CONFIG

# create all the scripts
make_imputation_files example.par

par="--time=60 --partition=quick"
# run step 1
jid1=$(sbatch --mem=2g ${par} hapmap3_euro_all_preimpute.sh)
[[ $? -eq 0 ]] || exit 1
echo "Submitted step 1: ${jid1}"

# run step 2
jid2=$(swarm -t1 -p2 -g2 ${par} --dependency=afterany:${jid1} -f hapmap3_euro_all_impute2_run_all.sh)
[[ $? -eq 0 ]] || exit 1
echo "Submitted step 2: ${jid2}"

# run step 3
jid3=$(sbatch --dependency=afterany:${jid2} ${par} --mail-type=ALL --mem=6g hapmap3_euro_all_impute2_cat.sh)
[[ $? -eq 0 ]] || exit 1
echo "Submitted step 3: ${jid3}"

sleep 5s
squeue -j ${jid1},${jid2},${jid3}

Submit to the queue with sbatch:

biowulf$ sbatch impute.sh

And similarly for all the other functions of xwas