High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
Eigensoft on Biowulf & Helix

***************************************************************************************************
Note, the programs eigenstrat and eigenstratQTL of EIGENSOFT version 2.0
have been replaced by smarteigenstrat.perl. Please refer to documentation.
***************************************************************************************************

The EIGENSOFT package combines functionality from population genetics methods (Patterson et al. 2006) and EIGENSTRAT stratification correction me thod (Price et al. 2006). The EIGENSTRAT method uses principal components analysis to explicitly model ancestry differences between cases and controls along continuous axe s of variation; the resulting correction is specific to a candidate marker’s variation in frequency across ancestral populations, minimizing spurious associations while maximizing power to detect true associations. The EIGENSOFT package has a built-in plotting script and supports multiple file formats and quantitative phenotypes.Eigensoft was developed at Harvard Genetics Department and the Broad Institute.

Running on Helix

$ module load eigensoft
$ cd /data/$USER/dir
$ smarteigenstrat.perl ....

Running a single batch job on Biowulf

1. Create a script file. The file will contain the lines similar to the lines below.

#!/bin/bash


module load eigensoft
cd /data/$USER/dir
smarteigenstrat.perl .......

2. Submit the script on biowulf:

$ sbatch jobscript

If more momory is required (default 4gb), specify --mem=Mg, for example

$sbatch --mem=10g jobscript 

Running a swarm of jobs on Biowulf

Setup a swarm command file:

  cd /data/$USER/dir1; eigensoft command
  cd /data/$USER/dir2; eigensoft command
  cd /data/$USER/dir2; eigensoft command
	[......]
  

Submit the swarm file, -f specify the swarmfile name, and module eigensoft will be loaded for each command line in the file:

  $ swarm -f swarmfile --module eigensoft

If more memory is needed for each line of commands, the below example allocate 10g for each command:

  $ swarm -f swarmfile -g 10 --module eigensoft
For more information regarding running swarm, see swarm.html

Running an interactive job on Biowulf

It may be useful for debugging purposes to run jobs interactively. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.

biowulf$ sinteractive 
salloc.exe: Granted job allocation 16535

cn999$ module load eigensoft
cn999$ cd /data/$USER/dir
cn999$ eigensoft commands
[...etc...]

cn999$ exit
exit

biowulf$

Make sure to exit the job once finished.

If more memory is needed, use --mem. For example

biowulf$ sinteractive --mem=8g

Documentation

https://github.com/DReichLab/EIG