Biowulf High Performance Computing at the NIH
EMIM on Biowulf

PREMIM and EMIM are tools for the estimation of parental and child genetic effects, based on genotype data from a variety of different child-parent configurations. PREMIM allows the extraction of child-parent genotype data from standard-format pedigree data files, while EMIM uses the extracted genotype data to perform subsequent statistical analysis.

References:

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf ~]$ sinteractive --gres lscratch:1
salloc.exe: Pending job allocation 52908283
salloc.exe: job 52908283 queued and waiting for resources
salloc.exe: job 52908283 has been allocated resources
salloc.exe: Granted job allocation 52908283
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3153 are ready for job
srun: error: x11: no local DISPLAY defined, skipping
[user@cn3153 ~]$ cd /lscratch/$SLURM_JOBID
[user@cn3153 52908283]$ module load emim
[+] Loading emim, version 3.22...
[user@cn3153 52908283]$ premim $EMIM_HOME/testpedigree.ped

PREMIM: Pedigree file processing program for EMIM, v3.22
--------------------------------------------------------
Copyright 2011-2016 Richard Howey, GNU General Public License, v3
Institute of Genetic Medicine, Newcastle University

Log file: premim.log
Input file: /usr/local/apps/emim/3.22/testpedigree.ped
Child trend analysis set for parameter file (emimparams.dat).

Number of subjects: 490
          Males: 320 (65.3061%)
          Females: 170 (34.6939%)
          Unknown sex: 0 (0%)
          Affected: 92 (18.7755%)
          Unaffected: 398 (81.2245%)
Number of SNPs: 1

Number of pedigrees: 130
Mean pedigree size: 3.76923
Standard deviation of pedigree size: 1.81098

File name: caseparenttrios
Number of counted case parent trios (all SNPs): 20
Average number of counted case parent trios (per SNP): 20
Number of uncounted (Mendelian error) case parent trios: 0

File name: casemotherduos
Number of counted case mother duos (all SNPs): 10
Average number of counted case mother duos (per SNP): 10
Number of uncounted (Mendelian error) case mother duos: 0

File name: casefatherduos
Number of counted case father duos (all SNPs): 10
Average number of counted case father duos (per SNP): 10
Number of uncounted (Mendelian error) case father duos: 0

File name: cases
Number of counted cases (all SNPs): 10
Average number of counted cases (per SNP): 10

File name: caseparents
Number of counted case parents (all SNPs): 10
Average number of counted case parents (per SNP): 10
Number of uncounted (Mendelian error) case parents: 0

File name: casemothers
Number of counted case mothers (all SNPs): 10
Average number of counted case mothers (per SNP): 10

File name: casefathers
Number of counted case fathers (all SNPs): 10
Average number of counted case fathers (per SNP): 10

File name: conparents
Number of counted control parents (all SNPs): 20
Average number of counted control parents (per SNP): 20
Number of uncounted (Mendelian error) control parents: 0

File name: conmotherduos
Number of counted control mother duos (all SNPs): 10
Average number of counted control mother duos (per SNP): 10
Number of uncounted (Mendelian error) control mother duos: 0

File name: confatherduos
Number of counted control father duos (all SNPs): 10
Average number of counted control father duos (per SNP): 10
Number of uncounted (Mendelian error) control father duos: 0

File name: cons
Number of counted controls (all SNPs): 10
Average number of counted controls (per SNP): 10

Number of uncounted groups: 0

Run time: less than one second

[user@cn3153 52908283]$ ls
casefatherduos.dat  casemothers.dat cases.dat    conparents.dat  premim.log
casefathers.dat     caseparents.dat confatherduos.dat  cons.dat
casemotherduos.dat  caseparenttrios.dat  conmotherduos.dat  emimparams.dat
[user@cn3153 52908283]$ cp $EMIM_HOME/emimmarkers.dat .
[user@cn3153 52908283]$ emim
 ANALYSING SNP NUMBER           1 , SNPID=   1.0000000000000000     
[user@cn3153 52908283]$ exit
exit
salloc.exe: Relinquishing job allocation 52908283
[user@biowulf ~]$

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. emim.sh). For example:

#!/bin/sh
set -e
module load emim

premim $EMIM_HOME/testpedigree.ped
cp $EMIM_HOME/emimmarkers.dat .
emim

Submit this job using the Slurm sbatch command.

sbatch [--cpus-per-task=#] [--mem=#] emim.sh
Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile (e.g. emim.swarm). For example:

mkdir sample1 && cd sample1 && premim ../input1.bed && emim
mkdir sample2 && cd sample1 && premim ../input2.bed && emim
mkdir sample3 && cd sample1 && premim ../input3.bed && emim
mkdir sample4 && cd sample1 && premim ../input4.bed && emim

Submit this job using the swarm command.

swarm -f emim.swarm [-g #] [-t #] --module emim
where
-g # Number of Gigabytes of memory required for each process (1 line in the swarm command file)
-t # Number of threads/CPUs required for each process (1 line in the swarm command file).
--module emim Loads the emim module for each subjob in the swarm