High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
mach2dat

mach2dat performs logic regression, using imputed SNP dosage data and adjusting for covariates.

mach2dat is intended to be used interactively on Helix. If large numbers of mach2dat jobs are required (> 2-3 simultaneous jobs), or the jobs are expected to run for a long time, the Biowulf cluster may be more appropriate. Please contact the Helix staff (staff@hpc.nih.gov) if you have questions about where to run mach2dat.

On Helix
Sample session:
module load mach2dat
helix%  mach2qtl  -d sample.dat -p sample.ped -i sample.mlinfo -g sample.mldose > test.out

helix% more test.out
mach2dat 1.0.24 -- Disease-snp Association Tests with Imputed Dosages
(c) 2008 Yun Li, Wei Chen, Goncalo Abecasis 


The following parameters are in effect:

Available Options
         Phenotypic Data : --datfile [pheno.dat], --pedfile [pheno.ped]
   Imputed Genotype Data : --infofile [sample.mlinfo],
                           --dosefile [sample.mldose]
        Analysis Options : --useCovariates [ON], --likelihoodratio [ON],
                           --samplesize [ON], --verboseSampleSize,
                           --nrrounds [20], --rsqcutoff [1.0e-04],
                           --method [newton]
                  Output : --frequency


Loading marker information ...
   100 markers will be analyzed

Processing dosage file ...


2335 individuals are found in both dosage and phenotype files


                INFORMATION FROM .info FILE              Disease ASSOCIATION
                ======================================   ============================================================
TRAIT           MARKER               ALLELES  FREQ1    RSQR   EFFECT1  OR      STDERR  WALDCHISQ PVALUE     LRCHISQ LRPVAL
cases           SAMPLE-SIZE     1161 cases      1174 controls
cases           rs11861870            4,2      1.000   .8697   -0.092   0.912   0.090   1.0386    0.3081     1.0395  0.3079 
cases           rs1861869             3,2      .9999   .9543    0.006   1.006   0.060   0.0112    0.9156     0.0112  0.9156 
cases           rs1861868             4,2      .9999   .9995    0.010   1.010   0.059   0.0263    0.8711     0.0263  0.8711 
cases           rs1075440             1,3      .9999   .9997    0.068   1.071   0.060   1.3098    0.2524     1.3106  0.2523 
cases           rs1077128             2,1      .9999   .9922   -0.106   0.899   0.081   1.7331    0.188      1.7353  0.1877 
cases           rs11643744            1,3      1.000   .9981    0.068   1.071   0.060   1.3101    0.2524     1.3108  0.2522 
cases           rs7184874             4,2      1.000   .9286    0.009   1.009   0.061   0.0220    0.8822     0.0220  0.8822 
cases           rs7186521             3,1      1.000   .9573    0.009   1.009   0.060   0.0229    0.8797     0.0229  0.8797 
cases           rs7191566             1,3      1.000   .9390   -0.078   0.925   0.080   0.9510    0.3295     0.9516  0.3293 
cases           rs13333228            2,4      1.000   .9571    0.056   1.057   0.062   0.8220    0.3646     0.8223  0.3645 
cases           rs9940700             3,2      .8306   .9221   -0.075   0.928   0.081   0.8511    0.3562     0.8516  0.3561 
cases           rs13334933            1,3      .8552   .8252   -0.065   0.937   0.091   0.5088    0.4757     0.5090  0.4756 
cases           rs16952517            3,1      .9476   .2338    0.118   1.125   0.270   0.1903    0.6627     0.1906  0.6624 
cases           rs6499642             2,4      .9560   .4568    0.141   1.151   0.212   0.4409    0.5067     0.4421  0.5061 
cases           rs6499643             4,2      .8303   .5103    0.066   1.069   0.109   0.3687    0.5437     0.3689  0.5436 
cases           rs4784323             1,3      .3521   .6323   -0.012   0.988   0.077   0.0240    0.877      0.0240  0.877  
cases           rs7206790             2,3      .5503   .6514   -0.026   0.974   0.072   0.1339    0.7144     0.1339  0.7144 
cases           rs8047395             3,1      .5106   .8216   -0.058   0.943   0.064   0.8203    0.3651     0.8206  0.365  
cases           rs9937053             3,1      .5705   .8601   -0.055   0.946   0.063   0.7699    0.3802     0.7702  0.3802 
cases           rs9928094             1,3      .5705   .8606   -0.055   0.946   0.063   0.7715    0.3798     0.7717  0.3797 
cases           rs9930333             4,3      .5671   .8642   -0.054   0.948   0.063   0.7211    0.3958     0.7213  0.3957 
cases           rs12446228            1,3      .3922   .9127   -0.029   0.972   0.063   0.2047    0.651      0.2047  0.651  
cases           rs9939973             3,1      .5705   .8616   -0.055   0.946   0.063   0.7707    0.38       0.7710  0.3799 
cases           rs9940646             2,3      .5654   .8444   -0.055   0.946   0.064   0.7539    0.3852     0.7541  0.3852 
....

Analysis took 0 seconds

Batch Job on Biowulf

Sample batch script:

#!/bin/bash
# this file is called job.sh

cd /data/$USER/mydir
module load mach2dat
mach2qtl  -d sample.dat -p sample.ped -i sample.mlinfo -g sample.mldose > test.out

Submit with:

sbatch  job.sh
This command will submit the job to 2 CPUs and 4 GB of memory. If more than 4 GB of memory is required, specify the memory with
sbatch --mem=#g  job.sh

Swarm of jobs on Biowulf

Set up a swarm command file along the following lines:

mach2qtl  -d sample1.dat -p sample1.ped -i sample.mlinfo -g sample.mldose > sample1.out
mach2qtl  -d sample2.dat -p sample2.ped -i sample.mlinfo -g sample.mldose > sample2.out
mach2qtl  -d sample3.dat -p sample3.ped -i sample.mlinfo -g sample.mldose > sample3.out
[...]

Submit this swarm with:

swarm -f swarmfile --module mach2dat
Each mach2dat command will be set up to use a max of 1.5 GB of memory. If you need more than the default, you can specify it with
swarm -f swarmfile -g # --module mach2dat
where '#' represents the number of GigaBytes of memory required.

Interactive job on Biowulf

Allocate an interactive session and run mach2dat there. e.g.

biowulf% sinteractive

salloc.exe: Pending job allocation 6016303
salloc.exe: job 6016303 queued and waiting for resources
salloc.exe: job 6016303 has been allocated resources
salloc.exe: Granted job allocation 6016303
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn1808 are ready for job

[cn1808 ~]$ module load mach2dat

[cn1808 ~]$ mach2qtl  -d sample.dat -p sample.ped -i sample.mlinfo -g sample.mldose 

[cn1808 ~]$ exit
salloc.exe: Relinquishing job allocation 6016303

biowulf%

Documentation
The README file from this package is available at /usr/local/apps/mach2dat/1.0.24/README and is displayed below:
mach2dat
========
Performs logistic regression, using imputed SNP dosage data and adjusting for covariates

Sample Command:
--------------
executables/mach2dat -d examples/pheno.dat -p examples/pheno.ped -i examples/sample.mlinfo -g examples/sample.mldose

Notes
-----
(1) accepts gzipped file format

(2) categorical covariates should be coded as (# category - 1) dummies

(3) Affection status: 0 = missing; 1 = control; 2 = case

Output
------
"INFORMATION FROM .info FILE" are taken from .mlinfo or .info file fed in,
including 
(1) MARKER: SNP name, 
(2) ALLELES: 2 alleles (order matters),
(3) FREQ1:  frequency for allele one (the allele before comma underthe ALLELES), and 
(4) RSQR: rsq from info file, an imputation quality measure. 
          We normally recommend discarding results for SNPs with RSQR <0.3. 
          This threshold removes ~70% of badly imputed SNPs at the cost of ~0.5% well-imputed SNPs.

"Disease ASSOCIATION":
(1) EFFECT1  : effect for allele one. 
               A negative value means allele one has protective effect;

(2) OR       : Odds ratio = exp(EFFECT1)
(2) STDERR   : standard error for the above point estimate EFFECT1.

(3) WALDCHISQ: wald test chi-square test statistic = (EFFECT1 / STDERR)^2

(4) PVALUE   : p-value associated with the above wald test statistic.

(5) LRCHISQ  : likelihood ratio chi-square.

(6) LRPVAL   : likelihood ratio p-value.

We recommend using likelihood ratio test results, which is the default.

Reference:
------------
Please cite: 
Li Y, Willer CJ, Sanna S, and Abecasis GR (2009). Genotype imputation. Annu Rev Genomics Hum Genet. 10: 387-406.