High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
mach2qtl on Biowulf & Helix

Description

Mach2qtl uses dosages/posterior probabilities inferred with mach or minimac as predictors in a linear regression to test association with quantitative traits.

References

Web sites

Running mach2qtl on Helix

Load the module that sets up the environment for mach2qtl:

helix$ module load mach2qtl
helix$ module list

Currently Loaded Modules:
  1) mach2qtl/1.1.3

Example run of mach2qtl:

helix$ cd /data/$USER/test_data/mach2qtl
helix$ mach2qtl -d sample.dat -p sample.ped -i sample.mlinfo \
          --probfile sample.mlprob \
          --samplesize > test.out
helix$ less test.out

Mach2Qtl V1.1.3 (2013-04-25) -- QTL Association Mapping with Imputed Allele Counts
(c) 2007 Goncalo Abecasis, Yun Li


The following parameters are in effect:

Available Options
         Phenotypic Data : --datfile [sample.dat], --pedfile [sample.ped]
   Imputed Allele Counts : --infofile [sample.mlinfo], --dosefile [],
                           --probfile [sample.mlprob]
        Analysis Options : --useCovariates [ON], --quantileNormalization,
                           --dominant, --recessive, --additive [ON]
                  Output : --samplesize [ON], --rsqcutoff [0.00]

FITTED MODELS (for covariate adjusted residuals)
======================================================
          Trait     Raw Mean Raw Variance         Mean     Variance
        lg10CRP      0.52383      0.13666     -0.00000      0.13180

Loading marker information ...
   1001 markers will be analyzed

Processing prob file ... 

Executing first [ass analysis of imputed genotypes ... 
Executing second pass analysis of imputed genotypes ... 

                    INFORMATION FROM .info FILE             QTL ASSOCIATION ADDITIVE         
                    =====================================   =================================
TRAIT               MARKER               ALLELES  FREQ1  RSQR   EFFECT2  STDERR  CHISQ     PVALUE N
lg10CRP             rs9977217            A,G      .3950  .9805   0.020   0.018   1.2320     0.267    867
lg10CRP             rs9977094            A,C      .9894  .3454   0.179   0.133   1.8130    0.1782    867
lg10CRP             rs10854272           C,T      .4169  .9562   0.014   0.018   0.6349    0.4256    867
...
Analysis took 1 seconds

helix$
helix$ mach2qtl -d sample.dat -p sample.ped -i sample.mlinfo \
          --probfile sample.mlprob --dominant --recessive \
          --samplesize > test.out

Running a single mach2qtl batch job on Biowulf

To run a single mach2qtl analysis on biowulf, a batch script similar to the exammple below will be required:

#! /bin/bash
set -e

cd /data/$USER/test_data/mach2qtl
module load mach2qtl/1.1.3
mach2qtl -d sample.dat -p sample.ped -i sample.mlinfo \
    --probfile sample.mlprob --dominant --recessive \
    --samplesize > test.out

This script can then be submitted to the slurm batch queue with a command like

biowulf$ sbatch --mem=2g batch_script.sh

Running a swarm of mach2qtl batch jobs on Biowulf

It is easiest to run a large number of simultaneous mach2qtl jobs via swarm. Set up a swarm command file along the following lines:

#------- this file is swarmcmd ------------------
mach2qtl -d sample1.dat -p sample1.ped -i sample1.mlinf --probfile sample1.mlp  > sample1.out
mach2qtl -d sample2.dat -p sample2.ped -i sample2.mlinf --probfile sample2.mlp  > sample2.out
mach2qtl -d sample1.dat -p sample3.ped -i sample3.mlinf --probfile sample3.mlp  > sample3.out
mach2qtl -d sample4.dat -p sample4.ped -i sample4.mlinf --probfile sample4.mlp  > sample4.out
[...]

If each mach2qtl process requires less than 1 GB of memory, submit this to the batch system with the command:

swarm -f swarmcmd --module mach2qtl

If each mach2qtl process requires more than 1 GB of memory, use

swarm -g # -f cmdfile --module mach2qtl

where '#' is the number of Gigabytes of memory required by each mach2qtl process. The swarm program will package the commands for best efficiency and send them to the batch system as a job array.

Running an interactive job on Biowulf

After allocating an interactive session with sinteractive, mach2qtl can be used essentially like described above for helix:

biowulf$ sinteractive --mem=2g
salloc.exe: Granted job allocation 17520
slurm stepprolog here!
srun: error: x11: no local DISPLAY defined, skipping
                                                    Begin slurm taskprolog!
                                                    End slurm taskprolog!

node$ cd /data/$USER/test_data/mach2qtl
node$ mach2qtl -d sample.dat -p sample.ped -i sample.mlinfo \
          --probfile sample.mlprob --dominant --recessive \
          --samplesize > test.out
node$ exit
biowulf$

Documentation

Mach2qtl readme:


mach2qtl
========
QTL analysis based on imputed dosages/posterior_probabilities


Options:
--------

    --datfile : merlin marker info file for phenotypes (required)

    --pedfile : merlin pedigree file for phenotypes (required)

    --infofile : mach1 output .info or .mlinfo file (required)

    --dosefile : mach1 output .dose or .mldose file (required if probfile not available)

    --probfile : mach1 output .prob or .mlprob file (required for non-additive model)

    --useCovariates : adjust for covariates (default is ON)

    --quantileNormalization : apply inverse normal transformation to quantitative trait(s) (default if OFF)

    --dominant : dominant model (default if OFF)

    --recessive : recessive model (default if OFF)

    --additive : additive model (default if ON)

Sample command line & Example
-----------------------------

    see sub-folder examples/

Coding
======
For recessive model: AL1/AL1                    1
                     AL1/AL2 & AL2/AL2          0
    so a positive beta means AL1/AL1 is associated with higher trait values)

For dominant model:  AL2/AL2                    1
                     AL1/AL2 & AL1/AL1          0
    so a positive beta means AL2/AL2 is associated with higher trait values)

For additive model:  AL1            0
             AL2            1
             (notice the negative sign in "double effect = -numerator / denominator;"
    so a positive beta means AL2 is associated with higher trait values)

Procedure:
----------
Calculate residuals (if related, use VC)
Perform residual-SNP association

Selected Output
----------------
Raw Mean/Variance vs Mean/Variance
        former for y (could be transformed)
        latter for residuals (even if no residuals, Raw Mean and Mean could be different b/c latter would be centered and thus would be ZERO)


ref:
----
Li Y, Willer CJ, Sanna S, and Abecasis GR (2009). Genotype imputation. Annu Rev Genomics Hum Genet. 10: 387-406. 
Chen WM and Abecasis GR (2007). Family-based association tests for genomewide association scans. Am J Hum Genet 81:913-26