High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
mach2qtl on Biowulf

Description

Mach2qtl uses dosages/posterior probabilities inferred with mach or minimac as predictors in a linear regression to test association with quantitative traits.

References

Web sites

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load mach2qtl

[user@cn3144 ~]$ cd /data/$USER/test_data/mach2qtl

[user@cn3144 ~]$ mach2qtl -d sample.dat -p sample.ped -i sample.mlinfo \
          --probfile sample.mlprob \
          --samplesize > test.out

[user@cn3144 ~]$ less test.out
Mach2Qtl V1.1.3 (2013-04-25) -- QTL Association Mapping with Imputed Allele Counts
(c) 2007 Goncalo Abecasis, Yun Li


The following parameters are in effect:

Available Options
         Phenotypic Data : --datfile [sample.dat], --pedfile [sample.ped]
   Imputed Allele Counts : --infofile [sample.mlinfo], --dosefile [],
                           --probfile [sample.mlprob]
        Analysis Options : --useCovariates [ON], --quantileNormalization,
                           --dominant, --recessive, --additive [ON]
                  Output : --samplesize [ON], --rsqcutoff [0.00]

FITTED MODELS (for covariate adjusted residuals)
======================================================
          Trait     Raw Mean Raw Variance         Mean     Variance
        lg10CRP      0.52383      0.13666     -0.00000      0.13180

Loading marker information ...
   1001 markers will be analyzed

Processing prob file ...

Executing first [ass analysis of imputed genotypes ...
Executing second pass analysis of imputed genotypes ...

                    INFORMATION FROM .info FILE             QTL ASSOCIATION ADDITIVE
                    =====================================   =================================
TRAIT               MARKER               ALLELES  FREQ1  RSQR   EFFECT2  STDERR  CHISQ     PVALUE N
lg10CRP             rs9977217            A,G      .3950  .9805   0.020   0.018   1.2320     0.267    867
lg10CRP             rs9977094            A,C      .9894  .3454   0.179   0.133   1.8130    0.1782    867
lg10CRP             rs10854272           C,T      .4169  .9562   0.014   0.018   0.6349    0.4256    867
...
Analysis took 1 seconds

[user@cn3144 ~]$ mach2qtl -d sample.dat -p sample.ped -i sample.mlinfo \
          --probfile sample.mlprob --dominant --recessive \
          --samplesize > test.out

[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Running a single mach2qtl batch job on Biowulf

To run a single mach2qtl analysis on biowulf, a batch script similar to the exammple below will be required:

#! /bin/bash
set -e

cd /data/$USER/test_data/mach2qtl
module load mach2qtl/1.1.3
mach2qtl -d sample.dat -p sample.ped -i sample.mlinfo \
    --probfile sample.mlprob --dominant --recessive \
    --samplesize > test.out

This script can then be submitted to the slurm batch queue with a command like

biowulf$ sbatch --mem=2g batch_script.sh

Running a swarm of mach2qtl batch jobs on Biowulf

It is easiest to run a large number of simultaneous mach2qtl jobs via swarm. Set up a swarm command file along the following lines:

#------- this file is swarmcmd ------------------
mach2qtl -d sample1.dat -p sample1.ped -i sample1.mlinf --probfile sample1.mlp  > sample1.out
mach2qtl -d sample2.dat -p sample2.ped -i sample2.mlinf --probfile sample2.mlp  > sample2.out
mach2qtl -d sample1.dat -p sample3.ped -i sample3.mlinf --probfile sample3.mlp  > sample3.out
mach2qtl -d sample4.dat -p sample4.ped -i sample4.mlinf --probfile sample4.mlp  > sample4.out
[...]

If each mach2qtl process requires less than 1 GB of memory, submit this to the batch system with the command:

swarm -f swarmcmd --module mach2qtl

If each mach2qtl process requires more than 1 GB of memory, use

swarm -g # -f cmdfile --module mach2qtl

where '#' is the number of Gigabytes of memory required by each mach2qtl process. The swarm program will package the commands for best efficiency and send them to the batch system as a job array.

Documentation

Mach2qtl readme:


mach2qtl
========
QTL analysis based on imputed dosages/posterior_probabilities


Options:
--------

    --datfile : merlin marker info file for phenotypes (required)

    --pedfile : merlin pedigree file for phenotypes (required)

    --infofile : mach1 output .info or .mlinfo file (required)

    --dosefile : mach1 output .dose or .mldose file (required if probfile not available)

    --probfile : mach1 output .prob or .mlprob file (required for non-additive model)

    --useCovariates : adjust for covariates (default is ON)

    --quantileNormalization : apply inverse normal transformation to quantitative trait(s) (default if OFF)

    --dominant : dominant model (default if OFF)

    --recessive : recessive model (default if OFF)

    --additive : additive model (default if ON)

Sample command line & Example
-----------------------------

    see sub-folder examples/

Coding
======
For recessive model: AL1/AL1                    1
                     AL1/AL2 & AL2/AL2          0
    so a positive beta means AL1/AL1 is associated with higher trait values)

For dominant model:  AL2/AL2                    1
                     AL1/AL2 & AL1/AL1          0
    so a positive beta means AL2/AL2 is associated with higher trait values)

For additive model:  AL1            0
             AL2            1
             (notice the negative sign in "double effect = -numerator / denominator;"
    so a positive beta means AL2 is associated with higher trait values)

Procedure:
----------
Calculate residuals (if related, use VC)
Perform residual-SNP association

Selected Output
----------------
Raw Mean/Variance vs Mean/Variance
        former for y (could be transformed)
        latter for residuals (even if no residuals, Raw Mean and Mean could be different b/c latter would be centered and thus would be ZERO)


ref:
----
Li Y, Willer CJ, Sanna S, and Abecasis GR (2009). Genotype imputation. Annu Rev Genomics Hum Genet. 10: 387-406. 
Chen WM and Abecasis GR (2007). Family-based association tests for genomewide association scans. Am J Hum Genet 81:913-26