High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
Mach1.0, Minimac and ChunkChromosome on Biowulf & Helix

MACH 1.0 is a Markov Chain based haplotyper. It can be resolve long haplotypes or infer missing genotypes in samples of unrelated individuals. Mach website

Minimac is a low memory, computationally efficient implementation of the MaCH algorithm for genotype imputation. NOTE: Minimac now has its own module file [Minimac website].

ChunkChromosome is a helper utility for minimac and MaCH. It can be used to facilitate analyses of very large datasets in overlapping slices. It will get loaded as part of the Mach or Minimac modules. [ChunkChromosome website.

if you get segmentation faults with ChunkChromosome and you are passing data in from a different directory, try running it from the directory containing the data. This issue has been reported to the developer.

All 3 programs were developed in the lab of Goncalo Abecasis at the University of Michigan.

The utility FCgene, a format converting tool for genotyped data (e.g. PLINK-MACH, MACH-PLINK) is also available. Type 'module load fcgene' to add the binary to your path, and then 'fcgene' to run it.

Reference panels (1000 Genomes) for Minimac are available in /fdb/minimac/

On Helix

Use 'module load mach1' to add the Mach1.0 executables to your path. Use 'module load minimac' to add the latest minimac executables to your path. Either command will also add the ChunkChromosome executables to your path.

Sample session:

helix% module load mach1

helix% mach1 --datfile /usr/local/mach/examples/sample.dat \
    --pedfile /usr/local/mach/examples/sample.ped
Mach 1.0.12 -- Markov Chain Haplotyping
(c) 2005-2007 Goncalo Abecasis, with thanks to Yun Li, Paul Scheet


The following parameters are in effect:

Available Options
       Input Files : --datfile [/usr/local/mach/examples/sample.dat],
                     --pedfile [/usr/local/mach/examples/sample.ped],
                     --mask [0.00]
    Optional Files : --crossoverMap [], --errorMap [], --physicalMap []
       Phased Data : --snps [], --haps [], --hapmapFormat, --autoFlip,
                     --greedy
    Markov Sampler : --seed [123456], --burnin, --rounds
   Mapping Options : --npl, --association
        Haplotyper : --states, --errorRate [1.0e-03], --weighted, --compact
        Imputation : --geno, --quality, --dosage, --mle
      Output Files : --prefix [mach1.out], --phase, --mldetails
    Interim Output : --sampleInterval, --interimInterval

Loaded pedigree with:
    500 individuals to be haplotyped at 46 markers

Formating genotypes and allocating memory for haplotyping
                          Pedigree file ... 114.3 kb
               Haplotyping engine (max) ... 88.6 mb
            Haplotyping engine (actual) ... 88.6 mb
Memory allocated successfully

Found initial haplotype set

Wrote out file [mach1.out.rec] with mosaic crossover rates ...
Wrote out file [mach1.out.erate] with per marker error rates ...

Estimated mismatch rate in Markov model is: 0.00100

helix%
Swarm of jobs on Biowulf

Set up a swarm command file along the following lines:

#------- this file is swarmcmd ------------------
mach1 --datfile sample1.dat --pedfile sample1.ped
mach1 --datfile sample2.dat --pedfile sample2.ped
mach1 --datfile sample3.dat --pedfile sample3.ped
mach1 --datfile sample4.dat --pedfile sample4.ped
[...]

Submit the swarm with:

swarm -f cmdfile --module mach1/1.0.18
This will run each command on a single core (2 CPUs) with 4 GB of memory. If your mach1 jobs require more than 4 GB of memory, use the '-g' flag for swarm:
swarm -g # -f cmdfile --module mach1/1.0.18
where '#' is the number of GigaBytes of memory required.

Documentation

Mach 1.0 Tutorial
Minimac documentation
Minimac2 documentation
Minimac3 documentation
ChunkChromosome docs