Biowulf High Performance Computing at the NIH
GIGI: Genotype Imputation Given Inheritance

GIGI implements an approach that enables computationally efficient imputation in large pedigrees. It samples inheritance vectors (IVs) from a Markov Chain Monte Carlo sampler by conditioning on genotypes from a sparse set of framework markers. Missing genotypes are probabilistically inferred from these IVs along with observed dense genotypes that are available on a subset of subjects.

References:

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive --mem=4g
[user@cn3406 ~]$ module load GIGI 
+] Loading GIGI  1.05               
Download sample data:
[user@cn3406]$  cp -r $GIGI_DATA/* . 
Run GIGI on the sample data by passing it a parameters file as input:
[user@cn3406]$ GIGI example/param.txt

<<GIGI: Genotype Imputation Given Inheritance (v1.05)>>

1. Reading the input files:
 a. Now reading the Pedigree Meiosis file from 'example/ped52.pedMeio'
     Pedigree has 52 individuals, from Subject1=101 to Subject52=516 and contains 80 meioses in the file.

 b. Now reading the framework IV file from 'example/framework.IVs'
     The framework IV file contains 100 sampled IVs.

 c. Now reading from the Framework marker position file from 'example/framework.map'
     Framework marker positions (cM): 200 markers in the range 0 to 99.5

 d. Now reading the Dense marker position file from 'example/dense.map'
     Dense marker positions (cM): 50 markers in the range 50.004 to 50.2

 e. Now reading the Dense genotype file from 'example/dense.genotypes'
     reminder: allelic types should be coded as 1,2,... (0 for missing)

 f. Now reading the Allele frequency file from 'example/dense.afreq'
     The file includes N markers with (M allelic types): 50(2)
    *[Reminder!] In each line of the allele frequency file, the first number should correspond to the allele frequency of the allele labeled "1" in the genotype file, the second number should correspond to the allele frequency of allele labeled "2". This is a common mistake!

    *The markers to be imputed are all di-allelic markers. The dosage file will be generated along with the imputed genotype file and the probability file.
     Dosage is the expected percent of 1-allele in a genotype.
      [eg. 1 if P(genotype is 1/1)=1]

2. Options:
 a. Call method: confidence-based thresholds: t1=0.8, t2=0.9
 b. Seed=1234 (default)


3. The output files will be saved to:
 a. impute.geno (called genotypes)
 b. impute.prob (estimated genotype probabilities)
 c. impute.consistentIV (number of consistent IVs by marker)
 d. impute.dosage (dosage file of imputed genotypes)

4. Start imputing genotypes: [Progress shown every 2000th markers]
 Done calculating genotype probabilities for 50 markers!

5. Now calling genotypes.
 Done!

[user@cn3406]$ GIGI param_MostLikelyGenotypeCalling.txt

<<GIGI: Genotype Imputation Given Inheritance (v1.05)>>

1. Reading the input files:
 a. Now reading the Pedigree Meiosis file from 'example/ped52.pedMeio'
     Pedigree has 52 individuals, from Subject1=101 to Subject52=516 and contains 80 meioses in the file.

 b. Now reading the framework IV file from 'example/framework.IVs'
     The framework IV file contains 100 sampled IVs.

 c. Now reading from the Framework marker position file from 'example/framework.map'
     Framework marker positions (cM): 200 markers in the range 0 to 99.5

 d. Now reading the Dense marker position file from 'example/dense.map'
     Dense marker positions (cM): 50 markers in the range 50.004 to 50.2

 e. Now reading the Dense genotype file from 'example/dense.genotypes'
     reminder: allelic types should be coded as 1,2,... (0 for missing)

 f. Now reading the Allele frequency file from 'example/dense.afreq'
     The file includes N markers with (M allelic types): 50(2)
    *[Reminder!] In each line of the allele frequency file, the first number should correspond to the allele frequency of the allele labeled "1" in the genotype file, the second number should correspond to the allele frequency of allele labeled "2". This is a common mistake!

    *The markers to be imputed are all di-allelic markers. The dosage file will be generated along with the imputed genotype file and the probability file.
     Dosage is the expected percent of 1-allele in a genotype.
      [eg. 1 if P(genotype is 1/1)=1]

2. Options:
 a. Call method: most likely genotypes
 b. Seed=1234 (default)


3. The output files will be saved to:
 a. impute.geno (called genotypes)
 b. impute.prob (estimated genotype probabilities)
 c. impute.consistentIV (number of consistent IVs by marker)
 d. impute.dosage (dosage file of imputed genotypes)

4. Start imputing genotypes: [Progress shown every 2000th markers]
 Done calculating genotype probabilities for 50 markers!

5. Now calling genotypes.
 Done!
End the interactive session:
[user@cn3406 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$