Go to Main Index
Go to Table of Contents

Chapter 5 IBD and MIBD Computation

5.1 Types of Linkage Analysis and Matrices

Two basic types of linkage analysis are available, Twopoint and Multipoint. For Twopoint linkage analysis, only the IBDs at specific markers or candidate loci are required. For Multipoint linkage analysis, multipoint IBDs (MIBDs) are required, which in turn requires that marker locations have been mapped. IBD stands for Identity By Descent, and MIBD stands for Multipoint Identity By Descent.

IBDs and MIBDs are matrices that contain one value for each pair of individuals in a pedigree. To save space, unrelated pairs of individuals are left out and their value is assumed to be zero. The matrices are stored in files which are compressed automatically by SOLAR using the GNU gzip program. Even with compression, the files can take up quite a bit of disk space. The first line in matrix files created by SOLAR looks like strange data but is actually a a checksum to insure the matrix is used with the correct pedigree only. This checksum line is created by the matcrc command. It is not necessary for user-created matrix files to have this checksum line, but it is recommended and easy to do with the matcrc command.

SOLAR uses an approximate method for computing multipoint IBDs. This method is discussed in detail in:

Almasy L, Blangero J (1998) Multipoint quantitative trait linkage analysis in general pedigrees. Am J Hum Genet 62:1198-1211.

It is also possible to use the multipoint IBDs computed by another genetic analysis program. This is discussed in Section 5.4.

5.2 Marker-specific IBD Computation

Preparation of marker-specific IBD files requires the following commands (in this order):

	load pedigree pedigree-filename
	load freq freq-filename           ;# Optional
	load marker marker-filename
	freq mle                          ;# Required if no load freq
	ibddir ibd-dirname
	ibd

(Depending on the situation, some of these commands are not required. A simplified presentation of the following discussion was presented in the tutorial Section 3.7.)

Use the load freq command if you have prior knowledge of the allele frequencies. Otherwise, a simple counting method, identical to that used by the PEDSYS program genefreq, is used by the load marker command to determine the allele frequencies. If you use the simple-count allele frequencies provided by load marker, it is recommended that you compute maximum likelihood estimates of the frequencies with the freq mle command (though that process might take a lot of computer time and memory). If you load allele frequency information which has already been maximized with load freq, you may choose not to use the freq mle command.

The order of the commands is important. The pedigree data must be loaded first so that SOLAR can determine the family structure and generate its internal indexing. The marker data and frequency data must be loaded next. If marker allele frequencies have been determined previously, these can be loaded either before or after loading the marker data. If allele frequencies are not known ahead of time, simply omit the load freq step.

Only one pedigree file can be loaded at a time. Issuing the load pedigree command causes any previously loaded pedigree data to be unloaded. Pedigree data need only be loaded once. Thereafter, each time SOLAR is run from the same working directory, the same pedigree data will still be loaded. Pedigree data must be loaded before marker data can be loaded, and when pedigree data is unloaded, any currently loaded marker data will also be unloaded.

Only one marker file can be loaded at a time. The marker data will remain loaded until either a new marker file is loaded or the marker unload command is given. Once loaded, the marker file does not have to be reloaded in subsequent SOLAR runs. Similarly, only one freq file can be loaded at a time. The allele frequency data will remain loaded until either a new freq file is loaded or the freq unload command is given. Once loaded, the freq file does not have to be reloaded in subsequent SOLAR runs.

By default, the markers in the marker file are assumed to be autosomal. If they are X-linked, the XLinked option must be set with the ibdoption command before the marker file is loaded. Alternatively, the -xlinked option can be specified in the load marker command. Since there is no mechanism for declaring the X-linked status of individual markers, the markers in the marker file must either all be autosomal or all be X-linked.

NOTE: At present, marker-specific IBDs for X-linked loci can be computed only with the Curtis and Sham method. This means that twopoint linkage analysis of X-linked loci is restricted to non-inbred pedigrees with limited looping. Also, it is not currently possible to conduct a multipoint linkage analysis for X-linked markers.

Computing the maximum likelihood estimates for the allele frequencies (using the freq mle command) will improve the accuracy with which marker genotypes are imputed for those individuals who are not typed. By default, IBDs will not be computed for markers with simple-count allele frequencies generated by the load marker command. That is, MLE allele frequencies are required when prior frequency data is not available. This behavior can be overridden either by setting the NoMLE option with the ibdoption command, or by giving the -nomle argument to the ibd command.

SOLAR uses an external program, allfreq, to compute MLE allele frequencies. Allfreq is an extension of the program MENDEL. The marker genotypes are expected to follow the rules of Mendelian inheritance. If a marker discrepancy is encountered, the following error message will be displayed:

	Mendelian inconsistency found near individual ID = ID

If the pedigree file contains a family ID (FAMID) field, the family ID will be included in the error message. This information may be helpful in diagnosing the discrepancy, but in general you will need to use other software to ensure that your marker data is clean.

During the maximization procedure, the freq mle command displays the number of iterations and the improvement in the log likelihood obtained thus far. If a SOLAR script which includes the freq mle command is run as a background job, this display may cause the job to hang or abort. To turn off the display, use the verbosity min command.

Once computed, MLE allele frequencies will be in effect while the marker data is loaded. When new marker data is loaded, the allele frequencies for previously loaded markers will not be retained (unless the frequencies were loaded from a file by the load freq command.) In order to keep a record of the allele frequencies that were used to compute IBDs, MLE allele frequencies should be saved to a file with the freq save command. The load freq command can be used to restore the MLE allele frequencies from this file at a later time if desired. New marker data will not be loaded until either the MLE allele frequencies are saved or the previous marker data is unloaded with the marker unload -nosave command. Likewise, the freq load command will not replace unsaved MLE allele frequencies with new frequency data unless the -nosave option is specified.

The directory where the IBD files are to be created must be specified with the ibddir command before the IBD computation can be started. While it is perfectly legal to store the IBD files in the current working directory (specified with ibddir . ) you may find it more convenient to create a subdirectory to hold the files.

SOLAR uses one of two methods to compute marker-specific IBDs. Which method is used depends on the family structure. The first method is the Curtis and Sham algorithm, in which the LINKAGE/FASTLINK package is used to compute the required likelihoods. This method is not applicable in the case of inbreeding. Furthermore, although LINKAGE/FASTLINK is capable of handling multiple loops, we have chosen to sidestep the issue of choosing multiple loopbreakers. Therefore, if inbreeding is present or if more than one loopbreaker is required in the case of non-inbreeding loops, SOLAR will not use the Curtis and Sham method.

There is an important caveat regarding IBD computation using the LINKAGE-based method. Because the method requires many invocations of the LINKAGE programs unknown and mlink, there will be a high volume of I/O as input and output files are read and written. As a result, the speed of file I/O will be the limiting performance factor. If possible, avoid the use of remote file systems, e.g. NFS, for which the file I/O must be done over a network. On systems that support an in-memory file system, e.g. Solaris' /tmp, consider running the IBD calculations there. We have observed a two-fold or better performance increase by running in /tmp.

The second method, which is applicable in all cases, is a Monte Carlo algorithm. First, all missing genotypes are imputed in a random fashion and the likelihood of the imputed genotype vector is calculated. A recursive algorithm due to Davis and Weeks is then used to compute IBDs given the imputed genotype vector. This process is performed repetitively and a weighted average of the IBDs is accumulated for each pair of individuals, where the weight is the likelihood of observing the imputed genotype vector. The number of iterations performed defaults to 200, but may be changed with the ibdoption command. In the case that all individuals are typed, as in a simulated data set, there are no missing genotypes to impute. There is no need to iterate, so the number of iterations may be set to one. Since the Davis and Weeks algorithm is quite fast, for completely-typed data the Monte Carlo method will give the best performance. For this reason, the Monte Carlo method is chosen automatically to compute IBDs for completely-typed markers.

The ibdoption command lets you display or modify the options in effect related to IBD and MIBD calculation. These options are:

	XLinked   select this option for X-linked marker data
	NoMLE     if this option is chosen, MLE allele frequencies are not
	             required for IBD calculation
	MCarlo    if this option is chosen, the Monte Carlo method will be
	             used to calculate IBDs
	MibdWin   size (in cM) of the multipoint IBD window - the MIBDs at
	             a given chromosome location depend only on markers inside
	             or on the boundary of the window centered at that location

Use the command help ibdoption for more information.

During the IBD computation procedure, the ibd command displays the number of pedigrees processed thus far. If a SOLAR script which includes the ibd command is run as a background job, this display may cause the job to hang or abort. To turn off the display, use the verbosity min command.

5.3 Multipoint IBD (MIBD) Computation

Preparation of multipoint IBD (MIBD) files requires the following commands:

	load pedigree pedigree-filename
	load map map-filename
	ibddir ibd-dirname
	mibddir mibd-dirname
	mibd relate               ;# Not required
	mibd merge                ;# Not required
	mibd means                ;# Not required
	mibd [from to] incr

As in the case of marker-specific IBD preparation, the order of the commands is important. First the pedigree data is loaded and then the map data. The marker-specific IBDs must have already been computed and reside in the directory specified by the ibddir command.

Since multipoint IBDs are computed on a per-chromosome basis, the map file includes only the markers on a particular chromosome. To compute MIBDs for more than one chromosome at a time, you can create a Tcl script which performs the steps listed above once for each chromosome. Of course, you won't need to reload the pedigree data for each chromosome, nor will it be necessary to run the mibd relate command more than once, since the relative-class file depends only on pedigree information, not marker or map data.

When computing MIBDs, SOLAR converts the distances between pairs of markers to recombination fractions. By default, SOLAR assumes the mapping function to be Kosambi. It is also possible to use the Haldane mapping function. Use the command help map for more information.

The directory where the MIBD files are to be created must be specified with the mibddir command before the multipoint IBD computation can be started. While it is perfectly legal to store the MIBD files in the current working directory (you would use the command mibddir . ), you may find it more convenient to create a subdirectory to hold the files.

The relative-class file is created with the mibd relate command. Next, the marker-specific IBDs are merged into a single file with the mibd merge command. The mibd means command computes the mean IBD by relative-class. The last mibd command listed above computes multipoint IBDs at incr cM intervals from location from through location to. If only the incr argument is given, multipoint IBDs are computed at incr cM intervals from location 0 through the location of the last marker in the map. For convenience, only the last of the mibd commands need be given. The first three mibd commands (mibd relate, mibd merge, mibd means) will be issued automatically.

Since the identification of relative classes requires knowledge of the family structure but not marker or map information, the mibd relate command can be issued as soon as the pedigree data has been loaded. The pedigree classes command can then be used to display a tally of the relative classes present in the data set. It is possible that the mibd relate command will fail because an unknown relationship is encountered. SOLAR cannot handle arbitrary relationships, relying instead on information that has been worked out in advance for an extensive set of relative classes. While the set of known relationships has been expanding with new SOLAR releases, occasionally a data set will contain relative classes not yet handled by SOLAR. If your data set contains such a class, you must contact the SOLAR developers for assistance.

NOTE: At present it is not possible to compute multipoint IBDs for X-linked markers.

During the MIBD computation procedure, the mibd command displays the marker location currently being processed. If a SOLAR script which includes the mibd command is run as a background job, this display may cause the job to hang or abort. To turn off the display, use the verbosity min command.

5.4 MIBD Computation Using Another Program

You are not limited to using the multipoint IBDs computed by SOLAR. A number of genetic analysis programs are available which may be used to compute multipoint IBDs. If your pedigree data contains relative classes not supported by SOLAR, or if you prefer to use exact multipoint rather than SOLAR's approximation, you can compute MIBDs with one of these programs and then use those MIBDs in your SOLAR multipoint analyses.

There are two important steps required in order to use another program to compute MIBDs suitable for use in SOLAR. First, the input files required by that program must be created. For a number of genetic analysis programs, this step can be performed using Mega2, which is available at:

http://watson.hgen.pitt.edu/mega2.html

The second step is to convert the output of the other program into SOLAR-ready MIBD files. (See Section 8.6 for a description of MIBD files.) Some genetic analysis programs may provide support for this step; for example, Loki has the capability to generate SOLAR-ready MIBD files directly. In general, however, some post-processing of the program output will be necessary. SOLAR provides a mechanism, described in Section 5.5, by which multipoint IBDs can be read from a comma-delimited file and used to generate SOLAR-ready MIBD files. This process is referred to as importing MIBDs. Thus, if a genetic analysis program outputs a file in the appropriate comma-delimited format, or if the program output is translated into this format, then the creation of MIBD files can be performed by SOLAR.

SOLAR currently provides direct support for computing multipoint IBDs using SimWalk2, Loki, GeneHunter, and Merlin. The input files needed to compute MIBDs are created by the mibd prep command, while the resulting output files are processed with the mibd import command.

For SimWalk2, the necessary commands are:

	load pedigree pedigree-filename
	load freq freq-filename          ;# Optional
	load marker marker-filename
	freq mle                         ;# Required if no load freq
	load map map-filename
	mibd prep simwalk

The following SimWalk2 input files will be created:

	BATCH2.DAT          - control file
	swmibd.map          - map file
	swmibd.loc          - locus file
	swmibd.ped          - pedigree/genotype data
	mibdchr<chr>.loc    - map file for SOLAR plots

The file mibdchr<chr>.loc (where <chr> is the chromosome number from the map file) should be moved to the directory where the SimWalk2-computed multipoint IBDs will be stored. SimWalk2 (version 2.91 or higher) can then be run to compute the MIBDs. A number of output files will be generated by SimWalk2. The files which contain the MIBD results needed by SOLAR are named IBD-01.nnn, where nnn is the pedigree number, e.g. 001. These files must be combined into a single output file, which can be gzipped to save space if desired. This file can then be processed by SOLAR using the commands:

	mibddir mibd-dirname
	mibd import simwalk chr -f output-filename

The chromosome number chr must be given so that SOLAR knows how to name the MIBD files it creates (see Section 8.6).

For Loki, the necessary commands are:

	load pedigree pedigree-filename
	load freq freq-filename        ;# Optional
	load marker marker-filename
	freq mle                       ;# Required if no load freq
	load map map-filename
	mibd prep loki

The following Loki input files will be created:

	lkmibd.data         - pedigree/genotype data
	lkmibd.prep         - prep parameter file
	lkmibd.loki         - loki parameter file
	mibdchr<chr>.loc    - map file for SOLAR plots

The file mibdchr<chr>.loc should be moved to the directory where the Loki-computed multipoint IBDs will be stored. Loki can then be run to compute the MIBDs. The MIBDs will be stored in a gzipped output file named loki.ibd.gz. This file can be processed by SOLAR using the commands:

	mibddir mibd-dirname
	mibd import loki

It is not necessary to give the chromosome number, as is the case with SimWalk2, because it is stored in one of the Loki input files.

For GeneHunter, the necessary commands are:

	load pedigree pedigree-filename
	load freq freq-filename          ;# Optional
	load marker marker-filename
	freq mle                         ;# Required if no load freq
	load map map-filename
	mibd prep genehunter

The following GeneHunter input files will be created:

	ghmibd.cmd          - control file
	ghmibd.loc          - locus file
	ghmibd.ped          - pedigree/genotype data
	mibdchr<chr>.loc    - map file for SOLAR plots

The file mibdchr<chr>.loc should be moved to the directory where the GeneHunter-computed multipoint IBDs will be stored. GeneHunter can then be run to compute the MIBDs. To use the control file ghmibd.cmd, start GeneHunter and enter the command run ghmibd.cmd. The MIBDs will be stored in an output file named ghmibd.ibd. This file can be processed by SOLAR using the commands:

	mibddir mibd-dirname
	mibd import genehunter chr

The chromosome number chr must be given so that SOLAR knows how to name the MIBD files it creates (see Section 8.6).

For Merlin, the necessary commands are:

	load pedigree pedigree-filename
	load freq freq-filename          ;# Optional
	load marker marker-filename
	freq mle                         ;# Required if no load freq
	load map map-filename
	mibd prep merlin

The following Merlin input files will be created:

	mlmibd.cmd          - Merlin IBD command
	mlmibd.dat          - data description file
	mlmibd.ped          - pedigree/genotype data
	mlmibd.frq          - allele frequency file
	mlmibd.map          - map file
	mibdchr<chr>.loc    - map file for SOLAR plots

The file mibdchr<chr>.loc should be moved to the directory where the Merlin-computed multipoint IBDs will be stored. Merlin can then be run to compute the multipoint IBDs by entering the Unix command contained in the file mlmibd.cmd. The MIBDs will be stored in an output file named merlin.ibd. This file can be processed by SOLAR using the commands:

	mibddir mibd-dirname
	mibd import merlin chr

The chromosome number chr must be given so that SOLAR knows how to name the MIBD files it creates (see Section 8.6).

5.5 Importing and Exporting IBDs and MIBDs

We have noted that it is OK to use externally-computed multipoint IBDs, i.e. MIBDs computed by another genetic analysis program, in a SOLAR multipoint analysis. It is also OK to use externally-computed marker-specific IBDs in a twopoint analysis. However, the use of IBDs/MIBDs computed by a program other than SOLAR is complicated by two issues. The first issue, the format of the IBD/MIBD files, is trivial. The more difficult problem is that SOLAR's IBD/MIBD files are keyed by the indexed IDs (IBDIDs) assigned by SOLAR rather than the IDs from the pedigree file.

To facilitate the use of externally-computed IBDs and MIBDs, SOLAR provides an import feature for the ibd and mibd commands. The ibd import command reads marker-specific IBDs, keyed by real IDs, from a comma-delimited file, and creates a SOLAR-formatted IBD file keyed by IBDIDs. See Section 4.1 for a description of the format of comma-delimited files. The comma-delimited input file must contain at least the following fields: ID1, ID2, and IBD. If the pedigree file contains a family ID (FAMID) field, then a FAMID field must appear in the input file as well. The input file can contain IBDs for more than one marker provided a MARKER field is present which contains the marker names. An input file which contains IBDs for more than one marker must be sorted on the MARKER field. If the input file contains a D7 field, the d7 values from that field will be included in the IBD file(s) that are created. See Section 8.5 for a description of d7 coefficients. Following are the first few lines from an example input file:

	MARKER,FAMID,ID1,ID2,IBD,D7
	D1S53,Smith,John,John,1,1
	D1S53,Smith,John,Karen,0.500135,0.07112

The mibd import command reads multipoint IBDs, keyed by real IDs, from a comma-delimited file, and creates SOLAR-formatted MIBD files keyed by IBDIDs. The comma-delimited input file must contain at least the following fields: CHROMO, LOCATION, ID1, ID2, and IBD. If the pedigree file contains a family ID (FAMID) field, then a FAMID field must appear in the input file as well. The CHROMO field contains the chromosome numbers, while the LOCATION field contains the chromosomal locations in cM. An input file which contains MIBDs for more than one chromosomal location must be sorted on the CHROMO field first and then on the LOCATION field. If the input file contains a D7 field, the d7 values from that field will be included in the MIBD file(s) that are created. Following are the first few lines from an example input file:

	CHROMO,LOCATION,ID1,ID2,IBD
	6,93,A0457,A1082,0.369
	6,93,A0457,A0119,0.14576

When importing IBDs (and MIBDs), SOLAR assumes that the input file contains an entry for every pair of individuals whose IBD value is non-zero. Any pair of individuals who do not have an entry in the input file will be assumed to have IBD = 0. However, in the case that there are no entries in the input file for the pairs of individuals in a particular pedigree, an IBD value of -1 is assigned to the main diagonal entries in the IBD matrix for that pedigree. In a linkage analysis, all pairs of individuals in that pedigree will be treated as having ibd = phi2, the expected IBD allele-sharing at a locus chosen at random.

SOLAR also provides an export feature for the ibd and mibd commands. The ibd export command writes the IBDs and d7 coefficients from one or more marker-specific IBD files to a comma-delimited file. A MARKER field is included which contains the marker name(s). The IBDIDs in the SOLAR IBD file are translated to real IDs, with a FAMID field added if family IDs are present in the pedigree file. The mibd export command writes the IBDs and d7 coefficients at every chromosomal location (for which a SOLAR MIBD file exists) on one or more chromosomes to a comma-delimited file. A CHROMO field is included to identify the chromosome(s), and a LOCATION field is included which gives the chromosomal locations in cM.

Since the exported IBD (MIBD) files can easily become quite large, it may be a good idea to limit the number of markers (chromosomes) exported to any one file. The mibd export command has a -byloc option, which specifes that a separate export file is to be created for each chromosomal location. This feature can be handy in the event you are planning to merge files containing MIBDs exported from different pedigrees, as, for example, in the situations described in the following paragraphs. Before MIBDs can be imported from the merged files, the files must be sorted on chromosomal location. Sorting is not necessary, however, if each merged file contains MIBDs for a single location.

The ability to export and import IBDs/MIBDs is useful whenever the SOLAR indexing must be modified. An obvious example is when one or more pedigrees in a data set are altered after IBDs have been computed. The indexed IDs (IBDIDs) that are assigned to individuals in the altered pedigrees, as well as those of individuals in following pedigrees, will likely be changed. This makes the existing SOLAR IBD files, which are keyed by IBDIDs, unusable. In this case, you can export the existing IBDs to a comma-delimited file. The IBDs for pairs of individuals from the pedigrees to be altered should be deleted from this file because those IBDs may change. Next, load the altered pedigrees into SOLAR, compute new IBDs for those pedigrees, and export the new IBDs, appending them to the comma-delimited file. Make sure the comma-delimited file is properly sorted -- by marker in the case of IBDs or by chromosomal location in the case of MIBDs. Finally, load the entire pedigree file, including the altered pedigrees, into SOLAR to generate the new indexing for the data set, and import the IBDs from the comma-delimited file.

Another situation in which it is necessary to export and import IBDs (or MIBDs) is when there are two or more subgroups within your data set for which the marker allele frequencies are different. In order to use the correct allele frequencies when computing IBDs, you must load each subgroup into SOLAR separately, compute the IBDs using the frequency data for that subgroup, and export the IBDs to a comma-delimited file. Merge the exported IBD files, making sure the merged file is sorted appropriately. Finally, load the entire data set into SOLAR, and import the IBDs from the merged file.