Go to Main Index
Go to Table of Contents

Chapter 10

Simulation-Based Commands

10.1 Quantitative Trait Simulation

In SOLAR, quantitative trait simulation is done with the simqtl command. There are two steps to simulating a quantitative trait; each step requires a separate call to simqtl. In the first step, a model is specified which describes the phenotypic distribution (mean and variance), the genetic components of the trait, and the effect of any covariates, e.g. age and sex. The genetic components consist of the proportion of phenotypic variance (heritability) due to an underlying quantitative trait locus (QTL) and the heritability attributable to polygenes. Optionally, a multi-allelic marker locus can be specified which is linked to the QTL. The simulation model is stored in the file simqtl.par.

NOTE: The simulation model is not as general as a "true" SOLAR model, i.e. one created with the model command, and is not formatted in the same way. At present, there is no mechanism for simulating a trait based directly on a SOLAR model file, although this could be added in a future release.

The second step is the actual simulation of the quantitative trait. This step can be run as many times as required without having to re-specify the model. In each simulation run, the following files are created:

	simqtl.phn     - a comma-delimited file containing the simulated
	                 trait and any covariates specified in the model
	simqtl.qtl     - a comma-delimited file containing the simulated
	                 QTL genotypes
	simqtl.mrk     - a comma-delimited file containing the simulated
	                 genotypes for the linked marker (if specified)
	simqtl.dat     - a working file which contains pedigree data

For historical reasons, there are two requisite covariates, SEX and AGE. The terms included in the model for these covariates are: sex, age(male), age(female), age^2(male), and age^2(female). Although these terms are always included in the simulation model, the effects of these covariates on the simulated trait are set to zero by default. SEX is a required SOLAR field, so it is always available. If AGE is present in the phenotype file, it will be included. If AGE is not available, the age terms will still appear in the simulation model, but no adjustment to the trait mean involving age can be made.

Additional covariates from the phenotype file can be added to the simulation model with the -cov option of the simqtl command. A mean value can optionally be subtracted from each covariate with the -cmean and -mage options. The effects of covariates, including the requisite covariates, are specified with the -beta option. For each covariate, there will be a single regression coefficient (or beta) included in the model. (This does not apply to AGE, for which there are the four beta terms described above.) To be more precise, each beta term in the model is actually a vector, with one element for each possible QTL genotype. In this way, genotype-specific covariate effects can be modeled.

Here is an example of a simulation model specification. Note that a backslash is used to continue the simqtl command on a second line.

	solar> simqtl -freq {.5 .35} -mean {70 75 80 90 95 110} -sdev 10 \
	-cov covar -h2r .3 -mfreq {.2 .2 .2 .2} -theta .01

This model calls for a QTL with 3 alleles, call them 1, 2, and 3, having frequencies .5, .35, and .15, respectively. The allele frequencies must sum to 1, so only the first two frequencies are given. The 3 alleles give rise to 6 genotypes: 1/1, 2/1, 2/2, 3/1, 3/2, and 3/3. The ordering of the genotypes is fixed; also, there is no distinction between genotypes i/j and j/i. The rule for ordering genotypes is explained in detail in the description of the simqtl command in Appendix 1. A mean trait value is specified for each of the genotypes. The trait variance is not allowed to vary by genotype, so there is a single phenotypic standard deviation. Besides the obligatory sex and age, there is a covariate covar, although in this example the beta term for covar is left at its default value of 0. A marker with 5 equally-frequent alleles is specified and it has a recombination fraction of .01 with the QTL.

Here is a display of the simulation model:

	solar> simqtl -model
	#qtl = 1  #all = 3
	#mrk = 1  #all = 5
	#gen = 6
	#trt = 1
	#cov = 3 (including sex and age)
	qtl:
	  freq = 0.5 0.35 0.15
	marker:
	  freq = 0.2 0.2 0.2 0.2 0.2
	theta = 0.01
	trait 1:
	  mean = 70. 75. 80. 90. 95. 110.
	  sdev = 10.
	  h2r  = 0.3
	  covariate sex:
	    beta = 0. 0. 0. 0. 0. 0.
	  covariate male age:
	    mean = 0.
	    beta = 0. 0. 0. 0. 0. 0.
	  covariate female age:
	    mean = 0.
	    beta = 0. 0. 0. 0. 0. 0.
	  covariate male age^2:
	    mean = 0.
	    beta = 0. 0. 0. 0. 0. 0.
	  covariate female age^2:
	    mean = 0.
	    beta = 0. 0. 0. 0. 0. 0.
	  covariate 1:
	    mean = 0.
	    beta = 0. 0. 0. 0. 0. 0.

In the model, there are 6 trait means, ranging from 70 for genotype 1/1 to 110 for genotype 3/3. Similarly, each covariate effect (beta) has 6 genotype-specific values.

Multiple quantitative traits controlled by the same QTL can be simulated. In the model specification there will be a set of genotypic means, a phenotypic standard deviation, and a set of covariate effects (betas) for each trait. There will also be a genetic correlation and environmental correlation specified for each pair of traits. It is also possible to simulate quantitative traits that are controlled by more than one locus. In that case, the genotypes for each QTL cannot be generated by simqtl, and so must be read from a file. See the simqtl command description for details on multi-locus trait simulation.

10.2 LOD Adjustment

For traits with distributions that differ significantly from multivariate normality, it may be necessary to make an adjustment to LOD scores which have been computed under the assumption of normality. A theoretical treatment of LOD adjustment is given in:

Blangero J, Williams JT, Almasy L (2000) Robust LOD scores for variance component-based linkage analysis. Genet Epidemiol 19 Suppl 1:S8-14.

Blangero J, Williams JT, Almasy L (2001) Variance component methods for detecting complex trait loci. Adv Genet 42:151-81.

The LOD adjustment required for a particular quantitative trait is computed with the lodadj command. Basically, lodadj runs a simulation to build up the distribution of LOD scores one can expect to observe under the null hypothesis of no linkage. This simulation consists of some number of trials (the default number is 10000) in each of which a fully-informative marker, completely unlinked to the trait, is simulated and trait linkage is then tested at that marker. Under the assumption of multivariate normality, the theoretical null distribution is well-known. The lodadj command regresses the observed LOD scores on the LOD scores expected for a multivariate normal trait. The inverse of the slope of the regression line is the LOD adjustment.

Without any arguments, lodadj simply turns the LOD adjustment on, meaning that the LOD scores reported in subsequent analyses of the selected trait will have been adjusted. The LOD adjustment can be turned off with the command lodadj -off. The madj command can be used to apply the LOD adjustment to a previous multipoint run.

Here is an example of a LOD adjustment run:

	solar> trait Cz
	solar> lodadj -calc
	Computing LOD adjustment, replicate ...
                                                    
	LOD Correction Constant =   0.882731

	LOD          % Original    % Adjusted    % Normal
	---------    ----------    ----------    --------
	=  0          0.516830      0.516830      0.5000
	>= 0.357      0.114450      0.100580      0.1000
	>= 0.588      0.061900      0.051430      0.0500
	>= 0.834      0.035110      0.027080      0.0250
	>= 1.175      0.015670      0.011020      0.0100
	>= 2.074      0.002050      0.001260      0.0010
	>= 3          0.000370      0.000170      0.0001

	#Reps: 10000    Cutoff: .05    Null Model: Cz/null0.mod

In the output above, the column labeled % Original displays the proportion of observed LOD scores falling in the range shown in the first column. The % Normal column shows the expected proportions assuming multivariate normality for the trait distribution. After the observed LODs are multiplied by the correction constant, they are distributed as shown in the % Adjusted column. Note that the distribution of adjusted LOD scores is closer to the expected distribution.

The cutoff value, in this case the default of .05, is the fraction of observed LOD scores at the high end of the distribution that are ignored when fitting the regression line. This is done so that those data points do not unduly influence the fit. The cutoff value can be changed using the lodadj command's -cutoff option.

Note: lodadj is not supported for bivariate models! You can calculate an adjustment (very slowly), but it may be inaccurate and it will not be used.

10.3 Empirical p-Values

The observed and expected LOD scores computed by lodadj are stored in a file named lodadj.lods in the trait (or outdir) directory. The observed LODs, which are stored in sorted order, are used by the empp command to determine the p-values associated with unadjusted LOD scores. For example, if a LOD of 6.13 appears as the 9996th entry in a file containing 10000 replicates, then we can say that, based upon our simulation, if the hypothesis of no linkage is true, we would expect only 5 out of 10000 observed LOD scores to be as high or higher than 6.13, giving us an empirical p-value of 0.0005.

Here is an example of an empirical p-value based on the same trait as shown in the LOD adjustment example above:

	solar> trait Cz
	solar> empp 3
	p = 0.00037

10.4 Power Calculation

SOLAR can perform power calculations with the following assumptions: (1) the trait to be studied is quantitative, (2) the trait to be studied is influenced by a single bi-allelic QTL with, optionally, a residual additive genetic effect due to polygenes, (3) there will be fully informative marker genotype data available for all study subjects, and (4) all study subjects will be phenotyped for the trait to be studied. For the currently loaded pedigree data, SOLAR uses simulation to estimate the LOD score one would expect to obtain for a QTL having a certain effect size (i.e. QTL heritability). The expected LOD is calculated for a range of effect sizes. The ELODs, in turn, are used to compute the power to detect a QTL having these various effect sizes with, for example, a LOD of 3.

The power command gives the user control over some aspects of the power calculation procedure. For example, the precise grid of QTL heritabilities for which ELODs are estimated can be specified. By default, the grid runs from 0.05 to 0.5 in increments of 0.05. A polygenic background can be added, either as a constant residual heritability added to the QTL heritability or by specifying that the total heritability (QTL plus residual) be some constant. Also, it is assumed that all members of the pedigree will be phenotyped for the quantitative trait. However, it is possible to specify the set of phenotyped individuals by giving the name of a field in a phenotypes file (which must have been loaded prior to the power command). Individuals who are missing data for that phenotype will not be included in the power calculations.

When the power calculation procedure is complete, three files will have been created. The file power.info serves as a record of the run, including the grid and other options chosen and the ELODs that were estimated for each effect size. The file power.out contains a line for each grid point in the format X Y1 Y2 ... and is suitable for input to plotting packages such as xmgr. In the first (or X) column is the QTL heritability. The succeeding columns contain the power estimates, each corresponding to a different LOD. By default, the LODs are 3 and 2, in that order, but you are free to choose a different set of LODs. The third file, power.lods, stores the results of the ELOD simulations. In the event a power calculation is interrupted, this file, along with power.info, can be used to restart the run.

The following sample power.info file is from a power calculation in which a grid interval of 0.025 was chosen and the total heritability was set to a constant value of 0.5. Only the first few lines of the file are shown. Only the 604 individuals having data for the trait Cz were included.

	data = Cz  navail = 604  phenf = example.phn
	nreps = 500  seed = 123
	h2t = .5  freq = .2112
	grid = .05 .5 .025  lod = 3 2
	h2q = 0.05  h2r = 0.45  ELOD = 0.292533
	h2q = 0.075  h2r = 0.425  ELOD = 0.382151
	h2q = 0.1  h2r = 0.4  ELOD = 0.535781

The following lines are from the power.out file for the same power calculation. The file is free-format with blank space between the columns. The second column shows the power to detect, with a LOD of 3, a QTL of the effect size given in column 1. The third column shows the power for a LOD of 2. In this example, we have 80% power to detect, with a LOD of 3, a QTL which accounts for approximately 38 or 39% of the trait variance.

	0.3    0.502568   0.754433
	0.325  0.578311   0.810474
	0.35   0.696662   0.884325
	0.375  0.759302   0.917146
	0.4    0.849706   0.957032