In SOLAR, quantitative trait simulation is done with
the
simqtl
command. There are two steps to simulating a quantitative
trait; each step requires a separate call to
simqtl
. In the first step, a model is
specified which describes the phenotypic distribution (mean
and variance), the genetic components of the trait, and the
effect of any covariates, e.g. age and sex. The genetic
components consist of the proportion of phenotypic variance
(heritability) due to an underlying quantitative trait locus
(QTL) and the heritability attributable to polygenes.
Optionally, a multi-allelic marker locus can be specified
which is linked to the QTL. The simulation model is stored in
the file simqtl.par
.
NOTE: The simulation model is not as general as a "true"
SOLAR model, i.e. one created with the
model
command, and is not formatted in the
same way. At present, there is no mechanism for simulating a
trait based directly on a SOLAR model file, although this
could be added in a future release.
The second step is the actual simulation of the quantitative trait. This step can be run as many times as required without having to re-specify the model. In each simulation run, the following files are created:
simqtl.phn - a comma-delimited file containing the simulated trait and any covariates specified in the model simqtl.qtl - a comma-delimited file containing the simulated QTL genotypes simqtl.mrk - a comma-delimited file containing the simulated genotypes for the linked marker (if specified) simqtl.dat - a working file which contains pedigree data
For historical reasons, there are two requisite covariates,
SEX and AGE. The terms included in the model for these
covariates are: sex
,
age(male)
,
age(female)
,
age^2(male)
, and
age^2(female)
. Although these terms are
always included in the simulation model, the effects of these
covariates on the simulated trait are set to zero by default.
SEX is a required
SOLAR field, so it is always available. If AGE is
present in the phenotype file, it will be included. If AGE is
not available, the age terms will still appear in the
simulation model, but no adjustment to the trait mean
involving age can be made.
Additional covariates from the phenotype file can be added to
the simulation model with the -cov
option
of the simqtl
command. A mean value can
optionally be subtracted from each covariate with the
-cmean
and -mage
options. The effects of covariates, including the requisite
covariates, are specified with the -beta
option. For each covariate, there will be a single regression
coefficient (or beta) included in the model. (This does not
apply to AGE, for which there are the four beta terms
described above.) To be more precise, each beta term in the
model is actually a vector, with one element for each possible
QTL genotype. In this way, genotype-specific covariate
effects can be modeled.
Here is an example of a simulation model specification. Note that a backslash is used to continue the simqtl command on a second line.
solar> simqtl -freq {.5 .35} -mean {70 75 80 90 95 110} -sdev 10 \ -cov covar -h2r .3 -mfreq {.2 .2 .2 .2} -theta .01
This model calls for a QTL with 3 alleles, call them 1, 2, and 3,
having frequencies .5, .35, and .15, respectively. The allele
frequencies must sum to 1, so only the first two frequencies are
given. The 3 alleles give rise to 6 genotypes: 1/1, 2/1, 2/2, 3/1,
3/2, and 3/3. The ordering of the genotypes is fixed; also, there
is no distinction between genotypes i/j and j/i. The rule for
ordering genotypes is explained in detail in the description of the
simqtl
command in Appendix 1. A mean trait value is specified for
each of the genotypes. The trait variance is not allowed to
vary by genotype, so there is a single phenotypic standard
deviation. Besides the obligatory sex and age, there is a
covariate covar
, although in this example
the beta term for
covar
is left at its default value of 0.
A marker with 5 equally-frequent alleles is specified and it
has a recombination fraction of .01 with the QTL.
Here is a display of the simulation model:
solar> simqtl -model #qtl = 1 #all = 3 #mrk = 1 #all = 5 #gen = 6 #trt = 1 #cov = 3 (including sex and age) qtl: freq = 0.5 0.35 0.15 marker: freq = 0.2 0.2 0.2 0.2 0.2 theta = 0.01 trait 1: mean = 70. 75. 80. 90. 95. 110. sdev = 10. h2r = 0.3 covariate sex: beta = 0. 0. 0. 0. 0. 0. covariate male age: mean = 0. beta = 0. 0. 0. 0. 0. 0. covariate female age: mean = 0. beta = 0. 0. 0. 0. 0. 0. covariate male age^2: mean = 0. beta = 0. 0. 0. 0. 0. 0. covariate female age^2: mean = 0. beta = 0. 0. 0. 0. 0. 0. covariate 1: mean = 0. beta = 0. 0. 0. 0. 0. 0.
In the model, there are 6 trait means, ranging from 70 for genotype 1/1 to 110 for genotype 3/3. Similarly, each covariate effect (beta) has 6 genotype-specific values.
Multiple quantitative traits controlled by the same QTL can be
simulated. In the model specification there will be a set of
genotypic means, a phenotypic standard deviation, and a set of
covariate effects (betas) for each trait. There will also be
a genetic correlation and environmental correlation specified
for each pair of traits. It is also possible to simulate
quantitative traits that are controlled by more than one
locus. In that case, the genotypes for each QTL cannot be
generated by simqtl
, and so must be read
from a file. See the
simqtl
command description for details on multi-locus trait
simulation.
For traits with distributions that differ significantly from multivariate normality, it may be necessary to make an adjustment to LOD scores which have been computed under the assumption of normality. A theoretical treatment of LOD adjustment is given in:
Blangero J, Williams JT, Almasy L (2000) Robust LOD scores for variance component-based linkage analysis. Genet Epidemiol 19 Suppl 1:S8-14.
Blangero J, Williams JT, Almasy L (2001) Variance component methods for detecting complex trait loci. Adv Genet 42:151-81.
The LOD adjustment required for a particular quantitative trait
is computed with the
lodadj
command. Basically, lodadj
runs a
simulation to build up the distribution of LOD scores one can
expect to observe under the null hypothesis of no linkage.
This simulation consists of some number of trials (the default
number is 10000) in each of which a fully-informative marker,
completely unlinked to the trait, is simulated and trait
linkage is then tested at that marker. Under the assumption
of multivariate normality, the theoretical null distribution
is well-known. The lodadj
command
regresses the observed LOD scores on the LOD scores expected
for a multivariate normal trait. The inverse of the slope of
the regression line is the LOD adjustment.
Without any arguments, lodadj
simply turns
the LOD adjustment on, meaning that the LOD scores reported in
subsequent analyses of the selected trait will have been
adjusted. The LOD adjustment can be turned off with the
command lodadj -off
. The
madj
command can be used to apply the LOD
adjustment to a previous multipoint run.
Here is an example of a LOD adjustment run:
solar> trait Cz solar> lodadj -calc Computing LOD adjustment, replicate ... LOD Correction Constant = 0.882731 LOD % Original % Adjusted % Normal --------- ---------- ---------- -------- = 0 0.516830 0.516830 0.5000 >= 0.357 0.114450 0.100580 0.1000 >= 0.588 0.061900 0.051430 0.0500 >= 0.834 0.035110 0.027080 0.0250 >= 1.175 0.015670 0.011020 0.0100 >= 2.074 0.002050 0.001260 0.0010 >= 3 0.000370 0.000170 0.0001 #Reps: 10000 Cutoff: .05 Null Model: Cz/null0.mod
In the output above, the column labeled %
Original
displays the proportion of observed LOD
scores falling in the range shown in the first column. The
% Normal
column shows the expected
proportions assuming multivariate normality for the trait
distribution. After the observed LODs are multiplied by the
correction constant, they are distributed as shown in the
% Adjusted
column. Note that the
distribution of adjusted LOD scores is closer to the expected
distribution.
The cutoff value, in this case the default of .05, is the
fraction of observed LOD scores at the high end of the
distribution that are ignored when fitting the regression
line. This is done so that those data points do not unduly
influence the fit. The cutoff value can be changed using the
lodadj
command's
-cutoff
option.
Note: lodadj
is not supported for
bivariate models! You can calculate an adjustment (very
slowly), but it may be inaccurate and it will not be used.
The observed and expected LOD scores computed by
lodadj
are stored in a file named
lodadj.lods
in the trait (or outdir)
directory. The observed LODs, which are stored in sorted
order, are used by the
empp
command to determine the p-values
associated with unadjusted LOD scores. For example, if a LOD
of 6.13 appears as the 9996th entry in a file containing 10000
replicates, then we can say that, based upon our simulation,
if the hypothesis of no linkage is true, we would expect only
5 out of 10000 observed LOD scores to be as high or higher
than 6.13, giving us an empirical p-value of 0.0005.
Here is an example of an empirical p-value based on the same trait as shown in the LOD adjustment example above:
solar> trait Cz solar> empp 3 p = 0.00037
SOLAR can perform power calculations with the following assumptions: (1) the trait to be studied is quantitative, (2) the trait to be studied is influenced by a single bi-allelic QTL with, optionally, a residual additive genetic effect due to polygenes, (3) there will be fully informative marker genotype data available for all study subjects, and (4) all study subjects will be phenotyped for the trait to be studied. For the currently loaded pedigree data, SOLAR uses simulation to estimate the LOD score one would expect to obtain for a QTL having a certain effect size (i.e. QTL heritability). The expected LOD is calculated for a range of effect sizes. The ELODs, in turn, are used to compute the power to detect a QTL having these various effect sizes with, for example, a LOD of 3.
The power
command gives the user control over some aspects of the power
calculation procedure. For example, the precise grid of QTL
heritabilities for which ELODs are estimated can be specified.
By default, the grid runs from 0.05 to 0.5 in increments of
0.05. A polygenic background can be added, either as a
constant residual heritability added to the QTL heritability
or by specifying that the total heritability (QTL plus
residual) be some constant. Also, it is assumed that all
members of the pedigree will be phenotyped for the
quantitative trait. However, it is possible to specify the
set of phenotyped individuals by giving the name of a field in
a phenotypes file (which must have been loaded prior to the
power
command). Individuals who are
missing data for that phenotype will not be included in the
power calculations.
When the power calculation procedure is complete, three files
will have been created. The file
power.info
serves as a record of the run,
including the grid and other options chosen and the ELODs that
were estimated for each effect size. The file
power.out
contains a line for each grid
point in the format X Y1 Y2 ... and is suitable for input to
plotting packages such as xmgr. In the first (or X) column is
the QTL heritability. The succeeding columns contain the
power estimates, each corresponding to a different LOD. By
default, the LODs are 3 and 2, in that order, but you are free
to choose a different set of LODs. The third file,
power.lods
, stores the results of the ELOD
simulations. In the event a power calculation is interrupted,
this file, along with power.info, can be used to restart the
run.
The following sample power.info file is from a power calculation in which a grid interval of 0.025 was chosen and the total heritability was set to a constant value of 0.5. Only the first few lines of the file are shown. Only the 604 individuals having data for the trait Cz were included.
data = Cz navail = 604 phenf = example.phn nreps = 500 seed = 123 h2t = .5 freq = .2112 grid = .05 .5 .025 lod = 3 2 h2q = 0.05 h2r = 0.45 ELOD = 0.292533 h2q = 0.075 h2r = 0.425 ELOD = 0.382151 h2q = 0.1 h2r = 0.4 ELOD = 0.535781
The following lines are from the power.out file for the same power calculation. The file is free-format with blank space between the columns. The second column shows the power to detect, with a LOD of 3, a QTL of the effect size given in column 1. The third column shows the power for a LOD of 2. In this example, we have 80% power to detect, with a LOD of 3, a QTL which accounts for approximately 38 or 39% of the trait variance.
0.3 0.502568 0.754433 0.325 0.578311 0.810474 0.35 0.696662 0.884325 0.375 0.759302 0.917146 0.4 0.849706 0.957032