Go to Main Index
Go to Table of Contents

Chapter 6 Basic Modeling Commands

6.1 Kinds of Modeling Commands

SOLAR has four major high level genetic modeling commands:

	Command	    Modeling Performed
	-------     -------- ---------
	polygenic   Quantitative Genetic (and/or covariate screening) 
	twopoint    Twopoint
	multipoint  Multipoint
	bayesavg    Bayesian Model Averaging (covariates or linkage elements)

These commands (which are actually SOLAR/Tcl scripts) create, maximize, and compare many models automatically. Modeling can also be done more manually using the elementary SOLAR commands. The basic command underlying all the large analysis scripts is maximize which adjusts model parameter values to achieve the maximum likelihood.

In rare cases, SOLAR may be unable to maximize models for which the data are barely adequate, or the curvature of the likelihood surface exceeds the numerical limits. The loglikelihood and/or the parameter values may go to illegal numerical values. This is called Convergence Failure. Techniques for overcoming convergence failure are discussed below in Section 6.8.

6.2 Quantitative Genetic or Polygenic Analysis

Quantitative Genetic or Polygenic analysis can be done using the polygenic command. First, you must load the pedigree and phenotypes file (if you have not done so already), specify your trait and specify your covariates. Then you can run polygenic. (If you have created any other models before, you should give the model new command first.)

        solar> example
        solar> load pedigree gaw10.ped
	solar> load phenotypes gaw10.phen
	solar> model new
	solar> trait q4
	solar> covariate sex age age*sex age^2 age^2*sex
	solar> polygenic

Covariates may include sex (found in the pedigree file) and any of the variables included in the phenotypes file. Variables may be raised to any power using the ^ operator (for example, age^2 is age squared). Interactions are specified using the * operator (so age*sex is the interaction of age and sex, usually called age by sex). You may list all the covariates in one command (as above) and/or use several commands; the list of covariates is cumulative. If you simply want to include ALL the variables (other than the trait) in the phenotypes file as covariates, you may use the automodel or allcovar commands (which will also include all the age and sex interactions and squares as shown above). In the covariate command, the pound sign ( # ) is a shortcut for both variables and their interactions, and commas can be used to separate multiple exponents. So the above list of 5 covariates could be abbreviated like this:

            covariate age^1,2#sex

The polygenic command will calculate H2r (the polygenic heritability), the significance of H2r, and the proportion of variance caused by the covariates. The coefficients of the covariates are the values of the parameters in the model itself. You can display the entire current model with the command model, or just the parameters with the command parameters. If we wanted to just see the value of one parameter, bage (the beta coefficient or parameter for age), we could use the command:

        solar> parameter bage =
	0.006262359675

To see the calculated standard error for this parameter, we use the selector se:

        solar> parameter bage se
        0.002376826451

A popular option, covariate screening, can be added to this with the -screen argument:

	    solar> polygenic -screen

( -screen may be abbreviated as -s )

Covariate screening will determine the statistical significance of each covariate, and remove it from the final model if it falls below a specified level. By default, the significance criterion level is 0.1 (chosen to retain covariates unless they are unlikely to be important). The significance criterion can be changed by using the -prob argument:

	    solar> polygenic -screen -prob 0.05

If you would like to determine the significance of each covariate, but would not like to remove any of them from the final model, you can force all the covariates into the final model with the -all argument:

            solar> model new
	    solar> trait q4
	    solar> covar age^1,2#sex
	    solar> polygenic -screen -all

You can also selectively force covariates into the final model using the one or more -f arguments. Each one is followed by the name of one argument:

	    solar> polygenic -s -fix sex -fix age

When completed, polygenic will provide a summary of the results, which will look something like this:

	Pedigree:    gaw10.ped
	Phenotypes:  gaw10.phen
	Trait:       q4                    Individuals:  1000

			 H2r is 0.5502247  p = 2.8143773e-29  (Significant)
	       H2r Std. Error:  0.0573958

                                      age  p = 0.0085273  (Significant)
                                      sex  p = 0.2913649  (Not Sig., but fixed)
                                  age*sex  p = 0.2032678  (Not Significant)
                                    age^2  p = 0.2947688  (Not Significant)
                                age^2*sex  p = 0.5929531  (Not Significant)

	The following covariates were removed from final models:
	age*sex age^2 age^2*sex

	Proportion of Variance Due to All Final Covariates Is
				  0.0069282

	Output files and models are in directory q4/
	Summary results are in q4/polygenic.out
	Loglikelihoods and chi's are in q4/polygenic.logs.out
	Best model is named poly and null0 (currently loaded)
	Final models are named poly, spor, nocovar
	Constrained covariate models are named no

	Residual Kurtosis is -0.0538, within normal range

New with SOLAR version 2.0.5 and above, a Residual Kurtosis test is performed. The residuals of the trait (after the covariate influences have been removed) are written to a file named polygenic.residuals and then the kurtosis is computed. If there are no covariates, only the kurtosis of the trait itself is tested. If the Residual Kurtosis exceeds 0.8, you are given a warning. In that case, you are advised to see Note 5 in the help message for polygenic. High kurtosis indicates a trait distribution not well suited for SOLAR linkage analysis because unrealistically high LOD scores might be produced. However, you can mitigate the harmfulness of high kurtosis using tdist or lodadj.

Output files will be written to a subdirectory named by the trait or outdir commands. They will include a file named polygenic.out, which is like the summary shown above. There will also be a file named polygenic.logs.out which will show all the loglikelihoods used in the calculations. There will also be many models, including the final model which will be named poly.mod. For convenience, a copy of that model named null0.mod will be made for use in twopoint and multipoint linkage scans. The final polygenic model will be left in memory at the conclusion. If screening was done, it will include only the covariates which met the significance criterion, or fixed by the user. The information summary shown above mentions the important files and models written.

6.3 Twopoint Scanning

A twopoint scan is done by the twopoint command (itself a Tcl/Solar script), which will scan all the markers currently in the ibddir. The LOD score and H2q1 (proportion of variance associated with a linkage element) will be reported for each marker. The report is displayed on your terminal and also written to the file twopoint.out. If you later get additional markers, you may append them to the previous twopoint.out file using the -append option.

Twopoint scanning requires a model named null0 in the output directory. The usual way to create this is with the polygenic command. If you are appending new markers to an existing twopoint.out, you must be sure you are using the same null model. If not, you should use the -overwrite option instead of the -append option, and scan the markers all over again.

Twopoint scanning can be done with custom parameterizations. See Section 9.5 for a discussion. One special case of this is the analysis of dominance, which is described in Section 9.4.

6.4 Multipoint Scanning

The multipoint command will scan the selected chromosomes (or the whole genome) for the highest scoring quantitative trait locus (QTL). If a criterion LOD score is specified, and that criterion is met by the highest LOD score, multipoint will lock-in the highest QTL and perform another scan to find another QTL. An example was given in Chapter 3. If no criterion is specified, multipoint will simply make one pass to find the best single QTL and stop.

Multipoint requires a model named null0 in the output directory and the usual way to create this is with the polygenic command. It also requires MIBD files (whose preparation is described in Section 5.3.

Before giving the multipoint command, the following other commands are also required:

	mibddir		select multipoint IBD directory
	chromosome	specify chromosome(s)
	interval        select scan interval and range

The following commands are OPTIONAL:

	finemap		establish LOD threshold for finemapping
			(OPTIONAL...it defaults to 0.588)

Here is another example multipoint scan:

	mibddir gaw10mibd
        chromosome 6-10 12 14-23
	interval 5
	multipoint 3 1.9

In this example, multipoint will perform one pass, and then if the LOD for the best QTL is greater than 3, it will make another pass. At the end of the second pass, it will continue making passes until the LOD for the best QTL does not reach 1.9 or higher.

As it is proceeding, multipoint will show the LOD score and H2q1 for each locus tested. When it is done, it will show the highest scoring locus for the last pass, and the QTL's included in the final model.

Summary output is written to a file named multipoint.out. This simply summarizes the loci that were scanned in each pass, and the QTL's included in the final model. The detailed results for each pass are written to a file named multipoint<pass>.out, for example, multipoint1.out for the first pass, multipoint2.out for the second pass, and so on. The best model containing all loci that met the specified criteria is saved in link.mod. The best model for each pass is also saved in a null model for the next pass named null<pass>.out, for example, null1 is model with the best QTL from pass 1, null2 is the model with the best QTL from pass 2 along with the best QTL from pass 1, and so on.

6.6 Bayesian Model Averaging

The Bayesian Model Averaging method identifies important elements (either linkage elements or covariates) by computing a statistic called the Bayesian Inference Criterion (BIC) for a model containing each possible combination (or set) of those elements, and then performing additional statistical tests on the models having the highest BICs and the elements they contain. The SOLAR command which uses this method is bayesavg.

bayesavg has been modified recently. Rather than computing the effective log(n) by maximizing a fully saturated model with all the candidate elements, an estimated log(n) derived from the number of elements is used. Later, after the window of important models has been identified, an exact log(n) is computed from the model with the best BIC. Not only is this approach more accurate, it bypasses the step having to maximize a fully saturated model. It does, however, require rewriting the main output file to correct the BIC values. During scanning, the main output file is bayesavg.est (or bayesavg_cov.est for covariate analysis). Later this file is copied to a file named bayesavg.nose (or bayesavg_cov.nose) which uses the final log(n) but does not have standard error estimates. Finally the file bayesavg.out (or bayesavg_cov.out) is written which has standard error estimates for all the models in the window (unless the -nose option has been specified, in which case the .nose file is simply renamed to .out). A file named bayesavg.avg (or bayesavg_cov.avg) has the final bayesian model averages.

6.6.1 Bayesian Linkage Model Averaging

One way of starting bayesavg assumes that you already have created a model with all the elements in it. This would be most likely in the case of linkage elements. After having run a multipoint scan finding several important linkage elements, you could immediately procede to bayesavg. The default mode is linkage model averaging:

	solar> mibddir gaw10mibd
        solar> model new
	solar> trait q4
        solar> covar age
        solar> polygenic
        solar> multipoint -ov 1
        solar> bayesavg

6.6.2 Bayesian Model Averaging with Covariates

It's also possible to use bayesavg with a starting model containing a slew of covariates. However, there are also alternatives where you start with a base model containing only fixed features, to which you want to add a set of variable features. Perhaps the most useful option for this is the -list option, where you specify a file containing all the covariates (or linkage element) you want tested. The file contains the name of each item on a separate line. When you specify a list, the covariates already in the model become fixed elements.

        model new
        trait q4
        covar age
        polygenic            
        bayesavg -cov -list testcovars.lst

6.6.3 QTN Analysis

SOLAR QTN analysis is based on bayesian covariate model averaging. Each candidate single nucleotide polymorphism (snp) is coded as a covariate in a phenotypes file. The -qtn option has all the features suitable this application. It automatically invokes the -cov and -stop options. The -stop option stops the analysis when no models of current size (that is, the degrees of freedom) have been included in the window. When this happens, it is impossible for any models of larger size to be included in the model either. (This option will probably become the default in a future release.) This is very important for QTN analysis because otherwise the set of all possible models could become astronomically huge. Also the -qtn option includes all covariates with the prefix snp_ as variable elements, and all the others as fixed elements. Thus, you can add all the regular covariates and the snps to the starting model (which need not be maximized). Also a special windowfile is produced named bayesavg_cov.win which contains a matrix of the snps included in each window model.

Before using bayesavg on a large set of snps, screening should be done using qtnm and plotqtn to eliminate redundant snps. Once again, this is done to help prevent the number of models that need to be tested from becoming astronomically huge.

When there are going to be more than 20 snps tested (which would correspond to 1,048,575 models with 1 or more elements in them) it is adviseable to use the -max option, because the amount of storage necessary to keep all the possible combinations in computer memory (RAM) will start becoming a significant problem. SOLAR might crash when starting the analysis if your computer does not have sufficient memory. Generally, it will not be necessary to run through all the possible degrees of freedom anyway. In most cases, a -max 10 or less is sufficient. (Use a progressively smaller number as the number of snps increases.) When using -max, however, be sure that the BIC started to decline for the last degree of freedom actually tested, otherwise a larger -max may be required.

6.7 Custom Quantitative Genetic Analysis

If you are using the standard SOLAR parameterization, but the polygenic script does not perform exactly the quantitative genetic analysis you require, you can probably do it at a lower level by using the maximize command to maximize models you have set up exactly as you want using the parameter, constraint, omega, and mu commands.

This is less necessary now than it has been in the past, because now polygenic can handle constrained covariate beta parameters, which previously was one of its more glaring limitations. (See Note 4 in the documantation for polygenic for further details.)

The direct use of the maximize command was introduced in Section 3.5 of the Tutorial (Chapter 3). If you need to use a custom parameterization that is discussed in Section 9.5.

Similar ideas can be employed to do a custom linkage analysis.

Underlying all modeling commands in SOLAR is the maximize command, which will maximize the loglikehood of the model by adjusting the values of the parameters such as the variance components and covariate betas. In addition, the standard errors for each parameter will be computed, though this is a time consuming option which may be switched off with the standerr option:

	option standerr 0

If there are covariates, it is usually best to maximize a sporadic model (having no genetic elements) first because the determination of covariate betas can then be done analytically. Then you can maximize the corresponding polygenic model. If you are using the standard parameters, you can use the spormod to set up the variance parameters constraints, and omega as required to make a model sporadic, and polymod to set up the parameters, constraints, and omega as required to make a model polygenic. You can call these commands after one another, in either order, and as many times as you like (at least that was the design intent). There is also a linkmod command which sets up the parameters, constraints, and omega as required for a linkage model. It is similar to the other commands, but it requires that a polygenic model has been set up first. After linkmod, you could go back to a polygenic model with the polymod command.

If you have been working with other models in the same session, it is safest to wipe the slate clean with the model new command.

        solar> model new
        solar> trait q4
	solar> covariate age
        solar> spormod
	solar> maximize
	solar> polymod
	solar> maximize poly.out
        solar> save model q4/null0

Illustrated above is what the polygenic command does in its most minimal form, without running any extra models to compute statistics, etc. The most important thing is that a null0 hass been created to use as the null in linkage analysis.

To make it easy to compare different models, SOLAR has commands such as loglike (return the natural log likelihood of the recently maximized model), chi (compute the probability of rejecting the null hypothesis given a chi-square value) and clod (compute the lod score given two log likelihoods). The following example will compute the p for the covariate age in a polygenic model (similar to what is done by the polygenic -screen command). Notice that chi here returns a string such as "p = 0.0153740". The Tcl set command is used to save intermediate results in variables and the Tcl expr command is used to evaluate mathematical expressions.

        solar> model new
        solar> trait q4
        solar> covariate age
        solar> spormod
        solar> maximize
        solar> polymod
	solar> maximize
        solar> set with_age [loglike]
	solar> parameter bage = 0
	solar> constraint bage = 0
        solar> maximize
        solar> set without_age [loglike]
        solar> set chisq [expr 2 * ($with_age - $without_age)]
	solar> set deg_freedom 1
	solar> chi $chisq $deg_freedom
        p = 0.0153740

In the next example, the p of the H2r in a polygenic model will be computed similar to the way it is done by the polygenic command. In this case, we will use the chi command to return a number so that it can be divided by two (which is done because H2r is fixed to a boundary).

	solar> spormod
	solar> maximize 
	solar> set spor_ll [loglike]
	solar> polymod
	solar> maximize
	solar> set poly_ll [loglike]
	solar> set chisq [expr 2 * ($poly_ll - $spor_ll)]
	solar> set p2 [chi -number $chisq 1]
	solar> set p [expr $p2 / 2]
        solar> puts "p = $p2"
        p = 3.6566122e-29

After a model has been maximized, all the parameter values in memory are updated. The current value of a particular parameter can be displayed with the parameter command, or the entire model can be displayed with the model command. Models can also be saved with the save model command, and loaded with the load model command.

6.8 Techniques for Resolving Convergence Failure

The optimization algorithms used by SOLAR are very flexible and robust, and do not necessarily need to have trait values following a Multivariate Normal Distribution (though that is optimal), and likewise for covariate values, (and one would expect SOLAR to operate best when all data is normally distributed and there is plenty of it). SOLAR includes a number of heuristics based on many years of experience with data that has not always been entirely adequate. However, despite all this, there is no guarantee of success. SOLAR maximization will sometimes be unable to find the optimal parameter values. This is called Convergence Failure.

Convergence Failure happens most frequently with linkage models. In the scan output files produced by the multipoint command, such as multipoint1.out, there is a final column (with no header) which gives an error abbreviation, if applicable. This is where you may see the abbreviation ConvrgErr. Convergence errors, like lousy weather, should not stop SOLAR from scanning the genome, at least once. If convergence errors have occurred during one scan, however, additional scans conditioned on that scan will not be performed, and a warning message will be displayed. When you plot the results, points having convergence errors will be shown having an asterisk symbol. (The symbol is placed at an interpolated position on the LOD curve, however you must remember that the LOD of any points where convergence failure occurred is actually unknown. Unfortunately, convergence failure is probably more likely to occur at interesting points such as peaks having the highest LOD score. However, sometimes convergence failure occurs right next to zero LOD scores on both sides. Such convergence failures can probably be ignored since even if the convergence failures are hiding single point peaks, those peaks can probably be ignored since they probably reflect random noise rather than an underlying biological process. So, the first thing one can do about SOME convergence errors is: (1) When convergence errors occur at unimportant QTL's (surrounded by zero LOD's on both sides), they can probably be ignored.

6.8.1 Get Better Data

When possible, (2) one of the most effective way to solve convergence failures is with better data. Better data can be better in many ways. It can simply be a larger data set. It can also be a data set in which outliers have been removed. SOLAR does not automatically remove outliers. It can also be a better cleaned data set, in which Mendelian errors have been removed. The cleaning of genetic data should be considered a part of every project based on genetic data. Statistical genetics programs, SOLAR included, are very sensitive to and can be thrown off by even a small number of errors. Here at SFBR, data cleaning is a major process involving many people and some staff are dedicated to it. SOLAR already checks for mendelian errors, but it does not necessarily detect all errors. Another program which is very useful for finding data errors is Simwalk.

6.8.2 Transform Data

Often it is possible to avoid convergence failure by transforming the data in some way. The ideal data for SOLAR would have a multivariate normal distribution, with a mean of zero and a standard deviation of 1.0. Usually larger values for the standard deviation are OK but frequently smaller values cause problems. In particular, a trait standard deviation smaller than 0.5 has often been found to cause problems so SOLAR now has a warning for that. One simple but often useful change in some cases is simply to multiply the trait values by some constant to overcome a small standard deviation. Other traditional transformations, such as taking the natural log or square root of each value, may also be useful. Typically one looks at a histogram of the data and/or gets some statistics such as standard deviation and kurtosis. (SOLAR has a command stats to get the statistics, including kurtosis, for any variable.) Then one decides upon a suitable transformation. Kurtosis less than 0.8 (where zero is defined as the value for a standard normal distribution) is highly desireable. Users of Pedsys can use the tally program. Other useable programs include Gauss and Excel. The transformation of data prior to statistical analysis is discussed in many textbooks, including Genetics and Analysis of Quantitative Traits by Lynch and Walsh (1998, Sinauer Associates, Inc.).

NOTE: The data transformations described below are intended for use with quantitative traits only. The things you can do with discrete traits are limited and not technically rigorous. However, sometimes you can do exploratory research by pretending your discrete trait is a poorly distributed quantitative trait. That can be accomplished by giving the command option EnableDiscrete 0 before running polygenic.

6.8.2.1 zscore

SOLAR has a command zscore to perform a simple transformation of the trait so that at least the Mean and SD are near optimal values. The formula is:

        zscore = (value - Mean) / SD

To use the zscore command, give the command zscore after specifying your trait. Then run polygenic and multipoint as usual. Be sure to read the documentation for zscore.

6.8.3.2 Inverse Normalization and other defines

Inverse normalization is an important new feature added to SOLAR version 4. When a variable (trait or covariate) is inverse normalized, it resembles a normally distributed variable, yet retains much of the original information as far as a linkage analysis is concerned. As with zscore, the variable will have a mean near 0.0 and a standard deviation near 1.0, but it will also have a distribution that approximates normal.

SOLAR provides two ways to inverse normalize. The most convenient is to inverse normalize on-the-fly with the define command, for example:

        define iapoe = inormal_apoe
	trait iapoe

The prefix "inormal_" (and the underscore is mandatory) causes the prefixed variable to inverse normalized during maximization. Note that if the phenotype name includes special characters, the entire term must be included in angle brackets, for example:

        define iapoe =

The define command can also be used with all other simple mathematical operators (+ - * /) and also the math functions (such as log, log10, sqrt, and exp) from C++ which are listed in the help documentation of the define command. Multiple phenotypes may be included in a single define.

The other way to inverse normalize is using the inormal command, which allows you to create a new file with the inormalized variable for use with SOLAR or other programs.

6.8.3.3 Residualization

Another useful technique is to residualize your trait variable to include all covariate effects. This technique is especially called for when you can run a full linkage analysis on the trait by itself, but adding important covariates leads to convergence failure. When you residualize your trait, you are implicitly fixing a set of covariate betas, and you are simplifying the linkage problem by only having one variable to maximize.

To do this, you start the analysis by doing a polygenic analysis of your trait and covariates. After the polygenic command, run the residual command. This will produce a file named residual.out with a residualized trait now called residual. Now load the file residual.out as a phenotypes file, and select the variable residual as your trait. Now you do not need to select any covariate, as all covariate effects are already incorporated into the residual value, at least as they appeared in the polygenic model.

        trait apoe
	covar age sex age*sex height weight BMI
	polygenic
	residual
	load phenotypes residual.out
	model new
	trait residual
	polygenic
	...
	multipoint

6.8.3 Manage real and artificial boundaries

Models using the standard parameterization set up by SOLAR automatically have the parameter boundaries managed by SOLAR. This is done differently for the variance component parameters and the covariate beta parameters.

6.8.3.1 Manage Covariate Beta Parameters

The boundaries of covariate beta parameters are almost always handled quite well by SOLAR. In most cases they can be left alone. If uninitialized, the parameter boundaries are estimated and set in one of the earliest phases of the maximize command. The initial boundaries are based on the maximum and minimum values of the covariate variable and the trait. The covariate boundaries are initially set so that the covariate variable could explain twice of the variation of the trait variable. (This default ratio is controlled by the option AutoCovarBound.) If the covariate boundaries are too narrow, they are automatically increased by covboundincr each time for a maximum of covboundretries. You can adjust option AutoCovarBound using the option command, and adjust the other factors using the covboundincr and covboundretries commands. But generally you will not need to mess with these (I have never known anyone to try), unless you are having trouble with your sporadic or polygenic models.

6.8.3.2 Manage Variance Component Boundaries and Starting Points

SOLAR already has fairly complicated automatic management of variance component parameter boundaries based on our experience. However, it is not necessarily perfect, and may require some adjustment to handle the most difficult convergence situations. The basic command to use is boundary, and you should read the documentation for the boundary command as well as the documentation under boundary-notes. You can click the links above in this paragraph, browse the documentation later, or use the help command to bring up the documentation.

The standard variance component parameters of SOLAR (such as e2 and h2r have natural boundaries of 0 and 1. But allowing the parameters this entire range frequently causes convergence failure, so artificial boundaries for each parameter are positioned and moved within that range. At the "end" of the Fortran maximization algorithm, if any of the variance component parameters has hit its boundary, the hit boundary is moved and then the maximization algorithm is called again. On the other hand, if there is a convergence failure, boundaries around all the variance components are pulled-in (crunched), and maximization is tried again. If boundaries have already been crunched and there is still a convergence failure, a second level of more intense crunching is done. During multipoint scanning, there are additional types of retries which may be tried. Ultimately the effect of all retries will either be a successful convergence, which is what usually happens, or a convergence failure.

Prior to running multipoint, you can give boundary commands to alter the way the automatic boundary management is done. For example, the boundary wide start can be given to use the wide natural boundaries whenever possible (unless an actual convergence failure occurs anyway). For more examples and discussion, see the documentation for the boundary command.

6.8.5 Conv, Tol, and other convergence settings

The Fortran maximization algorithm Search.f also has a number of convergence settings which can be controlled through the option command. Generally most users should not be touching these settings. Improper adjustment of the convergence settings could lead to invalid results.

There are four basic settings available named after the Fortran variables they control: MaxStep, Conv, NConv, and Tol. It has proven useful (starting with version 2.0.4) to use a larger value for Conv for discrete models, so there is now a separate option for Conv(Discrete). There are also two new options to deal with discrete models: BCliff and MaxCliffs.

Since options are model dependent, you should specify or change an option AFTER giving the trait command, but before giving a maximization command such as polygenic. If you load a model, all option values are taken from the model or use the default otherwise. If you give the model new command, all options are set to the default values.

MaxStep (which is called MXSTEP in Search.f) determines the number of step decrements which are allowed in each iteration. At the beginning of each iteration, Search.f tries to predict the optimal point by solving the quadratic equation. However, often there is more curvature in the likelihood function than expected, and the predicted point goes over a cliff where likelihood gets worse rather than better. When this happens, Search.f backs up, only going a fraction of the distance in the direction of the predicted optimal point. Once again, that can fail. The number of times SOLAR is allowed to "back up" is limited by MaxStep. Sometimes it helps to increase this value. You can see the number of steps that are taken in each iteration in the maximization output files or in the verbose maximization output.

Conv and NConv work together to determine when maximization has completed. When the loglikelihood has changed by less than Conv for NConv iterations, maximization is deemed complete. It sometimes helps to make Conv larger to allow maximization to complete when the likelihood surface is extremely curved. This is similar to increasing the tolerances for parameters, which is addressed by the Tol setting.

Maximization involves computing the loglikehood of one set of parameter values and then trying the find a set of parameter values that has an even higher loglikelihood. Unfortunately, sometimes a test point of parameter values will lead to an incomputable loglikelihood. Such an incomputable value, like division by zero, is called Not A Number or NaN. The code directing discrete trait maximizations, Optima.f had a tendency to ignore these problems under some circumstances. So an addtional mechanism has been introduced which handles this problem in the same way that Search.f backs up from unexpectedly large (but not incomputable) drops in loglikelihood value. The bad "point" is considered to be "over the cliff" so we back up from it. Since the situation has apparently gotten very bad, we back may need to back up more than usual, so there is a new MaxCliffs setting which takes the place of the smaller MaxStep setting. There is also a factor, BCliff which determines how "much" to back up. The default value, 0.1 means that each time we back up from a cliff, we back up by a factor of 10. This is the same (not adjustable) value that is used under "ordinary" circumstances where the likelihood has fallen by a very large but still computable amount.

6.8.6 Enforce Constraints

Sometimes a multipoint run will terminate with a "Constraints not Satisfied" error. This seems very strange at first, because your model started with satisfied constraints, so why should it not end with satisfied constraints? Unfortunately, during maximization there can be numerical problems (such as when very small numbers are added to very large numbers) such that, at the end of a maximization step the constraints are no longer satisfied. This is best understood as a kind of convergence error resulting from numerical limits. There is an EXPERIMENTAL option intended to correct this problem. You can try the command option EnforceConstraints 1 before running polygenic. Unfortunately, in our experience, this simply leads to other kinds of convergence errors later on. Residualization as described above generally works better, if you can use it.

6.9 Handling Poor Trait Distributions

SOLAR has two mechanisms to deal with trait distributions that are significantly different from multivariate normality. These are also applicable when the Residual Kurtosis (calculated by the polygenic command) is greater than 0.8. These mechanisms are embodied in the commands tdist and lodadj. Using either of these commands will result in more accurate LOD scores when the trait distribution is not ideal.

6.9.1 The `tdist` command

The tdist command sets up an extra parameter, t_param, and enables it by setting the option tdist to 1. All you need to do is give the tdist command prior to running polygenic or another maximization command:

        solar> model new
        solar> trait apoe1
        solar> covar age sex age*sex
        solar> tdist
	solar> polygenic

Although tdist is quick and easy to use, the extra parameter will slow down maximization somewhat, which will slow down genome scans. To turn if off, give the tdist -off command.

6.9.2 Empirical LOD Adjustment (`lodadj`)

Empirical LOD Adjustment, as realized by the lodadj command, is one of the most important features of SOLAR. It corrects for the innacuracy caused by non-normal trait distribution, but unlike tdist it does not make maximization slower or less robust.

To use LOD adjustment, you must first calculate the adjustment required by simulation. This is done by the lodadj -calc command, and it may require a considerable amount of time to complete. Ultimately you end up with a factor less than 1.0 (such as 0.9) by which all your LOD scores are (automatically) multiplied. After the LOD adjustment is computed, you simply need to turn it on when desired with the lodadj command and off with the lodadj -off command.

There is a full discussion of lodadj in Section 10.2.

6.10 Advanced Modeling Topics

Advanced modeling topics including discrete traits, bivariate (two traits), dominance, household groups, and arbitrary parameterization are now discussed in Chapter 9.

Chapter 6

Basic Modeling Commands