This appendix is a concatenation of all the change notes from the SOLAR change-notes command. More recent changes appear earlier.
**** New in version 8.1.1
1. Speed improvements in all SOLAR operations thanks to improved
compiler optimization. Polygenic runs 1.73 times faster compared
with previous official version 7.6.4. The biggest improvement is
in C++ intensive mathmatrix operations based on Eigen; as a
result of this and other changes "fphi -p" is now 25 times faster
than in version 8.0.6.
2. The implentation of fphi p value estimation, working since 8.0.6,
is made cleaner and slightly faster. The use of a new mathmatrix
operation PermuteY which eliminates the need to create an
intermediate shuffled vector for each column of the permuted
matrix (as had been done in 8.0.6). Far fewer temporary matrices
are created. These changes had only a small effect on speed
compared with the improved compiler optimizations.
3. fphi p value estimation now uses better random numbers to shuffle the
Y matrix: Mersenne Twister mt19937 which has a period of 2^19937.
AFAIK this is the best quality fast random generator available, and
it is included in the recent C++11 standard so is widely available.
4. fphi p value calculation numerator was off by one. Now p values
from "fphi -p" are slightly smaller (better).
5. Other changes to fphi: Tcl code is cleaned up with much junk removed,
variables given meaningful names, obsolete -method argument
removed, -testonly is now called -indicator.
6. option ShuffleReseeding controls how the mt19937 random number
generator used for matrix shuffling is seeded. The default value
is 1 which means the generator is reseeded to the default value
5489u at the beginning of each matrix shuffle operation. This
gives consistent p value results every time, and the most
comparable results when the same sample is used across a range of
models. The options are:
1...seeded on every shuffle to 5489u for consistent results (DEFAULT)
0...seeded first shuffle to 5489u, then free running
-1...seeded every shuffle to time() for purely stochastic results
-2...seeded first shuffle to time(), then free running
Other values: seed to this value at beginning of each shuffle
7. Errors in test models in mga (either the regular test model or the
interaction test model) would cause mga to exit. Now such errors are
reported to the screen and a line with SNP name but blank results is
written to the output file.
8. option MatrixNumberFormat controls the formatting precision of numbers
output by matrix operations such as "show". The default of 15 eliminates
long unrounded numbers such as 0.4999999999999999.
9. This version is the first ever to have a public source code
release, along with the usual binary releases for Linux, Mac, and
Microsoft Windows. The binary releases are fully tested and
replace Official Version 7.6.4 on the general download links at
http://solar.txbiomedgenetics.org. Due to the now distributed
nature of SOLAR Eclipse development, this release is not called
Official but instead General as in intended-for-general-use. An
Official version with coordinated updates from all developers may
be available soon.
**** Skipped versions 8.0.7 to 8.1.0
These versions were experiments in how to increase speed of fphi p
value estimation. Sadly it is far easier to make it slower than
faster, and so relatively code changes were chosen for 8.1.1 that
help a bit but don't attempt to tackle the biggest bottleneck. The
biggest bottleneck is the big matrix multiply with dimensions (nS x
nS) times (nS x nP). For typical 1000 person sample and 5000 nP,
this would require 15 billion floating point calculations. Eigen
already optimizes this by a factor of 5. Additional optimizations
specially applicable to a permuted matrix could also improve
performance by a factor of 5. But to be better than Eigen already
is, we would need to combine their optimization with the special
ones, which is very difficult. So until that can be done we are
just letting Eigen optimize the multiplication as it already does.
**** New in version 8.0.6
1. fphi -p has been corrected and appears to give reasonable p
values, BUT this is still Highly Experimental. (1) Speed is an
issue here, but 95% of the time is spent doing a single large
matrix multiplication in C++/Eigen. This will be addressed with
a virtual-shuffle-matrix-multiplication in the next version(s),
and also by adding a completely different implementation of fphi
to SOLAR. (2) We are also concerned about the p values possibly
not being accurate, and it is true they do not match another
implementation of fphi. We are continuing to investigate. The
h2r values in both implementations match and we believe they are
"correct" as far as an fphi approximation can do.
2. MathMatrix objects (such as used by fphi) can be written to files
using the new "output" command. Matrices are written without csv header.
output $m ;# write out matrix as csv file
This can be combined with "evdout" to write out the X, Y, and Z
matrices computed by SOLAR EVD:
model new
trait q4
covar age
evdout
set X [evdinx]
set Y [evdiny]
set Z [evdinz]
output $X Xmatrix.csv
output $Y Ymatrix.csv
output $Z Zmatrix.csv
3. evdinz command to obtain Z matrix from evddata.out.
4. shuffle command added which can shuffle a vector into a vector,
or shuffle a vector into n columns of a matrix (MathMatrix).
shuffle $v ;# shuffle the elements of vector v
shuffle $v n ;# shuffle the elements of vector v into n-1 cols
;# retaining first column unshuffled
5. identity creates an identity matrix with the specified number
of rows.
6. "mathmatrix debug" shows lots of info and especially prints before and
after each multiplication. "fphi -debug -p" turns on this feature to
highlight the time taken by multiplication of the shuffled matrix.
**** New in version 8.0.5
1. fphi now computes p value for h2r (but not yet correct in 8.0.5).
2. concatenate command concatenates two matrices, either vertically
or horizontally.
3. power command still works as before to perform power calculations,
but has been extended to also do elementwise power (square, cube,
inverse, etc) operations on MathMatrix objects. This occurs if the
first argument to the power command is a MathMatrix, the second
argument will then be interpreted as the power to which each element
of the matrix should individually be raised or lowered, for example:
power $M 2
4. max command has been extended to perform an elementwise maxing operation
which returns a matrix with each element either the original value
the second argument to the command. For example the following command
would truncate all the negative values in a matrix to zero:
max $M 0
If there is no element in the matrix which needs to be changed, the
original matrix is returned. Otherwise a new matrix is created.
5. Documented traditional sample matrix format better for "help matrix"
**** New in version 8.0.4
1. fphi is now producing reasonable approximations of heritability.
Several implementation errors from 8.0.3 had to be corrected to make
this possible, and the procedure now uses "Method 1" consistently.
See "help fphi" for more information on FPHI methods 1 and 2.
2. "option ResetRandom 1" will reset the random number sequence used during
maximization to its initial starting value at the beginning of
maximization for the current model. This is for experimental use in
solving convergence inconsistencies.
3. fphi has a -method2 option which attempts to use "Method 2". This
does not appear to be working yet.
4. The first value returned by fphi is now correctly labeled an
"indicator variable." The actual test-statistic is still being
implemented. The primary significance of the indicator variable is
that heritability estimation is impossible if it is non-positive.
5. evdinx now loads the X matrix without a leading column of 1's as
as required for Method 1. To get the column of 1's as required for
Method 2, give the command "evdinx -method2", or simply use the
command "fphi -method2" which takes care of this.
6. fphi now allows the direct specification of X, Y, and Z matrices.
If all three are specified, it is not necessary to specify trait and
covariates, and "evdout" is not invoked.
**** New in version 8.0.3
1. New procedure fphi calculates a test statistic and fast estimated h2r.
The model trait and covariates must have already been chosen. If the
-testonly option is specified, only the test-statistic is returned and
h2r is not estimated.
2. Matrix operations "ols" and "solve" now have X and Y arguments reversed
as is common practice (octave, matlab, R). So now the commands are
"ols y x" and "solve y x" corresponding to the standard description
"regress y on x" where Y is the dependent variable and X is the design
matrix.
3. The diagonal command now does one of two things: it returns a vector
when given a rectangular matrix, or (new) it returns a "diagonal
matrix" when given a vector. A "diagonal matrix" is a square matrix
with non-zero values only on the diagonal. A "rectangular matrix" has
more than 1 rows and more than 1 columns. A vector has either 1 row or
1 column. SOLAR does not keep track of which matrices are diagonal.
4. New dinverse command does a fast matrix inversion on a diagonal matrix.
The matrix must be a diagonal matrix and dinverse does not check this.
5. The plus and minus commands now permit one or two scalar arguments
so you can add or subtract: 1) matrix and matrix, 2) scalar and matrix,
and 3) scalar and scalar. The times command already allowed scalars like
this.
**** New in version 8.0.2
1. evdout is entirely redesigned to output the just the EVD transformed
variables, unless the -evectors option is used. The transformed variables
are writted to evddata.out.
2. New procedures evdinx (reads in X matrix), evdiny (reads in Y matrix), and
evdinev (reads in eigenvectors matrix) added to read EVD data written
by evdout into matrixes.
3. evdmat is obsoleted. Use evdout then evdinx, evdiny, and/or evdinev.
Memory-to-matrix commands evdmatx, evdmaty, and evdmatev were designed
but not finished in time for version 8.0.2.
**** New in version 8.0.1
1. Interactive and scriptable matrix algebra is now supported. Commands
include new, load, show, row, col, diagonal, rows, cols, times, plus
minus, transpose, inverse, ols, solve, evalues, evectors, mean, min
and max. See "help mathmatrix" for details. MathMatrix objects are
different from the sample relationship matrixes used during maximization.
They are implemented in C++ using Eigen.
2. evdout will output phi2 eigenvectors and eigenvalues as
used by SOLAR for EVD2 maximization. See 'help evdout' for details
on evdout, evdin, and evdmat.
3. evdin will read in the matrix file(s) created by evdout and
create MathMatrix object(s) representing them.
4. evdmat creates MathMatrix object(s) directly from the
current phi2 eigenvectors without writing or reading a file.
See 'help evdmat' for details.
5. zscoring using a command like "define zt = zscore_trait" generated
spurious messages about zscores being deleted. These messages had no
useful meaning and have been removed. The zscoring feature works fine.
**** New in version 8.0.0
1. Version 8.0.0 is compiled with 64 bit memory model (linux releases
only) to enable handling larger numbers of traits, parameters, and/or
individuals per pedigree. However, memory beyond 2 Gb can not usually
be allocated because contiguous memory is currently required. That may
be addressed in future updates. However 64 bit compilation is a
necessary first step, it sometimes helps, and also failed memory
allocation is now usually reported with an error message rather than
causing an unexplained crash as usually happened before.
Maximization is about 2% faster for all model types.
**** New in version 7.6.6
1. polygenic -residinor fixed, was mistakenly using previous null output.
**** New in version 7.6.5
1. polygenic -residinor computes a residual trait from the final
sporadic model and inormalizes it as the final trait to determine
heritability.
2. restore_phen restores the original phenotypes file after running
polygenic -residinor
**** New in version 7.6.4
1. residual now works correctly with long SNP names (snp_*) when the
snp variable only has two consecutive values.
2. mga now permits having snp covariates in the null model while defaulting
the list of snps to all those in the phenotypes files. Previously this
would cause either the snps to be double included or erronously removed.
Previously users were expected to use -snps or -snplists options to
restrict the list of snps to be tested to those not in the null model in
cases like this, but that requirement was too easily overlooked.
**** New in version 7.6.3
1. p value produced by polygenic for C2 parameter no longer truncated
at 0.0000001 due to obsolete formatting code.
**** New in version 7.6.2
1. Additional results have been added to mga.out:
est_maf (estimated minor allele frequency: mean/2)
est_mac (estimated minor allele copies: mean*samplesize)
dosage_sd (SD of the SNP dosage variable)
**** New in version 7.6.1
1. The formula used to compute standard errors for mga has been
corrected. The correct formula is sqrt($beta*$beta/$chi).
The variable "chi" is actually "chi-square" and does not
need to be squared.
**** New in version 7.6.0
1. vcfselect extracts genotype per sample data for a single SNP from
vcf files. vcfinfo extracts genotype information, with or
without the per sample data. These are very experimental and
feedback is requested.
2. 32000 mztwins and 32000 individuals now supported (was 20000 mztwins).
3. 400 simultaneous traits now permitted (was 20). Standard polygenic
models with >50 traits may cause memory exhaustion during maximization
due to the large number of rho*_ij parameters (ntraits*ntraits). This
memory exhaustion problem will be addressed in future versions. With
standard parameterization, memory required during maximization is a
cubic function of the number of traits multiplied by the square of the
size of largest family.
4. Memory leak related to use of the zscore_ operator in defined
expressions now fixed.
5. Now linked with 2014 update to imaging libraries.
6. Improperly formatted mibd file names now cause error message rather
than crashing multipoint with no explanation, or skipping name.
7. fakedata generates large pedigree/phenotypes files with small
families and random data.
**** New in version 7.5.9
1. Up to 20000 mztwins are now supported.
2. "load pedigree" will use 5 columns for the mztwin id in the pedindex
if more than 999 mztwin groups are found in the pedigree.
Otherwise, it will use only 3 columns, to ensure compatibility with
previous checksums stored in matrixes.
**** New in version 7.5.8
1. Up to 15000 mztwins are now supported (7500 pairs of twins).
2. CSV matrixes can now be loaded even if they have ID's not in the
current pedigree. Warnings are displayed and written to a file
named matrix.load.err, but in the end the matrix is loaded and
may be useable.
**** New in version 7.5.7
1. Commands "house" and "polygenic" now preserve an existing loaded
house matrix filename and options. So they save the new "-sample"
and "-allow" options.
**** New in version 7.5.6
1. Matrix options -allow and -sample are saved in model files, as
needed for correct operation with many commands.
**** New in version 7.5.5
1. Matrixes are now checked for completeness during maximization. All
individuals in the sample must have at least a diagonal matrix entry.
If not, an error occurs and the missing individuals are printed. There
are two options to modify this. "load matrix -allow" permits missing
individuals and defaults their diagonal to 1.0. "load matrix -sample"
removes the individuals from the sample, and a count of individuals
removed for not being in matrix is written to the maximization output
file.
2. Documentation for CSV matrix files is added to "help matrix".
3. mga -ixsnp now computes SE's for the bIX by default. Previously
they could only be estimated if the now obsolescent -evdse or -slowse
options were used.
**** New in version 7.5.4
1. stats now has -sample option, which calculates stats for the current
model sample.
**** New in version 7.5.3
1. SOLAR is now called SOLAR Eclipse in the startup message, in honor of
its use in imaging research where the Eclipse name is used.
2. A private option is added to the key command to help determine
the linux system compatibility on different linux distributions.
This is intended to make the SOLAR installer more reliably select
the best binary version on a particular system.
**** New in version 7.5.2
1. "matrix debug" now reads matrixes in memory, rather than relying on
statistics accumulated as the matrix file was being read. This way it
can find values that are defaulted to 0 or -1. Matrixes are traversed
from 1,1 to max,max.
**** New in version 7.5.1
1. "matrix debug" now shows the minimum and maximum values for both
on and off the diagonal. For each such minimum or maximum, it shows
the first pair of IBDID's found having that value.
**** New in version 7.5.0
1. Matrices can now be read in CSV format, using user ID (not pedindex)
as the index. If FAMID is required to disambiguate ID's, famid
should be included for each individual (famid1, famid2) since not
all matrices are limited to family interactions. The required fields
are id1,id2,matrix1. The optional fields are matrix2,famid1,famid2.
All other fields in a csv matrix file are ignored. The mapping from
ID to pedindex is obtained from the currently loaded pedigree
pedindex.out.
2. CSV matrices must be gzipped just like original format matrixes and
have filenames ending in ".gz". The matrix commands are identical, the
actual type of matrix file is autodetected.
3. Matrix reading code has been largely rewritten to attain much greater
speed than before, despite now also having to detect and process two
different kinds of matrix files and also having to translate ID's to
pedindex, which might have made it far slower. Notably matrix files
are now usually read in one pass whereas it used to require two passes.
Also, the association from ID's to pedindex is now done using an
advanced C++ object known as unordered_map, which has only officially
become part of C++ in recent years. This is a hashtable equivalent in
function to the associative array in Tcl, whose speed of access does
not decline exponentially with pedigree size, nor does it require
NxN memory increase. This may be applied to other machinery inside
SOLAR in the future to obtain further speed increases and better
functionality.
4. Though matrix handling is only a small part of MGA, matrix reading has
gotten sufficiently faster that MGA overall runs about 3% faster. (This
test was done using original format matrices. The speed increases should
apply to both types, but original format will generally be faster.)
5. A CSV matrix may have a checksum field comparable to the one used for
original format. The checksum field is optional (not required). The
matcrc command will prepend a checksum to a CSV matrix just
as it does for an original format matrix. The checksum must be in the
first data record of the file, and it has id1 named "checksum" and id2
named checksum. The actual checksum value is in the matrix1 field,
preceded by decimal point. The checksum comes from running cksum on
the pedindex.out file, so that any changes to the pedindex following
creation of the matrix will give an error. This is useful as often
people forget to update their matrices after updating a pedigree.
6. "matrix debug" now shows the sum of all matrix values, as taken from the
matrix file.
**** Versions 7.4.6 through 7.4.9 reserved
**** New in version 7.4.5
1. matrix debug now shows the actual minimum matrix value for both
one data column and two data column matrix files. Previously, for
one column matrix files, it showed the minimum value that was greater
than zero.
**** New in version 7.4.4
1. polygenic now has option -testcovar, to test a single covariate.
All other covariates are fixed and untested. The tested covariate
is not removed from the final model in any case. The default
probability level for declared significance is changed to 0.05.
The proportion of variance is reported only for the tested covariate.
**** New in version 7.4.3
1. mga now starts covariate beta parameters at 0.001 instead of 0.0.
This has not changed results in any regression tests, but might help
increase sensitivity for troublesome models.
2. solar -niskey is now available for use on clusters or clouds where
a fixed username and home directory are not available. -niskey uses
the nisdomainname shell command to obtain the nis identity, for which
a key should be requested. If the name is dotted, only the second
to last dotted portion is used for identification. For example on
solar.txbiomedgenetics.org only txbiomedgenetics is used as identity.
**** New in version 7.4.2
1. mga now by default outputs standard errors for the snp beta
that are calculated from the chi and the beta value
(sqrt(beta^2/chi^2)). You can still use the -evdse and
-slowse options for estimated standard errors as before.
If the standard error cannot be computed because chi is exactly zero,
which should never happen, the SE will be reported as 10e20.
2. polyclass -maxsnp will now output computed standard errors as is
the default for mga. (Note: as a result, -maxsnp will use EVD1
fast maximization without standard error estimation. Previously
-maxsnp had invoked -slowse.)
**** New in version 7.4.1
1. A new session option ExpNotation has been added, to force output in
exponential notation (such as -1.004e-4) in specific cases where needed.
Normally, many commands such as mga output in fixed point notation as
long as there are a few digits shown, and then they switch to exponential
for tiny values which would otherwise be reported as zero. So if a value
is shown as 0.00000 by mga it must actually be zero. But sometimes
people wonder if that is actually correct. So the ExpNotation
option has been added to force output of all numbers in exponential
notation, so there can be no doubt. Currently this option only affects
the output of mga, but it may be applied to other commands in the future.
It defaults to 0 (zero) meaning to use the default auto mode, and is set
to 1 to force exponential:
option ExpNotation 1
This option remains in effect during a single session of SOLAR. It is
not saved to models, and will return to default for subsequent or
concurrent sessions of SOLAR.
**** New in version 7.4.0
1. When parameters are constrained to zero, it is necessary (per the design
of the Fisher/Mendel Search program) to set the lower boundary slightly
below zero. Previously that adjustment had been -0.01. This has been
changed to -0.001. A new global variable SOLAR_constraint_tol is
set to the absolute value (0.001) of the adjustment when SOLAR starts.
A user can change that global variable before running polygenic to change
the automatic boundary adjustment, which cannot be made smaller than
0.00011 without changing Fisher code. Generally the automatic boundary
adjustment for constraints can now be bypassed by setting the boundary to
a non-zero value before calling polygenic or maximize.
**** New in version 7.3.9
1. Sometimes e2 would get lower boundary set incorrectly to -0.01. This
has been fixed. It would especially happen when household effect was
being added to a polygenic model in which e2 was previously estimated
to be zero.
**** New in version 7.3.8
1. Some maximization errors that could occur during twopoint could cause
the scan to stop. Errors are now caught, reported, and the twopoint
scan continues.
**** New in version 7.3.7
1. polyclass -maxsnp now writes results to file named polyclass.snpname.out
in addition to the usual polyclass.out, where snpname is the name of
the snp with the leading snp_ removed. If the -append option is used,
previous contents of the polyclass.snp are retained, so one can accumulate
results from a large number of SNP tests when polyclass -maxsnp -append is
used. Files are written to the output directory (usually named after the
trait).
**** New in version 7.3.6
1. polyclass -maxsnp (without -comb) now does not include the SNP
covariate in the initially estimated model or residuals. SNP covariates
are added later when running mga on the residuals.
2. polyclass -maxsnp now checks to ensure the snp name is prefixed with
snp_ or SNP_ since that is required by mga.
**** New in version 7.3.5
1. polyclass -maxsnp (without -comb) has been fundamentally changed. Now
it produces residuals for a fully loaded model, then uses mga to
evaluate each snp for each class with other classes blanked.
**** New in version 7.3.4
1. polyclass -comb -maxsnp was broken by the changes in 7.3.3 but
is now fixed.
**** New in version 7.3.3
1. polyclass -maxsnp now produces residuals in a file named
polyclass.residuals.out in the output directory. The file
includes fields ID, residual, trait, and all covariate variables.
2. twopoint could fail if some particular model failed in a particular way
when testing the log likelihood. That is fixed now.
3. intraitclass command added.
**** New in version 7.3.2
1. Under unusual circumstances, SOLAR was exhausting available
logical file units because of leaving many copies of
polygenic.logs.out open. This is now fixed.
The problem only occurred with thousands of back-to-back
runs of "polygenic" with very small sample size. In the end,
when no additional logical file units were available, SOLAR would
crash. The exact cause of the problem was unclear, and was not
fixed by creating an exception handling wrapper around polygenic
to ensure that close is invoked on the polygenic.logs.out logical
unit, but instead has been resolved by creating a new set of file
writing procedures (see below) to be used for polygenic.logs.out.
2. A new set of file writing procedures (putsnew, putsa, and putsat)
is available which obviates the need for the user to open and close
an output file, or to write to both terminal and output file at the
same time. These procedures are intended to replace the old, tricky,
and inefficient "putsout" procedure that many users found too
convenient not to use. They are not a replacement for the built-in
Tcl procedures open, puts, and close, which should still be used in
cases where efficiency is paramount and a tight writing loop is
possible without intervening maximizations.
**** New in version 7.3.1
1. polyclass -maxsnp has been changed so that when not in -comb
mode, only one snp p value is calculated and only snp
related statistics are reported. All classwise snp betas
are constrained to be the same value, and the p value is
calculated by constraining all of them to zero, a one degree
test.
2. polyclass now reports the class number that fails when attempting
to initialized parameters. Users may get an error that says there
are no individuals available for analysis which may seem unbelievable
unless the class number is also reported. It is easy to overlook
the possibility that some variable is not defined for a particular
class.
**** New in version 7.3.0
1. polyclass -intrait is corrected. Previously it summed the first
specified class twice, resulting in SD of 2. The defect had
been present since version 7.1.4.
**** New in version 7.2.9
1. polyclass -maxsnp now works with -comb.
2. the formatting of chi and varexp for polyclass -maxsnp has been
corrected to the formatting used for other outputs.
**** New in version 7.2.8
1. polyclass now has option -maxsnp which will do a
classwise association analysis of one snp. Results reported
include the snp beta value, chi square, p value, and variance
explained. The residual heritabilities are shown with and without
the snp.
**** New in version 7.2.7
1. This version uses the new C++0x (2011) C++ standard libraries, so
that unordered_map (hash table) can be used in a future version.
**** New in version 7.2.6
1. polyclass now works with discrete traits, and for bivariate models
with one or two discrete traits.
**** New in version 7.2.5
1. New option RicVolOffset allows adjustment to the volume numbers in
the phenotype file, and it defaults to 1 instead of the previous
fixed zero adjustment. This means that if the volume "1" is specified
in the phen file, 1 is subracted to get the actual array index of "0" in
the RicVolumeSet image data file. This prevents segmentation violation
in test file split_csv_0029.csv which has 859 volumes and the last one
listed is 859. With the new default the range of array indexes used is
0-858 which is typical C programming. In addition, it produces a
non-zero heritability for the trait FA_0029.
2. RicVolumeSet volume specifications in the phenotypes file are now tested
(after being adjusted by the new RicVolOffset) to make sure they do not
exceed the range actually found in the image file. This replaces the
previous segmentation violation with an error message.
**** New in version 7.2.4
1. polyvoxel command does routine polygenic analysis on voxel
data.
**** New in version 7.2.3
1. polyclass -resmax -comb now works.
**** New in version 7.2.2
1. polyclass -resmax option maximizes the polyclass model using residuals
which it computes first. If -intrait is also specified, it is the
residuals which are inormalized (inormalization is not done on the
initial model from which the residuals are derived).
2. residual now works on polyclass models if the -class option is specified.
3. polyclass -maxi option is renamed to -max for consistency with -resmax.
The original -maxi will continue to work. The maximization output file
is renamed to polyclassmax.out from polyclass.all.out and the model
saved is now called polyclassmax.mod.
**** New in version 7.2.1
1. polyclass -g -intrait is now working correctly for univariate and
bivariate. Although declared working for univariate in 7.1.4, it was
not including all classes in the inormalized traits.
**** New in version 7.2.0
1. Boundaries used by polyclass -maxi have been corrected to fix
convergence issues. When paremeter h2r is constrained to zero,
parameter e2 has to have an upper bound of 1.01 to permit
numerical imprecision.
**** New in version 7.1.9
1. polyclass -maxi now writes results to file named polyclass.out in the
output directory defined by trait name or outdir command.
**** New in version 7.1.8
1. mga -ixsnp has been fixed, the null model always has the bixsnp,
the test model has the two snps and their interaction, and the
interaction test model just has the two snps (and no interaction).
Previously it was all wrong.
**** New in version 7.1.7
1. polyclass -maxi now computes p value for each h2r by class.
**** New in version 7.1.6
1. mga -ixsnp option added for interaction analysis. An interaction
covariate is added to the test model, and a 3rd model in run in which
the SNP is included but not the interaction. Additional chi, p value,
ix beta, ix beta se, and variance are added to output file.
**** New in version 7.1.5
1. polyclass with the -intrait and -g options at the same time is now
fixed for univariate models but not yet bivariate models.
2. imout polygenic volume assignments have changed, allowing 8 reserved
volumes for linkage or association use. Covariate information now
starts at volume 20. See 'help polyimout' for the assignments and
more information.
3. imout now uses the dimensions of the loaded mask if available. The
mask must be loaded first with the mask command. If a mask is loaded,
the only arguments the user need specify are the output filename and
the number of volumes:
imout image2 -nvol 40
If the dimensions are defaulted to the mask, all required arguments
must be given in one imout command like the one above (since
that creates the imout object). Within that same command line, it is
possible to override the mask dimensions individually. It is also
possible to ignore the current mask file with the -ignoremask argument,
in which case all the dimensions must be given but multiple lines could
be used as when there is no loaded mask.
4. imout -ncovar allows the number of volumes to be specified as what
would be required for that number of covariates. For example, the
above imout command could also be given like this:
imout image2 -ncovar 5
which would allow for 5 covariates (which requires 40 volumes). There
is also a -ntrait argument but it defaults to 1 and any number other than
1 is currently an error, as the polygenic multivariate volume
assignments have not yet been made.
5. Attempting to use voxel coordinates outside of the range of the imout
now returns an error message rather than crashing SOLAR.
**** New in version 7.1.4
1. polyclass -intrait is now working correctly with univariate and
multivariate traits. Previously it derived all inverse normals from
Class 1 only and therefore generated hugely incorrect models. One
remaining problem is that the combination of -intrait and -g doesn't
yet work, it causes an error. This will be fixed asap.
2. The equation parser used for omega, mu, and defines has been enhanced
to permit unavailable variables if they are multiplied by zero (the zero
must come first) similar to the way this works in C++. For example
the following definition permits the class based inormal functions
to be used in a definition so that it has a value for both classes:
define i_q4 = (class==1)*inormalc_1_q4 + (class==2)*inormalc_2_q4
Depending on whether class is 1 or 2, only inormalc_1_q4 or
inormalc_2_q4 need be defined (and only one is defined for any
individual, depending on their class). If the class is 1, the
term (class==2) becomes zero, so the following multiplicand
inormalc_2_q4 is ignored.
3. polyclass -comb and -maxi options added. -comb creates a combined
class model using the class-specific traits (if -intrait is used).
-maxi performs ordinary maximization after the polyclass model
is created.
4. polyclass now gives error message when classes are not specified.
Previously, if no classes were specified, polyclass would crash
SOLAR.
**** New in version 7.1.3
1. Linked with new version of RicVolumed.a libraries.
**** New in version 7.1.2
1. EVD2 models now initialize beta variable boundaries in the same way as
is done for standard models. Previously, boundaries were set by the
default mechanism as applied to the Stage 2 model, with EVD
transformed variables, which generally resulted in larger boundaries
than desirable. Boundary setting for EVD2 models is now done in Tcl to
avoid having to go through the internal data packing required for
"maximize -initpar" which would be simpler to program but less
efficient. This is the first time the boundary algorithm has been
coded in Tcl.
2. polyclass and sporclass now have arguments -intrait and -incovar to
inormalize all trait and class variables respectively. -incovar is
not currently working due to mu command limitations that will be
corrected in the next version.
3. Previously, polyclass simply omitted all covariates if the -g option
was used. This has been fixed.
4. inormal now has a -class option, which restricts the sample and the
means to individuals with the specified class number.
**** New in version 7.1.1
1. Old style text file line termination is now detected when phenotypes
files are opened, and an error message is given:
File old.txt has unsupported text line terminators
Use retext command to fix file before using
It is not possible to handle this issue automatically as is done
with Windows style text files. But now a translation program is
provided...which is really just a call to the system program tr.
Previously using files like these would produce this useless message:
Short record in input file
2. New command "retext" translates files in old Mac style which is still
used by some Mac programs to modern Mac and Unix/Linux style.
**** New in version 7.1.0
1. imout command added to select and enable binary image output. See
"help imout" for command description and options.
2. The fixupper and fixlower parameter options introduced in version
7.0.7 is extended to multivariate polygenic models, and also
to a few other obscure cases that were not working as intended.
3. imout now enables image output for the polygenic command. See
"help polyimout" for how this would be used and what volumes are used
for each type of polygenic analysis results. Currently only univariate
polygenic models are supported.
4. polymod no longer forces all models to start with h2r set to 0.1.
Instead, if a model already has valid e2 and h2r, they are left alone.
This does not change the operation of the polygenic command because it
always starts with a sporadic model anyway. However, a future update
to polygenic will save the initial variance component parameter values,
if any, in order to use them to start the polygenic model
**** New in version 7.0.9
1. Some bivariate multipoint runs would fail with an error:
"No parameter H2r has been created." This was because one
section of the retry code had not been generalized to
multivariate parameter names, such as H2r(q4).
**** New in version 7.0.8
1. Parameters now have optional fixed upper and/or lower boundaries,
specified with the "fixupper" or "fixlower" identifier. For
example: parameter h2r fixlower 0.1 fixupper 0.5
If a fixed boundary is set, it will not be changed by automatic
boundary adjustment procedures. However, it can be reset to
another fixed or non-fixed bound by the user. Regardless of whether
a fixed or adjustable boundary is in effect, the regular boundary query
returns the boundary value. However, the "fixlower" or "fixupper"
queries return null ("") if the current boundary is not fixed. Fixed
boundaries are saved to and read from models, so long as the solar
version is 7.0.8 or later.
[this feature is not yet fully supported for mulitivariate models when
running polygenic or polymod]
2. mga now has -fixupper and -fixlower arguments which control the
boundaries of the snp beta parameters.
**** New in version 7.0.7
1. An error in accessing the RicVolumeSet object while reading
binary data has been fixed.
**** New in version 7.0.6
1. The mask command is created to read binary image files as masks and
use them to set the current voxel value, which is then used in the
reading of binary image phenotypes.
2. The voxel command can be used to set or read the current voxel value.
3. The format for specifying binary image files within CSV files has
changed. Now all fields which point to binary files must identify
themselves as type "nifti" in the header line following a colon
delimiter from the field name. For example, a valid CSV header
could look like this:
ID,age,count:nifti
As with the field names, the field type is not case sensitive. Other
types are currently ignored. Once a header like the above has been
read, all the non-blank fields identified as type NIFTI must have fields
which are the filenames of RicVolumeSet files in NIFTI format. Following
the filename there must be a colon separator followed by the number of
the Volume Set (volset) corresponding to the individual in that record,
for example:
A001,19,images.gz:1
**** Version 7.0.5 was preliminary to 7.0.6.
**** New in version 7.0.4
1. RicVolumeSet is now accepted for phenotypic data. To use this feature,
the regular text phenotypes file has alphanumeric field(s) which specify
the filename of the RicVolumeSet and the 4 arguments required to
identify the specific voxel. The leading delimiter to specify this
kind of data is "<" preceding the filename and ":" for delimiting each
numeric index field required. For example, to specify file named
input.gz, volume 1 and 2 for ID's 1 and 2, and x,y,z coordinates 2,3,4,
you would have a phenotypes file like this:
id,qtrait,voxel
1,5.0,>40)
covariate sample()
This would have the effect of blanking records for which the age
value is greater than 40. Since the "sample()" covariate is
declared as having null trait, it is not actually included in
maximization, but helps to delimit the allowed sample. There is
nothing special about the definition name "sample", any other name
could be used.
The blank constant must be multiplied only by 0, 1, or a conditional
expression as shown or multiple conditional expressions. If blank is
operated on numerically, it will just become a small number with no
special meaning. The actual blank value is -1e-20 which has always had
this special meaning in Fisher, Mendel, and other classic programs.
4. read_output now has a -d option which reads whether a variable was
determined to be discrete by maximize or maximize -initpar
**** New in version 6.6.0
1. The Pearson Residual has been corrected by subtracting from the 0,1
scaled discrete trait value, so it is actually a residual now rather
than the predicted value. You get Pearson Residuals automatically
when you run residual on a discrete trait model.
2. New command "maximize -initpar" does not maximize but reads the
phenotype variables, determines the sample, and initializes
parameter starts and bounds exactly as a normal maximize does.
3. If an attempt is made to maximize a discrete trait model for which
the omega has not been defined, the user gets a more helpful message
as with quantitative trait models that commands polygenic or polymod
will do the required setup. Previously, the error message complained
about the lack of SD constraint, which was not helpful because typically
without calling polygenic or polymod there is no SD parameter to be
constrained.
4. If an attempt is made to maximize a quantitative trait model for which
the omega has not been defined, the message used to suggest using
the polygenic command but now also suggests polymod, the command which
only does setup rather than a full test.
**** New in version 6.5.9
1. Within EVD2 phase 2 models (using evd2 transformed variables) option
MergeAllPeds, if present in the original model, is not included
because it is not applicable to the phase 2 models which are all
unrelateds.
2. Within EVD2 phase 2 models, all the transformed covariates are scaled
to zero, as required by the mathematics.
3. Within EVD2 phase 2 models, the phony phi2.gz matrix containing all
unrelateds is no longer loaded as it is not needed. This makes the
models more efficient (matrix loading skipped) but does not change
the results.
**** New in version 6.5.8
1. mga (mgassoc) no longer requires the -noevd option to
evaluate an evd2 model. Simply give the command
"option modeltype evd2" before running mga. In this
case, standard errors may be specified with either -evdse
or -slowse arguments, either way it is the evd2 standard
errors for an evd2 model.
**** New in version 6.5.7
1. mga (mgassoc) now handles multivariate models. There is
a beta value, a beta se, and a varexp value for each
trait in the output file. The chi square is evaluated
as the number of parameters, and there is one beta
parameter for each trait, so the p value is computed
accordingly. To use evd2 give the "option modeltype evd2"
prior to running mga, and specify the -noevd or -slowse
option to suppress the evd1 default.
2. mga (mgassoc) output data columns are reordered so that
NAv (number of available individuals in sample) now follows the
SNP. Otherwise the columns are unchanged, except that the last 3
(beta, betase, and Varexp) are repeated for each trait if there
are multiple traits. In the comma delimited output file, the
trait names are suffixed, for example, Varexp(q4). In the fortran
style tab delimited output file, a dot suffix indicates the trait index
if there are multiple traits, for example, Varexp.2 for trait #2
(because otherwise the columns could be too wide to be readable).
3. A small problem with snphap fixed.
**** New in version 6.5.6
1. EVD2 model translation from the EVD domain back to the original user
model has been simplified for better speed, reliability, and
maintainability. Now it is simple: the original user model is
re-loaded, then the maximized values from the phase2 model are
assigned to model parameters, and rather than parsing the phase 2
model file to get the parameter values, they are now put into a Tcl list
after maximization. The mapping between the original user
parameter names and the phase 2 names is remembered through an
associative array. Previously the translation code attempted to parse
and convert the entire phase 2 model file back to the user's model,
which was very complicated and could fail in some unusual cases, and
in all cases the parameters would get re-ordered (which no longer
happens).
2. parameter command has new option "parameter -return" that returns all
current parameter information as a Tcl list.
3. matrix command has new option "matrix -return" that returns all current
matrix information as a Tcl list.
4. Trait specific covariates (such as "covariate age(q4)") are now
translated correctly from the original user model with _evd applied
both to the covariate variable name and the trait qualifier.
**** New in version 6.5.5
1. EVD2 trivariate omega fixed.
**** New in version 6.5.4
1. EVD2 now correctly preserves matrices, constraints, and options, and
correctly translates them to the EVD2 parameterization and back.
As a result, sporadic models are now analyzed as sporadic, and so
on. Previously all EVD2 models were treated as polygenic, and
pre-existing constraints were ignored and eventually lost.
2. As a result of #1, some EVD2 models that converged incorrectly will now
not converge. Models including squared covariates like age^2 appear
to have difficulty converging.
3. EVD2 now correctly preserves starting values and bounds for covariate
parameters. As a result, retry operations which move these bounds now
work as intended.
**** New in version 6.5.3
1. The bivariate omega used for EVD2 now applies abs() to all parameters
for which a sqrt() is taken, to prevent domain errors from negative
numbers very close to zero.
**** New in version 6.5.2
1. EVD2 now handles interaction covariates and covariates with exponents
correctly.
**** New in version 6.5.1
1. Users of Condor parallel system can specify -condor option when
starting SOLAR. Then the userid will be obtained through Condor.
The key should then be specified with a -key argument, which
must follow the -condor argument. For example:
solar -condor -key userkey
Condor users should not change the current directory (using cd) from
the default either before or while running solar.
**** New in version 6.5.0
1. The EVD2 omegas have been simplified, reducing the need for sqrt
operators and eliminating sqrt entirely in the univariate case.
**** New in version 6.4.9
1. EVD2 maximization is now seamlessly integrated so that it seems
just like regular maximization. You just set "option modeltype evd2"
and run "maximize", "polygenic" or other model maximizing command
and you end up with a normal looking model, with the original
phenotypes and pedigree files loaded, hiding the way that temporary
transformed variables and special parameters were actually used during
the second phase of maximization. Results from the EVD domain get
translated back into the original model with the original pedigree and
phenotypes.
2. The EVD2 transformation of the sex variable was incorrect in the
same way as other transformations were prior to version 6.4.8.
That was fixed, but it was still not correct until changing the
interpretation of male,female as 0,1. Now it works.
**** New in version 6.4.8
1. EVD2 now appears to produce correct phenotype transformations, and
EVD2 maximization is now a reality. However it is still very clunky
to use (see "help evd2"). That that it actually works, EVD2
maximization will be streamlined in the next version.
2. "mga" is the new official name of the mgassoc command. Files such as
mgassoc.out are renamed mga.out. The old command can continue to be
used. Output file is mga.out for "mga" command and mgassoc.out for
"mgassoc". The command mgassoc_topedsys is renamed mg_topedsys.
3. If the user key is found to be invalid (not matching username) the error
message now prints the username, actual key filename, and key string.
When using batch queing systems, sometimes a different username or
home directory is used and this will help sort out problems more quickly.
**** New in version 6.4.7
1. tmean values corrected in evddata.out (see help evd2). Both the
tmean values and the lambdas produced match the reference data.
This shows the eigenvalues and vectors are being computed correctly.
But the transformed phenotypes are still not computed correctly.
2. The stats command now accepts files with Fortran D style exponents.
3. d2e and d2e2 commands handle conversion of D style exponents to the E
form more generally understood. Generally this is not needed anymore
since SOLAR understands D style exponents in phenotypes files.
**** New in version 6.4.6
1. invert inverts the rows and columns of a comma delimited file.
2. timediff returns the difference between two system time strings.
startclock and stopclock conveniently measure time using timediff.
3. countfields shows the largest and smallest record sizes in a file,
helping determine if it is internally consistent.
**** New in version 6.4.5
1. The command line editing feature no longer requires the creation of
a temporary "starting" file when starting solar, eliminating the danger
of these files accumulating.
2. A recent update could lead to multipoint failing with "invalid command
name last_maximize_goodlod". This is fixed.
**** New in version 6.4.4
1. Additional changes to the command line editing feature added in version
6.4.1. It is now possible to turn it off in 3 different ways. This is
in case it causes problems or inefficiencies in parallel or batch
operations (though no such problems or inefficiences have yet been
found). 1. You may invoke solar with the leading argument -noce.
2. You may define a shell variable SOLAR_noce to any non-null value.
3. You may run solar with arguments (solar commands) which is one
variation of batch mode (non-interactive) operation.
2. Parallel margin is increased to 3500 to prevent problems with current
parallel operation on medusa2 when too many machines are started at
once.
**** New in version 6.4.3
1. Fixes to the command line editing feature added in version 6.4.2 in
certain obscure cases. For example, when TERM=dumb as inside the
emacs shell mode, in which case command editing is impossible anyway,
rlwrap is no longer run, avoiding a useless error message.
**** New in version 6.4.2
1. Command line editing is now available. This means you can recall
previous commands with up arrow, and use right and left arrows to
position cursor and edit them, just as with most command shells.
A program called rlwrap is used to provide this capability, and it
uses the readline library.
**** New in version 6.4.1
1. An improvement has been made to solar file input routines
for better speed with large files.
2. Use of matrix checksums created with matcrc is now documented for
the ibd and mibd commands. Previously it was only mentioned within
the documentation for the checksum in section 8.3 of the
manual which actually concerns the phi2.gz matrix file. The
documentation for matcrc itself has also been clarified.
3. If a user has not registered, the "please register" message now
prints the short form of the user's login name which is needed
in order to obtain a working SOLAR key. Many systems now hide this
login name behind the user's full given name.
**** New in version 6.4.0
1. Negative LOD scores during some multipoint runs have been fixed by
adding a new set of retry strategies if the likelihood is smaller than
the null model likelihood. The actual retry strategies and means of
controlling them is described by "help maximize_goodlod". This is a
private procedure not normally documented in the manual.
**** New in version 6.3.9
1. The "load phenotypes" command now only shows the first two lines of
phenotypes for each file. If there are more phenotypes, the list is
ended with an ellipsis (...) and a comment is given that you may show
all the phenotypes with the "phenotypes" command. Previously, for
files with thousands of phenotypes, the mere act of listing all the
phenotypes could take a long time, and such a long list would not be
ordinarily read anyway. This makes the "load phenotypes" command
much faster in these cases. It also keeps the your terminal session
from scrolling off the window. A phenotypes file may have up to
40,000 phenotypes.
2. The method EVD2 uses for ensuring IBDID's are in increasing order has
been greatly improved in efficiency for large pedigrees. It used to
require NxN data shuffling, which is time consuming for large
pedigrees. Now the data ordering requires less than N operations.
3. The testing of duplicate individuals for EVD2 is no longer an NxN test,
but a N test based on the assumption that pedindex.out is in sorted
order, as it always is when created by the "load pedigree" command.
Also, unnecessary pedigree tests during maximization are skipped for
EVD2.
**** New in version 6.3.8
1. IBDID's in evddata.out are now in consistently increasing order, and
as a result the IBDID and lambda's match those in the reference dataset.
**** New in version 6.3.7
1. When a specified phenotype name is found in several loaded phenotype
files, the error message now correctly identifies the phenotype name
rather than calling it (null).
**** New in version 6.3.6
1. Precision for output of parameter values to model files and script
queries is increased from 10 to 16 digits. This fixes some model
problems identified with mgassoc. The precision can be controlled
with a new option ParameterFormat which defaults to 16.
**** New in version 6.3.5
1. mgassoc now handles discrete models which had not previously been
maximized. Previously you would get an error about SD not being
constrained.
2. mgassoc now updates the "mgassoc_start" model to reflect changes needed,
such as polygenic omega and constraint of SD for discrete models.
This makes it faster to create new "mgassoc_null" models when needed,
avoiding incomplete maximizations, and resulting in a 1% overall
increase in speed in a large test.
**** New in version 6.3.4
1. mgassoc now tests sample regardless of whether using EVD or not. So in
non-evd cases, it need not re-run null models for each SNP *unless*
sample has changed. Previously it was only able to test the sample
if evd was used, and therefore was forced to run null models for each
SNP with discrete traits.
2. -runwho is a new option for "maximize" which does the maximization and
produces a who.out file listing all the individuals in the analysis.
**** New in version 6.3.3
1. If option EnableDiscrete is 0, an otherwise discrete trait univariate
model can now be handled by EVD. This also means that such models
can be handled by the default EVD mode of mgassoc. HOWEVER, a
preferred way of handling discrete traits as if they were quantitative
is to multiply by a useful factor to increase the standard deviation
above 0.5, such as: define qt = dt * 5.
**** New in version 6.3.2
1. mgassoc will detect model types which do not permit EVD1 processing
(discrete and multivariate models) and revert to -noevd mode
automatically. This prevents errors resulting from sample
differences when user forgets to specify -noevd.
2. By default, mgassoc no longer runs standard errors, and the standard
error field in the output file will be all zeros in this case. There
are two mutually exclusive standard error options, -slowse and -evdse.
-evdse runs EVD if possible, and gives a warning that EVD1 computation
of standard errors may be inaccurate in some cases. -slowse runs
standard maximization for the model being analyzed.
3. "option samplesametrustme 1" may be used to bypass null models for each
SNP when EVD is not being used (either because -noevd is specified or
not applicable). This allows faster processing for non-EVD models.
EVD models always do automatic fast sample checking, and new null models
are forced by sample changes. The -notsame switch to mgassoc overrides
option samplesametrustme. Previously some of these features were
documented but not working.
4. mgassoc output for null models is streamlined and better formatted to
make it easier to follow null models and SNP results.
5. If EVD1 is not used during maximization because model is discrete or
multivariate, this is noted in the solar.out file. Previously this
information would only appear in terminal output.
**** New in version 6.3.1
1. bayesavg changed so that will run with any N value rather than crashing
with insufficient memory for N > 25. It now only computes the first
few million combinations upfront, adding more as needed. The -stop
rule set with -qtn will hopefully be satisfied before needing to
compute and maximize more than the 30 million combinations (which
would take years anyway) which would exhaust memory with too many
combinations.
2. mgassoc p values and EVD1 (option modeltype evd) likelihoods fixed for
certain difficult examples by making using the proper quantitative conv
value. Previously conv(discrete) was being used even though these
models are actually quantitative.
3. Added -saveall option to mgassoc so that all solar.out files are saved.
4. Changed occurrances of sfbrgenetics to txbiomedgenetics.
5. Changed matrix file CRC to start with "0" so that matrix writers
can line up decimal point of CRC with their data
6. Changed the display format of the omega (print ) operator to remove
trailing imprecision 99999's with matrix single precision data.
**** New in version 6.3.0
1. The Pearson residuals (which you get when running "residual" with a
discrete trait) have been corrected.
2. EVD2 now writes transformed phenotypes file correctly when a defined
trait is used. Previously, it messed up the file header. Thus you can
now use inormalized traits.
3. "help evd2" is now available though it doesn't show up in command list
because this is still being debugged.
4. If the sample includes no individuals of any kind, you no longer get
the confusing error message "No non-probands have complete data" which
is especially confusing if you have no probands in the first place.
Instead you get this message:
No individuals have complete data. Check that ped and phen IDs match.
It mentions the possibility of ped and phen mismatch because that is
one error that can cause this, say, if the formatting of ID's is
different, as sometimes happens. The situation where there ARE
individuals in the sample, but they are ALL probands, gets a different
message:
No non-probands have complete data. Every ped must have one non-proband.
5. Updated internal contact addresses to solar@txbiomedgenetics.org.
**** New in version 6.2.9
New features which assume use of Grid Engine software as at Texas Biomedical
Research Institute.
1. A new parallel option "ignore" is added to deal with users who submit
jobs running in automatic fallback mode. Such jobs previously restricted
machines available for "stepup -par" and "stepfor -par" which could not
compensate for automatic fallback jobs. Other parallel parameter
defaults have also been adjusted a bit.
2. New command "howmanyuser" shows how many jobs a particular user is
currently running on parallel resources. To see jobs run by everyone
use existing command "whoranch".
3. Online documentation of whoranch and howmanyuser added.
**** New in version 6.2.8
1. Sometimes, usually because of some mistake in datafile preparation,
a trait ends up having just one non-blank value in the phenotypes file.
When that happened in previous versions, you used to get a very
confusing error message (terminating maximization) complaining about
discrete trait coding which many people could not understand because
they weren't intending to have a discrete trait anyway.
Now if a trait has just one value, you get error that a trait has just
one value. The error message also suggests a new option SingularTrait
which you can set to 1 to have such trait(s) considered discrete, and 2
to have such trait(s) considered quantitative (as now documented in
'help option'). It was never possible to maximize such models before.
2. The determination of whether a trait is considered discrete or
or quantitative is now tightened up to the documented rule, and it
is possible to maximize models when a quantitative trait only has two
non-blank values. Now to be considered "discrete" the two non-blank
values do actually have to be integer values (as documented) which
means having no non-zero fractional part. (This is not affected
by the presence or absence of decimal points in the data file or
"data types" in a code file.) If there are just two values in the
file, but either they are not integers or they are not consecutive
integers you will just get a warning and the trait will be considered
quantitative. (Previously if there were just two values, integer or
not, but they did not have a difference of 1, you got an error about
discrete trait coding and maximization could not be done.)
3. The doc command now simply returns URL's where the solar manual may
be found locally and at the Texas Biomed SOLAR website. It no longer
tries to run Netscape.
**** New in version 6.2.7
1. Very rare internal error in 'chi' command fixed. This only appeared
on Linux system we currently use, appears to be GNU Fortran compiler
optimization bug.
**** New in version 6.2.6
1. EVD2 has been completely rewritten for efficiency and transparency in
Fortran, attached directly to the main phenotype input and sample
sizing routine pinput and called immediately after sample is determined.
2. Phenotypes files are now permitted to have fortran "D" style floating
point numbers, so the new Fortran-written evddata.out file can be
read
3. EVD2 now creates fake pedindex.out and pedindex.cde and phi2.gz. (It
also saves the old files for later restoration via evd2_restore_phen.)
Thus the evddata.out file does not need to be "loaded," a potentially
very time consuming operation for large pedigrees.
**** New in version 6.2.5
1. Fixes pedselect for multiple pedigrees on linux. Fixes "model new"
not clearing out entire list of pedselects.
2. Bypasses writing unbalanced trait individuals to evddata.out.
**** New in version 6.2.4
1. Additional changes for EVD2. lambda variable is added to evddata.out.
Model file required for second phase is automatically generated.
**** New in version 6.2.3
1. Additional corrections to evddata.out files for EVD2.
**** New in version 6.2.2
1. Highly experimental EVD2 now generates evddata.out files for
multivariate models.
**** New in version 6.2.1
1. Highly experimental EVD2 is now available by specifying
"option evdphase 1" in addition to "option modeltype evd". This
is intended for internal development only at this time.
**** New in version 6.2.0
1. mgassoc has been rewritten to meet the needs of more users. nGtypes is
no longer used (wasn't always useful) to outline the sample. Instead,
the first SNP is used as a null covariate in the null model to accomplish
this. If all SNP's have the same sample, everything works as before.
However if any SNP has a different sample, this is now detected, and
it forces a new null model for each such SNP. The user can also
specify -notsame as an option. Note: automatic sample validation is
not available with -noevd option, so -notsame mode is assumed unless
you set option SampleSameTrustMe 1. Because of reduced memory usage,
the MergeAllPeds option is no longer used by default.
2. EVD model processing has been improved in a few ways. The ID validation
no longer uses phi2, it simply compares IBDID's, which is far faster and
uses less storage (an NxN comparison of floats has been replaced with an
N comparison of ints). Thus in many cases there is no significant
improvement by using the SampleSameTrustMe option anymore. A new
option, DontAllowSampleChange forces an error during EVD processing
if the sample *has* changed. That error is now caught and handled by
mgassoc automatically (as described above).
3. evd flush is a new command that flushes out all memory used to store
EVD matrices. This should not be needed generally, except it is
also invoked by "load pedigree" to allow for the fact that IBDID's
could change rendering stored values incorrect.
**** Versions 6.1.1 through 6.1.9 skipped
**** New in version 6.1.0
1. The selectrecords command now permits the use of Tcl variables with
$$ operator. The single dollar sign $ operator is used for file
fields.
2. The documentation for using "snp effnum liji" has been clarified.
**** New in version 6.0.9
1. The option pedselect can be added to using a new + modifier.
For example, to include 2 pedigrees 1 and 2 you can do this:
option pedselect 1
option pedselect + 2
To clear list of pedigrees selected this way, you can select 0 again:
option pedselect 0
If option pedselect is currently 0, you can start pedselecting either
with or without plus sign, instead of the first example shown above,
you could now do this:
option pedselect + 1
option pedselect + 2
The list of pedselects is written to file when model is saved, and
restored when the model is loaded. Like most other options,
model new will restore the default value 0 which means all pedigrees
are included.
**** Version 6.0.8 skipped
**** New in version 6.0.7
1. Discrete and mixed-trait models cannot currently be analyzed by EVD.
If the "option modeltype evd" command is given for a discrete or
mixed trait model, it will now have no effect, other than to generate
a warning message in the maximization output file.
2. Mainly intended for testing purposes, a new option "SampleSameTrustMe"
disables the phi2 comparisons normally done during EVD processing.
This could make a series of EVD maximizations faster, but only if the
user is absolutely sure that the underlying sample of individuals has
not changed. Note that the selection of a different set of covariates
often changes a sample. Testing has shown that bypassing the phi2
comparisons during large scale mgassoc processing improves performance
by about 1%, which is not considered worthwhile in view of the added
risk of user error, except inside well written scripts.
3. If "premax" verbosity (0x800ff) is selected, the size for each EVD
matrix will be reported.
4. residual now completes successfully when some individuals are missing
the variable sex and sex is a covariate. Previously it would abort on
encountering the first such individual because previously missing
sex was not allowed.
5. residual now computes residuals for discrete traits (previously you
would get an error message). These residuals are adjusted to be
"Pearson Residuals."
6. Added documentation of options for "string plot".
**** New in version 6.0.6
1. mgassoc command added for measured genotype association analysis.
2. "z" format added to fformat to permit output files having identical
precision in fixed and csv formats.
3. ped2csv command added to convert pedsys files to comma delimited format.
4. selectrecords now ignores BLANK fields and elides them from the
comma delimited output. Previously BLANK fields would cause an error.
This fix makes it possible to use selectrecords to reformat pedsys
files to comma delimited, as in the new ped2csv command.
5. mgassoc_topedsys converts mgassoc.out from comma delimited format
to pedsys format for those more comfortable with handling pedsys
files. This can be used after the mgassoc analysis has been run.
**** New in version 6.0.5
1. "fastmod" sets up polygenic model with evd for fastest maximization
using evd.
2. iteration information is not output for normal EVD models to improve
speed. To restore iteration information, use "evdd" (EVD Debug)
instead of "evd" in "option modeltype".
**** New in version 6.0.4
1. Omega equation only evaluated once for each likelihood evaluation
for normal diagonals (phi2=1), and for all non-normal diagonals.
**** New in version 6.0.3
1. Omega equation only evaluated once, unless sample has diagonal
members with self phi2 is not 1.0. (Evaluation only, replaced by
approach used in 6.0.4.)
**** New in version 6.0.2
1. EVD model maximization available for sporadic models.
2. Omega equation only evaluated for diagonal elements where i and j
are equal.
**** New in version 6.0.1
1. A redundant internal copy of the phi2 matrix created during EVD model
maximization is eliminated. This reduces memory requirements greatly
for very large pedigrees and may even prevent a SOLAR crash on some
machines. Performance improved by 2x over 6.0.0 version.
**** New in version 6.0.0
1. EVD modeltype with greatly enhanced speed for polygenic models.
To enable this experimental feature, give the command:
option modeltype evd
The options is only applicable to polygenic models, for all other
models this option will be ignored.
2. .solar_model_new file may be created by user, if present this file
will be executed at the beginning of a SOLAR session and whenever
the "model new" command is given. This file must only contain the
following types of commands:
option trait parameter covariate constraint omega define
Any other command, or any invalid command, will cause the entire
.solar_model_new file to be rejected and an error returned.
**** New in version 5.0.0
1. Version 5 was developed to improve maximization performance by
used pre-compiled FORTRAN code for the polygenic omega. This
was successful in boosting performance about 3x. However, the
EVD modifications used in version 6 boost performance by about 60x.
Version 5 was only used for certain internal projects.
**** New in version 4.4.0
1. zscore_ prefix now available in define statements to specify
zscoring of prefixed phenotype. Such definitions can be used
as traits and covariates. Mean and Standard Deviation are
obtained from the actual maximization sample. This is meant
to replace the old and clunky zscore command (though that
command is still available). "help zscore" describes the new
prefix operator.
2. -rhopse option added to polygenic to compute and save SE of
rhop to SOLAR_RhoP_SE.
[Versions 4.3.4 through 4.3.9 were reserved for development of 4.4.0]
**** New in version 4.3.3
1. If a rho or other variance component is constrained, it will
not trigger a retry if is also at a real boundary.
**** New in version 4.3.2
1. Use of x_variables (variable means) in mu and omega commands for
discrete and mixed models is now fixed (never worked before).
**** New in version 4.3.1
1. qtld now writes the output file qtld.out with a one line header
identifying the field columns. Existing scripts which read this
file should either skip one line or skip the line which begins
with "Trait" (the first column).
**** New in version 4.3.0
1. snp ld -plot is fixed so that it reflects the latest updates to the
map file (as loaded with the load map command). Note that it is
still not necessary even to load a map file, but if you do it will
control the horizontal axis of the plot.
2. joinfiles was supposed to ignore fieldnames duplicated in a single file
such as BLANK which can't be selected anyway. However, if such a
field occurred an odd number of times, it didn't get ignored, leading
to problems. This has been fixed.
3. The version of selectfields from version 4.2.9 (which allows
multiple files including the loaded phenotypes file) could not
handle a few cases that the old version could: files in which
there is no actual ID field. So a -noid option has been added
to selectfields which makes it work like the old version. Also,
selectfiles no longer complains when loaded phenotype files are
also listed explicitly.
4. Clarify the rules regarding the user-construction of matrix files
in the manual and documentation for matrix and matcrc commands. The
data values should begin in character position 14 or higher.
5. Added option -d to polymod.
**** New in version 4.2.9
1. stepfor -par parallel option for stepfor, similar in design to
parallel option for stepup. Only available on SFBR compute ranch
using GridEngine.
2. Useless parameter rhoe is removed from the rhop test model created if
the -testrhop option of polygenic is selected. Otherwise, this would
cause errors in standard error estimation if later attempted by user.
3. multipoint -restart was not updating models that had previously
terminated with convergence error, but maximized correctly this
time. This has now been fixed.
4. At the beginning of each pass through the selected genome, multipoint
calls a user script named multipoint_user_start_pass which takes one
argument, the pass number (which starts at 1 for the first pass).
Within this routine, the user can change the selected chromosomes or
interval.
5. Now you get an error message if you try to compute residuals for
other than a univariate quantitative model. That was the original
design of the residual command, but it was inadvertently broken by
the addition of new model types. You also now get an intelligible
error message if you try to compute residuals for a non-univariate
model, or any model that does not have a mean parameter.
6. Documented the fields in qtld.out and mgeno.out
**** New in version 4.2.8 ****
1. selectfields can now work with multiple input files, and includes the
currently loaded phenotypes file(s) by default. New option -np
excludes the phenotypes file, option -list allows you to specify a
list of fields from a file, and -sample limits the output to
individuals with complete data.
2. solar can be started with -key argument to specify user key.
This is required for some job queing systems when .solar_reg file
cannot be accessed. This feature was originally introduced in version
2.0.4 but was broken by a later update.
3. New command showspace uses new command option "doranch finduser -all"
to create a sorted list of all /tmp space on the ranch.
4. New options -dash and -linestyle for stringplot (plot -string).
5. Fixed using .CDE files in parallel stepfor. Previously code files
had to have .cde extension in lower case for this procedure.
6. To keep standard errors turned off during the polygenic command, you
can now simply give the command "option standerr 0" before running
polygenic.
**** New in version 4.2.7 ****
1. New option -finishlogn for stepup allows completion of previous stepup
run which failed due to inability to compute logn.
2. stepup now reports log(n) actually used in stepup.avg as well as the
incredibly verbose stepup.history.
3. New retry method in multipoint prevents many convergence errors.
4. "snp ld -plot" is now working again, having been broken by snp update
in version 4.2.6.
5. Changes have been made to "plot -all" to make it useable again.
plot -all now produces and saves all the individual chromosome plots,
and a page of "miniplots" if there is more than one chromosome plot.
It does not try to display the postscript files, but instead lists
the names of the postscript files produced, and suggests using lp to
print them. It also suggests the alternative genome plot command
"plot -string" which most people have been using instead.
6. stepup -par now handles problem of running out of space in
/var/tmp (using by Unix sort) by using tmp subdirectory of home
directory instead, if necessary.
**** New in version 4.2.6 ****
1. If snp.genocov is created outside of SOLAR, the maximum number of
markers allowed by "snp ld" and "snp qtld" is now increased from 3000
to 10000. The limit is now tested rather than causing error.
2. Changed the default method for computing the effective number of
SNPs to that by Moskvina & Schmidt, which is more conservative. The
previous method due to Li & Ji may now be selected by option. And even
results from Li & Ji method would have previously been incorrect because
of an error in reading snp.ld.dat file. Note that all these "effective
number" calculations are highly experimental at this time.
3. snp.ld.pos was previously written when genotypic correlations are
computed. Now it is written whenever LD plot is created, so that
any changes to the SNP locations in the map file will show up.
4. A check has been added to be sure that there is no SNP in snp.genocov
that is not also in the loaded map file, if any. Previously a missing
SNP in the map file would cause a crash.
**** New in version 4.2.5 ****
1. For qtld, mean results will be output to the measured genotype file
mgeno.out even if the standard errors cannot be calculated.
Individual standard errors which cannot be calculated will be
reported as zero. Slightly negative values for h2m will be
trancated to zero. Other error handling has been improved.
2. "snp ld" and "snp qtld" commands can now be used even if the SNP
genotypes file is not loaded. All that is required is the snp.genocov
file produced by "snp covar".
3. "snp ld -plot" can now be done without loading a map file. Without a
map file, however, dummy sequential integer basepair locations will be
used.
4. "snp ld" now has a window to limit the number of correlations that
will be computed. By default, the window is 1 million basepairs wide
so that only those pairs of SNPs separated by no more than 1 million
basepairs will be considered. The window size can be adjusted with
a new -window option. A map file with basepair locations must be
loaded to use this option.
5. New command "snp effnum" will estimate the effective number of SNPs
based on the genotypic correlations among the SNPs. This is an estimate
of the number of independent statistical tests performed using these
SNPs and can be used to determine an appropriate significance level
for the analysis. At this time the only method implemented is the
method due to Li and Ji.
6. Derived estimate of RhoP from version 4.2.4 was incorrect. It was
fixed just before Feb 19 so any result before Feb 19 should be
rerun.
7. Documentation added for define command so that it is
understood how to enquote phenotype names with special characters in
angle brackets <>. Documentation for constraint and omega commands
also added.
8. Under some circumstances, the snp command would produce an erroneous
error message about "Name not found: snp_d". That has now been fixed.
**** New in version 4.2.4 ****
1. polygenic now computes a derived estimate of RhoP, the phenotypic
correlation, for bivariate polygenic models only. An additional
-testrhop option determines the p value for rhop being different from
zero by maximizing models with a rhop parameter (rhop.mod) and
a rhop parameter constrained to zero (rhop0.mod). Global variables
SOLAR_RhoP and SOLAR_RhoP_P are set to access the rhop and p values
after running polygenic, and variable SOLAR_RhoP_OK should be
checked to ensure the likelihood of the rhop parameterized model
is correct.
**** New in version 4.2.3 ****
1. Under very unusual circumstances relating to retries with multivariate
models, or loading unmaximized multivariate models, the SD parameter
upper bound could get initialized incorrectly to 1. This could cause
multivariate models to maximize incorrectly to a likelihood which is
too small, however there are no reports of this actually happening.
Now the standard SD parameter bound initialization will be applied
whenever the current SD upper bound is obviously wrong.
2. When given an ambiguous command abbreviation, "help" was not helpful
in listing all the possible completions. Now it does.
**** New in version 4.2.2 ****
1. Redesign of overlapping commands "pedigree classes", "relatives", and
"relpairs" provides information in a more useful way. Now classes are
listed in descending order of phi2, and, by default, the counts
for relationships of 3rd degree or higher, as well as some 1st and
2nd degree relationships, are combined, except for "relpairs" and
if the -full option is used. When using -full, an additional option
-phi2 adds a column containing the phi2 value for each class.
A -model option restricts the scope of the tally to those individuals
who have data for the null0 model of current trait. There is also a
-meanf option for computing "Mean f". The "relatives" and "relpairs"
commands now simply invoke "pedigree classes" with options (-model
for both and -meanf and -full for "relpairs"). An error in the
calculation of "Mean f" has been fixed. If the -full option is used,
a warning is displayed if any of the relative classes cannot be
handled by SOLAR's native method for computing MIBD's.
2. Quite a few new relationship classes have been added, and missing
phi2 values inserted.
3. The array size limits in "mibd relate" (also used by "pedigree
classes" and related commands described above) are eliminated,
except for one limit which may be increased by the user with the
new -mxnrel option.
4. Error messages are now properly provided for "maximize -who" and
"maximize -sampledata", such as for missing covariate or missing
omega. And since these commands are also used by other commands
(such as "relatives" and "stepfor") those commands now show proper
error messages rather than terminating mysteriously on user errors.
5. "selectfields" command added select some fields from a file and
copy them to a new file. In addition to the previously existing
"selectrecords" and "joinfiles" command, this permits the user to
do basic database operations on comma delimited and Pedsys files
from within SOLAR.
6. "multipoint -restart" lost records in multipoint1.out under some
circumstances. That is now fixed.
7. "stats" can now compute statistics for variables created with a
"define" command. This also fixes the use of some other commands
with defined variables.
**** New in version 4.2.1 ****
1. Fix for a multipoint -restart problem.
**** New in version 4.2.0 ****
1. New -max option for stepfor sets the maximum dimension. stepfor
now displays Chi^2 and p value for each model tested.
**** New in version 4.1.9 ****
1. New Official version.
2. Cleanup routines for /tmp directories on SFBR compute ranch have been
added. See"help doranch". These are a set of utilities which can
also be used to create custom cleanup procedures, such as
the new "stepup -parclean" for cleaning up after an emergency
shutdown of stepup -par.
3. ID names in pedigree files were unnecessarily limited in size on
linux and mac distributions. Combination of ID and FAMID is now
correctly limited to 36 characters on all distributions.
**** New in version 4.1.8 ****
1. The residual command now properly accounts for snp_ and hap_ variables
(which are "noscaled" by default) and for the use of the "scale"
and "noscale" commands. Previously, residuals from models which
used snp_ and _hap variables or scale/noscale commands would
be wrong by a constant, which would be of little consequence for most
users (for example, it would not affect the kurtosis).
2. The file snp.ld.dat now always contains signed genotypic correlations
rather than the previous options (absolute value, squared). The
"snp plot" command now has options to display either the absolute
value or square of the correlations.
3. "pedigree show" now displays a new column, labeled #bit, which gives
the number of bits of complexity for each pedigree. This number is
equal to 2 x #non-founders - #founders.
4. Discrete traits were broken in experimental version 4.1.7 (only)
because the mean boundaries were not initialized correctly. That
is now fixed.
5. When snp names are misspelled in the list file, stepup used to fail
with only a "file missing" message. Now better messages are given, and
if a covariate variable is missing that message is shown. Also, all
names in list files (read with the "listfile" command) are trimmed of
extra spaces on the left and right sides.
6. A fix has been added for a potential divde-by-zero bug in "snp ld"
which would occur when everybody is missing one or the other of a
pair of SNPs.
7. "stepup -par" had two problems that are now fixed. If any snp name
in the snp list is misspelled, a cryptic error would occur. Now
that error is detected and reported. Also, for some user
environments, parallel stepup would not run because the program
name couldn't be determined from the usual convention, so a symbol
is used instead. The symbol, SOLAR_PRGRAM_NAME, had been created some
time ago in the "solar" startup script.
8. The "snp covar" and "snp qtld" commands now produces files
snp.geno-list and snp.haplo-list (respectively) which list
all the SNPs processed. These list files are useful for bayesavg
and stepup. They may be used as-is or edited to list the SNPs
desired in a particular case. Previously it was tricky to get
such a list.
9. The "snp ld" command now automatically computes and displayes the
"effective number" of SNPs, which is useful in correcting
association-test p-values for multiple testing.
10. New cleanup routines for removing /tmp junk from SFBR Ranch machines
have been added. See "help doranch" for details. Also "stepup -parclean".
doranch is an open-ended tool that permits users to easily write their
own cleanup routines.
**** New in version 4.1.7 ****
1. Trait and covariate names up to 80 characters can now be handled
without any problem. Also, the trait command now tests to ensure
names are not any longer than 80 characters to prevent problems
later. Previously the limit was 18 characters.
2. If a standard parameter (mean, sd, e2, h2r, h2q1) is given a starting
value by the user, but the bounds are not initialized, SOLAR will
initialize the bounds. This was supposed to happen in previous
versions but didn't in many cases.
3. If option maxiter is set to 1, which only provides the likelihood for
the starting values, no retries are done for boundary errors,
misconvergence, incorrect quadratic, or any other such issue in
maximization. This is generally what you would expect for having
maxiter set to one, and in this case all retries would be futile anyway
since the starting values are never changed no matter how many retries
are done.
4. Problem with freq command on linux fixed.
5. All the remaining error messages that prompted the user to report
problems to "solar@darwin.sfbr.org" have been changed to prompt
the user to report problems to "solar@txbiomedgenetics.org". These
were for very unlikely scenarios that have probably never occurred.
This change had already been made in all the documentation.
6. If restarting a multipoint analysis in which there were convergence,
boundary, or other errors, the results with errors will be removed
and replaced by the new results, rather than being left in as
duplicates.
7. The -logn option now works properly in stepup.
**** New in version 4.1.6 ****
1. Pedigree files need no longer include mother and father ID fields if
the "load pedigree" command is used with the "-founders" option.
2. Missing value is now allowed for sex. The missing value may be coded
as 0, U, u, or blank. If a parent is found with missing or incorrect
sex code, e.g. female for a father, the code is automatically fixed
in the pedindex.out file and a warning is given instead of stopping
with an error.
3. If a parent record is missing, a founder record is automatically
created in pedindex.out and a warning is given instead of stopping
with an error.
4. More robust handling of SNPs or markers for which there is no genotypic
data in the genotypes file.
5. The "snp ld" command now offers 3 options for correlations: rho,
abs(rho), and rho^2. rho^2 is now the default. The -locn option
is removed.
6. Maximum size of phenotypes (and tablefile) records is increased to
800,000 characters, appropriate to recently increased maximum of
40,000 fields.
7. It is now possible to specify terms having parentheses in constraints
without using angle brackets. For example, the constraint:
constraint + = 1
may now be specified much more nicely as:
constraint e2(q4) + h2r(q4) = 1
However, this is still not true for the omega specification, and
the polygenic and other commands still use the angle brackets by
default. Those qualifications will be addressed in future updates.
**** New in version 4.1.5 ****
1. multipoint -cparm is now much less likely to cause convergence errors
because the prototype model is reloaded at the beginning of each
chromosome, and whenever there is a gap greater than 11cm.
2. The linux edition version of mlink (used during ibd file creation) had
the default fastlink limits instead of the bigger SOLAR limits. This
has now been fixed. (For example, in the linux edition you used to be
limited to 25 alleles per marker instead of 100, and 1000 individuals
instead of 10000.)
**** New in version 4.1.4 ****
1. power now works for discrete traits, and a -prev option
has been added to specify the prevalence of the trait. A new
more accurate method of estimating power as a function of
observed h2r is used. The method for smoothing the power curve
has also been improved.
2. h2power now catches the error in which "simqtl.phn" appears to be
the loaded phenotypes file after an earlier h2power run was aborted.
3. chinc now catches errors that could occur in the case of a very
large non-centrality parameter (lambda > ~740).
**** New in version 4.1.3 ****
1. polygsd is now fixed for multivariate models. The problem was that the
external kinship matrix was not being loaded, which is required for the
standard handling of unbalanced traits.
**** New in version 4.1.2 ****
1. There is now a parallel stepup procedure (stepup -par) designed
for use at SFBR's gridware based GCC "ranch". This is highly
EXPERIMENTAL and not supported elsewhere.
2. Further improvements have been made to stepup so it produces the
same results as bayesavg in even larger cases. The "strict" rule
is now suspended until the very end. This keeps models that may
seed higher dimension models that appear in the final window.
3. The maximum numbers of phenotypes, parameters, and covariates
have been increased to 40000.
4. h2power now works for discrete traits, and a -prev option
has been added to specify the prevalence of the trait. A new
more accurate method of estimating power as a function of
observed h2r is used. The method for smoothing the power curve
has also been improved.
5. Added -qter option to "mibd prep" which extends the range of MIBD
estimates to the end of the chromosome when using imported MIBD's.
"mibd prep" no longer requires the map file to be ordered by
marker location.
6. Added "snp unload" command to unload both marker and map files. Error
handling is improved for the "snp load" command.
7. Added "map names" command which displays the names of the markers in
the map as a list on a single line. Added "map chrnum" command which
returns the chromosome identifier from the map file.
8. Previously only 2000 markers were allowed for MIBD computation. This
has been increased to 3000 markers.
**** New in version 4.1.1 ****
1. The EXPERIMENTAL stepup algorithm has been improved to capture
the same models as bayesavg, and it now creates window and average
files just like bayesavg. The new algorithm applies a looser selection
rule, including all models better than the null model, up to the
3rd degree of freedom, which can be set to other df's with the
-cornerdf option. THIS IS STILL EXPERIMENTAL.
2. Both stepfor and stepup now retain the fully_typed() covariate upon
successful completion, however it may be removed using the new
"stepclean" command, which also unloads the fully_typed.out phenotypes
file.
3. There was a problem during the "sort" phase of bayesavg on some linux
systems. This was supposed to be fixed in 4.1.0 but wasn't completely
fixed. Now it is.
4. Multiple runs of stepup or stepfor could cause a fatal error related to
the multiple loading of the fully_typed.out phenotypes files which is
now fixed.
5. polygenic and residual commands had problem with ID names containing
an embedded space. That is now fixed.
6. SOLAR now works on some 64-bit and other linux systems on which it
previously crashed during user registration.
**** New in version 4.1.0 ****
1. Important fix made for multivariate use of traits created with the
define command. Under certain circumstances, maximization would
fail with a cryptic message, or trait values would get replaced with
covariate values during maximization. The latter would typically
result in convergence errors, but could also cause extremely strange
results.
(Two basic circumstances would elicit this problem: 1) First trait
defined, second trait ordinary phenotype, combined with two or more
phenotypes files loaded simultaneously in which second trait is NOT
found in first phenotypes file but in a later one, 2) First trait
defined, combined with two or more other traits (trivariate or higher)
at least one of which is an ordinary phenotypic value.)
2. If zscore is in effect and the trait is changed without invoking
"model new", zscore will be canceled. This is considered more
intuitive than having "zscore" apply to subsequent traits by
default, but only if "model new" is not given. Note that it is
recommended to give the "model new" command before changing traits
anyway. (Note: more changes to zscore were made in version 4.0.9.)
3. On some of the newest versions of linux, bayesavg would not sort its
output file, resulting in incorrect operation. This has been fixed
by using the correct "sort" arguments depending on what system is being
used.
4. Full plotting support is now provided for Intel Macs.
5. The linux edition of SOLAR is now provided in two binary forms: static
and dynamic linked. It appears that some linux distributions work with
one better than the other, and vice versa.
6. Sun's newest compilers are used for the Sun/SPARC edition of SOLAR.
It has been tested, but keep your fingers crossed.
**** New in version 4.0.9 ****
1. zscore has been changed. The mean and SD used are taken from
the actual sample that will be used in maximization, instead of
from all trait values in the phenotypes file. Often the actual
sample is a smaller set, restricted by the availability of
covariates. zscore is also much less verbose. Now, only one
line is displayed for each trait showing the trait name, mean
and standard deviation. Even that line is not displayed if zscore
is called in a script, or if the "zs" abbreviation is used to
invoke the command. The warnings about zscore being obsolescent have
been removed.
2. stepfor command added to do forward stepwise covariate screening.
**** New in version 4.0.8 ****
1. Residual fixed when there are many many phenotypes files loaded
(total characters in filenames exceeding 1000). Residual now uses
currently active phenotypes files rather than list found in maximization
output file because the latter gets truncated at 1000 characters.
2. joinfiles -add and -list options added which permit the use of
wildcards, directories, and a file containing names of files to be
joined. These options also overcome the system limit on open files
so you can join thousands of files at a time. For example:
joinfiles -all phen.*
3. polygsd and linkqsd commands have been extended to support multivariate
models with the esd, gsd, qsd1 parameterization. In this version,
however, linkqsd doesn't work with trivariate and above...that will
require changes to the omega command in a future version.
**** New in version 4.0.7 ****
1. Version 4.0.7 is now an Official public release, the first
Official release since 2.1.4. All incremental updates that
went into 4.0.6 over several months are included. Early
beta copies of 4.0.6 did not include all of these updates,
so a new version number is now used.
2. Added warning about using the zscore command. See "help zscore".
**** New in version 4.0.6 *****
1. Definitions may now be used as covariates as well as traits.
This includes the ability to apply the inverse normal
transformation to variables used as covariates.
2. The variable "sex" may be used in all definitions.
3. Problems with multivariate discrete and mixed quantitative and
discrete models have been resolved though a refinement in the
conditioning step of the discrete likelihood evaluation code.
4. lodadj is now permitted for multivariate models.
5. multipoint -cparm now permits multivariate models
6. bayesavg -qtn and -cov now report h2q1 results if h2q1 is
present and not constrained to zero.
7. Maximization errors during qtld with some snps don't prevent it
from going on to other snps.
8. On Mac OS X, all forms of plotting are now supported.
See notes in README.Mac in Mac distributions dated
December 22, 2006 or later.
9. plot -title is now supported for regular multipoint plots. Also
plot -subtitle is available to set subtitle. For regular and
string plots either "" or " " may be used to specify a blank title.
10. For "plot -max -overlay", test for plot width is done correctly
to allow overlay of same or narrower width. Incorrect error message
for attempted wider overlay is corrected and reworded for clarity.
**** New in version 4.0.5 *****
1. Very large pedigrees now require much less memory for loading because
unrelated individual pairs are no longer included in the phi2.gz matrix
file.
2. simqtl no longer requires that user provided genotype files be in
exactly the same order as pedindex.out.
3. The command "ibs" had unknowingly been broken in Version 4.0.0, it is
now fixed. Probably nobody noticed.
4. Show actual file error when reading phenotypes file. Previously,
the "missing FAMID" error message was shown incorrectly in many cases.
5. toscript command is nicer. Command numbers and script name and -ov
argument can be in any order (see updated documentation for details).
Invalid argument formatting checked in advance. If there is an error,
a bad script (which prevents SOLAR from starting) does not get created.
6. pedlike now works for discrete and mixed models
**** New in version 4.0.4 *****
1. Mac edition of SOLAR 4 is now available (on request).
2. "newmod" command takes place of "model new", performing
"outdir -default" automatically, and optionally setting trait.
(model new command itself remains unchanged.)
3. The linux edition of SOLAR is now statically linked, in the hope of
resolving library conflicts that sometimes occur with linux.
4. max # markers increased to 3000, max # in a pedigree increased to 30000
5. "snp qtld" now 2x faster and -xlinked option allows X-linked SNPs.
6. "mibd prep loki" now defaults to letting Loki estimate allele freqs,
but -usefreq option forces use of currently loaded freqs.
7. map file need not be sorted by user, it is automatically sorted during
loading.
8. New deputy registration feature allows select sites to make
solar keys for local use only. See "deputy" command for more
details.
9. register command is more user friendly. You can overwrite invalid
key immediately, or choose to overwrite one believed to be valid.
Key is validated immediately. Lengthy "registration help" message
is only displayed if there is no .solar_reg file so that error
message can be seen easily.
10. Trait qualification for covariates now checked for redundancy and
inconsistency.
11. bayesavg defaults to polygenic model type if the omega is undefined.
Previously, bayesavg gave a confusing error message unless the
user knew to run the command polymod first.
12. It is now no longer necessary to give the command "verbosity min"
when running commands to compute freq mle's, ibd's, or mibd's. The
normal verbose output is automatically suspended during the operation
of these commands. Also, load freq and load marker have improved
error handling.
13. plotqtld command added to plot various qtld results.
**** New in version 4.0.3 *****
1. unbalanced trait-specific covariates now permitted in sample
Trait-specific covariate variables are now only required in the sample
of the trait to which they apply, so you get the maximum sample size
possible for every trait in a multivariate analysis even when you have
covariate variables that are only defined when one of the traits is
defined (i.e., unbalanced covariates).
Though they were handled as documented, there was a serious flaw in the
practical usage of the previous implementation of trait-specific covariates
which some might have overlooked. Sample sizes might have been
unnecessarily reduced because trait-specific covariate variables were
required even in the sample of traits to which they did not apply. For
example, the following did not work well if there were some drinkers who
never started smoking:
trait drink smoke
covar age_started_smoking(smoke)
Even though the "age_started_smoking" variable would never be used in
estimating the mean for "drink", it used to be required in the sample
for "drink" anyway, thus rendering moot the normal ability of SOLAR
to handle unbalanced traits. Now "age_started_smoking" would not
be required in the sample for "drink".
As is further described in Note 7 in the documentation for the covariate
command, in a UNIVARIATE analysis ALL covariate variables are still
required in the sample even if they are specific to a different
trait. This behavior may be changed in a future update, and for that
reason the use of trait-specific covariates is not recommended in a
univariate analysis. There is no real need for them anyway. Also,
variables which appear in a user-specified mu or omega command are (at
this time) required for all traits.
"null" covariates (such as clinic() ) will always be required for all
traits because "null" covariates were designed for the specific purpose of
restricting the sample to those having the null covariate variable.
2. unbalanced defined traits now permitted
SOLAR now only requires the phenotypic variables actually used in the
definition of any trait (using the "define" command) in the sample
of that trait. It determines the variables used in the the definition
of any trait, and specifically requires those variables in the sample
of that trait, but not necessarily any other trait in a multivariate
analysis. Missing any of the variable(s) used in the definition of
a trait is equivalent to missing that defined trait.
Previously, all the variables in all definitions were required in the
sample of every trait, even if those definitions were unused.
3. h2power
A new command "h2power" is added to compute the power to detect a trait
with a given heritability (h2r) rather the power to detect a given effect
size (h2q1). Currently this is restricted to a single trait (univariate).
There is also a new plot option, "plot -h2power" to plot the results.
See the h2power documentation for more details.
4. Upgrade to Tcl 8.4.12
For the first time in 6 years, the version of Tcl used by SOLAR has been
upgraded to version 8.4.12. The motivation for this was that the newer
TCL now supports 64 bit integers, which may be useful in dealing with
numbers and lists larger than 2 billion. But there are also many other
new features in the latest Tcl which we may in time come to appreciate.
Fortunately, performance and reliability enhancements in Tcl 8.4.x have
eliminated problems that arose in intermediate versions 8.1.x through 8.3.x
(some of those versions were notoriously slow, for example).
5. Fix embedded blanks in ID for inormal
Previous 4.x versions could not handle ID or FAMID fields with embedded
blanks using the "inormal" command or "inormal_" in trait definitions.
That is now fixed.
**** New in version 4.0.2 *****
1. NEW DISCRETE TRAIT CODE
The discrete trait code has been extensively rewritten and is far better
than before. The "mean" parameter now accurately tracks the "threshold"
as intended, and can range from -8 to 8 in standard deviation units
(which is about all that can be done in double precision). Operation of
the new discrete trait modeling code has been verified through extensive
simulation. Heritabilities and likelihoods may differ significantly from
previous versions when discrete traits are involved. The "mean" parameter
is now correctly initialized (when maximization starts) to the inverse
normal of the difference from 1 of the prevalence.
Unfortunately, the new code does not appear to handle multivariate mixed trait
(quantitative and discrete) or multivariate discrete models correctly.
Trait correlations appear to go too high. So in these cases, use an earlier
(or later) version of SOLAR.
2. "print" function for debugging omega, define, and mu
You can now use a "print" function in SOLAR omega, mu, and define statements.
This function prints and then returns the current value of its argument, which
can be any expression valid for the command. For example, in place of the
standard omega:
For example, in place of the standard omega:
omega = pvar*(phi2*h2r + I*e2)
You could have:
omega = pvar*(print(phi2)*h2r + I*e2)
and this would print each phi2 value as it used. An omega,
mu, or define statement can include any number of print functions.
They are evaluated in the order of evaluation, starting with the
innermost subexpression. If you simply want to print some value
without including it in the rest of the expression, you can multiply
the print function by zero, for example:
omega = pvar*(phi2*h2r + I*e2 + 0*print(delta7))
As each print function is evaluated, it is printed in your terminal,
and you may press RETURN to continue to the next. To skip past a lot
of prints, simply hold down the RETURN key.
3. globals for accessing polygenic results
The polygenic command creates 4 global variables which may
be accessed later (which is often useful in scripts). The
variables are:
SOLAR_H2r_P p value for h2r
SOLAR_Kurtosis residual trait kurtosis
SOLAR_Covlist_P list of p values for covariates
SOLAR_Covlist_Chi list of chi values for covariates
The covariate lists are created only if the -screen option
is used. All screened variables are included, regardless of
whether they were retained in the final model. Before you
can access any of these variables in a script, you must
use a "global" command. For example:
global SOLAR_Kurtosis
if {$SOLAR_Kurtosis > 4} {puts "Very bad kurtosis!"}
4. chromosome command options: all * show showm
It is now possible to select "all" available chromosomes for multipoint
scanning using the command "chromosome all" or "chromosome *". This will
include numeric as well as alphanumeric (such as 11p) chromosomes. Also,
a new command "chromosome show" will display all available chromosomes,
and a command "chromosome showm" will show all mibd files selected by
the current "chromsome" and "interval" commands.
5. custom omegas for trivariate and above models
It is now possible to create custom trivariate and above models without
including the previously mandatory rho parameters such as rhoe_12, rhog_12,
rhoc_12, rhoq1_12. Sometimes it is convenient to write custom omegas where
such rho's are implicitly assumed to be 0, 1, -1 or some other value.
Now, only if the particular rho appears in the omega, or if the
corresponding generic parameter (such as rhoe_ij) appears in the omega,
are the rho parameters required. Also, if a parameter required by the
omega is missing, an error message specifically identifies the
missing parameter by name (such as mean(q1).
Also, there are now extra generic rho's available for custom use. The
available generic rhos are now rhoa_ij to rhog_ij and rhoq1_ij to rhoq10_ij.
Thus, there are generic parameters rhoa_ij, rhob_ij, rhod_ij and rhof_ij
which have no previously reserved function.
Also, there are now special "si" and "sj" variables which return the indexes
of traits i and j (1..ntraits).
6. normal -inverse
Not to be confused with the "inormal" command and "inormal_" define command
operator (described below as "new in version 4.0.0") there is also a new
"normal" command which currently has only one option: -inverse. This will
return the inverse normal for any single p value.
7. maximize -out
Previously "maximize" only permitted "-o" and "-output" options to identify
the output filename. This was inconsistent with most other SOLAR commands,
which use "-out". So, maximize now allows "-out" also.
8. option BounDiff
One small part of the improvement in handling discrete traits is how
maximizing is done near parameter boundaries. Part of the improved
method may be altered for unusual "hard" boundaries using the
"option boundiff" command. See "help option" for details.
9. Missing mean parameter message
A missing mean parameter is now identified by name, such as
"Missing parameter mean(q1)" instead of by the confusing phrase
"Missing mean parameter for trait q1".
10. New mlink
A new version of mlink from the latest fastlink release is used during
mibd creation. This was necessary to run on recent linux releases, but other
than that, we are not aware of any operational difference.
11. qtld
A problem with snp names longer than 10 characters was fixed on Dec 13 2005.
12. define inormal_
inormal_ prefixed variables in define statements can now be used in expressions such
as:
define a = inormal_q4 * 10
or
define a = inormal_q4 >= 1
13. "ibd prep merlin" and "ibd import merlin"
Added commands for using merlin to compute IBD files.
14. "mibd prep" starting location 0
"mibd prep" now requests starting MIBDs with location 0 instead of
first marker.
**** New in version 4.0.0 *****
1. The "define" command
The "define" command enables you to define an arithmetic expression to use
as a trait. The expression may include phenotypic variables, math operators,
functions, and constants. For example, give a quantitative phenotype named
"q4" you could define an expression named "alpha" based on a normalized log
of that phenotype:
define alpha = 100*log(q4/11.43)
trait alpha
covariate age sex age*sex
polygenic -s
Now you can apply any desired transformation to a trait within SOLAR itself.
Definitions become part of models, and are saved to them and loaded from
them. Eventually, the use of definitions will be extended to covariates.
The math functions available include everything available in C++. See
"help define" in solar4 for a list.
By default, all definitions result in "quantitative" traits, unless
the top level operator is a comparison operator such as >=.
Comparison operators return 0 (false) and 1 (true) and so produce
"discrete" traits. The comparison operators are available in two
varieties: Fortran and C-like:
FORTRAN C-like
.eq. ==
.ne. !=
.gt. >>
.lt. <<
.ge. >=
.le. <=
(Note the use of >> for > and << for < in the C-like operators. This was
necessary to be compatible with other parts of SOLAR syntax: < and > are
already used.)
2. Comparison operators in mu and omega
The comparison operators described above are now also available for
use in the "mu" and "omega" commands.
3. Inverse Normal Transformations
Within the "define" command, there is a now a built-in "inormal_" (inverse normal)
transformation which has quickly become quite popular since it makes
traits with problem distributions (such as extremely high kurtosis)
behave as if having something close to a standard normal distribution.
define a = inormal_q4
trait a
See "help inormal" for a complete description of this transformation.
It is also possible to do relational tests such as >= on expressions
to produce expressions which are like discrete phenotypes.
There is also a separate command "inormal" which performs the inverse
normal transformation on a variable in a file and writes out a new file.
For traits, it is more convenient to use the "inormal_" operator in
a define command, but if you need to use this transformation for
covariates, you must currently use the "inormal" command to write the
transformed values to a file first. This should not be confused with
obtaining the inverse normal for a single p value, which is done by
the "normal -inverse" command described below.
3. plotqtn (and qtnm) handle huge X values better
The qtnm and plotqtn commands now handle large numbers (up to 2B
anyway) nicely. Tick labels are now kept in "integer" format as
long as possible (up to 2 billion), and extra spacing is allowed for
large X tick labels so they don't run into each other.
4. Phenotypes file change detected and handled automatically
Changes that might be done to the "phenotypes" file(s) after they are
first loaded are now detected and "fixed" at the beginning of
maximization. If necessary, the phenotypes file is automatically
re-indexed. Also, change in the field mapping of FAMID is detected
and handled automatically. Previously changing a phenotypes file
outside SOLAR or changing the field mapping of FAMID could have cause
SOLAR to crash or give incorrect results.
5. FAMID tested more consistently in multiple phenotypes files
Previously you might get inconsistent and incorrect results and even
cause SOLAR to crash if one phenotypes file has a field named "FAMID"
in it but the others don't.
Now the presence or absense of a FAMID field is handled more
consistently across all phenotype files as a group.
If any phenotypes file lacks the FAMID field, and FAMID is required,
that is properly detected as an error. If one or more phenotypes
files have a field named FAMID, but the ID's are unique anyway, that
will not be a problem.
6. maximize -sampledata
There is also a new option to the maximize command called
"-sampledata" and it causes maximize to write out the sample data,
including computed trait values (if applicable), to a file called
sampledata.out in the current maximization output directory. IT DOES
NOT MAXIMIZE, IT ONLY WRITES OUT THE SAMPLE DATA THAT WOULD BE
MAXIMIZED. This was needed in order to seamlessly compute the
"residual kurtosis" for the polygenic command when definitions are
used, but you might also find it useful for other purposes. It is a
comma delimited file that can be read by "solarfile" and "tablefile"
and could be used as another phenotypes file by SOLAR. This file
includes only the individuals included that would actually included in
the analysis.
8. Stringplot overlays
The "plot -string" (and "stringplot") command now accepts options
which facilitate overlaying multiple LOD plots. The -name option lets
you save any stringplot for later overlay usage. The -layers option lets
you add previous plots to any new plot. The -replay option lets you
display any combination of previous plots. Also, a -color option is
available for stringplots now (replacing the previous -colorname)
and accepting X11 defined color names. The specification of layers
may default color specification, use previously stored color specifications,
or make new color specifications.
9. Pedigree Matrix Checking
Matrix files are now automatically checked to be sure that they were created
using the currently loaded pedigree. To get the benefit of checking, you
must consistently use SOLAR version 4.x for both creating and using matrix
files. Otherwise, to allow both forward and backward compatibility, no
checking is done.
A polynomial CRC (Cyclic Redundancy Check), sometimes called "checksum"
but much harder to fake than a simple sum, is added to the beginning of
each matrix file as the 1,1 (or first) element. Since it is immediately
followed by the real 1,1 value, it has no effect on matrix operations
and is backwards compatible with previous versions of SOLAR. Currently,
the lack of a CRC is ignored to allow older matrix files to be used, but
if the CRC is present it must match. You can also bypass checking
by editing out the CRC line at the top of the matrix. The private
"matcrc" command is used to add CRC's to matrix files. The possibility
of the CRC not detecting a pedigree/matrix mismatch is astronomically small.
Since the checking is based on the actual data content, it is
unaffected by file copying and creation dates.
**** New in version 3.0.4 *****
1. Household effects are now supported for multivariate models with
more than 2 traits. (Attempt to run houshold analysis with
more than 2 traits in previous versions of SOLAR 3.x would have
yielded invalid results.)
2. When exporting ibd or mibd files, it is no longer necessary to
re-enter the ibddir or mibddir in the current session.
3. mibd export is now compatible with SimWalk2 Version 2.91. An option
"-version 2.82" supports the earlier version. Also, marker names
are checked for uniqueness in 8 characters as required by SimWalk2.
4. The snp and snphap commands no longer limit SNP allele names to only
one character. The snphap command checks for uniqueness in 8 characters
as required for SimWalk2.
5. Multivariate household models are now supported.
6. The "multipoint" command now works for multivariate models without
using the -cparm option.
7. The "qtld" command now produces a measured genotype file, mgeno.out.
The format of the output produced by qtld has been greatly improved.
Analyses run with SOLAR versions prior to July 19 2005 should be re-run.
An additional problem for big qtld runs was fixed on November 18 2005.
An error message "Unable to open simple output file" could appear.
A problem with snp names containing hyphen was fixed on Nov 28 2005.
8. mibd import/prep
"mibd import" now checks unrelateds for for non-zero MIBD;
Kosambi->Haldate conversion in SOLAR since Loki didn't do it correctly
9. ibd import
"ibd import" now checks unrelateds for non-zero IBD; -xlinked
option added to do appropriate checks for that case.
10. plot -min
The "plot -min" option now works properly, adjusting the scale and
removing markers that are out of range. You can use the -min and -max
options to zoom in on a particular part of the plot. Using the xmgr
menus to do this is likely to mess up the display of marker positions.
11. Mixing ID and EGO in multiple files
Loading multiple pedigree files where one used "ID" as the indentifier
and another used "EGO" previously caused a problem with "residual"
because the redundant EGO field was not removed from an intermediate
file by the joinfiles command.
12. joinfiles fixes
In addition to the above EGO/ID fix for joinfiles, another problem with
multiple occurrances of the same named field in the same file was also
fixed. This commonly occurs when PEDSYS files have a field named BLANK.
13. dominance
The command "dominance-notes" gives pointers on how to do analysis of
dominance (it's described in Section 9.4 of the documentation).
14. fformat
fformat is a replacement for the Tcl "format" command which enforces
field widths for floating point numbers better than "format". The
modified %f and %e formats do a better job enforcing field widths
by reducing precision slightly when required, and a new %y format
is added which works much %g format, except it reverts to exponential
format and reduces precision to maintain column width much better.
Even while maintaining column width as well as possible, fformat
prevents non-zero value from being printed as zero (except for the
true fixed width %f format). Also, a center justification feature
is added. fformat is used by qtld to make for nicer printing of
results.
15. allsnp hint
A hint is added to qtnm to use the allsnp command first.
**** New in version 3.0.3 *****
1. loaded "map" file used in qtnm and plotqtn
solar3 now uses a loaded map file for most stages of QTN analysis,
including the qtnm and plotqtn commands. In earlier SOLAR versions,
you would encode the SNP location into the snp name, but those names
did not match up with the common SNP names. So now the name and the
location are entirely separate, but related through a map file, which
is now a standard map file (though the "basepair" function would
typically, if not necessarily, be specified instead of Kosambi or
Haldane).
If you are using the old-style SNP names, and don't have a
map file for them, you can invoke the qtnm command with the "-nomap"
option.
2. qtld command does association analysis for SNPs
3. plot -max and -min arguments fixed
The -min and -max options to plot were not working as intended, but now they
are fixed in version 3.0.3. Previously, if you specified -max or -min,
several bad things might happen. Now:
a) Marker labels will NOT get pushed to the right of the plot window, and
possibly off the page.
b) Marker ticks outside the specified range will not be included.
Previously they could go on and on and outside the plot window.
c) The LOD curve will be truncated at the exact -max and -min points
specified. Now there is a 5% blank margin at the beginning and end of the
plot window, as intended and as there is when -max and -min are not
specified. This give you complete control over what the displayed range
is, and makes plots look neater.
4. allsnp
The "allsnp" command includes all covariates which are SNPs in the
current model. This makes it easier to run qtnm.
5. Plotting for qtnm (plotqtn) now scales the X axis automatically.
Previously the X axis was simply set by the qtn.gr file, which often
caused range and "too many tics" problems because people didn't bother
to edit the file as needed. Note: If you have a custom qtn.gr file,
it will override automatic scaling. I recommend you fetch a new
qtn.gr file from /opt/appl/solar/3.0.3/lib/qtn.gr and add your
customizations to that, or comment out the following lines by
prepending # in front of them:
@xaxis tick major 2500
@xaxis tick minor 500
@xaxis ticklabel start type spec
@xaxis ticklabel start 0
6. Memory leaks in simqtl and tablefile fixed.
7. Maximum number of MZTWINs per pedigree increased from 1 to 10.
8. Save Hardy-Weinberg Equilibrium (HWE) and allele freq Standard Errors
(SE) in external freq files created with 'freq save' which are
loaded when freq files are loaded.
9. simqtl gets kinship coefficients from phi2.gz rather than computing
on-the-fly to allow user-supplied values.
10. Fix bug in 'snp ld -plot' which caused command to fail in some cases.
11. (11-Apr-05) Fix error in implementation of algorithm for computing
QTLD covariates. Analyses using these covariates should be rerun
from scratch because this will affect results. Create QTLD covariates
file from SNP genotype covariates file rather than directly from SNP
data for efficiency.
12. Change snp plot x-axis label to "Nucleotide Position".
**** New in version 3.0.2 *****
1. Experimental support for bivariate models with "mixed" quantitative
and discrete traits or multiple discrete traits. Trait types should
be autodetected correctly now, so no option is required for simple
mixed trait models.
2. The "mean parameter" in a discrete trait model has changed so that
the sign is now reversed from before. The "mean parameter" in a
discrete trait model is now the distance of the threshold from
the mean, whereas previously it was the distance of the mean from
the threshold.
3. Multiple phenotypes files supported. Simply list them all in
"load phenotypes" command.
4. SNP processing is now available with snp and snphap commands.
5. -nose (skip standard error calculations) and -hwe (test Hardy-Weinberg
Equilibrium) options have been added to "freq mle" command. When -hwe
option is chosen, the test results are shown by the "marker show"
command.
6. "lod" command can compute multivariate LODs corrected to 1df for any
number of traits. "lod" command now allows arguments and options
with an improved syntax replacing the cumbersome "clod" command.
A new -v option shows exactly how the LOD is calculated.
7. "option discreteorder" can be used to reverse ordering for
multivariate discrete, or print out ordered data to file
discrete.out.
8. The -cparm and -ctparm options of multipoint now permit embedded
expressions as well as parameters to be printed to the output.
Also, a -se option has been added to multipoint and linkmod to
enable the computation of standard errors, which may be output
using either -cparm or -ctparm option. See multipoint documentation
for an example.
9. "loadbinary" command added to load Tcl binary packages. (Most Tcl
applications use a "load" command for this, but we already use "load"
for other things.) Many Tcl binary packages are available, including
one which provides command line editing.
10. selectrecords command added to select records which match one or more
conditions and copy them to a new file.
11. Properly handle c2 (house effect) boundaries in bivariate models during
multipoint (where they should float), do_boundaries, and perturb. Previously,
c2 boundaries were untouched. Identify null model as "household
polygenic" in bivariate multipoint (as for univariate multipoint).
12. SNP covariate names (after the snp_ or hap_ prefix) can now be
descriptive alphanumeric names, and can be mapped to their location
through a snp.map file handled by the snpmap command. See details in
the documentation for snpmap and qtnm. The qtnm.out file now has a
fifth column to allow for the SNP descriptive name, though plotqtn will
also accept the original 4 column qtnm.out files. The descriptive names
will appear at the top of the qtn plot.
13. Correctly report null0 values for bivariate models at the
beginning of the multipoint command, as for univariate. Previously,
if multipoint were started without having the null0 model in memory,
the values would not have been correctly reported.
14. Multivariate mixed and discrete trait types are now detected
properly when there are multiple phenotypes files.
15. Bogus warnings from perturb are now suppressed. When there is one
fewer constraint than there are variance components (such as in
sporadic models) perturb is not possible anyway, so perturb now
doesn't complain about it.
16. Attempting to re-load pedindex.out as a pedigree is disallowed and
there are some comments about this in the help for pedigree.
17. There is a new "snpdir" that works like ibddir and mibddir for snp
related information (though this is not supported yet by the snp
command).
18. Fields named PEDNO and GEN in the marker file are ignored, and the
help for marker now lists all ignored fields.
19. The snp command now creates "covariates" using the snp names instead
of locations.
**** New in version 3.0.1 *****
1. Support for multivariate up to 20 traits
**** New in version 3.0.0 *****
1. Added basic support for trivariate models in polymod, linkmod, maximize,
and multipoint (using -cparm) commands.
2. Added basic support for multivariate in polymod command.
3. Added basic support for multivariate in linkmod command.
4. Improved rho "tracking" by linkmod command for trivariate linkage
models
**** New in version 2.1.4 *****
1. A major advance in maximization: most convergence errors are "fixed"
internally by the mechanism previously used to handle very large
negative changes in likelihood: a step is taken 0.1 times the
previous stepsize. (This applies to quantitative traits; a similar
change was made last year for discrete traits, but it seems to
work even better for quantitative traits.)
2. New -ctparm option for multipoint works more reliably than -cparm
in some cases by rebuilding every linkage model from the prototype
model.
3. Maximum number of constraints increased from 99 to 999.
4. Bug in sorting probands when there is more than one proband per
pedigree (or group, if "house" command is used) is now fixed.
PREVIOUSLY YOU WOULD HAVE GOTTEN INCORRECT RESULTS IF THERE WAS
MORE THAN ONE PROBAND PER PEDIGREE OR "HOUSE" GROUP! This bug
had been present in SOLAR for many years.
5. joinfiles command added to join data files horizontally.
6. Non-existant mibddir or ibddir doesn't cause error message on startup
or cause mibddir.info file to get deleted. Error is detected
and gives nice message when twopoint or multipoint is run.
7. Correctly report null0 values for bivariate models at the
beginning of the multipoint command, as for univariate. Previously,
if multipoint were started without having the null0 model in memory,
the values would not have been correctly reported.
8. When bivariate sporadic models don't converge on the first attempt,
the "perturb" operator now doesn't violate constraints. It properly
checks for constraints when the constrained parameters are <> quoted.
**** New in version 2.1.3 *****
1. Clarification in the documentation that lodadj is not (yet) applicable
to bivariate models, and that lodadj is ignored when computing LODs
for bivariate models. Previously, lodadj was ignored when computing
1dF equivalent LODs for bivariate models, now it is ALWAYS ignored
when computing LODs for bivariate models, and a warning message is
shown when previously the LOD adjustment factor would have been
reported, such as in the multipoint.out file.
2. -font option has been added to string plot. For example,
plot -string -font *bold-r*
There is also a -titlefont option to just change the font used in
the string plot title. Also new: if the script "stringplotk" is
copied into the working directory and modified, it supercedes the
standard one in the SOLAR bin directory.
Available fonts are listed with the command "xlsfonts | more".
3. "residual" command now works properly with phenotypes files having
"ego" as ID specifier. "residual" command now also works properly
for phenotype names including "." (period) and "-" (hyphen).
4. As a result of the above change, the residual kurtosis is now
computed properly when "ego" is used as ID specifier. In any
case where residual kurtosis is NOT computed, a warning message
is given.
5. If marker file does not have FAMID field, but pedigree file does,
error messages from "load marker" will not attempt to include
FAMID.
6. Allele names are mapped to consecutive integers for input to
Merlin.
7. The power command will now automatically re-do a bad replicate
(when maximization had convergence failure) rather than just skipping
it.
8. A bug in the rewrite of the "relate" code (introduced in version
2.0.8 Beta 1) was fixed. In an inbred pedigree, this bug could have
resulted in an incomplete enumeration of the ways in which a pair
of individuals are related. For example, a parent-offspring pair
who are also 1st cousins once removed might have been classified as
simply parent-offspring. The effect of such a misclassification would
have been limited to an incorrect tally of relative classes by the
"mibd relate" command. Since SOLAR is not able to compute its
approximate Multipoint IBD's for an inbred pedigree anyway, this
bug could not have produced incorrect linkage results.
9. Multiple solar jobs running in parallel from the same working directory
used to have a crash caused by "phenotypes.info" file being deleted and
recreated on startup. This now no longer happens, phenotypes.info
file is only deleted IF there is an error with an explicit "load
phenotypes" command, and it is only re-written if when a different
phenotypes file is selected. IT IS STILL NOT GOOD TO CHANGE phenotypes
file loaded IN A PARALLEL JOB, but simply re-specifying the phenotypes
file already loaded is harmless.
10. Likewise, use of mibddir or ibddir command in a script would cause
multiple jobs running from the same working directory to fail because
state file was always being deleted and rewritten. Now it is deleted
ONLY if there is an error, and re-written only if changed. IT IS
STILL NOT GOOD TO CHANGE ibddir OR mibddir IN A PARALLEL JOB, but
simply re-specifying the directory already selected is harmless.
11. Coefficients for class 49 (Double 2nd Cousins, 1 rem) have been added.
12. Bug in polygenic fixed: if all covariates are removed by screening,
"residual kurtosis" computation failed.
13. "mibd prep merlin" now correctly handles alpha allele names (i.e.
A/B).
14. "mibd prep simwalk" now handles consecutive integer locations in
the map file correctly.
15. If an imported mibd contains an ID which is NOT in the SOLAR pedigree
file, a helpful error messages is given.
16. "hap_" prefix has been added to qtn (in addition to "snp_") but
is not completely supported in this release because "noscale" is
not correctly being defaulted for "hap_". This will be corrected
in 2.1.4.
17. Buffer limitation in handling map files fixed.
18. Maximum number of markers per chromosome (or marker file) is
increased from 500 to 2000.
**** New in version 2.1.2 *****
1. Fixed bug which caused simqtl to compute a value of rhoe (for a
bivariate trait simulation) which was too small.
2. Fixed bug which caused SOLAR to crash whenever the omega includes
a x_ min_ max_ prefixed variable, or an _i _j suffixed variable when
that variable doesn't exist in the phenotypes file. Now you get an
explanatory error message, and another SOLAR prompt.
**** New in version 2.1.1 *****
1. All chapters of the documentation have been reviewed, clarified,
corrected, updated, enhanced, reformatted to the new style (if
necessary). The documentation for some commands has also been
clarified.
2. The change-notes have been updated.
**** New in version 2.1.0 Beta 3 *****
1. The restriction about giving matrices only lowercase names has
been removed. (This refers to their identifiers in the "matrix"
command, the actual filenames don't matter. None of the built-in
commands such as "multipoint" will do this anyway, but if you
are building up a model by hand you just might.) This bug had been
introduced in 2.1.0 Beta 1. You might notice the bug if you
manually tried to load an imported MIBD file with a command like
this (note uppercase "MIBD"):
load matrix mibddir/mibd.10.1.gz MIBD
**** New in version 2.1.0 Beta 2 *****
1. Documentation updates. The "help" command message summary and html
command index have been cleaned up. Chapter 7, which introduces SOLAR
scripting, has been largely rewritten and reformatted. The new commands
toscript and and showproc are described. There is an all new section 7.7
which explains how to break-out of SOLAR to run a Unix command with
wildcards. There is also a new section 7.8 describing commands which
might be useful in writing scripts. A number of previously undocumented
commands are now documented (with more intuitive names and interfaces in
some cases. The newly documented commands are:
read_model Read a parameter value or likelihood from a saved model
read_output Read variable statistics from a maximization output file
read_arglist Read hyphenated optional arguments
solarfile Read datafile allowing for optional field name mapping
is_nan Check if a value is NaN (Not a Number)
if_parameter_exists Check if a parameter exists
if_global_exists Check if a Tcl global variable exists
remove_global Remove a Tcl global variable
catenate Concatenate strings
string_imatch Case insensitive string match testing
remlist Remove element from a list by name
setappend Append only new elements to a list (keeping it like a set)
stringsub Simple verbatim string substitution
**** New in version 2.1.0 Beta 1 *****
1. When using imported MIBD files using the "-1" convention to indicate
pedigrees with no genotypic data, you would get invalid LOD scores
because the -1 convention was not being applied to MIBD files.
Warning: at this time, you must not give matrices uppercase names such
as IBD or MIBD1 when using imported matrix files. (This applies to the
matrix identifier name used in the model files, not the matrix filenames,
which have always been case sensitive.) None of the built-in commands
will do this anyway, but you might do this if you were editing scripts or
models by hand. There is now a case sensitivity bug bug in applying only
the -1 convention. This will be fixed in a future update.
**** New in version 2.0.9 Beta 2 *****
1. Starting lower boundary for bivariate e2 parameters is now 0.03 (or
"e2lower") just as for univariate. This artificial boundary prevents
some convergence problems by preventing heritabilities from shooting up
to 1.0 (where there are often singularities) in one go. If maximization
stops on this artificial boundary, the boundary is moved, ultimately all
the way to zero.
**** New in version 2.0.9 Beta 1 *****
1. Fixed Memory leak in "twopoint", and also any "load model/maximize"
sequence. Expansion per "load model" was equivalent to the length
of all ID's and FAMID's in the sample, typically about 20kB per
maximization. This could possibly lead to 100Mb or more of wasted
space in a long running analysis, and therefore also to
paging/swapping and a major slowdown. "multipoint" had not been
as much affected by this leak, since it doesn't much reload models
except when there are convergence problems. This leak had been
present since 2.0.0 in 2002.
2. Alphanumeric entries in phenotype fields are now detected as errors.
Previously, if you had an alphameric string such as "null" it would
get read as "0", leading to incorrect results. Also, invalid
exponents (such as "D4") and other alphameric suffixes will get
detected as errors. Previously, invalid exponents and alphameric
suffixes would be ignored, having the same effect as "e0".
3. Tabs in most comma delimited files, including the phenotypes and
pedigree files, are now ignored just as spaces are. (Previously,
because of the bug fixed by change 2, trailing tabs were
inadvertently ignored but not leading tabs.) Note that the commas
are still required; tabs are not (as yet) read as delimiters in
phenotypes and pedigree files.
4. Clarified comments regarding kurtosis (see: stats) and the warning
about high residual kurtosis (see: polygenic note 5). Also updated
html documenation to include 2.0.7 and 2.0.8 updates.
5. Updated classes.tab and relate.tab to reflect some new relationship
names and indexing.
6. Re-write of relate code so it handles all possible relative classes
correctly. Previously, some complex relationships could have been
incorrectly characterized. However, since such relationships fall
into the "Unknown relationship" category and are not supported for
computing SOLAR's multipoint IBDs, the problems with the old relate
code did not have any impact on previously obtained results.
**** New in version 2.0.8 Beta 2 *****
1. With all pedigree individuals typed, "ibd" would choose the "Monte Carlo
method" even with the "Xlinked" ibdoption. Now it correctly chooses the
Curtis and Sham method if you select the Xlinked ibdoption.
2. Memory violation with "ped show all" under certain circumstances fixed.
**** New in version 2.0.8 Beta 1 ******
1. Bug in freq mle (usually only seen on linux systems) fixed
2. Added -nostop option for bayesavg -qtn
3. Added c0v to bayesavg_cov.win file for bayesavg -qtn, showing posterior
probability of cov0.
**** New in version 2.0.7 Beta 3 ******
1. Bug in use of simqtl for bivariate traits fixed (caused crash)
2. Bug in loading haldane map fixed
3. Corrected max bits for Merlin and Genehunter
4. Enforced "ibddir or mibddir must be declared in session when writing
new ibd/mibd files" rule consistently.
5. Documentation Chapters 1-6 and 9 have been revised and formatted to
an improved style for clarity.
6. If simqtl genotypes were read from a file, that file would not get
closed. After multiple runs, user could run out of available files.
This bug has been fixed.
7. Output frequency of 10 chosen for Loki, which makes using Loki to
calculate MIBDs faster than it (Loki) was previously. This change
was recommended by Simon Heath, the author of Loki.
***** New in version 2.0.7 Beta 2 ******
1. Support for Merlin and Genehunter for mibd prep and mibd import
2. Added -byloc for mibd export to create a separate mibd export file
for each solar mibd (otherwise, the file can get so large as to
make file management difficult)
***** New in version 2.0.7 Beta 1 ******
1. Plotting for power calculation (plot -power) is now available which
plots power vs QTL heritability.
2. The restart for power calculations (power -restart) now permits changing
the number of replicates and grid sizes.
3. Dominance matrices (delta7 and d7) are now supported as intended by
twopoint and multipoint. Previously these "second column" matrices were
ignored by the custom parameterization (-cparm) options of twopoint
and multipoint.
4. Other problems with "second column" matrices are fixed, including one which
caused SOLAR to crash if you used one second column matrix to override
a previously loaded second column matrix, then did "model new".
Now, when you override a matrix though a second column entry, the old
one is deleted (or renamed if it is the first column and the second
column is loaded also) as required so that the result is unambiguous.
5. The -cparm options of twopoint and multipoint and twopoint automatically
seek out the highest indexed ibd* or mibd* matrix for scanning, which
means you can scan custom parameterized models with multiple linkage
elements.
6. The linkmod -noparm option has been renamed "-cparm" for consistency
with twopoint and multipoint. This has the new behavior required
to support dominance and other second column matrices, and multiple
linkage elements.
7. The broken "-saveall" option of multipoint has been fixed, and a
-saveall option has been added to twopoint. These options save
all the linkage models tested.
8. The "Advanced Modeling Topics" Chapter 9 of the documentation has
been updated, clarified, and the examples presented (regarding
dominance, household groups, and custom parameterization)
have been tested to actually work as presented. There is now
advice as to handle covariates with multiple "classes."
The command documentation for related commands has also been
clarified and updated.
***** New in version 2.0.6 Beta 3 ******
1. It is now possible to do "polygenic -testrhoe" and "polygenic -testrhog"
on bivariate household polygenic models. Also there is a new test,
"polygenic -testrhoc" that tests the rhoc parameter.
2. "twopoint -parm" has been renamed "twopoint -cparm" for consistency
with multipoint and the documentation.
***** New in version 2.0.6 Beta 2 ******
1. A "saved state" has been added to the field, mibddir, and ibddir
commands. This means that once you give one of these commands, it
remains in effect in all futures sessions within the same working
directory. The state is saved in the files field.info, mibddir.info,
and ibddir.info, respectively.
In the case of mibddir and ibddir, the saved state only works for
reading purposes, not writing new ibd or mibd files. In order to
write ibd or mibd files, you must explicity enter the ibddir or
mibddir commands within the current session. This is to prevent
you from accidentally overwriting previous files.
For the field command, it is now possible to restore a field to it's
"full default" state using the -default argument. For the ID field,
for example, the "full default" allows either ID or EGO.
1. For the polygenic command, the "Residual Kurtosis" test now works as
intended. Previously, you were only getting the stats on the trait itself,
and no test was done if there were no covariates. Now you are
actually getting the stats on the residuals, unless there
are no covariates, in which case you get the stats on the trait
itself. Also there was a bug in writing the first line of the residuals
file if the -q option (used by polygenic) was used. And the warning
about residual kurtosis being too high points you to some helpful
suggestions which have been added to the "help polygenic" message.
2. sporadic command now properly does variance due to all covariates and
computes sporadic residual kurtosis. A bug in the last update broke
these. However, sporadic does NOT save or overwrite null0.out. It
is not appropriate to use a sporadic model as the null in linkage
testing.
3. rhoc is used for bivariate household models in place of rhoh (since
household parameter is c2).
4. The residual command defaults to using the output file null0.out
rather than poly.out. This makes more sense.
5. The "residual" command will not run on discrete models. It is not
appropriate. Likewise, the polygenic command will not attempt to
report the residual kurtosis for discrete models.
6. If you are analyzing a discrete trait as quantitative, polygenic
will not now attempt to compute the Kullback-Leibler R squared.
7. The note about loglikelihoods and chi^2's being written to
polygenic.logs.out was been made neater and is written to the terminal
as well as the polygenic.out file.
***** New in version 2.0.6 ******
1. Bivariate household (household polygenic, etc.) models are now supported.
(VERY EXPERIMENTALLY...NOT BETA TESTED BY REAL USERS YET.)
Previously, you would get a cryptic error message if you attempted
to add household effects to a bivariate model. That was because
none of the "house" and "polygenic" code had any inkling of how to
deal with bivariate household models. Much work was needed to add
this capability, it kept falling to the bottom of our priority
list. Also there is a new "-keephouse" option which forces the C2
parameter into models even if C2 is estimated at 0 in the household
polygenic model.
2. "polygenic" now permits covariate beta variables to be constrained
to particular values. When a model contains such beta variables,
the "variance due to all covariates" or the "Kullback-Leibler R-squared"
cannot be computed. Also, covariates with constrained betas cannot
be screened, but other covariates in effect may be.
3. One unanticipated form of convergence error could cause multipoint
scans to crash. This convergence error would take the form of
constraints not being satisfied. It was hoped to be able to fix
the problem altogether, but that turned out not to be possible within
current numerical limits. But now the problem is correctly handled
as an individual convergence error, and multipoint scans continue.
4. Various bugs and inconsistencies in the "stats" command have been fixed.
It should not crash in any eventuality. In some cases, uncomputable stats
are correctly shown as NaN (not 0, or crash). Alphanumeric fields are
identified as such. Discrete fields are identified as such, and discrete
coding errors are identified correctly following the same algorithm as
during maximization.
5. Discrete trait handling improved: An inconsistency in the identification
of discrete variables has been fixed. zscore is permitted for "discrete"
variables when they are analyzed quantitatively (by setting option
enablediscrete to 0). When discete variables are being analyzed
quantitatively, a warning is given by polygenic. The discrete notes
have been updated to remove some areas of confusion. The warning
about discrete traits being analyzed quantitatively appears (just)
once in the multipoint.out file, and not in between every line of
multipoint scanning.
6. A bug in the use of the "field" command to permit a different label
to be used for famid has been fixed.
7. simqtl now properly closes input file after each use (eventually, you
could have lost the ability to open more files). map unloads current
map before loading new one. "mibd prep" checks for markers in map but
not in marker file, and also checks for zero recombination between
markers.
8. sporadic command has been merged with polygenic command to be more
consistent (and easier for us to update!). Some of the polygenic
output messages have been clarified.
****** New in version 2.0.5 or before *****
(Some of the following changes might actually have been made in releases
prior to 2.0.5, but were only documented for the 2.0.5 release.)
1. QTN
Quantitative Trait Nucleotide (QTN) analysis is provided for with the
"bayesavg -qtn" command and the qtnm command to do marginal testing and
plot results.
2. SOLAR user identification (matching key)
SOLAR KEY consistent with "su -c" (but possibly not "su")
Previous versions of SOLAR based the KEY verification on the
"username" or "login name" you ACTUALLY logged in on (regardless
of any "su" change) and the key found in the directory pointed
to by the HOME environment variable. In many cases, this worked
with "su" (because that doesn't update your HOME directory), but
not "su -c" (which updates your HOME directory). After much
agonizing, I decided it was more consistent to support "su -c".
So, the key can now match the identity you assume with with
"su -c" or a similar command, so long as the matching key is found in
the current HOME directory (which is pointed to by the environment
variable HOME...use the shell command "echo $HOME" to see it).
3. Cryptic errors when using the residual command when there are special
characters such as % in the phenotypes file have been fixed.
A bug involving the usage of special characters (such as %) in
the phenotypes file and the "residual" command
****** New in version 2.0.5 ******
Note: This is the 3rd release of SOLAR 2.0.5 (beta), properly known
as SOLAR 2.0.5 "c".
Here is a summary of the updates in version 2.0.5 as compared with 2.0.4
(details follow):
* trait command changed (-noparm not needed)
* Univariate models compatible with 1.7.3
* Alternate parameterization incompatible with 2.0.4
* ibd,mibd import and export
* zscore command
* stats command
* empp command
* New Batch Interface
* multipoint -saveall
* showproc command (really new)
* bad script identified
* missing FAMID checked for
* "undefined" phenotypes file
* multipoint -cparm
* multipoint -link
* esd,gsd,qsd parameterization
* spurious multipoint LODs fixed
* polygenic command tests residual kurtosis
* gridh2r
* d7 convergence problem fixed
* Automatically updates "last updated on" date
* multipoint -plot fixed on linux
Now, here are the details (some of them, anyway):
* trait command changed (-noparm not needed)
It is no longer necessary to give the "-noparm" option to the trait
command when setting up models with non-standard parameterization.
The trait command no longer adds or removes parameters in your model
to correspond with the trait(s). If you are attempting to change from
a univariate to a bivariate model or vice versa, and trait dependent
parameters have already been created, you are now given an error
message telling you that you must use the "model new" command first.
At that point, you cannot inadvertantly continue with invalid
parameters or the previous trait; any attempt to continue with a
maximization will repeat the message about the need for using the
"model new" command first. (The trait takes on a special value
"Must_give_command_model_new" in order to assure this. If you need to
recover this invalid model anyway, for the sake of previously maximized
parameters, you can save it to a file and edit it.)
You can still change from one univariate trait to another if desired,
but this is only recommended if the two traits have similar behavior.
During a univariate trait change, some parameter values and boundaries
will be adjusted. The documentation for the trait command gives more
details.
If you wish to set up the standard parameters (mean, sd, e2, h2r) for
the currently selected trait(s), you may use the "polygenic" command,
the "polymod" command (to set up the parameters without maximization),
or you may set up the parameters, constraints, and omega required by
hand. This was how it worked in earlier releases of SOLAR, but the
original solar2 assumed you wanted the standard parameters and set
them up for you whenever the trait command was given. I thought this
would simplify things for most people, but in the end this assumption
proved to cause too many problems and misunderstandings.
* Univariate models compatible with 1.7.3
As a result of the changes to the trait command, univariate models from
SOLAR 2.0.5 are now compatible with SOLAR version 1.7.3 and earlier releases.
* Alternate parameterization incompatible with 2.0.4
If non-standard parameters (such as esd, gsd, and qsd) are in your
2.0.5 model, it will not be compatible with 2.0.4, since that version
required the "trait -noparm" option for such models to be permitted.
* ibd,mibd import and export
The ibd and mibd command now have new options designed to permit the use
of ibd and mibd files from other software packages to be used in SOLAR, and
for SOLAR ibd and mibd files to be used in other software packages. The
new commands are:
ibd export
ibd import
mibd export
mibd import
mibd import
mibd prep
See the documentation for further details.
* zscore command
The zscore command will compute statistics about your selected trait
and then automatically "zscore" it during maximization. This is a
transformation in which the trait mean is subtracted from each trait
and then the result is divided by the standard deviation.
zscored = (value - Mean) / SD
The zscore status is stored in several model options (zscore, zmean1,
zsd1, zmean2, and zsd2, the latter two being for a second trait) and
can be adjusted if desired.
* stats command
The stats command can report basic statistics including mean, minimum,
maximum, standard deviation, skewness, and kurtosis about any (or all)
phenotypic variables. The default variable is the current trait. For
use in scripts, a helper procedure stats_get pulls named statistics out
of the record returned by stats.
* empp command
empp computes an empirical p-value from lodadj results.
* New Batch Interface
There is a new "batch" interface to solar. From the shell, you can now
invoke SOLAR with solar command(s) or the name of your script procedure
followed by its arguments, just as you would do at the "solar>" prompt:
solar2 [ []+ ]
For example, you could invoke the tutorial script "makemibd" like this:
solar2 makemibd
You can even string together multiple SOLAR commands using the semicolon (;)
separator:
solar2 trait q4 \; covar age \; polygenic -s
You need to put backslash in front of semicolons because otherwhise
the shell will interpret those as requiring the execution of new shell
commands.
You can still do things like this (as many Unix experts know):
solar2 << EOF
trait q4
covar age
polygenic -s
EOF
That is a feature of Unix shells and has nothing to do with SOLAR.
This is a change from how the argument(s) to solar2 were handled.
Previously, the arguments only allowed you to specify the NAME OF A
FILE containing a LIST OF SOLAR COMMANDS to process, not the name of a
SOLAR procedure ("proc"). This meant that if you wanted to execute a
predefined SOLAR proc, you had to create a second file simply
containing the name of that proc. This was very cumbersome and also
very hard to explain to people, who just wanted to use commands like
they do at the SOLAR prompt. Now you can do that.
* multipoint -saveall
The -saveall option of multipoint saves all models tested in a multipoint
analysis. It was present before but not documented.
* showproc command (was writeproc)
There is now a "showproc" command which will display any built-in or
user-defined procedure ("proc" or what most people call "scripts")
that is implemented in Tcl. You can also copy the script to a file.
The proc name will be suffixed with .copy so as not to collide with
the built-in script name.
showproc []
If you do not specify a filename, you will view the procedure in your terminal
window using the more pager. For example, if you want to view the
"polygenic" procedure (which is very complicated!), you could give the
command:
showproc polygenic
Even if you don't fully understand the built-in procedure, it is
sometimes useful to look at the code. Unfortunately showproc does not
show comments or the exact "pretty" formatting of the code. The real
code looks a little nicer than what gets shown.
* bad script identified
When SOLAR finds a seriously bad script when starting, it will identify
the bad script by name (rather than giving a cryptic message like it
used to).
* missing FAMID field in phenotypes file now checked for
Often people use ID's which are only unique in each family. When
doing this, they must also include a FAMID field to disambiguate ID's
in both the pedigree and phenotypes files. Previously, however, only
pedigree files would be checked for having unique ID's. Sometimes people
would forget to include FAMID in the phenotypes file, or forget to
name the applicable field in the phenotypes file as FAMID. Now
SOLAR checks for missing FAMID when loading the phenotypes file by
searching for duplicate ID's. If FAMID is already present in the
phenotypes file, or if the pedigree file does not have FAMID, this
check is not required. It is usually very fast.
* "undefined" phenotypes file
If there is an error loading the phenotypes file, the phenotypes file
status becomes "undefined," even if you exit from and restart SOLAR.
Previously, if you restarted SOLAR this "undefined" status would be
lost, and you might have continued with an obsolete phenotypes file.
* multipoint -cparm
The "-parm" option for multipoint has been renamed "-cparm" which signifies
"custom parameterization". See below for an example.
* multipoint -link
The -link option to multipoint allows you to specify a custom "linkage
model creation" procedure. This is useful if you use a custom
parameterization (such as esd,gsd,qsd) which is not supported by the
built-in linkage model creation procedure linkmod which uses the
standard parameterization. Two such procedures are "linkqsd" and
"linkqsd0" (see below).
* esd,gsd,qsd parameterization
The custom parameterization using esd, gsd, and qsd parameters (in
place of the standard e2, h2r, and h2q1 parameters) is now
EXPERIMENTALLY supported by built-in procedures polygsd, linkqsd,
linkqsd0, and gsd2h2r. Here is a sample analysis:
model new
trait q4
covar sex
polygsd ;# sets up esd and gsd parameters
maximize
save model q4/null0
gsd2h2r ;# reports equivalent h2r value
chromosome 9
interval 5
mibddir gaw10mibd
multipoint -link linkqsd0 -cparm {esd gsd qsd}
* spurious multipoint LODs fixed; polygenic h2r over 0.99
In some multipoint scans, positive LOD scores were being shown where
the h2q1 was zero. The true culprit was that the h2r in some polygenic
models was being improperly estimated as 0.99, which was not the
highest likelihood estimate. These invalid estimates were being
caused by singularities in the covariance matrix as h2r gets very
close to 1.0 when there are monozygotic twins.
This has been addressed in two ways. The default e2lower has been
raised from 0.01 to 0.03 which prevents the sum of all heritabilities
from being estimated over 0.97 on the first attempt. This stopped the
problem in the dataset at hand because h2r doesn't get close enough to
1.0 to cause singularity problems. Also, if h2r is estimated above
0.90 during the "polygenic" command, you now get a warning:
Warning. Unexpectedly high heritabilities might result from
numerical problems, especially if mztwins are present.
* polygenic command tests residual kurtosis
During a univariate polygenic analysis in which there are covariates, the
kurtosis of the residual is tested and reported. If it is above 0.8, you
get a special warning:
WARNING! Residual Kurtosis is 0.81, which is too high
* gridh2r
You run gridh2r after running polygenic to take a look at the
likelihoods for fixed h2r values around estimated h2r if you are
suspicious there was a problem with the h2r estimation. You can set
the upper bound, lower bound, and steps, or simply use the defaults.
* d7 convergence problem fixed
There was a bug in the code which was supposed to default missing d7 to the
delta7 value. This caused misconvergence in some dominance linkage
models.
* Automatically updates "last updated on" date
When starting, SOLAR now scans all the SOLAR binary files and the solar.tcl
file to determine the correct "last updated on" date shown in the
copyright message.
* multipoint -plot fixed on linux
Previously, multipoint -plot did not work on linux. In fact, once you did
a plot on linux, many things wouldn't work. That has now been fixed.
****** SPECIAL 2.0.4f update ******
1) h2q1 values for bivariate models were too low by an unpredictable
amount (some reports say a very tiny amount, but we're not sure
that's always true). LOD scores are unchanged. Sometimes convergence
is better.
2) hlod fixed for bivariate models or other models where phi2.gz is
explicitly included.
****** SPECIAL 2.0.4b update ******
This distribution includes the 2.0.4b bugfix update which fixes the following
bugs. When running, this version will identify itself as version 2.0.4,
but if you check the solar.tcl file it will be dated March 28.
1) p values for rhoe and rhog being different from zero were not being
calculated correctly. The previous values were too high. Better
(and correct) values are reported now.
2) If C2 in household polygenic models went to zero, the polygenic
command crashed. That has been fixed, and now also the
documentation clarifies that if C2 is 0, it is removed from the
model, regardless of whether the -screen option is in effect.
3) Some standard errors might be reported as 0.0000000. Now they are
reported as "not computable."
4) The Kullback-Leibler R^2 might be reported for non-discrete models
when the user has constrained SD to zero. Fixed.
****** NEW in SOLAR 2.0.4 *******
1. A bug disabling the variance component boundary management for bivariate
models, which was introduced in version 2.0.2, has been fixed in this
release. THIS MIGHT HAVE PRODUCED INVALID *BIVARIATE* MODELS and results
related to them. If you have any important models that were created by
release 2.0.2 or 2.0.3, you should do one of the following:
1) recreate those models using version 2.0.4
2) check all variance component parameters (e2 h2r h2q1 rhoe rhog rhoq1)
that they are not sitting on an artificial boundary. If you don't
know what this means, see option 3.
3) Reload those models in the new SOLAR 2.0.4, and run the
check_artificial_boundaries command. It returns a list of errors.
If it returns nothing, your model is OK.
2. Maximization of troublesome discrete models has been improved,
partly by relaxing the required convergence precision slightly, and
partly by handling "over the cliff" problems consistently in a
useful way. There may be very insignificant changes (in the third
or lesser signficant digit) in typical cases, but that is a small
price to pay for much greater chance of proper convergence in the
difficult cases. (I can still concoct some model which don't
converge, but it's fairly difficult.) If you are still getting
convergence errors with any type of model, please bring them to the
attention of the SOLAR developers at solar@txbiomedgenetics.org. We are
getting a better handle on these things.
3. Unsatisfied constraints involving more than one parameter (such as
e2 + h2r + h2q1 = 1) are now identified clearly and conveniently during
maximization and don't cause bouts of unnecessary retries and are not
mistakenly identified as "CONVERGENCE FAILURE." (Unsatisfied
constraints involving only one parameter are automatically fixed...that
has been a feature of SOLAR for some time now. The parameter is simply
set to the value it is constrained to. But with multiple parameters, it
is non-obvious how this should be done, and is probably best left to the
user, though I may invent some script to make it easier for the user to
deal with in the future.)
4. Bugs in multipoint scanning on linux have been fixed. Any convergence
error would abort the scan, and proper sorting of the output files wasn't
done.
5. "marker discrep" and "freq mle" will complete an entire list of markers
rather than terminating at the first error.
6. detection and reporting of various errors has been improved for pedigree,
marker, ibd, and mibd commands.
7. New relative classes have been added.
8. Internal Tcl search PATH has been simplified, which may improve efficiency
when NFS is used and eliminate conflicts on sites where other Tcl
versions have been installed locally.
9. Bivariate LOD now automatically adjusted for constraint of rhoq1 (or
rhoqN) if present. Minor improvements to clod and lodp interface.
Buggy auto-constraint of rhoq1 for convergence purposes eliminated.
10 Bug in perturb fixed. Perturb also now has multiple phases to deal with
difficult cases where there isn't much "room" between the boundaries.
11. -key argument to SOLAR permits operation without a permanent
.solar_reg file. This is useful for remote job execution under some
circumstances, and reduces related NFS traffic.
12. install_solar has been improved, now permitting you to give special
names to solar releases. Internally we call SOLAR version 2.0.x
"solar2".
******* NEW in SOLAR 2.0.3 ******
SOLAR 2.0.3 now allows the same flexibility for bivariate models which
had been available for univariate models. In addition, there is a new
command "toscript" which automatically writes your commands into
scripts, and support of arbitrary non-standard parameterization and
better "mu" editing. Details are discussed below.
1. toscript command (writes commands to a script file)
2. Trait-specific covariates
3. Null covariates (useful for locking-in phenotypes)
4. Trait change cleanup
5. Constraint cleanup
6. Arbitrary parameterization
7. -parm option for multipoint and twopoint (and linkmod -noparm)
8. Omega "(ti)" and "(tj)" variables generalized
9. Bivariate (and Univariate) Mu's
10. Bugs fixed
1. toscript
The new command "toscript" will automatically create a script based on
the commands previously entered in this session or a set of ranges of
those command. For example:
solar> trait q1 q2
solar> covar age sex
solar> toscript begin
Will create a script named "begin" including the two previous
commands. You may begin using the new script immediately, and it is
also saved in a file (begin.tcl in this example) in the current
directory. You may also specify ranges of commands to use in the
script. To see the numbered list of all commands (up to 200) in the
current session, use the Tcl "history" command. To overwrite a
previous script, use the "-ov" argument. For example:
solar> history
1 trait q1 q2
2 covar age sex
3 toscript begin
solar> toscript -ov begin 1-2
You can use any number of single command numbers or hypenated ranges:
solar> toscript analysis 1-5 11-15 25 20
As the script is being created, you also see it displayed on your terminal.
2. Trait specific covariates
It is now possible to specify exactly which trait a particular covariate
should be applied to. Previously, all covariates were "generic" and applied
to all traits, which was not always desired. To specify the trait to which a
covariate should be applied, include the trait name in parentheses following
the covariate. For example:
solar> covariate sex age(q1) age*sex(q2)
Here "sex" is a "generic" covariate to be applied to all traits, "age"
is only applied to trait q1, and age*sex is only applied to trait q2.
If you change trait(s), the specific covariate betas will change
automatically (see section 3). The help documentation for the covariate
command has been updated and has more detail.
3. Null covariates (useful for locking-in phenotypes).
In addition to trait specific covariates, it is now possible to create
"null covariates" which don't apply to any trait. These are not true
covariates in the usual sense, but serve the same function of
"locking-in" a phenotype so that only individuals including that
phenotype are included in the analysis. For example:
solar> covariate dob()
This forces the requirement that all invididuals in the analysis have
the variable dob, but no beta variable is created or estimated.
In the past, people have tried using "suspended" covariates (which
were really intended only for temporary hypothesis testing) and other
problematic ways to do this. "Null covariates" accomplish what is
required in an elegant way, and it fits in perfectly with how
trait specific covariates are handled when the trait(s) are changed.
4. Automatic trait change parameter cleanup
Previously if you ever changed trait(s) without giving the "model new"
command you could get into serious trouble, including fatal errors.
Now pretty much everything you absolutely need to do is taken care of
automatically. Covariates are "renewed" automatically. This means
that the old beta parameters are removed and new beta parameters are
created which correspond to the new traits. The covariate beta values
and boundaries are also reset (which is likely to be appropriate).
Finally, note that beta parameters are created only for those
covariates which are applicable to the new trait(s). Covariates which
were declared as specific to trait(s) not in the current model will not
have betas. For example:
solar> trait sex age(q1) age*sex(q2)
solar> trait q1
Since trait q1 is now in effect, beta parameters are created for sex and
age, but not age*sex.
solar> trait q2
Since trait q2 is now in effect, beta parameters are created for sex and
age*sex, but not age.
solar> trait q1 q2
Since traits q1 and q2 are now in effect, all three covariates are
applicable, and sex is applicable to both traits.
Even if a covariate is "inapplicable" to the current trait, the underlying
phenotypic variable is "locked-in." This is appropriate to most analysis
situations. If this is not what you want, you can always delete the
covariate or start from scratch with "model new".
The variance parameters, trait parameters, constraint, and omega are
also re-created as needed for the new trait(s). Optional constraints
on any of the above parameters are removed when the parameters are
deleted and then re-constructed. (The removal of optional constraints
might not be what you want in all cases, but it does prevent a number
of problems. For example, if a previous univariate trait was discrete
and the new trait is quantitative, the old constraint on parameter SD
would get you into trouble.)
If you don't want the trait command messing with any of your
parameters, omega, or constraint, use the -noparm option. In that
case you had better know what you are doing. You could end up with
models that will not maximize or worse.
Another alternative is that you can still use "model new" to start from
scratch when you change the trait.
5. Automatic constraint cleanup
Whenever a parameter is removed, contraints on that parameter are removed
also.
6. Arbitrary Parameterization
SOLAR is now designed so that the "standard" parameterization is no
longer required. Parameters also need not necessarily appear in a
particular order (as was also previously required). If you want to create
a model with arbitrary parameterization, be sure to use the -noparm
option of the trait command when specifying traits. You will notice that
models also include this option because they might or might not include
non-standard parameterization.
solar> trait -noparm q1
Otherwise the standard parameters will be created in the standard order,
which is more convenient for most people most of the time.
It is not even necessary to have "mean" and "SD" parameters in models
anymore. Of course, all models must still have "omega" and "mu"
equations; those are actually required for variance analysis to work.
The "mean" could simply be a constant in the mu.
To facilitate the use of linkage models with arbitrary parameterization,
the linkmod command (which builds linkage models) now has a -noparm
option. With this option, linkmod adds a linkage matrix to the current model,
replacing the previous matrix (if applicable), but does not touch the
the parameters, constraints, and omega in the model. It then becomes your
responsibility to set those things up correctly.
7. -parm option for twopoint and multipoint
To enable linkage scanning for models with non-standard parameterization,
the twopoint and multipoint commands now include a "-parm" option which
allows you to specify a list of alternative parameters to be displayed for
each model. For example:
solar> multipoint -parm {esd gesd}
This would print the values of parameters esd and gesd for each linkage model
(in addition to the loglikelihood and LOD, which are always included).
If you are using non-standard parameters, but there are no particular
parameters you need in the output file, you can indicate this with the
"empty list":
solar> multipoint -parm {}
When you use the -parm option, a non-standard parameterization is assumed.
The starting model MUST BE A CORRECTLY CONFIGURED LINKAGE MODEL. Scanning
is done simply by substituting the one ibd or mibd matrix after another
into the model (using the "linkmod -noparm" command). There must also be
a maximized null0.mod model in the output directory, but it is only read
to find the null loglikelihood for LOD calculation. It is not used as
a basis for creating new linkage models, since SOLAR only knows how to do
that for the standard parameterization.
8. Omega "(ti)" and "(tj)" variables generalized
Bivariate models require (ti) and (tj) subscripted variables in the
omega for each variance component. Previously these (ti) and (tj)
variables were provided for only e2, h2r, and h2q1. Now they are
available for ANY parameter for which there is a subscripted version
for each trait. This allows for household effect parameters and any
other variance terms you can imagine. For example:
solar> trait q1 q2
solar> parameter c2(q1)
solar> parameter c2(q2)
After defining c2 for each trait, you can use variables c2(ti) and
c2(tj) in the omega.
9. Bivariate and Univariate Mu's
Previously there was no practical way to use mu in bivariate models.
They were broken in several different ways. Now there are "t1" and
"t2" variables which reflect the trait being estimated at the time.
(For example, t1 is 1 if the first trait is current, 0 otherwise).
solar> trait q1 q2
solar> covar sex
solar> mu
mu = \{t1*(+*Female) + t2*(+*Female)\}
There is now a much better explanation in the help message for the
mu command, but I'll go over it briefly here.
First notice that the default mu has separate terms corresponding to
traits q1 and q2, and that they are activated by the t1 and t2
variables. Following this example, you can create custom mu's for
bivariate models. (You don't actually have to follow this example
precisely, you could repeat the t1 and t2 terms as many times as you
like. But I thought this clear division of the parts for each trait
is easier to read. I would have preferred to create entirely separate
mu's for each trait, but that turned out to be much harder to do. I
hope to do that in a future release.)
Second I'll explain why you see the "\{" and "\}" (quoted curly brace)
delimiters around the entire default mu. YOU DO NOT NEED TO ENTER
THESE DELIMITERS IF YOU ARE BUILDING UP A NEW MU FROM SCRATCH. They
merely delimit the "default" portion of the mu which is determined by
the mean parameters and covariates. Previously I used "[" and "]" to
delimit the default mu, but that made it impossible to use verbatim.
You always had to edit those characters out. Now, you can keep them,
and the default portion of the mu in, if you would like, simply by
enclosing that portion between "\{" and "\}" delimiters as shown. It
is convenient to leave the default portion of the mu in because it
automatically changes based on the covariates selected. (I suspect
people are using "residuals" and other features because they have
never known they could do this.)
Suppose I want to keep the default part of the mu, but add an additional
"log(g)" term. I can do that like this:
solar> parameter g
solar> mu = mu + log(g)
solar> mu
mu = \{t1*(+*Female) + t2*(+*Female)\}+log(g)
This will *augment* the default mu, which is (now) still shown when you
display the mu. (Previously it became the cryptic "mu" term, and most
people didn't understand what was going on.) You can keep on adding terms
to the mu in this way. You can also cut and paste the entire mu shown and
edit it as desired, for example, imagine that I have pasted then edit the
previous line as shown to change log(g) into log(1-g):
solar> mu = \{t1*(+*Female) + t2*(+*Female)\}+log(1-g)
solar> mu
mu = \{t1*(+*Female) + t2*(+*Female)\}+log(1-g)
SOLAR will accept this, and continue to understand that the portion between
the quoted curly brace delimiters is the default part which is dependent on the
covariates chosen. Editing that part will have no effect. If you want to
edit the entire mu, you must remove the delimiters.
10. HLOD now available.
Homogeneity/heterogeneity testing is now available in SOLAR. See the
documentation for the "hlod" command.
11. Bugs fixed
An cryptic error occurred when special characters (such as %) were included
in phenotype names and the "residual" command was used. An appropriate
error message is now given. Also there *may* be special characters in
the data file (for example, in strings) and the residual command will now
handle those properly.
Other bugs have been fixed also, but this is a very big update and it is
unfortunately quite likely that new bugs have been added.
Please report all bugs.
****** NEW in SOLAR 2.0.1 ******
Bivariate LODS are now reported as "1df equivalent" LODs which are comparable
to univariate LODS. See documentation for clod and lodp commands. You
should use the lodp command if you are intentionally constraining rhoq1
also.
pedlod and pedlike now provide accurate results so that when all
numbers are added up they should equal scores for the entire pedigree set
(except for small differences due to rounding errors). They are also
*much* faster, especially for large number of pedigrees.
There are now -testrhoe and -testrhog options for "polygenic" for
bivariate models.
ibd and mibd code is now fully updated with latest fixes (for x-linked
markers and some new relationship classes).
Probabilities are now reported with scientific notation if they get very
small.
linkmod -2p option for setting up twopoint models (replacing linkmod2p).
(Or, you can simply use the "twopoint" command; this now works).
To set traits with no parameters use "-noparam" option; this permits
arbitrary parameterization. Models are also saved this way, which is
why they can't be read by earlier releases.
Bivariate with FAMID fixed.
Do not use new "hlod" command yet. It is only experimental now.
Changes from the earliest distributions of 2.0.0:
Documentation is somewhat updated. The tutorial (Chapter 3) is revised
and corrected in a number of important places.
Too many fixes to remember.
**** New in version 2.0.0 ****
BIVARIATE ANALYSIS
Two traits may now be specified by the trait command, and the
resulting bivariate models may be used in standard SOLAR commands
as with univariate models.
"Unbalanced" traits (individuals having one trait but not the other) are
supported by default. However, there is an "UnbalancedTraits" option to
turn this support off is required.
X-LINKED MARKERS ARE NOW WORKING
Note: Changes between beta versions prior to version 1.7.3 are NOT
listed.
Changes to SOLAR 1.7.3 from 1.6.6
MUCH FASTER IBD CALCULATION
The Monte Carlo method of IBD computation (used when the Curtis and
Sham method is inappropriate) has been made faster by use of a
local likelihood approximation. In the imputation step, possible
genotypes are weighted by probabilities conditional on the
genotypes of immediate family members (parents, siblings, and
offspring) rather than conditional on the entire pedigree. For
large, complicated pedigrees the speedup can be quite substantial
(many fold). Though tiny differences might be observed in IBD
files, we have not found the approximation to change our analysis
results, which is the bottom line. Curtis and Sham IBD computation
has also been made faster through various efficiency measures.
MULTIPLE MARKERS MAY BE SPECIFIED
Multiple marker names may now be specified in IBD and MARKER
commands.
MATRIX MEMORY USAGE GREATLY REDUCED...FIXING SOME MATRIX BUGS TOO
When loaded into SOLAR, some matrices could become quite large:
hundreds of megabytes or more. Under some circumstances, you would
get an error about a matrix "not found" or "empty" when in reality
the problem was insufficient memory for loading. Improvements were
made to greatly reduce memory usage by matrices. In one typical
case, memory usage is reduced from 300 megabytes to 20 megabytes.
(The greatest reduction of memory usage comes when you have a large
number of separate families.) Also, if there is a memory shortage
when loading matrices, it is now correctly reported. In addition,
the Delta7 and D7 matrices are not loaded unless Delta7 or D7 is
actually included in the Omega. (This cuts matrix memory usage in
half regardless of family sizes.)
MORE DISCRETE MODELS NOW WORK
The changes made for matrices seem to have made some previously
uncomputable discrete models now maximize properly. Also, some work
was done to prevent discrete models from "hanging." It is still
true, however, that not all discrete models can be maximized by
SOLAR.
REDUCED NETWORK LOADING AND FASTER MAXIMIZATION
We found that dozens of large simultaneous SOLAR jobs were running
our network into the ground. Several changes have been made to
SOLAR to greatly reduce network loading. Some of these concern how
SOLAR finds its libraries and anciliary programs; it is now done
without unnecessarily searching directories (which on our network
and maybe yours are NFS mounted). File operations are now done
internally rather than using external programs such as "cp" and
"rm" which also caused much path searching over NFS. A very big
improvement was also made by eliminating the storage of pedigree
and phenotypic data in "scratchfiles" during analysis. This was a
carryover from core SOLAR routines which had been written in the
1980's to run on memory strapped PC's. These scratchfiles might be
read over and over hundreds of times for a single maximization,
resulting in tens of megabytes of file I/O. On a single machine,
this was not much of a problem because of Unix file caching. But,
across a network using NFS mounted directories, it was disasterous.
Elimination of the scratchfiles has also improved maximization
speed by about 20% even when the working directory is not NFS
mounted.
MUCH FASTER AND GENERALLY IMROVED BAYESIAN MODEL AVERAGING
The bayesavg algorithm has changed so that it is not necessary to
compute standard errors for every model, particularly for the
"saturated" model (for which standard errors might not even be
computable). This doubles speed in typical cases. Also, there is
now a -max option to limit the maximum subset size, and this can
reduce computation time astronomically (!) when it is applicable.
Models are now built in an intuitive bottom up order. Much less
time and memory are required to generate the subsets
("combinations"). (These changes are also reflected in the
combinations command.) Additional changes include: starting models
with interactions or household effects can be used, all models in
the window are saved by default, saved models use constrained betas
instead of "suspended" covariates, the "best" model is loaded at
conclusion, the log_n and other information is shown during
operation. If you have used the bayesavg command before, it may be
a good idea to re-read the help documentation before using the
new and improved version in this release. Also, the numerical
results may differ slightly from previous versions of SOLAR.
SPORADIC command
A new "sporadic" command has been added having most of the same
features (including optional covariate screening) as the
"polygenic" command. For some data, it is preferable to do
covariate screening with a sporadic model.
SIMQTL
simqtl genotypes are now written to a file named simqtl.qtl.
Previously they weren't being written out at all. simqtl has been
upgraded to handle MZ twins and multiple QTL's. A linux-specific
bugfix (also included in linux release 1.6.7) is added. AGE
covariate no longer required (nor is a dummy covariate).
ALPHANUMERIC CHROMOSOME NAME BUGS FIXED
Alphanumeric chromosome names (e.g. 2p and 2q) now work properly
with multipoint analysis and plotting. In multipoint output files,
a leading zero is now automatically prepended to single digit
chromosome names so that ordering is done properly with or without
the alphanumeric suffixes. However, you need not specify leading
zeros in your commands.
SCALE command
There is now a command "scale" to change the adjustment of covariate
variables. Previously they were always adjusted to the sample
mean, and there was no way you could change that.
PLOT -STRING
Options -colorname, -linewidth, and -datestamp added.
MU NOW EXPLICIT
The MU (Mean Equation) is now written to all saved models.
It also shows all the actual covariate adjustments used.
CONSTRAINT DELETION MORE CONSISTENT
You can now delete a constraint by specifying its full left hand
side. Previously, you had to specify either an internal index, or
(possibly ambigously) specify just one left hand term.
TWOPOINT WORKS WITH PEDLOD
TWOPOINT saves and loads best model when finished. As a result,
PEDLOD or PEDLIKE may be run afterwards.
999 MZ TWINS SUPPORTED (previously only 99)
BUG FIXES:
RELATIVES (and RELPAIRS) command returns correct counts (previously
counts were off by one per family so the error could be quite
large for analyses based on nuclear families or sib pairs).
PEDLOD and PEDLIKE work when FAMID is used.
MULTIPOINT -EPISTASIS fixed for discrete traits.
BAYESAVG gets better results because boundary problems fixed.
PEDLOD may be run after TWOPOINT.
Convergence errors now shown on string plots (previously crashed).
Fewer than 10 chromosomes may be string plotted (previously garbled).
Better handling of files passed through Microsoft Windows.
Changes to SOLAR 1.6.6 from 1.5.7
DISCRETE TRAITS:
SOLAR now supports a single discrete trait using the liability threshold
model. A discrete trait is automatically detected as two integer values
separated by 1. THIS IS CURRENTLY EXPERIMENTAL, AND IS KNOWN TO PRODUCE
INCORRECT RESULTS IN SOME CASES. There are two discrete trait
implementations included whose results can be compared (the default one
is more reliable but possibly less accurate due to numerical problems).
STRING PLOTS AND MINI PLOTS:
A "string plot" is a condensed method of showing the results of a
genome scan (one pass) on one page. The command is "plot -string."
Miniplots are ordinary one-chromosome plots shrunken down so that an
entire genome scan can fit on one page (even if some of the legends,
etc, are unreadable). The command is "plot -all." Miniplots
require that Python 1.5.2 or later be installed on the users system.
We prefer string plots, for which no external software is required.
EMPIRICAL LOD ADJUSTMENTS can now be calculated and applied to
all LOD scores. See "help lodadj" for more details. (This feature
may have been present but was not supported and was incomplete in
earlier releases.) There is also a command "madj" to apply a
LOD adjustment to previously computed multipoint files.
DOCUMENTATION BROWSER:
The documentation has been restructured and formatted in html. The
"doc" command starts a Netscape browser (if available) to read the
documentation. The "example" command copies the standard example
to the working directory.
TRAIT SIMULATION:
"simqtl" command added to simulate one or more quantitative traits
controlled by a QTL with (optionally) a linked marker locus. This
is an early release and does not yet handle MZ twins correctly,
multiple QTLs are not implemented, and QTL genotypes cannot be read
in from a file.
MARKER TESTING:
Marker files can now be checked using the "markertest" command. If
errors are found, markertest will try to find the discrepant
individual, family, pedigree, or pair.
MATH FUNCTIONS IN OMEGA AND MU EQUATIONS:
Math functions (all of C's math library including exp, log, sin, and
many more) now work in both Omega and Mu equations. (This was an
advertised feature that didn't work previously.)
PHENOTYPIC VARIABLES AVAILABLE IN OMEGA
Phenotypic variables now allowed in the omega equation. Must specify
varname_i or varname_j for individual i or j. Also may use
x_varname, min_varname, max_varname. For sex, use female_i, male_i,
female_j or male_j.
PER-PEDIGREE LODS:
Added new command "pedlod" which computes pedigree-specific LOD's.
PER-PEDIGREE LIKELIHOODS:
Added new command "pedlike" which computes pedigree-specific
loglikelihoods.
RELATIVE PAIRS:
Added new command "relatives" which lists relationships of relative
pairs included in an analysis (those having all variables).
PLOTTING LIABILITY:
"plot -liability" command plots discrete trait liability vs. age. This
is automatically separated in to male and female curves if there is
a sex or age*sex covariate.
MITOCHONDRIAL IBDS:
Added new command "ibd mito" for computing mitochondrial IBDs.
MORE NEW PLOT FEATURES:
Added -min and -max arguments to ordinary "plot" command to zoom in on
a desired range. If no chromosome is specified, the one with the
highest LOD score is plotted. Plot files may be written to postscript
files with the -write option. (See also new types of plots above.)
Plot command checks for most errors before putting up ACE/gr window.
Error checks now include check for presence of specified chromosome
number in multipoint run. On Solaris and Alpha platforms up to
1000 marker ticks can be shown. Marker labels and/or ticks can be
optionally supressed.
USING VARIABLE STATISTICS:
Basic statistics (mean, min, max, std deviation) about variables
can be retrieved from maximization output files using the "getvar"
command. (There has long been a "oldmodel" command to retrieve
parameter information from model files.) This makes it easier to
write certain kinds of scripts (like the "residual" command).
UNKNOWN RELATIONSHIPS:
Pedigrees containing "unknown" relationships (usually involving
inbreeding) are now permitted for twopoint analyses. The user is
warned that multipoint analyses are not possible.
CODING MISSING GENOTYPES:
Missing genotypes may now be coded as '-/-' as well as blank and '0/0.'
IDENTITY-BY-STATE:
Added a new command, ibs, which computes IBS (identity-by-state) matrices
for markers in the ibddir directory. This command should be considered
preliminary.
HALDANE MAPPING FUNCTION:
By default, the locations in the map file are mapped to recombination
fractions using the Kosambi mapping function. It is now possible to
use the Haldane mapping function instead (see "file-map").
CASE-INSENSITIVE MARKER NAMING
Marker names are now treated as case insensitive in commands. The
name as specified in the marker file is the "official" name used by
the IBD files created. The map file may also use a different case.
MIBD WINDOW:
You can now specify an MIBD window. The MIBDs at a given chromosomal
location will depend only on those markers that lie within the window.
This may give a large speedup when computing MIBDs but should be used
with caution.
RESIDUAL COMMAND FIXED:
The "residual" command had several bugs which were fixed. In some
cases you might have gotten only a few lines of results or no results
at all.
ID "BUG" FIXED:
It is permitted now for the ID fields in pedigree and marker files
to have different max widths. (This can happen if either file contains
more ID's than the other.) Differences in formatted width (e.g. in
a PEDSYS file) are ignored.
NEW TRAIT "BUG" FIXED:
Scripts resetting trait (without doing a full "model new") will now work
although this is still not recommended procedure unless traits are
similar in genetic properties (variance component parameters are not
reset).
IMPROVED BOUNDARY HANDLING:
To handle problems with difficult likelihood spaces better, existing
artificial boundary handling has been improved and there are new
options to the "boundary" command: "boundary wide," "boundary null,"
and "boundary quadratic tol." In some cases the quadratic values
were tested when they shouldn't have been leading to premature
"convergence error" failures, in other cases restarts occurred on
boundaries (leading to infinite runs that appeared like SOLAR was
stuck), and so on. There are now two tiers of boundary "crunching"
instead of just one. Loglikehood must remain with 9 digits of
accuracy from "best" iteration to last one or warning is given
(for "maximize" or with "verbosity max").
COVARIATE BOUNDARY ERRORS FIXED:
In some cases, covariate boundaries errors were occurring. The defaults
for the covariate boundary adjustment and retry mechanism (from 1.4.x)
were increased to 10 retries with an expansion factor of 5. These
values are now adjustable through the 'boundary cov retries' and
'boundary cov incr' commands. The new default values handle currently
known problem cases without further adjustment.
ALPHANUMERIC CHROMOSOME LABELS:
Chromosome labels may now include alphabetic and numeric characters
and underscore. Alphanumeric labels may be used in the "chromosome"
command and plot command, but may not be included in chromosome "ranges"
(e.g. instead of chromosome 1-23, now say chromosome 1 2 2p 3-23).
HEADER SPACING ALLOWED:
Comma delimited files with padded 'ID' columns now work OK
PEDSYS files with padded 'ID' columns (matched with unpadded ID C.D.
marker files) now work OK
COMMENTS ALLOWED IN DATA FILES:
Comments (leading #) and blank lines now OK in comma delimited files
MISSING PEDIGREE FILES HANDLED NICELY:
Improved handling of corrupted/missing state files (pedigree.info,
freq.info, marker.info). In particular, pedindex.out isn't deleted
if one of these files is missing or corrupt.
MARKER FILES MAY CONTAIN PEDIGREE FIELDS:
The marker load command now ignores fields in a marker file which
are named FA, MO, SEX, MZTWIN, HHID, or AGE. Previously, all fields
other than ID and FAMID were considered to be genotype fields.
MULTIPOINT PROMPTING
You can now enter the "multipoint" command and be prompted to enter
chromosome, mibddir, and interval rather than just getting error
messages for each item you forgot to enter.
IBD BUGS FIXED:
A bug in the computation of IBD's for some MZ twins was fixed. (This
bug had not been reported by any users.) Portability is improved by
use of printf in the domcibd script.
IBD CALCULATION MORE EFFICIENT:
IBD calculation (command: 'ibd') has been made more efficient when the
Monte Carlo method is used.
IBD MESSAGES FIXED:
If parental IDs were longer than ego IDs, this caused 'load pedigree' to
give an misleading error message. The message is now helpful.
"LOAD MARKER" MESSAGES IMPROVED:
The "load marker" command now displays a message indicating whether the
marker allele frequencies were fread from a previously loaded freq
file or are being estimated from the marker data. Also, if the
allele freqs are MLEs computed from old (now unloaded) marker data,
a warning to that effect is given. Such allele freqs are now called
"stale MLEs" rather than "old MLEs" as before.
FREQ BUGS FIXED:
If allele freqs are loaded from a freq file rather than being estimated
from the marker data, it is possible to have frequency data for more
alleles than are actually present in the marker data. (The converse
would be an error - every allele in the marker data must have an entry
in the freq file.) If MLE allele frequencies are then computed, the
alleles not present in the marker data will have frequencies of 0.
This 0 frequencies caused problems with other commands, such as "ibd"
when using the Monte-Carlo method. These problems have been fixed.
MIBD CHECKING:
The mibd command now ensures that the mean IBD file (mibdchrN.mean) is
newer than the merged marker-IBD file (mibdchrN.mrg.gz). If the mean
file is older (or not found), it is recomputed. The mibd command also
ensures that the merged marker-IBD file is newer than any of the IBD
files for markers in the map file. If the merged marker-IBD file is out
of date, an error message is displayed, and "mibd" cannot be done until
"mibd merge" is done (which is described in the message). IF, in fact,
the IBD files have not changed, but merely have a more recent date due
to copying, the user can simply run 'mibd merge' and proceed. There is
also more extensive checking for "mibd merge."
RESOURCE LEAKS FIXED:
File descriptor "leak" in polygenic command fixed. (If you ran
polygenic 50 times, you would run out of file descriptors.)
"FILE" COMMANDS
The file formats are described by "file-*" commands (e.g. file-freq)
replacing the old "notes-*" commands.
HOUSEHOLD-PEDIGREE MERGING INFORMATION
If household-pedigree groups were merged, this information is shown in
the results of the "polygenic" command. A household-pedigree
group merging bug (unusual case) was fixed. There is an option to
show the merged groups ("option HouseGroupShow on").
CONSTRAINTS EASIER TO MODIFY AND DELETE
You can now "replace" a constrant with a new one...the obsolete one
is automatically removed. Parameters having embedded *'s may be
included in constraints using <>.
ALNORM command added to evaluate the tail of a normal curve.
AND MANY MORE BUG FIXES AND CHANGES
And no doubt some new bugs too.
Changes to SOLAR 1.5.7 (from 1.5.6)
1. Faster twopoint performance
A bug was fixed which had been causing 4x slower twopoint analyses in
version 1.5.5 than in 1.4.0.
Changes to SOLAR 1.5.6 (from 1.5.5)
1. DEC Alpha zombie bug fixed
A bug was fixed (for DEC Alpha systems only) in which SOLAR would create
large numbers of zombie processes.
Changes from SOLAR 1.5.5 to 1.5.4
1. Twin Bug fixed
Linkage analyses could not be done for pedigrees including MZ Twins.
The table-driven ibd code (introduced in version 1.5.0) did not handle
mztwins at all, due to a bug in multipnt.c and classes.tab, now fixed in
SOLAR 1.5.5.
2. Linux edition IBD bugs fixed
Bugs making IBD files using the "Curthis and Sham" method (the default
in most cases) fixed on Linux (bugs existed only in early Linux versions).
Changes from SOLAR 1.5.5 to 1.5.4
1. Twin Bug fixed
Linkage analyses could not be done for pedigrees including MZ Twins.
The table-driven ibd code (introduced in version 1.5.0) did not handle
mztwins at all, due to a bug in multipnt.c and classes.tab, now fixed in
SOLAR 1.5.5.
2. Linux edition IBD bugs fixed
Bugs making IBD files using the "Curthis and Sham" method (the default
in most cases) fixed on Linux (bugs existed only in early Linux versions).
Changes from SOLAR 1.5.4 to 1.5.3
1. The documentation was updated and enhanced for version 1.5.x. Notably
full documentation for the input file requirements (which used to only
be available in help messages in earlier releases) and matrix file
contents were added (ibd, mibd, phi2) for the benefit of those doing
more sophisticated analyses (dominance, etc.).
2. Attempting to do a quantitative analysis of a discrete trait results
in a warning (as intended for 1.5.3), not a fatal error (a bug that
crept into 1.5.3).
3. If the correct model parameters have not been set up, the user is now
advised to give the "polygenic" script first (previously automodel and
polymodel were advised...but "polygenic" is now the recommended approach.
4. Support for Linux (Intel) added.
Changes from SOLAR 1.5.3 to 1.4.0
1. Household effects
There is a new command 'house' which sets up a C2 parameter
for any common environmental effect. The pedigree file should have
a 'HHID' field in order for the required 'house.gz' matrix file
to be created during the 'load pedigree' command.
The 'polygenic' command will determine the significance of the
household effect if the 'house' command was given. Multipoint and
other commands will create models including the household effect.
Note that in some cases individuals in different "pedigrees" may
share the same household. (This happens for "marry-ins," for
example.) In order to get the best estimate of the household
effect, pedigrees sharing households are automatically merged
during maximization. (This feature was added after version 1.5.0.)
This feature can be turned off with the MergeHousePeds option.
Alternatively, all individuals may be included in the same
pedigree with the 'MergeAllPeds' option. Either kind of merging
has little or no effect on hereditary estimates, but can make SOLAR
run more slowly.
2. No arbitrary limits on sizes
Previously there were some hard-coded limits for the number of
individuals in a pedigree, number of alleles, etc., particularly
for commands related to creation of IBD and MIBD matrices. The
limits arose because some of the old FORTRAN programs used had
fixed array sizes. These programs have been rewritten to use
dynamic memory allocation instead, so there are no arbitrary limits
on any sizes (or if there are, they are way out there). You should
not run into any arbitrary limits any more.
3. Epistasis and other special kinds of covariance
Epistasis effects are automatically handled by the new -epistasis
argument of multipoint. In this case, the starting model is not
a null0 polygenic model, but a model including one or more linkage
elements. One of those is chosen by the -epistasis argument to
be applied to each QTL in the multipoint scan with an added epistatic
interaction component (h2qe1).
The multipoint and twopoint procedures now preserves special
elements in the covariance (omega) equation and constraints. This
makes possible the inclusion of household effects, epistasis, and
other effects such as dominance (though dominance terms must still
be set up manually by the user). Previously, multipoint simply
clobbered whatever special elements the user had set up with a
standard e2, h2r, h2q1, h2q2, ... series.
4. Grid command and twopoint -grid
The highest likelihood in the vicinity of a marker can be found with
the new grid command which searches recombination fractions of
the marker. There is also a twopoint -grid option which will do
this for every marker in a twopoint scan.
5. Multipoint and twopoint require 'polygenic' be run first
Multipoint and twopoint procedures now require that 'polygenic' is
run first to do a polygenic analysis. 'polygenic' now creates a
model named null0 which is now required by multipoint and twopoint.
Most users were running polygenic first anyway. If you forget,
multipoint and twopoint will now remind you to run polygenic first.
Previously twopoint and multipoint would create a null0 model using
a procedure similar to (but not necessarily identical with)
polygenic and possibly clobbering the null0 model that had
previously been created.
6. chi -inverse and better chi
There is now a chi -inverse option, and the chi procedure itself
is a better one (you may notice changes in the last printed decimal
place in reported p values).
7. Fully automatic array allocations during maximization
Memory allocation for maximization is now fully automatic. Previously,
with some very large pedigrees or other input data, the user might have
to set some obscure options to increase array allocations, and to
make matters worse...the required allocations were sometimes
underestimated...resulting in fatal or other errors. Now the arrays
are tested after maximization to be sure memory overwriting did not
occur.
8. Support for Digital Alpha Unix added
9. Relative-class info in tables
Information about the types of relatives handled by SOLAR used to
be hard-coded but is now read from tables. This means that adding
new classes or modifying existing classes can be done by
updating these tables; it isn't necessary to recompile SOLAR. A
few new relative classes have been added since 1.4.0, and we can
(relatively) easily add new classes if needed by any users. SOLAR
will tell you if you need a new relative class.
10. Quadratic tested
The final normalized quadratic (which should be close to 1.0) is
now checked. With some data, maximization would abort prematurely
with poor quadratic values. Now there is an automatic retry mechanism
to force the quadratic to a good value, or fail with an error message
if that proves impossible. There is also a "quadratic" command to
get the last quadratic value.
11. Comma Delimited Input File Support Improved
Comma delimited files may now have type:name pairs in the first (header)
line. Segmentation violations related to reading some files fixed.
12. PEDSYS Input File Support Improved
PEDSYS 'standard' mnemonics (e.g. INFERD) are now properly ignored in
any mnemonic position (previously, they were only ignored in the last
position).
PEDSYS mnemonic names in successive fields are concatenated even if
they are shorter than 6 characters. Many users liked to segment their
field names this way.
13. Pedigrees identified by first PID
Pedigrees skipped (because of missing data) or having errors detected
during the maximization phase (rare) are now identified by the first
PID rather than by the PEDINDEX number. (The PEDINDEX number had no
connection to the user's pedigree numbering.)
14. A warning is given if trait is discrete
The analysis of discrete traits by liability threshold model is not
yet fully integrated with the Tcl-based public version of SOLAR.
If the user attempts to analyze a discrete trait, a warning is now
given, though the analysis is done anyway without liability threshold
model. (Previously SOLAR would ignore whether the trait was discrete
or not.)
15. Added drand command to get random numbers
16. linkmod and linkmod2p interfaces improved
'linkmod' is the command which sets up (but doesn't test or maximize)
a linkage model with a particular MIBD matrix. linkmod2p does this
for a twopoint IBD matrix. These commands have been made relatively
easy to use by users (they no longer require global variables set up
by the multipoint and twopoint procedures). Thus it is now much easier
to write custom IBD/MIBD scanning procedures.
17. Help documentation for SOLAR model options added
18. Bugs in the summary statistics for excluded pedigrees fixed
19. Solaris Workshop 5 C++ library check and other startup issues
SOLAR is now compiled (on Solaris SPARC) with Solaris Workshop 5
compiler. This may provide better performance and reliability, but
it also required considerable recoding to comply with the new C++
standards. Also, a C++ library patch is required from Sun for Solaris.
SOLAR checks for the required library patch on startup and tells you
what is needed. The SOLAR main binary is now called 'solarmain' to
avoid confusion with the 'solar' shell script which starts SOLAR.
20. Convergence, Boundary, and other maximization errors properly named.
21. Output formatting of polygenic command improved.
22. Fixed bug when not reloading phenotypes after loading pedigree.
23. Fixed bug setting omega to 3 character expression.
24. Fixed bugs with residual command: when phenotypes file has famid field,
case sensitivity, skipping too many individuals, and the expression
used to calculate residual was fundamentally wrong.
Changes from SOLAR 1.4.0 to 1.3.0
The changes in this version were considered so vital that the previously
anticipated public release was delayed for a few weeks so that they
could be included (rather than wait for the next major release in a few
months).
INCREASED LOD SCORES AND H2Q COMPONENT VALUES
LOD scores might increase (in rare cases) and QTL positions might even
change (in very rare cases) as a result of these changes.
It turned out that the old retry mechanism did not detect all artificial
boundary conditions. In particular, supposedly maximized linkage models
could hit an artificial lower bound for E2, silently preventing the H2Q
component from reaching their maximum values. This would cause reduced LOD
scores as well.
You can check earlier models to see if E2 is at an artificial lower bound
(higher than 0.0). If it is, then you need to re-maximize the model using
version 1.4.0 or higher.
Models with this problem would be likely to have unusually steep LOD score
curves (and relatively high LOD scores) in the first place. This is
not going to cause new LOD score peaks to arise, just make very steep peaks
even steeper (and possibly even change slightly the positions of the
summits -- in other words -- the QTL positions).
BOUNDARY COMMAND AND AUTOMATIC RETRIES
A new 'boundary' command has been added, replacing the slew of various
artificial boundary heuristics (h2qf h2rf e2lower and e2squeeze) used to
assist in convergence control through the use of artificial boundaries.
But more importantly, there is now a retry mechanism so that whenever an
artificial variance component boundary is hit the boundary is automatically
changed. Because of this mechanism, it is believed that 'Boundary Errors'
(at least the ones related to variance components) should never occur
again, and that if 'Convergence Errors' ever occur they can be dealt with
more easily. Because of the retry mechanism, the default heuristic
values have been set very low to make convergence errors very unlikely
also.
The heuristically set upper bound for h2q parameters now floats from
one locus to the next. This floating action (governed by the command
'boundary float upper') has replaced the old 'h2qf' command (which turned
out to be very problematical in many cases...there was no one value which
worked across the genome). The h2qf command is now silently ignored.
The other heuristic commands still exist, but they are best used through
the more intuitive 'boundary' command interface.
The new retry mechanism now applies to all maximization commands, including
a single 'maximize' command (for which previously no heuristics or retries
were applied in earlier versions).
MULTIPOINT -RESTART FIXED
It turned out that if a multipoint scan was restarted (with the command
'multipoint -restart'), it would ignore models for which a boundary or
convergence error occurred. This meant that if you wanted to redo those
models, you needed to edit the files such as multipoint1.out and remove
those models so they would be redone. Now a restart will re-maximize all
erroneous models automatically, and no file editing is needed.
Changes from SOLAR 1.3.0 to 1.2.1
COMMAND SHORTCUTS, USAGE, AND HELP
Shortcuts for command names may now be used in scripts. For example, a
script may use the command 'mul' for 'multipoint' just as at the command
line. (This applies only to SOLAR commands, not Tcl commands.)
There is now a 'usage' command which gives an abbreviated form of help for
any command. Unlike 'help,' the usage information stays on screen to
help the next command as a memory aid.
There is also a 'shortcut' command which shows the legal shortcuts allowed
for any SOLAR command. The legal shortcuts will also be shown by the
'help' and 'usage' commands.
Previously some commands required plural forms (such as 'phenotypes') while
others required singular forms (such as 'parameter'). Now, both singular
or plural forms are allowed for many commands, and, since they all be
abbreviated in scripts anyway, you may stick with the singular forms.
PLOT improvements
'plotmulti' is now officially the 'plot' command (though you may still use
'plotmulti.'
RESIDUALS
There is now a 'residual' command which postprocesses a maximization
output file and a phenotypes file to get the residual after the
application of covariates. The result is a new phenotypes file giving
the residual value for each individual.
PROBANDS
Use of the proband field may now be turned on and off without reloading the
phenotypes file using the commands 'field probnd -none' and
'field probnd probnd' (or whatever).
SOLAR now detects proband fields named 'proband,' and 'prband,' as well as
the PEDSYS standard PROBND.
TDIST
There is now a 'tdist' commands which sets up the 't' distribution option
for robust estimation of mean and variance for non-normal distributions.
SCORE TEST
'multipoint -score' uses a score based test (in place of maximum
likelihood' to find QTL's. The resulting measure is SLOD (Score based
LOD) which is not a real LOD score but should be analogous to one.
'multipoint -scoredebug' gives additional information.
NEW BAYESAVG OPTIONS and BUGFIXES
command 'bayesavg' now accepts a '-sporadic' option which forces the use
of sporadic models in cases where the polygenic models don't converge.
command 'bayesavg' now accepts a '-fix' option to fix some covariates.
cases where H2R=0 are now handled properly.
IBD-related PROCESSING improvements
It is no longer necessary to load the pedigree file in every SOLAR run in
order to do IBD-related processing. Pedigree data remain loaded until a
new 'load pedigree' command is entered. This is now true for the marker
file and the freq file as well. Two new SOLAR files, marker.info and
freq.info, have been added to preserve marker and frequency information
between SOLAR runs.
To be consistent, the term "locus-information file" has been changed to
"freq file". Since 'load pedigree' loads a pedigree file, it makes sense
that 'load freq' loads a freq file.
The marker file and the freq file no longer have to contain exactly the
same set of markers, nor is the order of the markers important. As before,
if a marker is loaded and no allele frequency info is available from a
prior 'load freq', a simple counting method is used to estimate the allele
frequencies. Now, however, frequency info can be loaded in advance for
some of the markers in the marker file and not others. Or a single freq
file might contain allele frequencies for all the markers in a study, not
just those in a particular marker file.
When a marker is loaded for which no frequency info has been loaded in
advance, the simple-count allele frequencies are added to the file freq.info
and will be displayed by 'freq show'. But the freq file itself is never
modified, even when MLE allele frequencies are computed. Nor is a default
freq file created (previously the file locfile.dat was created.) To save
frequency information, a 'freq save' command has been added which creates
a file that can be loaded later with 'freq load'.
There is now a 'marker unload' command. When marker data are unloaded,
the allele frequencies for these markers are removed from freq.info, and
the marker-specific subdirectories created by 'marker load' are deleted.
If MLE allele frequencies have been computed for any of the markers but
have not been saved to a file, the unload will not proceed unless the
-nosave option is given in the unload command. It is not necessary to
unload current marker data before loading a new marker file. The unload
will be done automatically (unless MLE allele frequencies have not been
saved, in which case an explicit 'marker unload' with the -nosave option
is required.)
There is also a 'freq unload' command which removes all currently loaded
frequency information, except for any markers with currently loaded
genotype data. That is, the allele frequencies for markers in the marker
file are not deleted. They can only be removed by 'marker unload'. It
is not necessary to unload current frequency info before loading a new
freq file; the unload will be done automatically.
When MLE allele frequencies are computed, this fact is recorded in the
file freq.info. An attempt to run 'freq mle' on markers for which MLE
allele frequencies have already been computed simply returns a warning
message. By default, MLE allele frequencies are now required before
marker-specific IBDs can be computed. The ibd command has a -nomle option
to get around this. Alternatively, a new IBD-processing option, NoMLE,
can be set with the ibdoption command.
The 'freq show' display has been updated to indicate markers for which
MLE allele frequencies have been computed, and whether the frequencies
have been saved to a file. Also, X-linked markers are labeled as such.
The 'marker load' command now has a -xlinked option which can be used to
load X-linked marker data. Alternatively, the XLinked option can be set
with the ibdoption command, as before. Male genotypes for X-linked
markers can now be coded as one allele, e.g. " /A" or "A/ ", or as a
"homozygote", e.g. "A/A".
Previously, the method of IBD computation had to be chosen before loading
marker data. This is no longer true. If both methods are applicable,
i.e. there is no inbreeding and multiple loopbreakers are not required,
then either method, Monte Carlo or Curtis and Sham, can be chosen at any
time. For performance reasons, the Monte Carlo method is now used
automatically for markers that are completely typed, i.e. markers for
which there is no missing genotype data.
The information displayed by 'pedigree show' now includes the number of
loopbreakers required for each pedigree, and whether the pedigree is
inbred.
IBD and multipoint IBD (MIBD) matrix files are no longer created in the
current working directory by default. The directory in which to create
the IBD files must now be specified with the ibddir command. Similarly,
the directory in which to create the MIBD files must be specified with
the mibddir command. Also, since IBD files are used to compute MIBDs,
the ibddir command is now required before the mibd command can be run.
When MIBDs are computed, a copy of the map file is now placed in the
mibddir directory. This file is used by the plot command.
After the relative-class file has been created by 'mibd relate', a new
command, 'pedigree classes', will display a tally of the relative classes.
Several IBD-related commands write status info to the screen while they
are running, namely 'freq mle', 'ibd', and 'mibd'. For example, 'freq mle'
displays the current iteration and the change in the likelihood. These
displays can cause SOLAR scripts to hang or abort when run as background
jobs. Since the status info serves no purpose in a background job, the
'verbosity min' command can now be used to turn off these displays.
Another problem with IBD-related scripts has been the use of prompts.
For example, the 'load marker' command would not overwrite an existing
locinfo.dat file without an OK from the user. These prompts have been
eliminated.
MZ twin IDs no longer have to be sequential integers, but can be any
unique identifier.
If the pedigree file contains a household ID field, a matrix file (named
'house.gz') will be created so that household effects can be incorporated
into a variance components analysis. The household ID can be any unique
identifier. The standard SOLAR name for this field is HHID, but this can
be changed with the field command.
In the pedigree file, sex can be coded "m/f" as well as "M/F" and "1/2".
The manual already made this claim, but the code did not allow it.
TCL patchlevel
TCL patch level 5 (8.0.5) is now used, and the startup scripts forces the
use of the init files in SOLAR_LIB rather than using whatever happens to
be installed in the OS.
Changes from SOLAR 1.2.1 to 1.2.0
PLOT improvements
The 'Plot' command accepts the -map option which allows the use of a user
map file to specify marker locations. The map file may be in the same
format as used for the 'load map' command (see file-map).
If there is a convergence error, this is plotted as a Star for each
non-convergence location, and a legend box is put up to identify the
Star. The symbol used may be changed by editing multipoint.gr.
The chromosome number may simply be specified without the -chrom
identifier:
plot 6
will plot chromosome 6.
The -set and -graph arguments are no longer documented as they are probably
not going to be needed.
Changes from SOLAR 1.2.0 to 1.1.2
PLOTMULTI and MULTIPOINT -PLOT
There is now a PLOTMULTI command to plot multipoint LOD scores vs.
chromosome position in Cm. The plots also show marker locations if
the file mibdchr.loc files are found in the mibddir (as they
should be).
Plots may be drawn during a multipoint scan by using the '-plot' argument
to multipoint.
Several plots may be overlayed by using the -overlay argument for each
subsequent PLOTMULTI command. The color of each plot may be specified
with the -color argument.
The multipont pass and chromosome number may be specified with -pass and
-chrom arguments.
The multipoint files are read from the 'trait' or 'outdir' directory.
Example:
trait bmi
plotmulti -chrom 5
plotmulti -overlay -chrom 5 -pass 2 -color 1
A custom version of XMGR is used to do the plotting (included with SOLAR)
is used to do the plotting. The plot may be modified using the XMGR
GUI interface after plotting has been done, or by modifying the file
'multipoint.gr' beforehand. The file describes all the useful settings.
Plots may be printed or saved to postscript files.
CONVERGENCE
The default value of H2QF has been changed to 1.25 which helps convergence
in many cases.
BAYESAVG improvements
A 'strict' Occam's window is now the default. The 'symmetric' Occam's
window (which used to be used) is now an option using the -symmetric
argument. The models within the window are now reported
Only the most important models and output files are saved. (Previously,
all the models and output files were saved, and this could require
gigabytes.) The saturated and unsaturated models now have more mnemonic
names (e.g. c.sat.mod or cov.sat.mod). Options -saveall and -savewindow
may be used to save more of the models. A seperate command 'bayesmod'
may be used to regenerate any of the models.
Because only the most important models are saved, large changes had to
be made for the restart function. There is now a -redo option to allow
for the case when non-converging models were edited out of the output
file. The regular -restart begins after the last model processed.
PEDIGREE Handling
A message is printed if pedigrees needed to be skipped because
there were no non-probands having a full set of the required variables.
A list of skipped pedigrees is printed, and statistics are provided for
the skipped as well as the unskipped pedigrees. A bug which caused some
of the statistics to be incorrect if there were skipped pedigrees has
been fixed. (Note: these messages do not appear at the low verbosity
levels used by 'multipoint' and 'polygenic,' but are written to the
maximization output files such as 'poly.out' and 'null0.out.'.)
Changes from SOLAR 1.0.1 to 1.1.2
Covariate interactions may now specify any number of quantitative and/or
binary variables specified in any order. The * operator is now used to
signify an interaction between variables (replacing : used in version 1.0.1,
however : is still permitted for compatibility). For example, the following
covariates could be specified:
covariate age^2*sex*diabetes
covariate waist*height^2*age
Covariate Boundary Detection and Automatic Retries: If a covariate is
maximized to a boundary, there will be up to 3 retries (increasing the
boundary each time) to correct the problem. If the boundary problem
persists after 3 retries, an error will be reported and the current command
will terminate.
The bayesavg command has been fixed to give correct results for covariates
and restart properly for covariates. Posterior probabilities are now shown
in the final output.
A default finemapping of 0.588 LOD is now in effect for multipoint.
Previously, the default was simply to finemap around the single highest
peak, which was decided to be not useful (and potentially misleading for
inexperienced users).
The default verbosity for commands such as multipoint and polygenic hides
all the maximization detail (so you don't need to remember to do
"verbosity min"). If you really want to see all the maximization detail
there is a new verbosity level, "verbosity plus" which shows it.
(verbosity max shows even more detail such as memory usage.)
twopoint now runs sporadic and polygenic models and reports their results
(see file twopoint0.out) before beginning to maximize linkage models.
Field names EGO, SIRE, and DAM are permitted in place of ID, FA, MO.
User errors involving field names are more fully reported and the user
is told what field command to give. Also, the following bug has been
fixed: if a file hadn't been read because of a mislabeled field, you had
to re-start SOLAR to read it (even after giving a filed command).
There is now no limit on the length of parameter names. Names shorter than
40 characters are recommended for neater output, however.
Variable names longer than 6 characters specified in code files need not be
divided by the spaces ordinarily required for PEDSYS mnemonics. SOLAR now
allows the names to be divided by spaces or not. Warning: PEDSYS programs
do not have this feature. They will write write spaces into the 6th and
12th character positions whether you have spaces there or not.
The 'maximize' command now writes to the current 'outdir' (which defaults
to the name of the trait). It accepts -quiet and -output options. -output
is required when giving a particular output file name, e.g.:
maximize -quiet -output nocov.out
Probability levels are reported as '=' (not <) unless below reportable
precision.
The upgrade command now makes copies (*.old) of the model or script files
which are upgraded.
Changes from SOLAR 0.9.100 to 1.0.1:
The new SOLAR 1.0.1 has much more powerful and intuitive covariate
commands, better covariate interaction (e.g. age by sex) screening,
the new bayesavg (Bayesian Oligogenic Model Averaging) command, and
some changed command options (more self-explanatory). It also has
a number of bug fixes.
The changes are described in the following sections of this
announcement:
1. Improved help
2. 18 char variable names and 40 char parameter names
3. Model and Script Upgrading
4. New covariate command
5. Bayesian Model Averaging
6. Some changed commands and arguments
7. Comma delimited file bugs
Improved Help
-------- ----
The 'help' command now lists all the available commands and scripts
and gives a one-line summary of each. You may find it easier to
find the command you are looking for. All help messages now use
unix 'more' to page through the documentation.
(Note: If you use Sun's Open Windows cmdtool terminal, you may lose
a line or two at the top of each help message because of a bug in
Sun's cmdtool which occurs only if you have scrolling turned on.
It is annoying but not critical. If you have scrolling turned on,
you can always scroll back to see the top line, which is always a
'Purpose' description.)
18 character variable names and 40 character parameter names
-- --------- -------- ----- --- -- --------- --------- ----
Data variable names are now useful up to 18 characters long and
model parameter names are useful up to 40 characters long. They
may actually be as long as you like, but they must be unique within
the new specified limits. In the most important SOLAR messages,
the full names of variables and parameters are printed. In some
older messages, the names are truncated on display to fit nicely in
columns.
When using PEDSYS files, the first 3 'mnemonics' are concatenated
and used as variable names. Previously, only the first mnemonic
was used.
Spaces are not allowed in variable names (as before), so the name
is terminated by the first space.
The longer parameter names make possible more meaningful covariate
beta parameter names, which are described in a later section.
Model and Script Upgrading
----- --- ------ ---------
The new solar does not require any changes to pedigree or phenotype
data files. But it does require that previously created models
be upgraded to use the new covariate command syntax. SOLAR will
automatically detect when models need upgrading and tell you how
to use the 'upgrade' command, e.g.:
solar> load model oldie
Must use upgrade command to upgrade this model: upgrade oldie.mod
solar> upgrade oldie
solar> load model oldie
solar>
Some command option names have changed (-f is now -overwrite and
-r is now -restart). Existing scripts may need to be upgraded as
well. To upgrade scripts, you may also use the upgrade command,
but you MUST include the ".tcl" suffix:
solar> upgrade doit.tcl
The New Covariate Command
--- --- --------- -------
You can now:
a. create or delete many covariates in one command line
b. specify interactions between any quantitative variable
and any binary variable (not just sex)
c. specify any exponent (1-9)
The syntax is also designed to be more intuitive. Some examples:
covariate age age as a simple covariate
covariate age:sex the age by sex interaction ONLY
covariate age#sex age, sex, and the age:sex interaction
i.e. both vars AND their interaction
covariate age^2 age squared as a simple covariate
covariate age^1,2 age and age^2
covariate age^1,2#sex all combinations of age^1,2 and sex
i.e., all of the following:
age sex age:sex age^2 age^2:sex
covariate age^1,2,3:diabet age^1,2,3 by diabetes interactions ONLY
All covariate beta parameters begin with 'b', followed by the
complete covariate name (including the exponent and interactor
variable name if applicable).
For example, covariate age:sex has a beta parameter named:
bage:sex
Covariate age^2:diabetes has a beta parameter named:
bage^2:diabetes.
This is made possible by the fact that variable names may now be up
to 18 characters long, and parameter names may now be up to 40
characters long.
If you enter 'covariate age#sex', there is one beta term (bage)
applied to males, and three (bsex, bage, and bage:sex) applied to
females. 'bage:sex' isn't a female age term, but rather the
age-related sex difference between males and females.
This new parameterization allows SOLAR to test variables and their
interactions separately, which is what will be done during
covariate screening. 'age' might be found to be significant while
its interaction with sex isn't, or vice versa. Also, the squared
terms might drop out separately. All tests use one degree of
freedom now.
If you by mistake enter something like:
covariate sex age^1,2#sex
the repetition of the sex covariate by itself is ignored silently.
This simplifies the entry of such more complex things as:
covariate age^1,2#sex weight^1,2#sex
Technically, sex by itself is being specified twice, but SOLAR
ignores that.
Bayesian Model Averaging
-------- ----- ---------
The bayesavg procedure performs a Baysian Oligogenic Model Averaging
analysis on the linkage components of the current model.
It tests each combination of the linkage components, and finds the
set of models within Occam's window based on their Bayesian
Information Criterion. Then it computes weighted averages of the
linkage components based on their weighted average. The summary
output files is named 'bayesavg.info,' while the final
weighted-average model is described by 'bayesavg.out.'
Some Changed Commands and Arguments
---- ------- -------- --- ---------
Several commands and arguments have been changed to be more
consistent and intuitive.
'covariate delete_fully' is now 'covariate delete '
to be more consistent with the delete commands for parameters and
constraints. The covariate suspend and restore commands are
similarly reordered, and any of these commands may list more than
one covariate to operate on. If you happen to give the operator
after the variable name, that will be understood also, so long as
you are only operating on only one variable.
covariate delete sex ;# this form is preferred now
covariate delete sex age ... ;# because you can have a list
covariate sex delete ;# but this is still OK
All hyphenated arguments now have full readible names (such as
-restart and -overwrite). The most common hyphenated arguments
also have shorter abbreviations (such as -r and -ov). This
is the approach preferred by most Tcl programmers. The confusing
"-f" arguments have been removed (replaced by -overwrite or -ov).
Some commands now have a few additional options. Multipoint now
has -renew and -nullbase as options as well, clarifying some
special case behaviors explicitly.
Twopoint no longer requires a '-m' argument. The default is to use
the previously stored null model, or the model currently in memory
if there is one. This is similar to what is done by multipoint.
Polygenic now takes fully named arguments (such as -screen and
-fix) as well as the old abbreviated arguments (-s and -f).
Comma Delimited File Bugs
----- --------- ---- ----
A number of bugs have been fixed that might cause segmentation
violations under some circumstances. The worst involved the
use of comma delimited files. If you are using comma delimited
files (instead of PEDSYS files), the new version is strongly
recommended.
Changes from Solar 0.9.16 to 0.9.100:
THIS IS A BIG CHANGE! Please read carefully.
TRANSLAT is no longer needed! SOLAR will now run directly from PEDSYS
files or the Comma Delimited files. At first, there are many changes you
might need to learn about. In the long run, these changes should make
Solar much easier to use for everybody.
Here is an itemized summary of the changes.
1) Do not run TRANSLAT or use phenfiles created by TRANSLAT. Previously
created model files will need to have the 'phenfile load' command
removed.
2) The LOAD PHENFILE command no longer exists. It has been replaced by the
LOAD PHENOTYPES command, which should be used in conjunction with the
LOAD PEDIGREE command to load pedigree and phenotype information
separately:
solar> load pedigree Pedfile
solar> load phenotypes Phenofile
(Or, you can also use 'pedigree load' and 'phenotypes load.')
Both Pedfile and Phenofile can now be Pedsys files. They can also be
the same file, if that file contains both the pedigree and phenotype
information. (They can also be comma delimited files: Solar
automatically figures out the file type.)
Once a load pedigree or load phenotype command has been done within a
particular working directory, it need not be done again, unless you
start working with a different pedigree or phenotype file. Within a
particular directory, you will always default to the pedigree and
phenotypes files you used last. The phenotypes file can contain all
the phenotypes; Solar lets you select which ones you want to use in
a particular analysis and excludes families when (and only when) that
is required.
Since the phenotypes file can be the whole phenotype database, there is
no reason why it needs to be identified in models anymore.
3) The FIELD command lets you map the mnemonics of the fields in your
data files to the names that Solar requires in the Pedigree, Marker,
and Phenotype files (which can all now be PEDSYS or Comma Delimited
Files).
Once you determine the field commands which will be required, it is
useful to put them in a .solar file in the working directory.
By itself, the FIELD command lists the Fields that Solar expects:
solar> field
ID: ID ;Individual Permanent ID
FA: FA ;Father's Permanent ID
MO: MO ;Mother's Permanent ID
SEX: SEX ;Sex: M/F or 1/2
PROBND: PROBND ;Proband Status (optional)
MZTWIN: MZTWIN ;Monozygotic Twin Group (optional)
FAMID: FAMID ;Family ID (optional)
You can change the mapping using the FIELD command. For example, if
your Twin field is called TWIN instead of MZTWIN, you can use the
following FIELD command:
solar> field mztwin twin
If your data has no probands or no twins, you should declare this with
commands like the following:
solar> field probnd -none
solar> field mztwin -none
If you have a PROBND field, any value other than 0 (zero) or blank
makes the individual a proband.
FAMID is only required if ID is not unique in the entire pedigree.
If there is no FAMID field, or field mapped to FAMID with a field
command, solar will assume it isn't necessary.
4) Previous "fixed-width" pedigree and marker files can still be used if
PEDSYS code files are created to go along with them.
Fields other than those listed above in the marker-data file are taken
to be genotypes, and the field names are taken to be the marker
names. Marker names and the order of markers must still agree with that
in the locus-info and map data files.
5) The names of the following files have been changed:
old name new name
----------- ---------------
relate.in mibdrel.in
prep.ped mibdrel.ped
prep.cde mibdrel.cde
chr.ibd mibdchr.mrg
chr.loc mibdchr.loc
chr.mean mibdchr.mean
Names of marker directories have been changed from .d_
to d_ , i.e. the leading period has been dropped.
6) Bugs have been fixed in IBD calculations involving three or more
genetically identical individuals
7) Fields determined to be binary are marked in the description of
variables in the maximize command.
8) Error reporting is improved.
9) Many help messages have been improved.
10) The AUTOMODEL and ALLCOVAR commands ignore most pedigree fields IF THEY
ARE CORRECTLY IDENTIFIED with field commands. If this doesn't work as
intended, you may end up having some pedigree fields as covariates
which will cause Solar to behave badly. You should check which
variables have been automatically selected as covariates.
There is also a new command, EXCLUDE, which lets you specifically
exclude certain fields from being selected as covariates by automodel
or allcovar:
solar> exclude pedno groupno cseq
11) AUTOMODEL no longer includes a LOADKIN command (which loaded a phi2.gz
file, if present). If you are doing a special analysis which requires
the use of a hacked phi2.gz, you should give a MATRIX or LOADKIN command.
This is not the typical operation, and Fisher's built-in phi2 handling
is much faster than loading a matrix each time.
12) TWOPOINT has been fixed to work with the way that twopoint files are
acutally compressed.
13) .solar files are now read in only when Solar is started, not whenever
a model is loaded or created from scratch. This means you can now load
models or do just about anything in a .solar file without causing an
infinite recursion. You will also not have your session-specific changes
overwritten. About the only negative impact might be to the 'model new'
command. 'model new' will take you back to a nearly empty model, lacking
any parameters you may have defined in your .solar file. You will need
to load those back in yourself. It would be best to define such things
as procedures in the .solar file so that they can be readily re-loaded.