SOLAR generates many files which are re-used by SOLAR itself. Normally there is no need to mess with these files, and it might cause problems if you do. But in some cases, users want or need to know about these files.
FORTRAN temporary files with names such as:
tmp.FAAA3Baa8z
may be left behind if a SOLAR command is terminated
unexpectedly. Temporary files are normally deleted automatically
end of some operation or at the end of a SOLAR session. Files
with names such as this (prefixed by tmp.
)
may be deleted when SOLAR is not running if you wish to
recover diskspace (otherwise, they are harmless). You should
exit from SOLAR first to be safe. Do not summarily delete any
other files created by SOLAR.
The load pedigree
command creates a file
named pedindex.out
(among other files; see
section 8.2.3 for more) in the current working directory.
Other SOLAR commands look for this file when they need
to access pedigree information. (One should not work with
more than one pedigree file from within the same working
directory as this can cause serious problems.) This
simplifies most SOLAR commands because they can read pedigree
information in a simplified, verified, and canonical form.
If you change the pedigree file, you will need to give the
load pedigree
command again to create a
new pedindex.out
file. If you have
created IBD and/or MIBD files, you will also
need to create them all over again, even if no genotypic
information has changed, since they depend upon the specific
indexing in the pedindex.out
file, which
might change even if you do not change any of your ID's.
Once a pedigree file has been loaded, it is not necessary to load
it again within the same working directory, even when starting
SOLAR again at a later date. SOLAR detects the
presence of an already existing
pedindex.out
file. If you reload the same
pedigree file, nothing will be damaged because the same
pedindex.out
will be re-created each time
you load the exact same source pedigree file. But unless you
have changed the pedigree data file, reloading it is just a
waste of time.
On the other hand, you must NOT try to load
pedindex.out
itself as a pedigree file
using load pedigree
. Not only is this not
necessary, it will not work as you may think. The pedigree
loading procedure is not idempotent. In other words,
the individuals in pedindex.out
will be
ordered differently than in the your original pedigree file,
and loading
pedindex.out
itself would result in yet
another ordering.
The pedindex.out
file associates each
ID
in your pedigree file with a
sequential ID used by all
SOLAR commands. The field name for these sequential
ID's is
IBDID
. IBDID
's are
unique in the entire file. Fathers and mothers are identified
by fields named
FIBDID
and
MIBDID
which provide the respective parent
IBDID
. (Note: the field names are found
inside the companion code file
pedindex.cde
, described in the next
section.)
An all new sequential pedigree identification
number named PEDNO
is also created which is
independent of any PEDNO
you might
already have in your pedigree file. The new
PEDNO
assignments are based entirely on
father and mother information available in your file to ensure
consistency. These new PEDNO
assignments
are the ones to which SOLAR commands such as
pedlod
refer.
If you already have a
PEDNO
field in your pedigree file, it is
ignored, and the PEDNO
assignments in
pedindex.out
are likely to be different.
If your ID
's are sequential within a
numbered families, and not unique in all your data, you must
use a field named
FAMID
to identify the families to which
individuals belong for identification purposes. If
present in your pedigree file, the FAMID
field will be carried over unchanged into the
pedindex.out
file. But SOLAR may decide
to divide your families down further, if, for example, it
detects individuals such as
marry-ins who have no genetic relation to other
individuals in a family. Marry-ins will become singleton
pedigrees with their own unique PEDNO
in
the pedindex.out
file. But this will not
affect the FAMID
, which is used simply for
individual identification purposes.
load pedigree
A PEDSYS code file named
pedindex.cde
is created which identifies
the fields in the fixed width pedindex.out
file. Because of this file, the
pedindex.out
can be read with such PEDSYS
programs as browse
. Within SOLAR,
you can write scripts that read
pedindex.out
using the
tablefile
command, which can read files
either in PEDSYS or comma delimited format.
A state file named pedigree.info
is
created which points to the loaded pedigree file and contains
basic information about the pedigree. This basic information
is displayed by the pedigree show
command.
The load pedigree
command also builds one
or two matrix files, the Kinship Matrix
phi2.gz
and, optionally, the Household
Matrix house.gz
. These matrix files
are described in the following sections.
The load pedigree
command also creates a
file named phi2.gz
which contains a two
times the kinship coefficient matrix. Currently this file
is not used in the normal course of operations (for
quantitative traits) because kinship coefficients are
generated on-the-fly by SOLAR in a way that reduces
memory requirements. However it is used during the analysis
of discrete traits because the discrete trait modeling code
was simplified by removing the on-the-fly kinship computation.
In some cases, very sophisticated SOLAR users can substitute a
modified phi2.gz
file to perform a special
kind of analysis. In order to force SOLAR to use the external
matrix file for a quantitative trait, you need to give a
suitable load matrix
command or use the
loadkin
command which does this using the
standard matrix identifiers phi2
and
delta7
(described below). This should be
done before running polygenic
. During
model maximization, if a matrix has been loaded with the
identifier name phi2
, it supercedes any
on-the-fly calculated values for phi2
.
It is instructive to understand the
phi2.gz
file as it is a template for all
matrix files, such as the IBD and MIBD files described below.
All matrix files are compressed with GNU gzip to save a considerable amount of space (a high compression factor is achieved.) Even in their compressed form a full set of MIBD files for the whole genome can take a lot of space (possibly 100's of megabytes.)
When uncompressed, every SOLAR matrix file is found to
have a very simple format. There are three or four columns of
numbers which must be space delimited. The first two columns
are sequential identifiers, based on the IBDID
sequencing in pedindex.out
(and not
necessarily any user sequencing, as described above.) There
must be one (and only one) line for each pair of individuals.
The last one or two columns are the coefficients related to
that pair of individuals. Each type of matrix file has
different types of coefficients. The coefficients should begin
in the fourteenth character column, or higher, counting the
first character column as number one.
Starting with SOLAR Version 4, a "checksum" is added as
the first line in every matrix file as a fake 1,1 element. It
is immediately followed by and overwritten by the real 1,1
element (or a fake 1,1 guard element with value 0). This
"checksum" is actually a polynomial Cyclic Redundancy
Check (CRC) computed with the Unix
cksum
command on the pedindex.out file.
This enables the
command (used
often in model files) to check that the matrix file was
created with the exact same pedigree. Even the smallest
change to a pedigree file will change the IBDID's computed
during the
If you are creating your own matrix files, you can generate
and prepend the required CRC using the
matcrc
command, which should be run on the
matrix file after it has been gzipped by you (matcrc will
gunzip the file, prepend the CRC, and then re-gzip the file).
This must be run in a directory where the matching pedigree
file is loaded. Version 4 of SOLAR does not require
that all matrices have CRC's, though this requirement may be
added in some future SOLAR version, and it is desireable
anyway to have the pedigree/matrix checking. Without this
checking, it is all too easy to use an obsolete matrix file
with a mismatching modified pedigree.
The phi2.gz
file has
phi2
and delta7
coefficients. phi2
is the kinship
coefficient phi times 2
, a term
which occurs frequently in genetic covariance equations. (We
might have liked to name this 2phi, but many computer programs
don't like names which begin with numbers.)
For pedigrees without inbreeding, the coefficient we call
delta7
is the same as delta7 from
the Jacquard condensed coefficients of identity,
delta1 - delta9. When inbreeding is not
present, Jacquard's delta7 is the probability that a
pair of individuals share exactly two alleles identical by
descent (IBD) at a randomly chosen locus. For pedigrees
with inbreeding however, our delta7
may
differ from Jacquard's delta7, and should not be used.
For non-inbred pedigrees, we can also express IBD-allele
sharing using the Cotterman coefficients,
K0 - K2.
K0 is the probability, at a random locus,
of sharing no alleles IBD, K1 is the
probability of sharing exactly one allele IBD, and
K2 is the probability of sharing two
alleles IBD. Hence, in the case of no inbreeding, our
delta7
is equivalent to
K2.
phi2
and delta7
are
named terms which may be used in the SOLAR
omega
(covariance) equation. Normally the
omega
is automatically set up for you by
other commands (such as polygenic
or
multipoint
) but it is available for
examination or modification for custom analysis (see Section 9.5 for an
introduction to custom analyses). Only
phi2
is currently set up automatically;
analysis invoving dominance would involve using
delta7
as well, and we have not automated
that yet (but it is described in Section 9.4).
When a load pedigree
command is executed,
the matrix file house.gz
will be created
automatically if the pedigree file has an
HHID
field (or a field mapped to
HHID
with the field command.) The
household matrix file contains only one meaningful coefficient
at this time. It is simply a 1 if the two individuals are
members of the same household, and 0 if they are not. We name
this coefficient house
.
The household matrix need not necessarily refer to household but could refer to any other shared environmental grouping, such as neighborhood, city, tribe, favorite musical genre, etc.
To include household effects in an analysis, use the house command prior to
giving the polygenic
command.
The ibd command
creates files named ibd.<marker>.gz
,
where <marker>
is the marker name.
These files contain the ibd
and
d7
coefficients for each pair of related
individuals for which ibd >
0
ibd
is
computed as .5 * p(1) + p(2)
p(i)
is the probability of sharing
exactly i
alleles identical by
descent at this marker locus. In non-inbred
relationships, ibd
gives the expected
proportion of alleles shared identical by descent at this
marker locus. d7
is simply p(2). These
coefficients are the marker-specific analogues of the
phi2
and delta7
coefficients described in the previous section. If no member
of a pedigree has been genotyped for the marker, an IBD value
of -1
is assigned to the main diagonal
entries in the IBD matrix for that pedigree. In a linkage
analysis, all pairs of individuals in that pedigree will be
treated as having ibd = phi2
IBD files will be created in the directory specified by the ibddir command. For
example, the command ibddir ibd
ibd
. You can refer to the current
directory with the name .
(dot).
The IBD files are used by the twopoint
command when doing twopoint analyses.
twopoint
uses the command
linkmod -2p
linkmod
-2p
load
matrix
d7
is only available if the IBDs were
computed by the Curtis and Sham method (or if
IBDs were imported from a package which computed
d7
). Pedigrees which contain inbreeding
or multiple marriage loops, or for which all
individuals have been typed, result in the Monte
Carlo method being used by SOLAR, and in those
cases d7
is not available. In those
cases, it is suggested that you use another genetics package
to compute IBDs
, which might give more
exact numbers anyway, and that is especially advantageous for
analyses of dominance. (Analysis of dominance is discussed in
Chapter 9).
A number of working files are required for the IBD calculation
process. These files exist in subdirectories created by the
load marker
command, one directory for
each marker. For more information, use the command help marker.
The first line in IBD matrix files created by SOLAR looks like strange data but is actually a a checksum to insure the matrix is used with the correct pedigree only. This checksum line is created by the matcrc command. It is not necessary for user-created matrix files to have this checksum line, but it is recommended and easy to do with the matcrc command.
The mibd
command creates files named
mibd.<chr>.<loc>.gz
, where
<chr>
is the chromosome number and
<loc>
is the chromosomal location in
cM. These files contain the mibd
coefficients for each pair of related individuals for which
mibd > 0
mibddir
command. For example,
mibddir mibd
mibd
. You can refer to the current
directory with the name .
(dot).
For historical reasons, the second column of MIBD files
created by SOLAR contains a copy of
phi2
from the phi2
matrix. This is no longer needed now, and it is an
obsolescent feature, because there is now a separate
phi2.gz
file. Imported MIBD files may
contain d7
, the MIBD analogue of
delta7
, in their second column, if the
genetic package computes it. It makes more sense for the
second column to contain d7
than anything
else now, (though it is unlikely that SOLAR will compute
d7
for MIBDs). See section 5.5 for information on
how to use MIBDs computed by another program.
The first line in MIBD matrix files created by SOLAR looks like strange data but is actually a a checksum to insure the matrix is used with the correct pedigree only. This checksum line is created by the matcrc command. It is not necessary for user-created matrix files to have this checksum line, but it is recommended and easy to do with the matcrc command.
mibd
is the multipoint analogue of the
ibd
coefficient in marker-specific IBD
files, which are described in the previous section. If no
member of a pedigree has been genotyped for any of the markers
on this chromosome, an MIBD value of -1 is assigned to the
main diagonal entries in the MIBD matrix for that pedigree.
In a linkage analysis, all pairs of individuals in that
pedigree will be treated as having mibd =
phi2
The MIBD files are used by the multipoint
command when doing multipoint analysis.
multipoint
uses the
linkmod
command to set up the parameters in
each multipoint model. linkmod
uses the
load matrix
command to load a particular
MIBD file.
In addition to the files above, four other files are created
and used to compute MIBDs:
mibdrel.ped
which stores the relative
class (e.g. full sib, parent-offspring) for each pair of
individuals in a pedigree;
mibdchr<chr>.mrg
which stores the
marker-specific IBDs for all the markers on chromosome
<chr>
;
mibdchr<chr>.mean
which stores, for
each relative-class, the mean IBD value at each marker on
chromosome <chr>
; and
mibdchr<chr>.loc
which stores the
location in cM of each marker on chromosome
<chr>
.
Models are constructed automatically by commands such as
polygenic
, twopoint
,
and multipoint
. In the course of these
commands, important models are automatically saved, and may be
reloaded later with the load model
command. Models may also be saved at any time with the
save model
command. To identify models,
SOLAR automatically tacks on a .mod
extension to all model filenames (unless they arleady have a
.mod
extension).
The model file itself is a text file consisting of a sequence
of SOLAR commands such as solarmodel
,
parameter
, constraint
,
trait
, covariate
,
omega
, mu
,
matrix
, and option
.
The first command must be a solarmodel
command (which identifies the SOLAR version under which the
model was created so that any upgrade issues can be resolved).
Model files do not contain load pedigree
or load phenotypes
commands. Those
commands must be given prior to using a model at least once in
any working directory.
There is usually some current model in effect in
SOLAR. Usually this is the best model created by the
previous command. When starting solar, the empty model
is in effect. (To clear any previous model and start over
with a new empty model, use the model
new
command.)
If the current model has been maximized (so that
its parameters have been set to their maximum likelihood
estimates) the model will include a
loglike
command giving the (natural) log
likelihood associated with the maximized model. The basic
maximization command is maximize
, but
commands such as polygenic
,
multipoint
, and
twopoint
automatically do maximization for
you so you may not need to use the actual
maximize
command except in special cases
such as those described in Chapter 9.
When any maximization starts, the starting model is saved as
last.mod
, which might permit you to go
back to the previous model if something goes wrong. In many
cases, however, you cannot depend on the
last.mod
command to take you back before
your last command. Sometimes commands create and
maximize several intermediate models, and in that case
last.mod
will represent the last of these
intermediate models. Even the maximize
command itself sometimes creates intermediate models in the
process of trying to resolve convergence difficulties.
It is more useful to load specially named models that may
result from particular commands such as
polygenic
and
multipoint
. For example,
polygenic always creates a model named
null0.mod
, and
multipoint
creates a model named
null1.mod
(and
null1.mod
, null2.mod
,
etc. if oligogenic scanning is performed). For a detailed
description of the models created by any particular command,
see the command documentation for that command.
The default directory for loading and saving models is the
current working directory. However, commands such as
multipoint
and
twopoint
save models into the
maximization output directory named by the
trait
or outdir
command. Therefore, if you are going to load any of the
models created by a previous command, you need to either
specify that directory name explicitly:
solar> load model q4/null0
Or, if you have previously specified the trait
or
outdir
or loaded another model (which
specifies the trait) you can use the full_filename
command:
solar> load model [full_filename null0](The
full_filename
command is most useful
in scripts where you don't know what the trait is going to be,
or whether the user has used the outdir
command.)
There is also a command read_model which lets you read the likelihood and/or parameter values from previously stored models without loading them.
If you maximize a model with the maximize
command, or if you use the verbosity max
command before running a command which maximizes many models
such as polygenic
or
multipoint
, you will see a lot of
information about the model, the data, and the history of
maximization displayed on your terminal. This information may
also be written to a file for you to read later, (even if you
haven't used the verbosity max
command).
The default name for this maximization output
file if you use the maximize
command is solar.out
, and it is written to
the maximization output directory described in the previous
section. Other commands which do maximization name these
files after the models they create, except with a
.out
suffix.
polygenic : |
poly.out spor.out nocovar.out |
multipoint : |
null1.out null2.out |
Maximization output files show the final values for all parameters, the standard errors (unless standard error calculations were turned off), the Loglikelihood, the Normalized Quadratic value, Descriptive Statistics for the Quantitative Variables, and the Iteration History. (Note: if there are retries because of maximization convergence difficulties, the Iteration History will only correspond to the last maximization retry.) Some of these values are saved within the corresponding model file itself, but a few are not. For that reason, there is a command to read certain information from the maximization output files which is not available in model files: read_output.