At the solar>
prompt, you have
immediate access to three powerful command systems. For one,
you have access to all the SOLAR commands described
elsewhere in this manual. For another, you have access to all
the commands built-in to Tcl. Thirdly, you also have
access to all the Unix (or linux) commands on
your system (except for a few which are actually shell
commands defined by your shell). Note that you cannot
use shell wildcards like *
directly because of Tcl limitations, but there are ways to get
around this described below in section 7.7 below.
You can also create your own Tcl scripts which can be invoked from SOLAR just as though they were built-in commands. That is what the rest of this chapter is about.
You may find that you rarely have to exit from SOLAR. For example, you can list your current directory just as you would do in your Unix shell:
solar> ls
The Public Release of SOLAR uses an open source command and script interpreter named Tcl (officially pronounced "tickle") which has been available since the 1980's and is now in use by thousands of varied applications throughout the world, especially in science and engineering.
Tcl, which stands for "Tool Command Language," was developed by John Ousterhout starting in the mid 1980's at UC Berkeley. Sun later adopted the Tcl project, but in 1998 Dr. Ousterhout started his own company, Scriptics, which later changed its name to Ajuba. The definitive Tcl book (which we use) is Tcl and the Tk Toolkit by John Ousterhout himself, though it is now slightly outdated (mostly regarding some very esoteric features you are unlikely to be concerned with), and there are also several other good Tcl books now. The reference documentation for the version of Tcl used by SOLAR (8.0.5) is available online:
http://www.tcl.tk/man/tcl8.0/TclCmd/contents.htm
SOLAR and Tcl ommand names may be abbreviated to
the shortest unambiguous string when entered at the
solar>
prompt. This is also true for most
SOLAR commands in scripts (however, it is not true for Tcl
commands; Tcl commands cannot be abbreviated in scripts).
Also, in scripts you must use the
exec
command to execute Unix
commands, even though, as shown above,
exec
is not required at the
solar>
prompt. (Abbreviation and
automatic exec are special convenience features
for interactive use.) For example, at the
solar>
prompt, you could copy a file like
this:
solar> cp solar.out save.out
But in a script, you would have to enter the same command like this:
exec cp solar.out save.out(For this trivial example, however, you could also use the Tcl command file copy, which doesn't require
exec
.)
Any SOLAR command can be entered at the SOLAR prompt (as in the Tutorial in Chapter 3). But for serious and systematic usage, once you have become familiar with the SOLAR commands, it is usually more efficient to put SOLAR commands into script files so that the same operations may be reused with updated data sets, with other data, or with variations.
Also, since many SOLAR commands or sequences of commands can run for quite a long time (minutes, hours, days, or even weeks) it may be best to put them in scripts so you don't have to sit around waiting for each command to finish before starting the next. (Note: You can actually enter new SOLAR commands before the previous ones have finished. But there are limits to this, and it is messy if SOLAR is meanwhile writing output to your terminal. It is safest to write scripts for long running sequences of commands.)
Examples of some simple SOLAR scripts are provided in
the doc/Example
subdirectory of your SOLAR
installation. These are short and easy to understand. You
can copy the entire Example (including these scripts)
to your working directory using the SOLAR
example
command. The example script files are
named
makemibd.tcl
and
doanalysis.tcl
.
You can write SOLAR scripts with any text editor (such
as vi or
emacs or even the CDE Text Editor which may be
selected through the Workspace Menu/Applications on Sun
systems). You should save scripts with the filename extension
.tcl
. When SOLAR starts up, it scans for
all .tcl
files, and installs all the
procedures defined in those files as new commands. If you
have already started SOLAR, you can use the
newtcl
command to make SOLAR look for new
or modified .tcl
files in your working
directory. You can also put your SOLAR script files in
a lib
subdirectory of your home directory;
that way they will be found regardless of what working
directory you are in.
A SOLAR script file should be organized into one or more
procedures, defined using the Tcl
proc
command (shown in examples below).
That is the way to define new commands which can be
invoked from SOLAR. Procedures can also be invoked
from other procedures, allowing you to build up collections of
reuseable scripts.
solar>
prompt, just as with other
commands. You simply specify the procedure name as a command,
followed by any mandatory and optional arguments. For a
procedure named myproc
and two arguments
arg1
and arg2
, this
would look like this:
solar> myproc arg1 arg2
However, sometimes it is useful or convenient to be able to run SOLAR simply to run one command or script. That way the command can be run in background or through a job queing system. To run the above procedure this way, you can give a command like this from your shell prompt (shown as $ below, though for those using the C shell it might be % instead):
$ solar myproc arg1 arg2The above syntax also works fine from within a shell script, for those who know how to write shell scripts. However, in some cases, people who write a shell script want to execute several SOLAR commands in a row. You could do that by writing a Tcl script containing all those commands, and then execute that Tcl script as above in the shell script. However, often people find it more convenient or useful to put everything in a single shell script. You can do that using the "<< END" syntax allowed by shell scripts as in the following example:
solar << END model new trait mytrait covar mycovar polygenic END
Take a look at the example script named
makemibd.tcl
included in the example:
proc makemibd {} { # # Purpose: Makes ibd and mibd files for chromosome 9 and 10 in subdirectories # # # Remove old directories (if any) and make new ones # exec rm -rf gaw10ibd exec rm -rf gaw10mibd exec mkdir gaw10ibd exec mkdir gaw10mibd ibddir gaw10ibd mibddir gaw10mibd # # Make ibd files...simple case where all genotypes are known # load pedigree gaw10.ped load marker mrk9 ibd load marker mrk10 ibd # # Make mibd files # load marker mrk9 load map map9 mibd 1 load marker mrk10 load map map10 mibd 1 }
For the most part, this is simply a list of SOLAR commands
exactly as you might have entered them at the
solar>
prompt.The first line:
proc makemibd {} {
tells Tcl
that a procedure named
makemibd
is being defined. The adjacent
pair of curly braces {}
indicates that
there are no arguments to the procedure (if there were
arguments, their names would be listed between the braces).
Then the open brace {
indicates that
several command lines follow. The close brace
}
at the bottom ends the procedure
definition. The lines which begin with pound sign
#
are comment lines.
toscript
It is often useful to save the commands you have used in a
SOLAR session to a script. The toscript command
lets you do this. You can either write all of the
commands used in the current session (or at least, the most
recent 200, since the history buffer is set to 200 by default)
to a script file, or select which particular commands you want
included. Often it is useful to write a script with
toscript
and then edit it further with a
text editor to add additional refinements and/or corrections.
It is often useful to review the previous commands using the
history
command first. For example:
solar> example solar> load pedigree gaw10.ped solar> load phenotypes gaw10.phen solar> trait q4 solar> covar age sex age*sex solar> help polygenic solar> polygenic -s solar> history 1 example 2 load pedigree gaw10.ped 3 load phenotypes gaw10.phen 4 trait q4 5 covar age sex age*sex 6 help polygenic 7 polygenic -s 8 history solar> toscript startup 2-5 7 proc startup {} { load pedigree gaw10.ped load phenotypes gaw10.phen trait q4 covar age sex age*sex polygenic -s }
As the script is being written to a file, it is also displayed
on your terminal. To overwrite an existing script, you must
use the -ov
option.
newtcl
is invoked automatically to add the
new procedure. Be careful not to use the name of any built-in
procedure (which won't work anyway).
All Tcl commands are similar to Unix commands in that they consist of a command name followed by arguments. For example:
solar> puts Hello Hello
is an example of the puts
command, and
will, as you might expect, display the word
Hello
on your terminal. This command is
shown with one argument: the word Hello. If the string you
wanted to display has spaces in it, you would have to enclose
it in double quotes:
solar> puts "Hello, there." Hello, there.
You can assign variables to values in Tcl using the set command. The first argument is the variable name, and the second is the value, which can be a number or a text string. Tcl does not require you to declare the type of any variable.
solar> set weight 175 solar> set name Charles
To use the value of variables, you precede their names with
the $
operator.
solar> puts "My name is $name, and my weight is $weight." My name is Charles, and my weight is 175.
To evaluate an ordinary arithmetic expression, you must use the
expr
command. Tcl uses precedence rules
(such as multiplication having higher precedence than
addition) similar to other programming languages such as
C, and also has built-in math functions:
solar> expr 10 - 6 / 3 + 2 * sqrt(4) 12.0
You can put square brackets around any Tcl command to have it evaluated in the context of another command. Tcl replaces the square bracketed command by the result it returned.
solar> puts "Oops! My weight increased to [expr 10 + $weight]." Oops! My weight increased to 185.
You can use this technique to assign a variable to the value of an expression:
solar> set length 3 solar> set width 4.5 solar> set area [expr $length * $width]
Tcl has the features of a modern structured computer
language, such as if
commands with
optional else
and
elseif
clauses, while
commands,
and for
commands similar to those in the
C programming language. Note that curly braces must be
used to enclose the condition if it is more than one
word, and also to allow a command (or part of a command) to
extend for more than one line. Curly braces also let you
enter multiline commands at the solar>
prompt.
solar> if {$area > 12} { puts "Area is greater than 12" } elseif {$area == 12} { puts "Area is equal to 12" } else { puts "Area is not allowed to be less than 12" } Area is greater than 12 solar>
Tcl also has many built-in string and list operations. Any string
in Tcl can also be interpreted as a list of elements separated
by spaces. An interation command foreach
can be used to perform one or more commands on each element of
a list.
solar> set friends "Ralph Bill Fred" solar> lindex $friends 0 Ralph solar> lappend friends Sally Ralph Bill Fred Sally solar> lsearch $friends Fred 2 solar> foreach friend $friends { puts "Hi, $friend." } Hi, Ralph. Hi, Bill. Hi, Fred. Hi, Sally.
In most cases, a Tcl list may be used where
you might use an array in another
programming language for representing a simple
aggregate to be indexed by position. Tcl also has an
"array" type, but it is a more sophisticated associative
array than most people are familiar with. Rather than
indexing values by position, the Tcl array associates values with
names, their position not being important. But since
the names could also be numbers, you can still
use an a Tcl array in the usual way, with numeric indexes.
But in many cases the transparency of a list (it is viewable
as a text string), and the simplicity of extending it with
lappend
and iterating it with the
foreach
command make the list the better
choice.
Anyway, here is an example of using the Tcl array, in conjunction with the list created above:
solar> set age(Sally) 44 solar> set age(Fred) 45 solar> set age(Ralph) 46 solar> set age(Bill) 47 solar> foreach friend $friends { puts "$friend's age is $age($friend)." } Ralph's age is 46. Bill's age is 47. Fred's age is 45. Sally's age is 44.
Beyond a few lines, you will probably want to put any significant
Tcl programming into procedures which can be called and
incorporated into other procedures. Procedures are defined with
the proc
command, and can look like this:
proc times {a b} { return [expr $a * $b] }
It is recommended, though not necessary, for every proc to have a return statement. If there is nothing particular to return, return an empty string:
return ""
This prevents your procedure from returning something from the last statement, which might not be what you expect.
Once you have defined a procedure in a file that SOLAR loads, you can run the procedure just like any built-in Tcl command:
solar> puts "The area is [times $width $length]." The area is 12.4.
Procedures can have zero or more arguments. The argument
names are associated with values when the procedure is
invoked. An argument named args
can be
used to allow variable number of arguments. All the values
are assigned to args
as a list:
solar> proc sumof {args} { set sum 0 foreach arg $args { set sum [expr $sum + $arg] } return $sum } solar> sumof 1 2 3 4 5 15
If any command in a script generates an error, the script will
terminate, and an error message will be displayed. If you
want to intercept such errors instead, you can enclose one
or more commands inside a catch
command,
which returns 0 if there was no error, or 1 if there was an
error, but in either case lets the script continue past the
end of commands in the catch command itself:
if { [catch {exec rm temp.out}]} { puts "The file did not exist" }
On the other hand, if you need to raise an error to
terminate a script before completion, you can use the
error
command. The string you provide
will be displayed on the terminal:
error "This should not have happened."
If you need to continue a Tcl command past one line, you can
use the \
operator at the end of the line
(which escapes the
newline). Blocks of commands can be enclosed within
curly braces, as shown in many of the examples above.
Of course, there is much much more to Tcl than can be shown here. Get the book! Meanwhile, you can check out all the commands here.
The Tcl interpreter permits every command to define its
own syntax. Some SOLAR commands use a syntax which is
intended to be more intuitive than that in basic Tcl. For
example, consider the parameter
command
which defines model parameters and/or allows you to set their
starting point or boundaries:
parameter bsex = 0
The parameter
command is defined so that if its second argument is
an equal sign, it sets the current (starting) value to the
following argument. This looks like an assignment
statement as in programming languages like C or
Fortran, but not Tcl. (Note that it is safest to put
spaces before and after the equal sign.) Other SOLAR commands
which let you use the equal sign are omega, and mu,and constraint.
However, for these commands the equal sign represents a a
fixed equality rather than an immediate (and temporary)
assignment.
constraint E2 + H2r = 1 omega = pvar*(Phi2*h2r + I*E2) mu = Mean
The syntax allowed for each of these commands differs from
what is required for Tcl commands such as
expr
. See the documentation for each
command for more details. For now, note that parameters and
matrices are listed by name without using the
$
for dereferencing, and names are treated
without case sensitivity. The
omega
and mu
commands
use a special SOLAR expression parser which allows
either **
or ^
to
indicate exponentiation, any math function defined for
C
, and special variables whose meaning is
context dependent. The
constraint
command is limited to simple
linear constraints because those are the only kind we are
able to support.
Thought this is not intended to replace writing ordinary
script files (as described above), you can create a file of
SOLAR option-setting commands in your working directory
named .solar
and SOLAR will execute them
when starting. For example you could have a .solar file
containing the commands:
chromosome 1-22 finemap 1 interval 5 option MergeAllPeds 0 boundary wide start
This is intended only for commands which set up the environment
or options for later commands! You should not use it to perform
significant work in SOLAR, such as by using the
ibd
, polygenic
,
multipoint
, or
maximize
commands. Under some
circumstances, major SOLAR commands reload the
.solar
file and this could cause an
infinite recursion. It is not even a good idea to load models
in the .solar
file.
When you are using Unix commands at the
solar>
prompt or in SOLAR scripts, you will
find that you can't use shell wildcards such as
*
directly. SOLAR uses Tcl as its
command interpreter, and Tcl does not understand wildcards
except in
glob
commands, which will be described
below. Note that the character *
has two
special meanings in Tcl: inside an expr
command it is the
multiplication symbol, and inside a glob command
is is the wildcard symbol.
If all you want to do is execute some Unix command with
wildcards interactively (not in a script), the easiest thing
to do may be to break out of SOLAR temporarily.
This works with most common Unix shells. You can
simply enter the keystroke CTRL-z
(pressing the
Control key and the z key at the same time)
and you will temporarily
break out of SOLAR back to the Unix shell from which you
started SOLAR.
Then you can use
wildcards as you usually do. Then, to return to your SOLAR
session, use the shell built-in command
fg
(which stands for foreground).
Note that CTRL-z
echoes as
^Z
on your terminal, and most shells then print
a cryptic Unix message which you can ignore unless you are
going to break out recursively. Then, when you use
fg
to return to SOLAR, the shell will
print the name of the program you are returning to (i.e.
solar
) but you will not get another
solar>
prompt automatically.
Nevermind that, you can simply enter
a command anyway, or, if it makes you feel better, hit the
RETURN key again and you will get another prompt.
(From the point of view
of SOLAR, you never entered a command to the last
solar>
prompt, so it has no reason to
give you another prompt. The
CTRL-z
and everything else you entered in
the meantime went back to your original shell and SOLAR itself
never saw any of it.) Here's what it would look like (for
shell Csh running with the default prompt on a machine
named
mendel):
solar> ^Z[1] + Stopped (SIGTSTP) solar mendel% rm *.tmp mendel% fg solar load phenotypes gaw10.phen solar>
A similar but alternative way to get into your shell is to
launch your shell again from inside
SOLAR. This is easy to do if you know the name of the
command which launches your shell. If you are using the
C shell, the command is csh
. If
you are using linux, and you don't know any
different, your shell is likely to be bash which is
launched with the command bash
. Another
possibility is the Korn Shell which is launched with
the command ksh
. If you don't know, ask
your system administrator. Once you know the command to
launch your shell, the rest is easy. You simply enter that
command, and then you will be running your shell, from which
you can enter any command using wildcards the way you usually
do. When you are done, you exit from the shell with the
command exit
. Then you will be back
inside SOLAR with another solar>
prompt. This might look something like this:
solar> csh mendel% rm *.tmp mendel% exit solar>
Using the Tcl command glob
is not
difficult to understand by itself, but the consequences of
using it often require a little thought, and you may also need
to understand a few other Tcl commands. The
glob
command simply returns a list
of all the filenames that match the patterns provided
as arguments (which may include wildcards such as
*
to match any string, and
?
to match any single character). You could
then use that list in a command such as
foreach
to iterate through that list and
do one or more things to each of its elements:
set tempfiles [glob -nocomplain *.tmp tmp/*] foreach tempfile $tempfiles { catch {file delete $tempfile} }The
-nocomplain
option prevents
glob
from raising an error if there are
no matching files. The foreach
command
takes each list element in turn and assigns it to the name
given as its first argument for each pass through the list of
commands. The file delete
command is
a command built in to Tcl for deleting files. The
catch
prevents the loop from terminating
early if there are any files protected from deletion.
(Alternatively, you could have invoked the file
delete
command with the -force
option, which would cause it to delete all files you are able
to delete, and therefore generally doesn't raise any errors
except in exceptional conditions that you would probably want
to know about anyway.) (Note also that the Tcl
file
command has many variants which allow
you to do many other things besides deleting files.
Some of the more useful
file
commands have to do with extracting
pathnames and
extensions from filenames. The file command
works on both files and filenames. See the Tcl
file
documentation for more details.)
The above approach works best particularly if you are going to
do more than one thing with each file, and it is also useful
if you want to have the most control over what is going on,
such as in not forcing the deletion of every file, but
nevertheless completing the entire list of files, and
deleting all the ones which are not protected from deletion.
If all you want to do is forcibly delete each file, you don't
need a loop to do this. But, at the same time, you
can't simply pass the list of filenames to either the
file delete
command or its Unix
equivalent, rm
. The list of files
looks like one big long string, and if you simply try to use
it in another command, that command will only see one big name
consisting of all the other names put together. For example,
if [glob *.tmp]
matches files
a.tmp
and b.tmp
, it
will return a list which looks like the string:
"a.tmp b.tmp"If that string is passed to the
file delete
command, it will try to find one filename with all 11 characters
and a space in the middle of the name (which is possible in
Unix). But this is probably not what you want it to do.
Besides using the foreach
command to
divide up the list, it is also possible to use the
eval
command to operate on your
command. What eval
does is evaluate all
the commands and dereference all the variables in your
command, and then evaluate a new command with the results of
all those commands and variables put together as though they
were separate arguments. (This sounds a bit tricky at first,
but it is ultimately one of the most useful features of
Tcl to be able to do things like this.) The following
command might do what we want:
eval file delete -force [glob -nocomplain *.tmp]All we had to do here is add the command
eval
in front of our command as it might
have seemed we could have written it.
eval
will then create a command like the
following command and evaluate it:
file delete -force a.tmp b.tmp
(Note: In this example, the -force argument is used
simply because if some file actually is protected from
deletion, it will cause the entire command to exit, possibly
before deleting all the other files. Using
catch
will prevent it from exiting the
entire script, but it will not prevent the file
delete
command itself from exiting prematurely.)
Now, if instead of using the file delete
we wanted to use the Unix command
rm
inside a script, we would have to put
the exec
command in front of
rm
so that Tcl knows we want to
evaluate a Unix command. If we're not sure that any files
are going to match the pattern, we had better add a
-nocomplain
option, and we had also better put the
entire command inside a catch
because
rm
will generate an error if it isn't
actually given any names to delete (file
delete
doesn't generate an error in such a case).
Also, we need to force the deletion of all files with the
-f
option of rm
.
So, the resulting command looks like this:
catch {eval exec rm -f [glob -nocomplain *.tmp]}Once you understand what all the terms do, this isn't really that difficult, but often people want to do these things without thinking much about them, which is why the easiest way to use wildcards interactively is probably to launch a new shell to do it, as described in the previous section. Then you only have to think about all these details when you are writing scripts.
In the course of developing SOLAR, we have developed certain utility procedures which are helpful in writing SOLAR scripts. (These are in addition to the extensive set of utility commands provided by Tcl.) The most useful of these are documented and shown in the command listing. Click on the links below for more details:
tablefile | Read data file in comma delimited or PEDSYS format |
solarfile | Read data file as with "tablefile" but applying "field" name mapping |
putsout | Write message to terminal and/or file |
drand | Return a random floating point number between 0 and 1 |
chi | Compute probability for a chi-square value |
chinc | Compute probability for a noncentral chi-square value |
alnorm | Evaluate the tail of a normal curve |
if_parameter_exists | Check if a parameter exists |
read_model | Return a parameter value from any saved model |
read_output | Read variable statistics from maximization output file |
read_arglist | Read hyphenated optional arguments and argument-value pairs |
is_nan | Check if a value is NaN (Not a Number) |
if_global_exists | Check if a Tcl global variable exists |
remove_global | Remove a global variable (so it no longer exists) |
catenate | Concatenate strings |
string_imatch | Case insensitive string match testing |
setappend | Append only new elements to a list (keeping it like a set) |
remlist | Remove element from list |
stringsub | Simple verbatim string substitution (not regsub) |
full_filename | Prepend the maximization output directory name to a filename |
clod | Calculate a LOD score |
tclgr | Create xmgr session with pipe connection to SOLAR |
stats | Get and/or show statistics for any variable in a file |
combinations | Make a list of all combinations (sets) of integers 1..N |
showproc
Many of the commands in SOLAR
are actually implemented as SOLAR scripts themselves. All the
standard SOLAR scripts are defined in the file
solar.tcl
found in the
lib
subdirectory of your SOLAR
installation.
You can examine the standard scripts using the showproc command. By itself,
this will simply display the entire procedure definition of
any SOLAR command which is defined by a script. If the
script is more than one page long, it will be displayed using
the more
pager, which shows only a page at
a time and lets you advance to the next page by pressing
space. For example:
solar> showproc twopoint
The formatting shown by showproc
may not
be as pretty as it actually is in the source file because it
will concatenate lines which are extended by using
backslash. showproc
is based on the
Tcl command info body
which has
this feature.
You can also write copies of SOLAR procedures to files in your
working directory. You can then edit the procedures to fit
special requirements. To prevent any confusion with the built-in
commands, the name of the newly created procedure is suffixed
with .copy
, regardless of the name you
choose for the output file. If you would rather not use the
.copy
suffixed name, you can change that
when you are editing the copied procedure. (But do not
attempt to give the procedure the exact same name as a
built-in procedure!)
solar> showproc twopoint twopoint.tcl solar> newtcl solar> twopoint.copy
SOLAR is designed so that even if you were to create procedures with the same names as built-in procedures, they would be ignored, and the built-in procedures would be used anyway. Otherwise, you could foul up SOLAR operation unintentionally. Even if you were careful not to create procedures with any documented command names, you might create a procedure with the same name as some undocumented internal procedure.
If you must know how this works, read this paragraph. The
scripts in directory containing the active
solar.tcl
file are given the highest
precedence. So, actually, you could overcome the safeguard by
simply copying the solar.tcl
file to your
working directory, then all the scripts in your working
directory will have the highest precedence. But in that case,
you might as well edit the entire
solar.tcl
file in your directory. If you
were to copy solar.tcl
into your working
directory, then make a copy of a built-in procedure using the
showproc
command, and edit that copy to
remove the .copy
suffix, you would have
two copies of the same script name in the highest priority
directory. This ambiguity would be resolved in a somewhat
unpredictable way: by which file is found first while
traversing the directory. This is not necessarily the
alphabetic order in which files are displayed by the
ls
command. To see which file's version
of a script is actually being used, take a look at the
tclIndex
file created by SOLAR in
your working directory.