Go to Main Index
Go to Table of Contents

Chapter 7

SOLAR Scripts

7.1 The SOLAR Interpreter

At the solar> prompt, you have immediate access to three powerful command systems. For one, you have access to all the SOLAR commands described elsewhere in this manual. For another, you have access to all the commands built-in to Tcl. Thirdly, you also have access to all the Unix (or linux) commands on your system (except for a few which are actually shell commands defined by your shell). Note that you cannot use shell wildcards like * directly because of Tcl limitations, but there are ways to get around this described below in section 7.7 below.

You can also create your own Tcl scripts which can be invoked from SOLAR just as though they were built-in commands. That is what the rest of this chapter is about.

You may find that you rarely have to exit from SOLAR. For example, you can list your current directory just as you would do in your Unix shell:

	solar> ls

The Public Release of SOLAR uses an open source command and script interpreter named Tcl (officially pronounced "tickle") which has been available since the 1980's and is now in use by thousands of varied applications throughout the world, especially in science and engineering.

Tcl, which stands for "Tool Command Language," was developed by John Ousterhout starting in the mid 1980's at UC Berkeley. Sun later adopted the Tcl project, but in 1998 Dr. Ousterhout started his own company, Scriptics, which later changed its name to Ajuba. The definitive Tcl book (which we use) is Tcl and the Tk Toolkit by John Ousterhout himself, though it is now slightly outdated (mostly regarding some very esoteric features you are unlikely to be concerned with), and there are also several other good Tcl books now. The reference documentation for the version of Tcl used by SOLAR (8.0.5) is available online:

http://www.tcl.tk/man/tcl8.0/TclCmd/contents.htm

SOLAR and Tcl ommand names may be abbreviated to the shortest unambiguous string when entered at the solar> prompt. This is also true for most SOLAR commands in scripts (however, it is not true for Tcl commands; Tcl commands cannot be abbreviated in scripts). Also, in scripts you must use the exec command to execute Unix commands, even though, as shown above, exec is not required at the solar> prompt. (Abbreviation and automatic exec are special convenience features for interactive use.) For example, at the solar> prompt, you could copy a file like this:

	solar> cp solar.out save.out

But in a script, you would have to enter the same command like this:

        exec cp solar.out save.out

(For this trivial example, however, you could also use the Tcl command file copy, which doesn't require exec.)

7.2 Why Write Scripts?

Any SOLAR command can be entered at the SOLAR prompt (as in the Tutorial in Chapter 3). But for serious and systematic usage, once you have become familiar with the SOLAR commands, it is usually more efficient to put SOLAR commands into script files so that the same operations may be reused with updated data sets, with other data, or with variations.

Also, since many SOLAR commands or sequences of commands can run for quite a long time (minutes, hours, days, or even weeks) it may be best to put them in scripts so you don't have to sit around waiting for each command to finish before starting the next. (Note: You can actually enter new SOLAR commands before the previous ones have finished. But there are limits to this, and it is messy if SOLAR is meanwhile writing output to your terminal. It is safest to write scripts for long running sequences of commands.)

Examples of some simple SOLAR scripts are provided in the doc/Example subdirectory of your SOLAR installation. These are short and easy to understand. You can copy the entire Example (including these scripts) to your working directory using the SOLAR example command. The example script files are named makemibd.tcl and doanalysis.tcl.

You can write SOLAR scripts with any text editor (such as vi or emacs or even the CDE Text Editor which may be selected through the Workspace Menu/Applications on Sun systems). You should save scripts with the filename extension .tcl. When SOLAR starts up, it scans for all .tcl files, and installs all the procedures defined in those files as new commands. If you have already started SOLAR, you can use the newtcl command to make SOLAR look for new or modified .tcl files in your working directory. You can also put your SOLAR script files in a lib subdirectory of your home directory; that way they will be found regardless of what working directory you are in.

A SOLAR script file should be organized into one or more procedures, defined using the Tcl proc command (shown in examples below). That is the way to define new commands which can be invoked from SOLAR. Procedures can also be invoked from other procedures, allowing you to build up collections of reuseable scripts.

7.2.1 Ways to invoke SOLAR commands and scripts

A typical way to invoke scripts is from the solar> prompt, just as with other commands. You simply specify the procedure name as a command, followed by any mandatory and optional arguments. For a procedure named myproc and two arguments arg1 and arg2, this would look like this:

        solar> myproc arg1 arg2

However, sometimes it is useful or convenient to be able to run SOLAR simply to run one command or script. That way the command can be run in background or through a job queing system. To run the above procedure this way, you can give a command like this from your shell prompt (shown as $ below, though for those using the C shell it might be % instead):

        $ solar myproc arg1 arg2

The above syntax also works fine from within a shell script, for those who know how to write shell scripts. However, in some cases, people who write a shell script want to execute several SOLAR commands in a row. You could do that by writing a Tcl script containing all those commands, and then execute that Tcl script as above in the shell script. However, often people find it more convenient or useful to put everything in a single shell script. You can do that using the "<< END" syntax allowed by shell scripts as in the following example:

        solar << END
	model new
	trait mytrait
	covar mycovar
	polygenic
	END

7.3 makemibd.tcl

Take a look at the example script named makemibd.tcl included in the example:

        proc makemibd {} {
        #
        # Purpose: Makes ibd and mibd files for chromosome 9 and 10 in subdirectories
        #
        #
        # Remove old directories (if any) and make new ones
        #
            exec rm -rf gaw10ibd
            exec rm -rf gaw10mibd
            exec mkdir gaw10ibd
            exec mkdir gaw10mibd
            ibddir gaw10ibd
            mibddir gaw10mibd
        #
        # Make ibd files...simple case where all genotypes are known
        #
            load pedigree gaw10.ped
            load marker mrk9
            ibd
            load marker mrk10
            ibd
        #
        # Make mibd files
        #
            load marker mrk9
            load map map9
            mibd 1
            load marker mrk10
            load map map10
            mibd 1
        }

For the most part, this is simply a list of SOLAR commands exactly as you might have entered them at the solar> prompt.The first line:

        proc makemibd {} {

tells Tcl that a procedure named makemibd is being defined. The adjacent pair of curly braces {} indicates that there are no arguments to the procedure (if there were arguments, their names would be listed between the braces). Then the open brace { indicates that several command lines follow. The close brace } at the bottom ends the procedure definition. The lines which begin with pound sign # are comment lines.

7.4 Automatically writing scripts with toscript

It is often useful to save the commands you have used in a SOLAR session to a script. The toscript command lets you do this. You can either write all of the commands used in the current session (or at least, the most recent 200, since the history buffer is set to 200 by default) to a script file, or select which particular commands you want included. Often it is useful to write a script with toscript and then edit it further with a text editor to add additional refinements and/or corrections. It is often useful to review the previous commands using the history command first. For example:

        solar> example
        solar> load pedigree gaw10.ped
        solar> load phenotypes gaw10.phen
        solar> trait q4
        solar> covar age sex age*sex
        solar> help polygenic

        solar> polygenic -s

        solar> history
            1  example
            2  load pedigree gaw10.ped
            3  load phenotypes gaw10.phen
            4  trait q4
            5  covar age sex age*sex
            6  help polygenic
            7  polygenic -s
            8  history

        solar> toscript startup 2-5 7
        proc startup {} {
            load pedigree gaw10.ped
            load phenotypes gaw10.phen
	    trait q4
	    covar age sex age*sex
	    polygenic -s
	}

As the script is being written to a file, it is also displayed on your terminal. To overwrite an existing script, you must use the -ov option. newtcl is invoked automatically to add the new procedure. Be careful not to use the name of any built-in procedure (which won't work anyway).

7.5 More Basic Tcl Programming

All Tcl commands are similar to Unix commands in that they consist of a command name followed by arguments. For example:

	solar> puts Hello
        Hello

is an example of the puts command, and will, as you might expect, display the word Hello on your terminal. This command is shown with one argument: the word Hello. If the string you wanted to display has spaces in it, you would have to enclose it in double quotes:

	solar> puts "Hello, there."
        Hello, there.

You can assign variables to values in Tcl using the set command. The first argument is the variable name, and the second is the value, which can be a number or a text string. Tcl does not require you to declare the type of any variable.

	solar> set weight 175
	solar> set name Charles

To use the value of variables, you precede their names with the $ operator.

	solar> puts "My name is $name, and my weight is $weight."
	My name is Charles, and my weight is 175.

To evaluate an ordinary arithmetic expression, you must use the expr command. Tcl uses precedence rules (such as multiplication having higher precedence than addition) similar to other programming languages such as C, and also has built-in math functions:

	solar> expr 10 - 6 / 3 + 2 * sqrt(4)
        12.0

You can put square brackets around any Tcl command to have it evaluated in the context of another command. Tcl replaces the square bracketed command by the result it returned.

        solar> puts "Oops!  My weight increased to [expr 10 + $weight]."
        Oops!  My weight increased to 185.

You can use this technique to assign a variable to the value of an expression:

	solar> set length 3
	solar> set width 4.5
	solar> set area [expr $length * $width]

Tcl has the features of a modern structured computer language, such as if commands with optional else and elseif clauses, while commands, and for commands similar to those in the C programming language. Note that curly braces must be used to enclose the condition if it is more than one word, and also to allow a command (or part of a command) to extend for more than one line. Curly braces also let you enter multiline commands at the solar> prompt.

	solar> if {$area > 12} {
	         puts "Area is greater than 12"
	       } elseif {$area == 12} {
                 puts "Area is equal to 12"
	       } else {
	         puts "Area is not allowed to be less than 12"
               }
        Area is greater than 12
        solar>

Tcl also has many built-in string and list operations. Any string in Tcl can also be interpreted as a list of elements separated by spaces. An interation command foreach can be used to perform one or more commands on each element of a list.

        solar> set friends "Ralph Bill Fred"
        solar> lindex $friends 0
        Ralph
        solar> lappend friends Sally
        Ralph Bill Fred Sally
        solar> lsearch $friends Fred
        2
        solar> foreach friend $friends {
                 puts "Hi, $friend."
	       }
        Hi, Ralph.
        Hi, Bill.
        Hi, Fred.
        Hi, Sally.

In most cases, a Tcl list may be used where you might use an array in another programming language for representing a simple aggregate to be indexed by position. Tcl also has an "array" type, but it is a more sophisticated associative array than most people are familiar with. Rather than indexing values by position, the Tcl array associates values with names, their position not being important. But since the names could also be numbers, you can still use an a Tcl array in the usual way, with numeric indexes. But in many cases the transparency of a list (it is viewable as a text string), and the simplicity of extending it with lappend and iterating it with the foreach command make the list the better choice.

Anyway, here is an example of using the Tcl array, in conjunction with the list created above:

        solar> set age(Sally) 44
        solar> set age(Fred) 45
        solar> set age(Ralph) 46
        solar> set age(Bill) 47
	solar> foreach friend $friends {
	         puts "$friend's age is $age($friend)."
	       }
	Ralph's age is 46.
	Bill's age is 47.
	Fred's age is 45.
        Sally's age is 44.

Beyond a few lines, you will probably want to put any significant Tcl programming into procedures which can be called and incorporated into other procedures. Procedures are defined with the proc command, and can look like this:

	proc times {a b} {
	    return [expr $a * $b]
	}

It is recommended, though not necessary, for every proc to have a return statement. If there is nothing particular to return, return an empty string:

		return ""

This prevents your procedure from returning something from the last statement, which might not be what you expect.

Once you have defined a procedure in a file that SOLAR loads, you can run the procedure just like any built-in Tcl command:

	solar> puts "The area is [times $width $length]."
	The area is 12.4.

Procedures can have zero or more arguments. The argument names are associated with values when the procedure is invoked. An argument named args can be used to allow variable number of arguments. All the values are assigned to args as a list:

        solar> proc sumof {args} {
                 set sum 0
                 foreach arg $args {
                   set sum [expr $sum + $arg]
	         }
		 return $sum
               }
        solar> sumof 1 2 3 4 5
        15  

If any command in a script generates an error, the script will terminate, and an error message will be displayed. If you want to intercept such errors instead, you can enclose one or more commands inside a catch command, which returns 0 if there was no error, or 1 if there was an error, but in either case lets the script continue past the end of commands in the catch command itself:

	if { [catch {exec rm temp.out}]} {
		puts "The file did not exist"
	}

On the other hand, if you need to raise an error to terminate a script before completion, you can use the error command. The string you provide will be displayed on the terminal:

        error "This should not have happened."

If you need to continue a Tcl command past one line, you can use the \ operator at the end of the line (which escapes the newline). Blocks of commands can be enclosed within curly braces, as shown in many of the examples above.

Of course, there is much much more to Tcl than can be shown here. Get the book! Meanwhile, you can check out all the commands here.

7.5.1 Special Syntax for certain SOLAR commands

The Tcl interpreter permits every command to define its own syntax. Some SOLAR commands use a syntax which is intended to be more intuitive than that in basic Tcl. For example, consider the parameter command which defines model parameters and/or allows you to set their starting point or boundaries:

		parameter bsex = 0

The parameter command is defined so that if its second argument is an equal sign, it sets the current (starting) value to the following argument. This looks like an assignment statement as in programming languages like C or Fortran, but not Tcl. (Note that it is safest to put spaces before and after the equal sign.) Other SOLAR commands which let you use the equal sign are omega, and mu,and constraint. However, for these commands the equal sign represents a a fixed equality rather than an immediate (and temporary) assignment.

		constraint E2 + H2r = 1
		omega = pvar*(Phi2*h2r + I*E2)
		mu = Mean

The syntax allowed for each of these commands differs from what is required for Tcl commands such as expr. See the documentation for each command for more details. For now, note that parameters and matrices are listed by name without using the $ for dereferencing, and names are treated without case sensitivity. The omega and mu commands use a special SOLAR expression parser which allows either ** or ^ to indicate exponentiation, any math function defined for C, and special variables whose meaning is context dependent. The constraint command is limited to simple linear constraints because those are the only kind we are able to support.

7.6 The .solar file

Thought this is not intended to replace writing ordinary script files (as described above), you can create a file of SOLAR option-setting commands in your working directory named .solar and SOLAR will execute them when starting. For example you could have a .solar file containing the commands:

	chromosome 1-22
	finemap 1
	interval 5
	option MergeAllPeds 0
	boundary wide start

This is intended only for commands which set up the environment or options for later commands! You should not use it to perform significant work in SOLAR, such as by using the ibd, polygenic, multipoint, or maximize commands. Under some circumstances, major SOLAR commands reload the .solar file and this could cause an infinite recursion. It is not even a good idea to load models in the .solar file.

7.7 Wildcards

When you are using Unix commands at the solar> prompt or in SOLAR scripts, you will find that you can't use shell wildcards such as * directly. SOLAR uses Tcl as its command interpreter, and Tcl does not understand wildcards except in glob commands, which will be described below. Note that the character * has two special meanings in Tcl: inside an expr command it is the multiplication symbol, and inside a glob command is is the wildcard symbol.

7.7.1 Temporarily going back to your shell (interactive use only)

If all you want to do is execute some Unix command with wildcards interactively (not in a script), the easiest thing to do may be to break out of SOLAR temporarily. This works with most common Unix shells. You can simply enter the keystroke CTRL-z (pressing the Control key and the z key at the same time) and you will temporarily break out of SOLAR back to the Unix shell from which you started SOLAR. Then you can use wildcards as you usually do. Then, to return to your SOLAR session, use the shell built-in command fg (which stands for foreground). Note that CTRL-z echoes as ^Z on your terminal, and most shells then print a cryptic Unix message which you can ignore unless you are going to break out recursively. Then, when you use fg to return to SOLAR, the shell will print the name of the program you are returning to (i.e. solar) but you will not get another solar> prompt automatically. Nevermind that, you can simply enter a command anyway, or, if it makes you feel better, hit the RETURN key again and you will get another prompt. (From the point of view of SOLAR, you never entered a command to the last solar> prompt, so it has no reason to give you another prompt. The CTRL-z and everything else you entered in the meantime went back to your original shell and SOLAR itself never saw any of it.) Here's what it would look like (for shell Csh running with the default prompt on a machine named mendel):

        solar> ^Z[1] + Stopped (SIGTSTP) solar
        mendel% rm *.tmp
	mendel% fg
        solar
        load phenotypes gaw10.phen
        solar>

A similar but alternative way to get into your shell is to launch your shell again from inside SOLAR. This is easy to do if you know the name of the command which launches your shell. If you are using the C shell, the command is csh. If you are using linux, and you don't know any different, your shell is likely to be bash which is launched with the command bash. Another possibility is the Korn Shell which is launched with the command ksh. If you don't know, ask your system administrator. Once you know the command to launch your shell, the rest is easy. You simply enter that command, and then you will be running your shell, from which you can enter any command using wildcards the way you usually do. When you are done, you exit from the shell with the command exit. Then you will be back inside SOLAR with another solar> prompt. This might look something like this:

        solar> csh
        mendel% rm *.tmp
        mendel% exit
        solar>

7.7.2 Using glob (required for tcl scripts)

Using the Tcl command glob is not difficult to understand by itself, but the consequences of using it often require a little thought, and you may also need to understand a few other Tcl commands. The glob command simply returns a list of all the filenames that match the patterns provided as arguments (which may include wildcards such as * to match any string, and ? to match any single character). You could then use that list in a command such as foreach to iterate through that list and do one or more things to each of its elements:

        set tempfiles [glob -nocomplain *.tmp tmp/*]
	foreach tempfile $tempfiles {
	    catch {file delete $tempfile}
	}
The -nocomplain option prevents glob from raising an error if there are no matching files. The foreach command takes each list element in turn and assigns it to the name given as its first argument for each pass through the list of commands. The file delete command is a command built in to Tcl for deleting files. The catch prevents the loop from terminating early if there are any files protected from deletion. (Alternatively, you could have invoked the file delete command with the -force option, which would cause it to delete all files you are able to delete, and therefore generally doesn't raise any errors except in exceptional conditions that you would probably want to know about anyway.) (Note also that the Tcl file command has many variants which allow you to do many other things besides deleting files. Some of the more useful file commands have to do with extracting pathnames and extensions from filenames. The file command works on both files and filenames. See the Tcl
file documentation for more details.)

The above approach works best particularly if you are going to do more than one thing with each file, and it is also useful if you want to have the most control over what is going on, such as in not forcing the deletion of every file, but nevertheless completing the entire list of files, and deleting all the ones which are not protected from deletion. If all you want to do is forcibly delete each file, you don't need a loop to do this. But, at the same time, you can't simply pass the list of filenames to either the file delete command or its Unix equivalent, rm. The list of files looks like one big long string, and if you simply try to use it in another command, that command will only see one big name consisting of all the other names put together. For example, if [glob *.tmp] matches files a.tmp and b.tmp, it will return a list which looks like the string:

         "a.tmp b.tmp"
If that string is passed to the file delete command, it will try to find one filename with all 11 characters and a space in the middle of the name (which is possible in Unix). But this is probably not what you want it to do. Besides using the foreach command to divide up the list, it is also possible to use the eval command to operate on your command. What eval does is evaluate all the commands and dereference all the variables in your command, and then evaluate a new command with the results of all those commands and variables put together as though they were separate arguments. (This sounds a bit tricky at first, but it is ultimately one of the most useful features of Tcl to be able to do things like this.) The following command might do what we want:
        eval file delete -force [glob -nocomplain *.tmp]
All we had to do here is add the command eval in front of our command as it might have seemed we could have written it. eval will then create a command like the following command and evaluate it:
        file delete -force a.tmp b.tmp

(Note: In this example, the -force argument is used simply because if some file actually is protected from deletion, it will cause the entire command to exit, possibly before deleting all the other files. Using catch will prevent it from exiting the entire script, but it will not prevent the file delete command itself from exiting prematurely.)

Now, if instead of using the file delete we wanted to use the Unix command rm inside a script, we would have to put the exec command in front of rm so that Tcl knows we want to evaluate a Unix command. If we're not sure that any files are going to match the pattern, we had better add a -nocomplain option, and we had also better put the entire command inside a catch because rm will generate an error if it isn't actually given any names to delete (file delete doesn't generate an error in such a case). Also, we need to force the deletion of all files with the -f option of rm.

So, the resulting command looks like this:

        catch {eval exec rm -f [glob -nocomplain *.tmp]}
Once you understand what all the terms do, this isn't really that difficult, but often people want to do these things without thinking much about them, which is why the easiest way to use wildcards interactively is probably to launch a new shell to do it, as described in the previous section. Then you only have to think about all these details when you are writing scripts.

7.8 Useful built-in utility procedures

In the course of developing SOLAR, we have developed certain utility procedures which are helpful in writing SOLAR scripts. (These are in addition to the extensive set of utility commands provided by Tcl.) The most useful of these are documented and shown in the command listing. Click on the links below for more details:

tablefile Read data file in comma delimited or PEDSYS format
solarfile Read data file as with "tablefile" but applying "field" name mapping
putsout Write message to terminal and/or file
drand Return a random floating point number between 0 and 1
chi Compute probability for a chi-square value
chinc Compute probability for a noncentral chi-square value
alnorm Evaluate the tail of a normal curve
if_parameter_exists Check if a parameter exists
read_model Return a parameter value from any saved model
read_output Read variable statistics from maximization output file
read_arglist Read hyphenated optional arguments and argument-value pairs
is_nan Check if a value is NaN (Not a Number)
if_global_exists Check if a Tcl global variable exists
remove_global Remove a global variable (so it no longer exists)
catenate Concatenate strings
string_imatch Case insensitive string match testing
setappend Append only new elements to a list (keeping it like a set)
remlist Remove element from list
stringsub Simple verbatim string substitution (not regsub)
full_filename Prepend the maximization output directory name to a filename
clod Calculate a LOD score
tclgr Create xmgr session with pipe connection to SOLAR
stats Get and/or show statistics for any variable in a file
combinations Make a list of all combinations (sets) of integers 1..N

7.9 Displaying built-in procedures with showproc

Many of the commands in SOLAR are actually implemented as SOLAR scripts themselves. All the standard SOLAR scripts are defined in the file solar.tcl found in the lib subdirectory of your SOLAR installation.

You can examine the standard scripts using the showproc command. By itself, this will simply display the entire procedure definition of any SOLAR command which is defined by a script. If the script is more than one page long, it will be displayed using the more pager, which shows only a page at a time and lets you advance to the next page by pressing space. For example:

        solar> showproc twopoint

The formatting shown by showproc may not be as pretty as it actually is in the source file because it will concatenate lines which are extended by using backslash. showproc is based on the Tcl command info body which has this feature.

You can also write copies of SOLAR procedures to files in your working directory. You can then edit the procedures to fit special requirements. To prevent any confusion with the built-in commands, the name of the newly created procedure is suffixed with .copy, regardless of the name you choose for the output file. If you would rather not use the .copy suffixed name, you can change that when you are editing the copied procedure. (But do not attempt to give the procedure the exact same name as a built-in procedure!)

        solar> showproc twopoint twopoint.tcl
        solar> newtcl
        solar> twopoint.copy

SOLAR is designed so that even if you were to create procedures with the same names as built-in procedures, they would be ignored, and the built-in procedures would be used anyway. Otherwise, you could foul up SOLAR operation unintentionally. Even if you were careful not to create procedures with any documented command names, you might create a procedure with the same name as some undocumented internal procedure.

If you must know how this works, read this paragraph. The scripts in directory containing the active solar.tcl file are given the highest precedence. So, actually, you could overcome the safeguard by simply copying the solar.tcl file to your working directory, then all the scripts in your working directory will have the highest precedence. But in that case, you might as well edit the entire solar.tcl file in your directory. If you were to copy solar.tcl into your working directory, then make a copy of a built-in procedure using the showproc command, and edit that copy to remove the .copy suffix, you would have two copies of the same script name in the highest priority directory. This ambiguity would be resolved in a somewhat unpredictable way: by which file is found first while traversing the directory. This is not necessarily the alphabetic order in which files are displayed by the ls command. To see which file's version of a script is actually being used, take a look at the tclIndex file created by SOLAR in your working directory.