Documentation for Rolloff exponential fit and jackknife: expfit.sh is bash script that can be used to fit an exponential distribution to the output of ROLLOFF and for inferring the jackknife mean and standard error of the estimated date. This program uses the nls function in R to determine the nonlinear (weighted) least-squares estimates of the parameters of a nonlinear model. It also uses the DEoptim package to pick the initial values of all the parameters (Katharine Mullen, David Ardia, David Gil, Donald Windover, James Cline (2011). DEoptim: An R Package for Global Optimization by Differential Evolution. Journal of Statistical Software, 40(6), 1-26. URL http://www.jstatsoft.org/v40/i06/). More information on these functions and packages can be found at- http://stat.ethz.ch/R-manual/R-patched/library/stats/html/nls.html http://cran.r-project.org/web/packages/DEoptim/index.html To run the program, do the following - 1. Run exfit.sh $BIN/expfit.sh parfile where parfile = parameter file used for Rolloff. 2. This will run rexpfit.r program (see README.REXPFIT for details) which fits an exponential distribution to the output of Rolloff using non-linear least squares. The program takes the following parameters which can be set/ changed in expfit.sh - input: ROLLOFF Output file name. As input the program takes the output of ROLLOFF. output: Filename for the output of expfit. output_col: column # of the output file which has correlation values. Default is 4. lval: Lower (starting) value of genetic distance. Default is 0.5cM. hval: Higher (end) value of genetic distance. Default is 100cM affine: TRUE/ FALSE. Default is TRUE. plot: TRUE/ FALSE. plot the output jackknife: TRUE/ FALSE - jackknife option used in roloff or not. Output files: The program creates following output files: output.log: contains the summary of the fit fit_output: contains the fitted values generated by the model output.pdf: pdf file which shows the ROLLOFF output and the fitted values. output.jin: Summary of jackknife results. output.jout: Jackknife mean and standard error. where, output = output file name used in the expfit function call. Details of the file names are published on std out when the program is run. See example shown in- $BIN/expfit.sh parfile examples/expfit.log Important points to consider: 1. If you get errors/warnings related to nls, try to rerun expfit.sh so that nls can find an optimal solution. The errors/ warnings are indicating that there was an issue with convergence. 2. Please check the output to ensure that the exponential fit visually provides a good fit to the data. One can also compute the mean square error to check this. 3. Please use an accurate genetic map for the analysis. Inaccuracy in the genetic map can affect the results, especially of older dates of mixture. 4. Rolloff estimates a date with the assumption of a single wave of mixture. If the underlying admixture model is in fact multiple pulses of admixture or continuous gene flow then the estimated date in Rolloff would lie within the time period spanned by the admixture events. Details of simulations exploring different models of admixture can be found in our Moorjani et al. 2011. 5. Typically, we look at the relationship of admixture LD with distance >0.5cM to avoid issues related to background LD. However, in cases of founder events post admixture, this cut off might not be appropriate. We find that the estimated dates in Rolloff can be confounded by founder events. Hence if you are studying a founder population, please perform appropriate simulations before applying Rolloff to real data. 6. While very accurate ancestral populations are not required for running Rolloff, the results are hard to interpret if highly divergent reference populations are used for the analysis. So we recommend using Rolloff with populations that are closely related to the ancestral populations as the references. COMMON ERRORS/ FIXES: 1. ERROR: ../bin/expfit.sh: line 60: Rscript: command not found FIX: This usually implies that you need to install R on your computer or add it to your path. R can be downloaded from - http://www.r-project.org/ 2. ERROR: Installing package(s) into ‘/usr/local/lib/R/site-library’ (as ‘lib’ is unspecified) Warning in install.packages("DEoptim") : 'lib = "/usr/local/lib/R/site-library"' is not writable Error in install.packages("DEoptim") : unable to install packages FIX: You might get an "install.package error" if you do not have permission to download the package to the root. The simplest way to fix this is to connect to R interactively and install the DEoptim package using the commands below. If the problem persists, please contact your system administrator to get permission to install the package at the root or install it in your own local directory. options(repos=structure(c(CRAN="http://cran.cnr.Berkeley.edu"))) if (!("DEoptim" %in% rownames(installed.packages()))) { res <- try(install.packages("DEoptim")) } Check installation with - library("DEoptim") (If you get no errors, please rereun expfit.sh) 3. EROOR: Warning / errors related to "nls" Warning message: In nls(wcorr ~ (C + A * exp(-m * dist/100)), start = par1, control = list(maxiter = 1000, : singular gradient FIX: This suggests that there was an error related to convergence in NLS. Rerun expfit.sh so nls can find a more optimal solution. ------------------------------------------------------------------------------ Questions? email Arti Tandon, atandon@broadinstitute.org