Parameter Fitting Procedure By Victor Anisimov (victor@outerbanks.umaryland.edu) and Alex MacKerell Jr. (alex@outerbanks.umaryland.edu); December 2007 FITPARAM is a parameter fitting procedure that is primarily designed to fit partial atomic charges and atomic polarziabilities based on the Drude oscillator model to interaction energy data and dipole moments, though it may be applied to fitting of other parameters (see below). It supports optimization of multiple parameters for a series of model compounds sharing common parameter sets. Different weights can be assigned to different target data. Optimized parameters can be restrained to their corresponding initial values by using a parabolic penalty function. FITPARAM performs non-linear least square fitting using the Levenberg-Marquardt algorithm. * Menu: * Introduction:: Overview of functionality * Syntax:: Syntax of commands * Keywords:: Description of keywords * Format:: File format * Example:: Input examples * Limitations:: Known limitations
Overview of functionality The primary purpose of FITPARAM is charge derivation using various types of interaction energy as a target data. In this goal it complements the functionality of the FITCHARGE module which derives charges solely from fitting to electrostatic potentials (see fitcharge.doc). However, FITPARAM is not limited to charge derivation and it can be used to optimize any sort of parameters using any sort of relevant target data. Although implemented in CHARMM the FITPARAM module is in fact a stand-alone procedure. The user is not required to load topology and generate a molecule and if loaded this information will be ignored. The command FITPARAM operates with generic parameters without the awareness whether these are point charges, atomic polarizabilities or dihedral force constants. Thus, any parameter may be optimized using FITPARAM based on basic least square fitting. Correspondingly, calculation of the target property and its gradient has to be performed separately by external routines. Preparing the necessary scripts to accomplish this task becomes the user assignment. Despite this obvious inconvenience, such architectural design is necessary to allow for flexibility in the target data used, which otherwise would not be accessible for the parameter optimization. It also makes possible parameter fitting for a variety of molecules to be performed simultaneously. In order to start parameter optimization FITPARAM needs an initial guess for parameters. It reads the parameter values from the input file, performs a cycle of least square optimization and saves the updated parameters in the output file. Then FITPARAM executes the external job, which performs target property and gradient calculations using the current parameter values and returns the control back to FITPARAM. These iterations are continued until convergence criterion is satisfied. Using FITPARAM to optimize charges requires the user to preserve the total charge of the optimized molecule. This can be accomplished by excluding one charge from the fitting and computing the value of the excluded charge in the external procedure as the difference in total charge of the molecule and the sum of optimized charges. In general it is not important which atomic charge to exclude from the fitting. The general suggestion is to exclude that charge which appears least important in reproducing the value of the target data. Dipole moment information can be provided to FITPARAM in two different ways. A dipole may be presented in scalar form as the total dipole moment or in vector form representing three dipole moment components. These two ways of dipole moment specification are handled independently by FITPARAM. When dealing with multiple molecules each molecule may be represented by its own total dipole moment and dipole components. The total dipole moment value may be defined as the gas-phase experimental value, or QM computed value, or manually defined value the user is targeting as the result of the fitting. Providing manually increased (or HF/6-31G* computed) dipole moments is a standard practice when deriving charges for a non-polarizable model. Providing the dipole moment in vector form has the purpose of minimizing the angle between the model and target dipole moment vectors.
Syntax of commands [SYNTAX FITPARAM - parameter optimization] FITPARAM { [NITEr int] [TOLErance real] [COUNter int] NEXPeriments int NPARameters int GUESs int EXPData int PARM int [EXPWeight int] [RESTraints int] { [ANTOINE] [ [NDIPoles int] [FVALues int REXternal str [MULTiplicity int] ] } }
Description of keywords NITEr - (optional) maximum number of Levenberg-Marquardt (least square) iterations. Default value NITE=100. TOLErance - (optional) relative tolerance on the convergence of the minimized function. Default value TOLE=1.0E-4. COUNter - (optional) is number of iterations under the tolerance it needs to converge. Default value COUN=2. Greater values can be used to assure the convergence. NEXPeriments - (mandatory) number of target data. Care must be taken when handling dipole moments which can be provided in scalar (single value for total dipole) or vector form (three dipole component values in a row). Three components of a vector are considered as one target value for NEXP keyword. Total dipole moment value counts independently as one experimental value. This is also discussed in the examples section. NPARameters - (mandatory) number of parameters to optimize. GUESs - (mandatory) file unit number for initial parameter guess. EXPData - (mandatory) file unit number for target ("experimental") data. PARM - (mandatory) file unit number to save optimized parameter values. EXPWeight - (optional) file unit number for individual weights assigned to each target value. Default value is 1.0 for each target value. RESTraints int - (optional) file unit number for individual restraint factors assigned to each optimized parameter. Default value is 0.0, which means free fitting without restraints. Following keywords are exclusive. ANTOINE - (mandatory if Antoine Constant calculation is expected) switches on Antoine parameter fitting to find analytical dependence in experimental vapor pressure - temperature dependence. This information is useful for calculation of experimental heat of vaporization. or NDIPoles - (mandatory, if dipoles are provided in vector form) number of dipole moments specified in vector form (X Y Z form). Default value is 0. Note, NDIP keyword counts only the number of dipoles provided in vector form (X Y Z components). Correspondingly, NDIP does not count the number of dipoles provided in scalar form (total dipole moment). FVALues - file unit number. This file provides function values and gradients to FITPARAM. The data in this file have to be computed externally using updated parameter values (which are saved in PARM unit by FITPARAM). FITPARAM needs current target function values and gradients to perform next parameter optimization iteration. REXternal - external procedure provided in the form of a string enclosed in double quotes. This string will be executed by CHARMM. This external procedure is in charge of calculating current function values and gradients. MULTiplicity - (optional) file unit number. The data in this file specify th multiplicity of the optimized parameters. Default value of multiplicity for each parameter value is 1. FITPARAM computes the total charge using the information about charge multiplicity. This data makes no influence on the progress of parameter optimization and is implemented for debugging purpose only. For example, three hydrogen atoms in methyl group carrying the same charge value imply a multiplicity of 3. Correct accounting for charge multiplicity will help FITPARAM to provide meaningful information about total charge of the optimized molecule. Non-charge parameters should be assigned multiplicity 0. This is particularly helpful to when FITPARAM optimizes charges simultaneously with non-charge parameters.
File format GUESs unit Includes NPAR number of lines. Format: string (up to 20 characters before real number is encountered), real (recognized by decimal point). Example: "A 17.81671". The string portion of the data will be printed by FITPARAM as the parameter name. One may use the string portion to do some useful work for external job, e.g. "set A 17.81671" would turn the parameter file into a functional CHARMM script file. EXPD unit Includes NEXP number of lines. Format: one real value per string. (Exception is Antoine input file which contains two real numbers in free format. First value is temperature; the second one is vapor pressure.) Total dipole moment is defined by one value. Dipole moment components are specified by three numbers in free format. Example: 1.63 ! dipoleTotal MP2 value -1.5155 0.1868 -0.5583 ! dipoleXYZ Comments can be placed after exclamation mark. This information will be ignored by FITPARAM. PARM unit Includes NPAR number of lines. Format: A20,F16.8 This is an output file which will be created by FITPARAM. FVAL unit Includes NEXP + NEXP * NPAR number of lines. Format: The format is free. This file is created by external job and read by FITPARAM. The external job creating this file has to take care of the following requirements. One real number per string is expected for scalar values. Three numbers are provided for a vector value. First NEXP data are the function values which are computed for NEXP target data using the current parameter values. The order of computed values must be the same as in the EXPD file. Next NPAR * NEXP data are partial derivatives computed in the order of experimental data (first running index) and parameter data (second running index). Example: E1 E2 E3 dE1/dp1 dE2/dp1 dE3/dp1 dE1/dp2 dE2/dp2 dE3/dp2 Where E1, E2, E3 are target (experimental) data; p1 and p2 are optimized parameters. Derivatives of vector properties are provided by three real numbers in a row (see example below). EXPW unit Includes NEXP number of lines. Format: one real number in a row. MULT unit Includes NPAR number of lines. Format: one integer number in a row. REST unit Includes NPAR number of lines. Format: one real number in a row.
Input Examples Two examples given in this section illustrate basic functionality of FITPARAM. First example covers optimization of Antoine function parameters. This is an example where the function value and gradient computations are implemented inside the FITPARAM module so there is no need to call an external procedure. Therefore this is an exception to the standard use of FITPARAM. This example is discussed here because of its simplicity and because it is included in CHARMM test case (see antoine.inp). The Antoine function is a simple parametric analytical function which is used to describe experimental vapor pressure - temperature dependence. It has the following form: lnP = A + [B / (T + C)], where A,B, and C are fitted parameters, P - pressure, T- temperature. Having this function gives the opportunity to compute derivative of pressure over temperature which is necessary to compute the heat of vaporization of the pure liquid. Following is the content of the antoine.inp script: open unit 11 read form name antoine.ini ! initial guess for parameters open unit 12 read form name antoine.exp ! experimental (target data) open unit 13 write form name antoine.prm ! storage for optimized parameters FITPARAM - NITE 50 - ! maximum number of iterations TOLE 0.001 - ! chi^2 convergence threshold COUN 2 - ! number of consecutive successful steps before convergence NEXP 8 - ! number of experimental data NPAR 3 - ! number of parameters to fit (2 or 3 Antoine coefficients) ANTOINE - ! Antoine coefficient fitting GUES 11 - ! input: initial guess for parameters EXPD 12 - ! input: data to fit to PARM 13 ! output: file to store optimized parameters stop The computation starts from opening two input files antoine.ini and antoine.exp, which contain initial guess for parameters and target experimental data, respectively. Following is the content of antoine.ini file: A 17.81671 B 4705.03330 C -60.75000 Three parameters, NEXP=3, will be optimized starting form the above values. The antoine.exp file contains the following data: 393.15 3.649359 398.15 3.877432 403.15 4.076690 408.15 4.264087 413.15 4.461877 418.15 4.651099 423.15 4.825109 428.15 5.018603 Here we have 8 experimental data, NEXP=8, representing temperature - vapor pressure data. Following parameter values are obtained after the execution of the above script: A 17.84653417 B 4706.72855119 C -61.45937115 The second example illustrates how to set up FITPARAM calculation for a charge parameter optimization calculation. The example is based on hydroxyl charge optimization in ethanol targeting interactions with water, which is a standard charge derivation procedure in the additive CHARMM force field. In the example the alkane charges in methyl group are constrained to their standard values. The methyl group is also kept electro-neutral. Two lone pairs (OLP) are assigned to oxygen atom in this example; therefore the oxygen atom is represented by two point charges (qOLP). Note, the central O atom carries zero charge. The charges to be determined are qC, two qHC, two qOLP, and qHO. From these variables one should be excluded. This charge should be assigned manually to maintain a total charge of zero. In the present example we exclude the methylene group hydrogen atom charge: qH = -1/2 * (2*qOLP + qC + qHO). Remaining variables define NPARM=3. The lone pairs are equivalent therefore they contribute as one parameter qOLP. Correspondingly, the initial guess parameter file "parameters.ini" contains the following data: set qOLP -0.23 set qHO 0.36 set qC -0.06 Because the qOLP parameter has the multiplicity of 2 we declare this in the "multiplicity" file: group 2 1 1 Here the parameter multiplicity is preceded with the "group" keyword which indicates the program that an electro-neutral group is being treated. If the methyl group charges were also included in the fitting, then the "multiplicity" file would contain the following data: group 2 1 1 group 3 Here we optimize the charge on methyl hydrogens (which has multiplicity 3), whereas the charge on methyl carbon has to be computed externally to keep the CH3 group electro-neutral. The charges specified in the "parameters.ini" file are taken from the CHARMM22 force field. To restrain the charges to their initial values we specify "restraints" file using the following restraint factors: 0.1 0.1 0.1 The smaller the number the weaker the restraint. To choose a particular value one needs to do some experimenting. Next we define the target data in "expdata" file: -4.89 ! C-O-H angle bisector -2.51 ! along C-O line -4.80 ! lone-pair position -4.36 ! along O-H line 1.8 ! dipoleTotal MP2 = 1.6259 -1.5155 0.1868 -0.5583 ! dipoleXYZ Presented is ethanol - water QM interaction energies and the ethanol dipole. First four values in the expdata file are interaction energies for the corresponding four water orientations. The fifth value is the ethanol dipole moment set to 1.8 Debye. Note, that the MP2 dipole is 1.63. The dipole moment is purposefully increased to provide an illustrative example that we are free to define any target data we want our model to reproduce. While we are experimenting with the magnitude of the total dipole moment we certainly want to minimize the angle between the target and empirical dipole moment vectors. Therefore the sixth line of data is the dipole moment specified in vector form. Overall, we have six target data, NEXP=6, and one dipole vector, NDIP=1. Weighting of data is performed via the "expweight" file that instructs FITPARAM of the level of priority assigned to the target data: 1. 1. 1. 1. 100. 100000. Each line in this file applies to corresponding target data in the "expdata" file. The value 1.0 means a standard weight. The larger the value we specify the stronger FITPARAM will try to match the target data during the parameter optimization. One needs to make some empirical adjustments in order to define the optimal value for the each weighting factor. The FITPARAM script to run the optimization has the following form: open unit 11 read form name parameters.ini open unit 12 read form name expdata open unit 13 write form name parameters open unit 14 write form name fvalues open unit 15 read form name expweight open unit 16 read form name multiplicity open unit 17 read form name restraints FITPARAM - NITE 100 - ! max number of iterations TOLE 0.0001 - ! convergence tolerance COUN 4 - ! steps to test the convergence NEXP 6 - ! number of target data NDIP 1 - ! number of dipole moments among the target data NPAR 3 - ! number of parameters to optimize GUES 11 - ! initial guess for parameters EXPD 12 - ! target data PARM 13 - ! place to store optimized parameters between loops FVAL 14 - ! function values and derivatives REXT "./run.pl ./parameters ./fvalues" - ! script to compute derivatives EXPW 15 - ! weight of individual target data, optional MULT 16 - ! parameter multiplicity, optional REST 17 ! weight of parameter restraint, optional stop After reading initial guesses for parameters specified in the "parameters.ini" file FITPARAM will write updated parameter values in the "parameters" file (see unit PARM 13). The "parameters" file can be streamed into a CHARMM script to be run externally to compute energies and gradients using the current parameter values. This is done with the help of the REXT keyword (which stands for Run EXTernal process), which tells FITPARAM how to invoke the external process to compute the function values and derivatives. In this particular example we invoke the string "./run.pl ./parameters ./fvalues". This says that computation will be managed by perl script "run.pl" which we assume will invoke individual computations. The perl script takes two arguments. The "parameter" file contains current parameter values computed by FITPARAM. The "fvalues" file instructs "run.pl" script to save the function values and gradients in the "fvalues" file, because FITPARAM will be reading it (see FVAL 14 unit) after "run.pl" script finishes its computation. The perl script or other user selected external routine has to be written individually for each optimization job. Due to the nature of individual computations and because of the multitude of possibilities it is difficult to provide one set of rules on how to do this in the best way. Preparing such script the user should take care about writing the "fvalues" file in the right format (see above). The data in this file are rewritten with each optimization cycle. For the example shown above the "fvalues" file had the following intermediate values: -5.34805 -2.72626 -5.07762 -4.62209 1.86600 -1.72300 0.03500 -0.71400 30.85000 27.06500 30.33000 -15.22500 -14.00000 0.89123 -6.56414 -2.47438 5.55000 9.83500 5.61500 -25.45000 -4.00000 0.55868 -4.51683 -1.57192 2.59500 1.12500 2.67500 -1.37500 -3.00000 -0.09208 -0.23845 0.21026 Relating these data to the data in "expdata" file helps to reinforce the understanding of the format of the "fvalues" file.
Known limitations The present implementation does not preserve the total charge. Therefore, the user is responsible to manage this problem in designing the external procedure. Implementing the charge constraint mechanism is in future development plans.