Adaptively Biased Path Optimization (ABPO) Method He Huang Bradley M. Dickson Carol B. Post * Menu: * Description:: Introduction and algorithm * Syntax:: Description of commands and options * Remarks:: Implementation remarks * Examples:: Usage examples * References:: References
Introduction and Algorithm. ABPO is a path optimization method that works in collective variable (CV) space. It combines the optimization algorithm of the finite temperature string (FTS) method and the sampling power of an adaptive biasing potential (ABP) scheme. In a multi-dimensional CV space, it finds transition paths between two meta-stable regions, and provides the potential of mean force (PMF) along the path. When applied to a one-dimensional CV space, it can be used as a pure sampling approach to evaluate the PMF associated with the single CV. For path optimization, ABPO uses the iterative scheme of the FTS method to evolve an initial path to the principal curve associated with the system. In each iteration, the vicinity of the current path is sampled, and the mean positions of the sampling on the hyperplanes perpendicular to the path are calculated. The path is then updated to the curve connecting these mean positions. At convergence, the path recovers the principal curve. Within each iteration, sampling is carried out with an ABP scheme which is both flexible and efficient. Unlike the original FTS method, which performs independent samplings each restricted to a discrete hyperplane or cell perpendicular to the path, ABPO allows dynamics trajectories to diffuse along the path. It only restricts the sampling dynamics within the tube-shaped space around the path by applying a one-sided harmonic tube-wall potential. Moreover, the diffusion in the direction of the path is accelerated by the following adaptive biasing potential: Vb = kb * T * b / (1 - b) * ln[c * h(lambda, t) + 1], (1) where h is a mollified/smoothed histogram counting the samples that fall around each slice of the path tube parameterized by lambda, the arc-length along the path. Vb learns from the sampling history recorded in h to offset the original potential of mean force (PMF) of lambda. The parameter b ranges from 0.0 to 1.0 and controls the bias strength. At convergence, Vb approaches -b times the PMF of lambda. The parameter c controls how tightly Vb and h are coupled. Specifying a finite number for c is equivalent to 'pre-deposit' some visits or time evenly into all bins of the histogram to avoid fast change of Vb at the beginning of the sampling. The sampling is further aided by running multiple dynamics trajectories/replicas in parallel. At the beginning of each iteration, ABPO launches a number of independent adaptively biased dynamics trajectories, which proceed in blocks of timesteps. At the end of each block, the histograms from all trajectories are pooled together to check whether all tube slices have been sampled for at least a certain amount of time. When the sampling threshold is reached, the path is updated to the combined mean positions and the PMF is calculated from the combined histogram. As in FTS, ABPO treats the collective variable space as a skewed space in which distance is measured with respect to the inverse of a pseudo-diffusion tensor D. Such a treatment allows the principal curve to be invariant upon a change of variable within the collective variable space. In the current scheme, we assume D as constant and evaluate it with unbiased dynamics before initiating any path optimization. The following flow chart summarizes the optimization algorithm: 1. Current path = initial path. 2. While not converged, repeat: 2.1. Launch n independent ABP trajectories in the current path tube. 2.2. While sampling threshold not reached, repeat: 2.2.1. Run N steps of biased dynamics with each trajectory. 2.3. Pool samples from all trajectories together, calculate PMF from the combined histogram, update the current path to the mean sampling positions on the tube slices.
Description of Commands and Options As mentioned above, multiple trajectories/replicas may be used to enhance the sampling process. ABPO takes use of the communication layer provided by the ensemble module to realize inter-replica communication. It is therefore implemented as a subcommand of the ENSEmble command. When compiling CHARMM, the ABPO method is built by using the ABPO keyword, e.g. $> ./install.com gnu medium M ABPO The ABPO keyword activates ENSEmble automatically, and therefore requires the M keyword. Before running ABPO, the NENSem keyword of the ENSEemble command needs to be called to specify the number of replicas. Note that NENSem has to be a factor of the total number of processes. Following is the Syntax of the ABPO command: ENSEmble ABPO { SETCv repeat ( cv-spec ) } { DTNS dtns-spec } { OPTI opti-spec } It has three sub-commands, each associated with one of the following three steps involved in a complete optimization process: (1) Define collective variables. (2) Evaluate the D tensor. (3) Optimize the path. These sub-commands and available options are describe below: --------------------------------------------------------------------------- ENSEmble ABPO SETCv repeat ( cv-spec ) cv-spec = { DIST NP int repeat ( real atom-spec x 2 ) } { DSEL atom-selection x 2 } { ASEL atom-selection x 3 } { TSEL atom-selection x 4 } atom-spec = { segid resid iupac } --------------------------------------------------------------------------- This command defines the collective variables (CVs), each specified with a cv-spec statement. In the current version, a CV can be one of the following types: [DIST] specifies a CV as a linear combination of a set of internal distances: CV_dist = sum_i c_i * d_i [NP int] specifies the number of distances that go into the linear combination. Each individual distance is specified by a [real atom-spec x 2] statement, which gives the coefficient c_i as a real number, and specifies the two atoms that define d_i. [DSEL] specifies a CV as the distance between the centers-of-mass of two atom-selections. The distance is represented in the unit of angstroms. [ASEL] specifies a CV as the angle formed by the centers-of-mass of three atom-selections. The angle is represented in radius and ranges from zero to pi. [TSEL] specifies a CV as the torsion angle formed by the centers-of-mass of four atom-selections. The torsion angle is represented in radius and ranges from -pi to pi. --------------------------------------------------------------------------- ENSEmble ABPO DTNS dtns-spec dtns-spec = [DSTEps int] [DFRQ int] --------------------------------------------------------------------------- This command sets up options for evaluating the D tensor. Before invoking DTNS, at least one CV has to be defined. The following table describes the options that apply to DTNS: Keyword Default Purpose DSTEps 1000 Length of dynamics in timesteps for evaluating D. DFRQ 100 Interval in timesteps at which the instantaneous value of D is evaluated. While the DTNS command sets up D evaluation options, it does not trigger any dynamics run. The dynamics run will be triggered by the next DYNAmics command (see dynamc.doc). The NSTE option of the DYNA command will be ignored and the number specified by DSTEps in the DTNS command will be used as the number of timesteps. The IUNC and IUNW options will also be ignored. When the dynamics run finishes, a file named invd.dat will be generated under the current directory, which stores the inverse of the evaluated D tensor. Running the DTNS command generates the following data under the current directory: - [FILE] invd.dat: inverse of the D tensor - [DIR] eval_d: directory that includes: - [FILE] repXXX.dcd: trajectory file for replica XXX - [FILE] repXXX.rst: restart file for replica XXX --------------------------------------------------------------------------- ENSEmble ABPO OPTI opti-sepc opti-spec = [REST] [BCYC int] [ECYC int] [TEMP real] - [MNBL int] [BSTE int] [MINC real] [SMOO real] - [NPNT int] [MOLL real] [RTUB real] [FTUB real] [LOOP] - [BFCT real] [CFCT real] [CVFR int] --------------------------------------------------------------------------- This command sets up options for path optimization. Before invoking OPTI, the CVs have to be defined and the invd.dat file has to be present under the current directory. The following table describes the options that apply to OPTI: Keyword Default Purpose BCYC 1 Index of the starting cycle. ECYC BCYC Index of the ending cycle. REST false When present, restart a previous optimization run. Load restarting information from the path.snap file under the directory for BCYC. TEMP 300.0 Temperature. Should be the same as the dynamics temperature. MNBL 20 Maximal number of dynamics blocks to run in each optimization cycle. BSTE 1000 Number of timesteps to run in each dynamics block. MINC 100 Sampling threshold in timesteps. Update the path and enter the next cycle when the histogram reaches this value everywhere along the path. NPNT 100 Number of points the path is discretized into. MOLL 0.05 Ratio of the standard deviation of the Gaussian mollifier used to smooth the histogram to total path length. RTUB 1.0 Radius of the path tube in (angstrom g^{1/2}). Note that distance is measured with respect to D^{-1}. FTUB 1.0 Force constant of the tube wall potential in (kcal/g/angstrom^2). Again, note that distance is measured with respect to D^{-1}, therefore FTUB should be adjusted according to D^{-1} to achieve desired restraining strength. Rule of thumb is to use a smaller FTUB when the norm of D^{-1} is large, and vise versa. LOOP false When present, treat the path as a loop, i.e., treat the two end points as neighbors. BFCT 0.8 Strength of the adaptive bias, which ranges from 0.0 to 1.0. Same as the parameter b in Eqn (1). The dynamics is unbiased with value 0.0, analogous to metadynamics with value 1.0, and analogous to well-tempered metadynamics with values in between. (see Ref [2]) PRED 1000 Total timesteps that are 'pre-deposited' into the histogram. This number divided by NPNT is the time deposited into each tube-slice at the beginning of each cycle. SMOO 0.05 Ratio of the standard deviation of the Gaussian mollifier used to smooth the path to total path length. CVFR 100 Frequency for outputing the positions in the collective variable space Like the DTNS command, the OPTI command does not trigger any dynamics run. Instead, the dynamics run will be triggered by the next DYNAmics command. The NSTE option of the DYNA command will be ignored and the length of the dynamics will be controlled by options specified with the OPTI command. Running the OPTI command generates the following data under the current directory: - [DIR] cycXXX: directory for cycle XXX, includes: - [FILE] mean.dat: mean sampling positions on all tube slices - [FILE] smean.dat: mean.dat after smoothing, initial path for next cycle - [FILE] pmf.dat: PMF with respect to path arc length - [DIR] repXXX: directory for replica XXX, includes: - [FILE] blockXXX.dcd: trajectory file for dynamics block XXX - [FILE] blockXXX.rst: restart file for dynamics block XXX - [FILE] blockXXX.cv: trajectory in the collective variable space
Implementation Remarks The ABPO module does not supply a standard mechanism to detect convergence of the optimization process. The user may monitor the progress by plotting any meaningful metric of the distance between the current path and each previous path (For example, see Ref [1]). Ideally, such distance curve should go to zero while convergence is approached. In practice with the presence of sampling error, the distance curve approaches a non-zero plateau, the height of which depends on the specific system as well as sampling accuracy. In the current version, the OPTI command can only be called for once in a single CHARMM run. If there is a need to use multiple OPTI commands, for example, when optimization with first a larger then a smaller RTUB is desired, do it in two seperate runs.
Usage Examples Note: to keep things easy, ABPO always assumes the current directory, i.e, the directory where CHARMM is launched as the working directory. Input files (init.dat and invd.dat) will be searched there, and all output files will be directed there. Make sure to 'cd' into the desired working directory before running CHARMM. 1) Optimize a path for the transition of the alanine dipeptide between its configurations C7eq and C7ax with four replicas. init.dat specifies a straight line, and reads: -1.45 1.30 1.22 -1.22 !----------------------------------------------------------------------------------- ! Load parameter, topology, psf and coornidate files ! Setup nonbond interactions set temp 300 scalar fbeta set 5 sele all end ensemble nensem 4 ensemble abpo setcv - tsel sele type clp end sele type nl end sele type ca end sele type crp end - tsel sele type nl end sele type ca end sele type crp end sele type nr end ! Evaluate the D ensemble abpo dtns dsteps 50000 dyna leap lang tbath @temp timestep 0.001 - nsavc 100 nprint 1000 IPRFrq 10000 ! Run path optimization ensemble abpo opti - temp @temp bcyc 1 ecyc 10 npnt 500 moll 0.05 bfct 0.90 pred 500 - rtube 0.2 ftube 50.0 smooth 0.05 - mnblock 40 bstep 10000 minc 50 dyna leap lang tbath @temp timestep 0.001 - nsavc 100 nprint 1000 IPRFrq 10000 2) Restart the calculation of 1) and optimize for ten more cycles !----------------------------------------------------------------------------------- ! Load parameter, topology, psf ! Coornidate will be loaded from the .rst files saved under cyc010/repXXX ! Setup nonbond interactions set temp 300 scalar fbeta set 5 sele all end ensemble nensem 4 ensemble abpo setcv - tsel sele type clp end sele type nl end sele type ca end sele type crp end - tsel sele type nl end sele type ca end sele type crp end sele type nr end ! No need to run dtns again because invd.dat is already generated in the previous run ensemble abpo opti rest bcyc 10 ecyc 20 dyna leap lang tbath @temp timestep 0.001 - nsavc 100 nprint 1000 iprfrq 10000 3) Calculate the pmf of two sodium ions in vacuum as a function of the distance between them. Use ten replicas. init.dat specifies the distance range, and reads: 2.00 10.00 !----------------------------------------------------------------------------------- ! Load parameter, topology, psf and coornidate files ! Setup nonbond interactions set temp 300 scalar fbeta set 5 sele all end ensemble nensem 4 ensemble abpo setcv - tsel sele type clp end sele type nl end sele type ca end sele type crp end - tsel sele type nl end sele type ca end sele type crp end sele type nr end ! For one-dimensional CV space, 'ensemble abpo dtns' may be ommitted by ! manually creating a file named invd.dat that holds a single number 1.0. ensemble abpo opti - temp @temp bcyc 1 ecyc 1 npnt 500 moll 0.05 bfct 0.90 pred 500 - rtube 0.0 ftube 10.0 smooth 0.05 - mnblock 40 bstep 100000 minc 500 dyna leap lang tbath @temp timestep 0.001 - nsavc 100 nprint 1000 iprfrq 10000 Testcase: c38test/abpo.inp
References [1] Dickson, B. M.; Huang, H. & Post, C. B. Unrestrained computation of free energy along a path. J Phys Chem B, 2012, 116, 11046-11055 [2] Dickson, B. M. Approaching a parameter-free metadynamics. Phys Rev E, 2011, 84, 037701