Distance Correlation (DCOR) Distance correlation coefficient is a very useful and elegant alternative to the standard measures of correlation and is based on several deep and non-trivial theoretical calculations developed by Székely, Rizzo and Bakirov [1]. The main result is that a single, simple statistic DCOR(X,Y) can be used to assess whether two random vectors X and Y, of possibly different respective dimensions, are dependent, linearly or non-linearly, based on an independent and identically distributed (i.i.d.) sample. * Menu: * Commands:: Invoking DCOR * Background:: An introduction to distance correlation * References:: Relevant citations * Example:: Outline of an example input script
DCOR Commands DCOR can be invoked from CORREL using the keyword DCOR to calculate dependence between two time series, which can be of different respective dimensions. Time series can contain any variable. The only requirement is both the time series should be of equal length. setup trajectory file correl maxt ... maxa ... maxs … setup first time series setup second time series traj trajectory sepcifications dcor time-series1 time-series2 end DCOR can also be invoked from CORMAN as a part of “COORdinate COVAariance”, using the keyword DCOR (DCOV), to calculate distance correlation (covariance) between positional fluctuation of two selection of atoms. COORdinates COVAriance traj-spec 2x(atom_selection) [UNIT_for_output int] - [RESIdue_average_nsets integer] [MATRix] - [ENTRopy [TEMP <real>] [DIAG] [RESI] [SCHL] ]- [DCOR] [DCOV] If DCOR or DCOV has been requested with ENTRopy, ENTRopy command will be ignored. If both DCOR and DCOV have been requested, then DCOR will be ignored. Parallel: When using CORREL parallelization in calculating distance correlations can be achieved by submitting multiple serial jobs. Using COORdinate COVAriance distance correlations between positional fluctuations of multiple atoms can be calculated using parallel version of CHARMM. Benchmark for 352x352 atoms, 500 steps dynamics, using 32 core AMD box: CPUs DCOR (charmm c40) time (sec) speedup efficiency 1 1443.0 1.00 100% 2 750.0 1.92 96% 4 366.6 3.94 98% 8 189.6 7.61 95% 16 93.9 15.37 96% For better parallelization of DCOR using COORdinate COVAriance, select the larger number of atoms using the second selection. Time requirement for DCOR calculation between two time series increases as N*N where N is the length of the time series.
Introduction to Distance Correlation Among Pearson’s correlation coefficient (PCC), a generalized correlation coefficient (GCC) 16 and distance correlation coefficient (DCOR), DCOR is the most appropriate parameter to find association in atomic motions because it is least sensitive to angular dependence while reflecting variability in covariance. [2] Calculation of DCOR between two vector series is straightforward to implement. Let {A} and {B} be two vector series with m entries each and the ith entry in {A} is denoted by A_i . To calculate distance covariance between {A} and {B} the following five steps are needed. 1) Calculate the m x m matrix, a, from {A}, where a_ij is the Euclidean distance between the ith and jth entries of {A}: a_ij = a_ji =| A i − A j | 2) Average the rows of a: a_i. = 1/m sum_j(a_ij) 3) Average the columns of a: a_.j = 1/m sum_i(a_ij) 4) Average all elements of a: a_.. = 1/(m*m) sum_ij(a_ij) 5) Build the m x m matrix alpha from a where alpha_ij = a_ij − a_i. − a_. j + a_.. Then the distance covariance is DCOV(A, B) ≡ sqrt(1/(m*m) sum_ij(alpha_ij * beta_ij)) where beta_ij is defined similarly from B. The distance correlation coefficient, DCOR, is defined as DCOR = DCOV(A,B)/sqrt(DCOV(A,A) * DCOV (B,B)) DCOR was found to capture both linear and non-linear correlation between positional vectors [2] and was able to reveal long-distance concerted motions in a protein that was not revealed by PCC or GCC. [2]
References [1] Measuring and testing dependence by correlation of distances GJ Székely, ML Rizzo, NK Bakirov The Annals of Statistics 35.6 (2007): 2769-2794. [2] Detection of Long-Range Concerted Motions in Protein by a Distance Covariance Amitava Roy and Carol Beth Post J. Chem. Theory Comput., 2012, 8 (9), pp 3009–3014
Input File 1) CORREL input for DCOR between positional fluctuation between 2 CA atoms: open read file unit 31 name traj1_file open read file unit 32 name traj2_file correl maxt 5000 maxa 4 maxs 6 enter a atom XYZ sele ires 1 .and. type CA end enter b atom XYZ sele ires 20 .and. type CA end traj firstu 31 nunit 2 dcor a b end Output: CORREL> dcor a b DCOR> VAR1 = 19.868 VAR2 = 18.500 COVAR = 14.777 CORR = 0.771 CORREL> end VAR1,VAR2 – distance variances COVAR – distance covariance between time series “a” and “b” CORR – distance correlation between time series “a” and “b” 2) COORdinate COVAriance input for DCOR between positional fluctuations between all CA atoms: open read file unit 31 name traj1_file open read file unit 32 name traj2_file open write card unit 41 name matrix_file coor cova firstu 31 nunit 2 sele type CA end sele type CA end unit 41 dcor Output options are identical with COOR COVA output options.
CHARMM Documentation / Rick_Venable@nih.gov