⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 root_mean_square_deviation.htm

📁 RMSD: Root Mean Square Deviation 是一种在分子模拟及预测中很常见的评价标准
💻 HTM
字号:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">

<head>
  <title>RMSD: Root Mean Square Deviation (bosco)</title>
  <link href="http://files.infogami.com/styles/style.css" type="text/css" rel="stylesheet" />
  <link href="http://bosco.infogami.com/style.css" type="text/css" rel="stylesheet" />
  
</head>

<body>
  <left>
  <div id="contents">
    <table cellpadding="0" cellspacing="10">
      <td valign="top">
        <div id ="sidebar">

          <div class="title">
            <a href='/'>bosco</a>
          </div>

          essays and other writings...

          <p><a href="&#109;&#97;&#105;&#108;&#116;&#111;:&#x61;&#x70;&#x70;&#111;&#115;&#105;&#116;&#101;&#64;&#103;&#x6D;&#97;&#105;&#x6C;&#46;&#99;&#111;&#109;">&#x61;&#x70;&#x70;&#111;&#115;&#105;&#116;&#101;&#64;&#103;&#x6D;&#97;&#105;&#x6C;&#46;&#99;&#111;&#109;</a></p>

          <p class="login"><a href="/_account/in?path=/Root_Mean_Square_Deviation">log in</a></p>

        </div>
      </td>
      <td width="100%">
        <div id="main">
          <div id="article">

            &nbsp;&nbsp;&nbsp;
            <img width="35" src="http://static.flickr.com/47/189064390_fd5ff3eb57_o.gif">
            <img width="45" src="http://static.flickr.com/47/189064390_fd5ff3eb57_o.gif">
            <img width="60" src="http://static.flickr.com/47/189064390_fd5ff3eb57_o.gif">
            <img width="45" src="http://static.flickr.com/47/189064390_fd5ff3eb57_o.gif">
            <img width="35" src="http://static.flickr.com/47/189064390_fd5ff3eb57_o.gif">
            <img width="30" src="http://static.flickr.com/47/189064390_fd5ff3eb57_o.gif">

            <h1>RMSD: Root Mean Square Deviation</h1>

            <div class='page'><p>If it's your day job to push proteins <i>in silico</i> then you will one-day have to calculate the RMSD of a protein. You've just simulated protein GinormousA for a whole micro-second but you don't even know if GinormousA is stable. You could load up a protein viewer and eyeball the stability of the protein throughout the simulation. But eye-balling is just not going to cut it with your boss. You need to slap a number on it. This is where RMSD comes in.</p><p>A protein conformation is basically a set of 3-dimensional coordinates. This means that a conformation is a set of n vectors, x<sub>n</sub>, where each x<sub>n</sub> has three components. As the protein evolves along the trajectory, the conformation changes, resulting in a new protein conformation and a new set of vectors y<sub>n</sub>. We capture the difference between conformations through the two sets of vectors x<sub>n</sub> and y<sub>n</sub>. To make life easier, we shift the centre-of-mass of y<sub>n</sub> and x<sub>n</sub> to the origin of the coordinates (we can always shift them back afterwards). Then the difference can be captured by the square distance:</p><blockquote>  <p>E = &Sigma;<sub>n</sub> |x<sub>n</sub> - y<sub>n</sub>|<sup>2</sup></p></blockquote><p>The root mean-square distance (RMSD) is then</p><blockquote>  <p>RMSD =  &radic; E / N</p></blockquote><p>But hola, you say, the mean-square measure doesn't measure the similarity very well. Just a little rotation of the set of y<sub>n</sub>, which doesn't change the internal arrangement of y<sub>n</sub>, would distort the RMSD. What we really need, is to find the best rotation of y<sub>n</sub> with respect to x<sub>n</sub> <i>before</i> taking the RMSD. To rotate the vectors y<sub>n</sub>, we apply a rotation matrix U to get y'<sub>n</sub>  = U y<sub>n</sub>. We can then express E as a function of the rotation U:</p><blockquote>  <p>E = &Sigma;<sub>n</sub> |x<sub>n</sub> - U y<sub>n</sub>|<sup>2</sup></p></blockquote><p>To find the RMSD, we must first, find the rotation U<sub>min</sub> that minimizes E.</p><h2>Matching sets of vectors</h2><p>The classic papers of Wolfgang Kabsch showed how to minimize E to get the RMSD. The algorithm described in those papers have since become part of the staple diet of protein analysis. As the original proof for the Kasch equations using 6 different Lagrange multipliers is rather tedious, we'll describe a much simpler proof using standard linear algebra.</p><p>First we expand E (which is a function of U)</p><blockquote>  <p>E = &Sigma;<sub>n</sub> ( |x<sub>n</sub>|<sup>2</sup> + |y<sub>n</sub>|<sup>2</sup> )  - 2 &Sigma;<sub>n</sub> x<sub>n</sub> U y<sub>n</sub></p>    <p>E = E<sub>0</sub> - 2 &Sigma;<sub>n</sub> x<sub>n</sub> U y<sub>n</sub></p></blockquote><p>The first part E<sub>0</sub> is invariant with respect to a rotation U. The variation resides in the last term L = &Sigma;<sub>n</sub> x<sub>n</sub> U y<sub>n</sub>. The goal to minimize the RMSD is to find the matrix U<sub>min</sub> that gives the largest possible value of L<sub>max</sub> from which we get:</p><blockquote>  <p>RMSD = &radic; E<sub>min</sub> / N = &radic; ( (E<sub>0</sub> - 2 L<sub>max</sub>) / N )</p></blockquote><p>The trick is to realize that we can promote the set of vectors x<sub>n</sub> and y<sub>n</sub> into the {N x 3} matrices X and Y. Then we can write L as a trace</p><blockquote>  <p>L = Tr( X<sup>t</sup> U Y )</p></blockquote><p>where X<sup>t</sup> is a {3 x N} matrix and Y is a {N x 3} matrix. Juggling the matrices inside the trace, we get:</p><blockquote>  <p>L = Tr( X<sup>t</sup> U Y )   &nbsp;&nbsp;&nbsp;<i>trace of a {N x N} matrix</i> <br />  &nbsp;&nbsp; = Tr( U Y X<sup>t</sup> ) &nbsp;&nbsp;&nbsp; <i>trace of a {3 x 3} matrix</i> <br />  &nbsp;&nbsp; = Tr( U R )</p></blockquote><p>We don't know what U is, but we can study the other matrix R = Y X<sup>t</sup>, which is the {3 x 3} correlation matrix between X and Y. By invoking the powerful Singular Value Decomposition  theorem, we know that we can always write R as:</p><blockquote>  <p>R = V S W</p></blockquote><p>and obtain the {3 x 3} matrices V and W (which are orthogonal) and S (which is positive diagonal, with elements &sigma;<sub>i</sub>). We substitute these matrices into the troublesome L term of the square-distance equation:</p><blockquote>  <p>L = Tr( U V S W<sup>t</sup> ) = Tr( S W<sup>t</sup> U V )</p></blockquote><p>As S is diagonal, if we group the matrices W<sup>t</sup> U V together into the matrix T, we can get a simple expression for the trace in terms of the &sigma;<sub>i</sub>:</p><blockquote>  <p>L =  Tr( S T ) = &sigma;<sub>1</sub> T<sub>11</sub> + &sigma;<sub>2</sub> T<sub>22</sub> + &sigma;<sub>3</sub> T<sub>33</sub></p></blockquote><p>Writing L in this form, we can figure out the maximum value of L. The secret sauce is in T. If we look at the definition of T, we can see that <em>T is a product of only orthogonal matrices</em>, namely the matrices U, W<sup>t</sup> and V. T must also be orthogonal.</p><p>Thus, the largest possible value for any element of T is 1, or T<sub>ij</sub> &le; 1. Now L is a product of terms linear in the diagonal elements of T, thus the maximum value of L occurs when the T<sub>ii</sub> = 1,</p><blockquote>  <p>L<sub>max</sub> = Tr( S T<sub>max</sub> ) = &sigma;<sub>1</sub> + &sigma;<sub>2</sub> + &sigma;<sub>3</sub> = Tr ( S )</p></blockquote><p>Plugging this into the equation for L<sub>max</sub> into the equation for RMSD, we get</p><blockquote>  <p>RMSD =  &radic; [ ( E<sub>0</sub> - 2 ( &sigma;<sub>1</sub> + &sigma;<sub>2</sub> + &sigma;<sub>3</sub>  ) ) / N ]</p></blockquote><p>From this analysis, we can also get the rotation that gives the optimal superposition of the vectors. When T<sub>ii</sub> = 1, then T = I, the identity matrix. This immediately gives us an equation for the optimal rotation U<sub>min</sub> in terms of the left and right orthogonal vectors of the matrix R:</p><blockquote>  <p>T<sub>max</sub> = W<sup>t</sup> U<sub>min</sub> V = I <br />  U<sub>min</sub> = W V<sup>t</sup></p></blockquote><h2>How to solve for R</h2><p>In the above section, we showed that the RMSD can be expressed in terms of the singular values of R (&sigma;<sub>i</sub>) but we didn't say how to find them. Kabsch, who first derived the solution, showed that to solve for &sigma;<sub>i</sub>, we should solve the matrix R<sup>t</sup> R, which is a nice symmetric real matrix, and much easier to solve. Expanding, we get a relation to the S matrix of R:</p><blockquote>  <p>R<sup>t</sup> R = ( W<sup>t</sup> S<sup>t</sup> V<sup>t</sup> )( V S W ) = W<sup>t</sup> S<sup>2</sup> W.</p></blockquote><p>From this expansion, we can see that the singular value term of R<sup>t </sup> R is S<sup>2</sup>. In other words, the singular values of R<sup>t</sup> R (&lambda;<sub>i</sub>) are related to the singular values of R (&sigma;<sub>i</sub>) by the relation &lambda;<sub>i</sub> = &sigma;<sub>i</sub><sup>2</sup>. And also, the right singular vector W is the same for both R and R<sup>t</sup> R.</p><p>Since R<sup>t</sup> R is a symmetric real matrix, the eigenvalues are identical to the singular values &lambda;<sub>i</sub>. Thus we can use the standard Jacobi transform algorithm to find the eigenvalues of R<sup>t</sup> R, which are identical to the &lambda;<sub>i</sub>. We substitute &lambda;<sub>i</sub> = &sigma;<sub>i</sub><sup>2</sup> into the RMSD formula:</p><blockquote>  <p>RMSD = &radic; [ ( E<sub>0</sub> - 2 ( &radic;&lambda;<sub>1</sub> + &radic;&lambda;<sub>2</sub> + &radic;&lambda;<sub>3</sub> ) ) / N ]</p></blockquote><h2>The curse of chirality</h2><p>Unfortunately, there's one more ambiguity that we will have to deal with before we get to the Kabsch equation for the RMSD. The ambiguity arises in a degeneracy in the resulting U<sub>min</sub> from the SVD theorem. U<sub>min</sub> is guaranteed to be orthogonal but for our RMSD calculation, we don't just want orthogonal matrices, we want rotation matrices. Orthogonal matrices can have determinants of +1 or -1. We want rotational matrix, which have determinants of +1 and not reflected rotational matrices, which have determinants of -1. A reflection in U<sub>min</sub> will scramble the internal arrangement of our set of coordinates.</p><p>To restrict the solutions of U<sub>min</sub> = W V<sup>t</sup> to reflections, we check the determinants of W and V:</p><blockquote>  <p>&omega; = det( V ) * det( W )</p></blockquote><p>If &omega; = -1 then there is a reflection in U<sub>min</sub> , and we must get rid of the reflection. The easiest way to get rid of the reflection is to reflect the smallest principal axes in U<sub>min</sub>. We can then simply reverse the sign of the third (smallest) eignevalue &radic;&lambda;<sub>3</sub> of the RMSD, which, finally, gives Kabsch formula for RMSD in terms of &lambda;<sub>i</sub> and &omega;:</p><blockquote>  <p>RMSD =  &radic; [ ( E<sub>0</sub> - 2 ( &radic;&lambda;<sub>1</sub> + &radic;&lambda;<sub>2</sub> + &omega; &radic;&lambda;<sub>3</sub> ) ) / N ]</p></blockquote><h2>Implementations of RMSD in code</h2><p>In C, see <a href="http://dillgroup.ucsf.edu/~bosco/rmsd.c">rmsd.c</a>, <a href="http://dillgroup.ucsf.edu/~bosco/rmsd.h">rmsd.h</a></p><p>In Python, using the very excellent <a href="http://numpy.scipy.org/">numpy library</a>:</p><p><div style="width: 100%;">   <div style="background-color: #EEE; padding: 1em; overflow: auto;"></p><pre><code>from numpy import dot, transpose, sqrt, shape      from numpy.linalg import det, svd      def calculate_rmsd(conformation1, conformation2,                          is_align = False, is_center = True):        """        Calculates the RMSD between two conformations.          * conformation1: array of dimensions [N,3]           * conformation2: array of dimensions [N,3]           * is_align: True = modify conformation2 to be                        aligned to conformation1          * is_center: True = remove average center differences                              before calculating RMSD         """        if not shape(conformation1) == shape(conformation2):          raise "calculate_rmsd: conformations not the same size."        elif not len(shape(conformation1)) == 2:          raise "calculate_rmsd: conformations not the correct rank."        vecs1 = conformation1.copy()        vecs2 = conformation2.copy()        n_vec, vec_size = shape(vecs1)        if is_center:          center1 = sum(vecs1,0) / float(n_vec)          center2 = sum(vecs2,0) / float(n_vec)          vecs1 -= center1          vecs2 -= center2        E0 = sum(sum(vecs1 * vecs1)) + sum(sum(vecs2 * vecs2))        correlation_matrix = dot(transpose(vecs2), vecs1)        V, S, W_trans = svd(correlation_matrix)        is_reflection = (det(V) * det(W_trans)) &lt; 0.0        if is_reflection:          # reflect along smallest principal axis          S[-1] = -S[-1]        rmsd_sq = (E0 - 2. * sum(S)) / float(n_vec)        rmsd_sq = max([rmsd_sq, 0.])        if is_align:          if is_reflection:             V[-1,:] = V[-1,:] * (-1.0)          optimal_rotation = dot(V, W_trans)          conformation2[:,:] = dot(vecs2, optimal_rotation) + center2        return sqrt(rmsd_sq)</code></pre><p></div> </div></p><p class="dateline">    last updated 2 months ago    <a href="/Root_Mean_Square_Deviation">#</a></p></div>

          </div>
        </div>
      </td>
    </table>
  </div>
</body>

</html>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -