📄 root_mean_square_deviation.htm
字号:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>RMSD: Root Mean Square Deviation (bosco)</title>
<link href="http://files.infogami.com/styles/style.css" type="text/css" rel="stylesheet" />
<link href="http://bosco.infogami.com/style.css" type="text/css" rel="stylesheet" />
</head>
<body>
<left>
<div id="contents">
<table cellpadding="0" cellspacing="10">
<td valign="top">
<div id ="sidebar">
<div class="title">
<a href='/'>bosco</a>
</div>
essays and other writings...
<p><a href="mailto:apposite@gmail.com">apposite@gmail.com</a></p>
<p class="login"><a href="/_account/in?path=/Root_Mean_Square_Deviation">log in</a></p>
</div>
</td>
<td width="100%">
<div id="main">
<div id="article">
<img width="35" src="http://static.flickr.com/47/189064390_fd5ff3eb57_o.gif">
<img width="45" src="http://static.flickr.com/47/189064390_fd5ff3eb57_o.gif">
<img width="60" src="http://static.flickr.com/47/189064390_fd5ff3eb57_o.gif">
<img width="45" src="http://static.flickr.com/47/189064390_fd5ff3eb57_o.gif">
<img width="35" src="http://static.flickr.com/47/189064390_fd5ff3eb57_o.gif">
<img width="30" src="http://static.flickr.com/47/189064390_fd5ff3eb57_o.gif">
<h1>RMSD: Root Mean Square Deviation</h1>
<div class='page'><p>If it's your day job to push proteins <i>in silico</i> then you will one-day have to calculate the RMSD of a protein. You've just simulated protein GinormousA for a whole micro-second but you don't even know if GinormousA is stable. You could load up a protein viewer and eyeball the stability of the protein throughout the simulation. But eye-balling is just not going to cut it with your boss. You need to slap a number on it. This is where RMSD comes in.</p><p>A protein conformation is basically a set of 3-dimensional coordinates. This means that a conformation is a set of n vectors, x<sub>n</sub>, where each x<sub>n</sub> has three components. As the protein evolves along the trajectory, the conformation changes, resulting in a new protein conformation and a new set of vectors y<sub>n</sub>. We capture the difference between conformations through the two sets of vectors x<sub>n</sub> and y<sub>n</sub>. To make life easier, we shift the centre-of-mass of y<sub>n</sub> and x<sub>n</sub> to the origin of the coordinates (we can always shift them back afterwards). Then the difference can be captured by the square distance:</p><blockquote> <p>E = Σ<sub>n</sub> |x<sub>n</sub> - y<sub>n</sub>|<sup>2</sup></p></blockquote><p>The root mean-square distance (RMSD) is then</p><blockquote> <p>RMSD = √ E / N</p></blockquote><p>But hola, you say, the mean-square measure doesn't measure the similarity very well. Just a little rotation of the set of y<sub>n</sub>, which doesn't change the internal arrangement of y<sub>n</sub>, would distort the RMSD. What we really need, is to find the best rotation of y<sub>n</sub> with respect to x<sub>n</sub> <i>before</i> taking the RMSD. To rotate the vectors y<sub>n</sub>, we apply a rotation matrix U to get y'<sub>n</sub> = U y<sub>n</sub>. We can then express E as a function of the rotation U:</p><blockquote> <p>E = Σ<sub>n</sub> |x<sub>n</sub> - U y<sub>n</sub>|<sup>2</sup></p></blockquote><p>To find the RMSD, we must first, find the rotation U<sub>min</sub> that minimizes E.</p><h2>Matching sets of vectors</h2><p>The classic papers of Wolfgang Kabsch showed how to minimize E to get the RMSD. The algorithm described in those papers have since become part of the staple diet of protein analysis. As the original proof for the Kasch equations using 6 different Lagrange multipliers is rather tedious, we'll describe a much simpler proof using standard linear algebra.</p><p>First we expand E (which is a function of U)</p><blockquote> <p>E = Σ<sub>n</sub> ( |x<sub>n</sub>|<sup>2</sup> + |y<sub>n</sub>|<sup>2</sup> ) - 2 Σ<sub>n</sub> x<sub>n</sub> U y<sub>n</sub></p> <p>E = E<sub>0</sub> - 2 Σ<sub>n</sub> x<sub>n</sub> U y<sub>n</sub></p></blockquote><p>The first part E<sub>0</sub> is invariant with respect to a rotation U. The variation resides in the last term L = Σ<sub>n</sub> x<sub>n</sub> U y<sub>n</sub>. The goal to minimize the RMSD is to find the matrix U<sub>min</sub> that gives the largest possible value of L<sub>max</sub> from which we get:</p><blockquote> <p>RMSD = √ E<sub>min</sub> / N = √ ( (E<sub>0</sub> - 2 L<sub>max</sub>) / N )</p></blockquote><p>The trick is to realize that we can promote the set of vectors x<sub>n</sub> and y<sub>n</sub> into the {N x 3} matrices X and Y. Then we can write L as a trace</p><blockquote> <p>L = Tr( X<sup>t</sup> U Y )</p></blockquote><p>where X<sup>t</sup> is a {3 x N} matrix and Y is a {N x 3} matrix. Juggling the matrices inside the trace, we get:</p><blockquote> <p>L = Tr( X<sup>t</sup> U Y ) <i>trace of a {N x N} matrix</i> <br /> = Tr( U Y X<sup>t</sup> ) <i>trace of a {3 x 3} matrix</i> <br /> = Tr( U R )</p></blockquote><p>We don't know what U is, but we can study the other matrix R = Y X<sup>t</sup>, which is the {3 x 3} correlation matrix between X and Y. By invoking the powerful Singular Value Decomposition theorem, we know that we can always write R as:</p><blockquote> <p>R = V S W</p></blockquote><p>and obtain the {3 x 3} matrices V and W (which are orthogonal) and S (which is positive diagonal, with elements σ<sub>i</sub>). We substitute these matrices into the troublesome L term of the square-distance equation:</p><blockquote> <p>L = Tr( U V S W<sup>t</sup> ) = Tr( S W<sup>t</sup> U V )</p></blockquote><p>As S is diagonal, if we group the matrices W<sup>t</sup> U V together into the matrix T, we can get a simple expression for the trace in terms of the σ<sub>i</sub>:</p><blockquote> <p>L = Tr( S T ) = σ<sub>1</sub> T<sub>11</sub> + σ<sub>2</sub> T<sub>22</sub> + σ<sub>3</sub> T<sub>33</sub></p></blockquote><p>Writing L in this form, we can figure out the maximum value of L. The secret sauce is in T. If we look at the definition of T, we can see that <em>T is a product of only orthogonal matrices</em>, namely the matrices U, W<sup>t</sup> and V. T must also be orthogonal.</p><p>Thus, the largest possible value for any element of T is 1, or T<sub>ij</sub> ≤ 1. Now L is a product of terms linear in the diagonal elements of T, thus the maximum value of L occurs when the T<sub>ii</sub> = 1,</p><blockquote> <p>L<sub>max</sub> = Tr( S T<sub>max</sub> ) = σ<sub>1</sub> + σ<sub>2</sub> + σ<sub>3</sub> = Tr ( S )</p></blockquote><p>Plugging this into the equation for L<sub>max</sub> into the equation for RMSD, we get</p><blockquote> <p>RMSD = √ [ ( E<sub>0</sub> - 2 ( σ<sub>1</sub> + σ<sub>2</sub> + σ<sub>3</sub> ) ) / N ]</p></blockquote><p>From this analysis, we can also get the rotation that gives the optimal superposition of the vectors. When T<sub>ii</sub> = 1, then T = I, the identity matrix. This immediately gives us an equation for the optimal rotation U<sub>min</sub> in terms of the left and right orthogonal vectors of the matrix R:</p><blockquote> <p>T<sub>max</sub> = W<sup>t</sup> U<sub>min</sub> V = I <br /> U<sub>min</sub> = W V<sup>t</sup></p></blockquote><h2>How to solve for R</h2><p>In the above section, we showed that the RMSD can be expressed in terms of the singular values of R (σ<sub>i</sub>) but we didn't say how to find them. Kabsch, who first derived the solution, showed that to solve for σ<sub>i</sub>, we should solve the matrix R<sup>t</sup> R, which is a nice symmetric real matrix, and much easier to solve. Expanding, we get a relation to the S matrix of R:</p><blockquote> <p>R<sup>t</sup> R = ( W<sup>t</sup> S<sup>t</sup> V<sup>t</sup> )( V S W ) = W<sup>t</sup> S<sup>2</sup> W.</p></blockquote><p>From this expansion, we can see that the singular value term of R<sup>t </sup> R is S<sup>2</sup>. In other words, the singular values of R<sup>t</sup> R (λ<sub>i</sub>) are related to the singular values of R (σ<sub>i</sub>) by the relation λ<sub>i</sub> = σ<sub>i</sub><sup>2</sup>. And also, the right singular vector W is the same for both R and R<sup>t</sup> R.</p><p>Since R<sup>t</sup> R is a symmetric real matrix, the eigenvalues are identical to the singular values λ<sub>i</sub>. Thus we can use the standard Jacobi transform algorithm to find the eigenvalues of R<sup>t</sup> R, which are identical to the λ<sub>i</sub>. We substitute λ<sub>i</sub> = σ<sub>i</sub><sup>2</sup> into the RMSD formula:</p><blockquote> <p>RMSD = √ [ ( E<sub>0</sub> - 2 ( √λ<sub>1</sub> + √λ<sub>2</sub> + √λ<sub>3</sub> ) ) / N ]</p></blockquote><h2>The curse of chirality</h2><p>Unfortunately, there's one more ambiguity that we will have to deal with before we get to the Kabsch equation for the RMSD. The ambiguity arises in a degeneracy in the resulting U<sub>min</sub> from the SVD theorem. U<sub>min</sub> is guaranteed to be orthogonal but for our RMSD calculation, we don't just want orthogonal matrices, we want rotation matrices. Orthogonal matrices can have determinants of +1 or -1. We want rotational matrix, which have determinants of +1 and not reflected rotational matrices, which have determinants of -1. A reflection in U<sub>min</sub> will scramble the internal arrangement of our set of coordinates.</p><p>To restrict the solutions of U<sub>min</sub> = W V<sup>t</sup> to reflections, we check the determinants of W and V:</p><blockquote> <p>ω = det( V ) * det( W )</p></blockquote><p>If ω = -1 then there is a reflection in U<sub>min</sub> , and we must get rid of the reflection. The easiest way to get rid of the reflection is to reflect the smallest principal axes in U<sub>min</sub>. We can then simply reverse the sign of the third (smallest) eignevalue √λ<sub>3</sub> of the RMSD, which, finally, gives Kabsch formula for RMSD in terms of λ<sub>i</sub> and ω:</p><blockquote> <p>RMSD = √ [ ( E<sub>0</sub> - 2 ( √λ<sub>1</sub> + √λ<sub>2</sub> + ω √λ<sub>3</sub> ) ) / N ]</p></blockquote><h2>Implementations of RMSD in code</h2><p>In C, see <a href="http://dillgroup.ucsf.edu/~bosco/rmsd.c">rmsd.c</a>, <a href="http://dillgroup.ucsf.edu/~bosco/rmsd.h">rmsd.h</a></p><p>In Python, using the very excellent <a href="http://numpy.scipy.org/">numpy library</a>:</p><p><div style="width: 100%;"> <div style="background-color: #EEE; padding: 1em; overflow: auto;"></p><pre><code>from numpy import dot, transpose, sqrt, shape from numpy.linalg import det, svd def calculate_rmsd(conformation1, conformation2, is_align = False, is_center = True): """ Calculates the RMSD between two conformations. * conformation1: array of dimensions [N,3] * conformation2: array of dimensions [N,3] * is_align: True = modify conformation2 to be aligned to conformation1 * is_center: True = remove average center differences before calculating RMSD """ if not shape(conformation1) == shape(conformation2): raise "calculate_rmsd: conformations not the same size." elif not len(shape(conformation1)) == 2: raise "calculate_rmsd: conformations not the correct rank." vecs1 = conformation1.copy() vecs2 = conformation2.copy() n_vec, vec_size = shape(vecs1) if is_center: center1 = sum(vecs1,0) / float(n_vec) center2 = sum(vecs2,0) / float(n_vec) vecs1 -= center1 vecs2 -= center2 E0 = sum(sum(vecs1 * vecs1)) + sum(sum(vecs2 * vecs2)) correlation_matrix = dot(transpose(vecs2), vecs1) V, S, W_trans = svd(correlation_matrix) is_reflection = (det(V) * det(W_trans)) < 0.0 if is_reflection: # reflect along smallest principal axis S[-1] = -S[-1] rmsd_sq = (E0 - 2. * sum(S)) / float(n_vec) rmsd_sq = max([rmsd_sq, 0.]) if is_align: if is_reflection: V[-1,:] = V[-1,:] * (-1.0) optimal_rotation = dot(V, W_trans) conformation2[:,:] = dot(vecs2, optimal_rotation) + center2 return sqrt(rmsd_sq)</code></pre><p></div> </div></p><p class="dateline"> last updated 2 months ago <a href="/Root_Mean_Square_Deviation">#</a></p></div>
</div>
</div>
</td>
</table>
</div>
</body>
</html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -