📄 node5.html
字号:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"><!--Converted with LaTeX2HTML 98.1p1 release (March 2nd, 1998)originally by Nikos Drakos (nikos@cbl.leeds.ac.uk), CBLU, University of Leeds* revised and updated by: Marcus Hennecke, Ross Moore, Herb Swan* with significant contributions from: Jens Lippmann, Marek Rouchal, Martin Wilck and others --><HTML><HEAD><TITLE>Loops and Nearest neighbor rules</TITLE><META NAME="description" CONTENT="Loops and Nearest neighbor rules"><META NAME="keywords" CONTENT="FEBS98-html"><META NAME="resource-type" CONTENT="document"><META NAME="distribution" CONTENT="global"><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1"><LINK REL="STYLESHEET" HREF="FEBS98-html.css"><LINK REL="next" HREF="node6.html"><LINK REL="previous" HREF="node4.html"><LINK REL="up" HREF="FEBS98-html.html"><LINK REL="next" HREF="node6.html"></HEAD><BODY BGCOLOR=#FFDEAD TEXT=#202020 LINK=#800000 ALINK=#ffff00 VLINK=#353976><!--Navigation Panel--><A NAME="tex2html113" HREF="node6.html"><IMG WIDTH="37" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="next" SRC="img/next_motif.gif"></A> <A NAME="tex2html110" HREF="FEBS98-html.html"><IMG WIDTH="26" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="up" SRC="img/up_motif.gif"></A> <A NAME="tex2html104" HREF="node4.html"><IMG WIDTH="63" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="previous" SRC="img/previous_motif.gif"></A> <A NAME="tex2html112" HREF="node1.html"><IMG WIDTH="65" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="contents" SRC="img/contents_motif.gif"></A> <BR><B> Next:</B> <A NAME="tex2html114" HREF="node6.html">Constrained folding</A><B> Up:</B> <A NAME="tex2html111" HREF="FEBS98-html.html">Algorithms and Thermodynamics for</A><B> Previous:</B> <A NAME="tex2html105" HREF="node4.html">Software platforms and environment</A><BR><BR><!--End of Navigation Panel--><H1><A NAME="SECTION00050000000000000000">Loops and Nearest neighbor rules</A></H1><P>The <I>mfold</I> software uses what are called <I>nearest neighbor</I>energy rules. That is, free energies are assigned to loops rather thanto base pairs. These have also been called loop dependent energyrules. In an effort to keep this article as self-contained aspossible, we are including some well-known definitions that may befound elsewhere [<A HREF="node19.html#SAND8301">21</A>,<A HREF="node19.html#ZUKM8401">22</A>,<A HREF="node19.html#ZUKM8601">23</A>].<P>A secondary structure, <B>S</B> on an RNA sequence, <!-- MATH: ${\bf R} =r_1,r_2,r_3,\dots,r_n$ --><IMG WIDTH="147" HEIGHT="29" ALIGN="MIDDLE" BORDER="0" SRC="img15.gif" ALT="${\bf R} =r_1,r_2,r_3,\dots,r_n$">,is a set of <I>base pairs</I>. A base pairbetween nucleotides <I>r</I><SUB><I>i</I></SUB> and <I>r</I><SUB><I>j</I></SUB> (<I>i</I><<I>j</I>) is denoted by <I>i</I>.<I>j</I>. Afew constraints are imposed.<P><UL><LI>Two base pairs, <I>i</I>.<I>j</I> and <I>i</I>'.<I>j</I>' are either identical, or else<IMG WIDTH="40" HEIGHT="31" ALIGN="MIDDLE" BORDER="0" SRC="img16.gif" ALT="$i \neq i'$">and <IMG WIDTH="44" HEIGHT="31" ALIGN="MIDDLE" BORDER="0" SRC="img17.gif" ALT="$j \neq j'$">.Thus base triples are deliberatelyexcluded from the definition of secondary structure.<LI>Sharp U-turns are prohibited. A U-turn, called a hairpin loop,must contain at least 3 bases.<LI>Pseudoknots are prohibited. That is, if <I>i</I>.<I>j</I> and <!-- MATH: $i'.j' \in{\bf S}$ --><IMG WIDTH="59" HEIGHT="31" ALIGN="MIDDLE" BORDER="0" SRC="img18.gif" ALT="$i'.j' \in{\bf S}$">,then, assuming <I>i</I> < <I>i</I>', either <!-- MATH: $i < i' < j' < j$ --><I>i</I> < <I>i</I>' < <I>j</I>' < <I>j</I> or <!-- MATH: $i < j< i' < j'$ --><I>i</I> < <I>j</I>< <I>i</I>' < <I>j</I>'.</UL><P>Pseudoknots[<A HREF="node19.html#PLEC8901">24</A>,<A HREF="node19.html#ABRJ9001">25</A>,<A HREF="node19.html#GUTR9001">26</A>,<A HREF="node19.html#DAME9201">27</A>,<A HREF="node19.html#PLEC9401">28</A>,<A HREF="node19.html#DUZ9601">29</A>] and basetriples are not excluded for frivolous reasons. When pseudoknots areincluded, the loop decomposition of a secondary structure breaks downand the energy rules break down. Although we can assign reasonablefree energies to the helices in a pseudoknot, and even to possiblecoaxial stacking between them, it is not possible to estimate theeffects of the new kinds of loops that are created. Base triples posean even greater challenge, because the exact nature of the triplecannot be predicted in advance, and even if it could, we have no datafor assigning free energies.<P>A base <I>r</I><SUB><I>i</I>'</SUB> or a base pair <I>i</I>'.<I>j</I>' is called accessible from abase pair <I>i</I>.<I>j</I> if <!-- MATH: $i < i' ( < j') < j$ --><I>i</I> < <I>i</I>' ( < <I>j</I>') < <I>j</I> and if there is not other basepair, <I>k</I>.<I>l</I> such that <!-- MATH: $i < k < i' ( < j' ) < l < j$ --><I>i</I> < <I>k</I> < <I>i</I>' ( < <I>j</I>' ) < <I>l</I> < <I>j</I>. The collection ofbases and base pairs accessible from a given base pair, <I>i</I>.<I>j</I>, but<B>not</B> including that base pair, is called the loop closed by<I>i</I>.<I>j</I>. We denote it by <B>L</B>(<I>i.j</I>).The collection of bases andbase pairs not accessible from any base pair is called the exterior(or external) loop, and will be denoted by <B>L</B><SUB>e</SUB>here. It isworth noting that if we imagine adding a 0<SUP><I>th</I></SUP> and an <!-- MATH: $(n+1)^{st}$ -->(<I>n</I>+1)<SUP><I>st</I></SUP>base to the RNA, and a base pair <I>0.(n+1)</I>,then the exterior loopbecomes the loop closed by this imaginary base pair. We call this the<I>universal closing base pair</I> of an RNA structure. If <B>S</B>is a secondary structure, then <B>S'</B>denotes the same secondarystructure with the addition of the universal closing base pair. Theexterior loop exists only in linear RNA. It is treated differentlythan other loops because we assume as a first approximation that thereare no conformational constraints, and therefore no associatedentropic costs.<P>Any secondary structure, <B>S</B> decomposes an RNA uniquely intoloops. We can write this as:<BR><P></P><DIV ALIGN="CENTER"><!-- MATH: \begin{displaymath}{\bf R} = \bigcup_{i.j \in {\bf S'}} {\bf L}(i.j)\end{displaymath} --><IMG WIDTH="112" HEIGHT="47" SRC="img24.gif" ALT="\begin{displaymath}{\bf R} = \bigcup_{i.j \in {\bf S'}} {\bf L}(i.j)\end{displaymath}"></DIV><BR CLEAR="ALL"><P></P>Loops may contain 0, 1 or more base pairs. The term <I>k</I>-loopdenotes a loop containing <I>k</I>-1 base pairs, making a total of <I>k</I> basepairs by including the closing base pair. We introduce the terms<I>l</I><SUB>s</SUB>(<B>L</B>) and <I>l</I><SUB>d</SUB>(<B>L</B>)to denote the number ofsingle-stranded bases and base pairs in a loop, respectively. The sizeof a 1 or 2-loop is defined as <I>l</I><SUB>s</SUB>(<B>L</B>).<P>A 1-loop is called a hairpin loop. Polymer theory predicts thatthe free energy increment, <!-- MATH: $\delta \delta G$ --><IMG WIDTH="31" HEIGHT="15" ALIGN="BOTTOM" BORDER="0" SRC="img27.gif" ALT="$\delta \delta G$">,for such a loop is given by<BR><P></P><DIV ALIGN="CENTER"><!-- MATH: \begin{equation}\delta \delta G = 1.75 \times RT \times \ln ( l_{s} ),\end{equation} --><TABLE WIDTH="100%" ALIGN="CENTER"><TR VALIGN="MIDDLE"><TD ALIGN="CENTER" NOWRAP><A NAME="DDGLOG"> </A><IMG WIDTH="181" HEIGHT="28" SRC="img28.gif" ALT="\begin{displaymath}\delta \delta G = 1.75 \times RT \times \ln ( l_{s} ),\end{displaymath}"></TD><TD WIDTH=10 ALIGN="RIGHT">(1)</TD></TR></TABLE></DIV><BR CLEAR="ALL"><P></P>where <I>T</I> is absolute temperature and <I>R</I> is the universal gasconstant (1.9872 cal/mol/K). The factor 1.75 would be 2 if thechain were not self-avoiding in space. In reality, we use tabulatedvalues for <!-- MATH: $\delta \delta G$ --><IMG WIDTH="31" HEIGHT="15" ALIGN="BOTTOM" BORDER="0" SRC="img27.gif" ALT="$\delta \delta G$">for <I>l</I><SUB><I>s</I></SUB> from 3 to 30. These values are basedon measurements and interpolations of measurements, and are stored inan file named <I>loop.dg</I>, or <I>loop.TC</I>, where <I>TC</I> is atemperature (integral) in <IMG WIDTH="11" HEIGHT="15" ALIGN="BOTTOM" BORDER="0" SRC="img29.gif" ALT="$^{\circ}$">C. We use the latter only whendeparting from our temperature standard of 37 degrees. Thus <I>loop.dg</I> and <I>loop.</I>37 refer to the same file. The sameconvention holds for other files defined below. Equation <A HREF="node5.html#DDGLOG">1</A>is used to extrapolate beyond size 30. Thus, for <!-- MATH: $l_{s} > 30$ --><I>l</I><SUB><I>s</I></SUB> > 30,<BR><P></P><DIV ALIGN="CENTER"><!-- MATH: \begin{equation}\delta \delta G= \delta \delta G_{30} + 1.75 \times RT \times \ln ( l_{s}/30 ).\end{equation} --><TABLE WIDTH="100%" ALIGN="CENTER"><TR VALIGN="MIDDLE"><TD ALIGN="CENTER" NOWRAP><A NAME="DDGLOG2"> </A><IMG WIDTH="266" HEIGHT="28" SRC="img30.gif" ALT="\begin{displaymath}\delta \delta G= \delta \delta G_{30} + 1.75 \times RT \times \ln ( l_{s}/30 ).\end{displaymath}"></TD><TD WIDTH=10 ALIGN="RIGHT">(2)</TD></TR></TABLE></DIV><BR CLEAR="ALL"><P></P><P>Figure <A HREF="node5.html#LOOP">1</A> shows the information stored in the <I>loop</I> file.<BR><DIV ALIGN="CENTER"><A NAME="LOOP"> </A><A NAME="811"> </A><TABLE WIDTH="100%"><CAPTION><STRONG>Figure 1:</STRONG>The <I> loop.dg</I> or <I> loop.TC</I> containssize based free energy increments for hairpin, bulge and interiorloops up to size 30. Entries with `.' are undefined.</CAPTION><TR><TD ALIGN=CENTER><PRE>DESTABILIZING ENERGIES BY SIZE OF LOOP (INTERPOLATE WHERE NEEDED)hp3 ave calc no tmm;hp4 ave calc with tmm; ave all bulgesSIZE INTERNAL BULGE HAIRPIN-------------------------------------------------------1 . 3.8 . 2 . 2.8 . 3 . 3.2 5.64 1.7 3.6 5.55 1.8 4.0 5.66 2.0 4.4 5.37 2.2 4.6 5.88 2.3 4.7 5.4 ...30 3.7 6.1 7.7</PRE></TD></TR></TABLE></DIV><BR><P><BR><DIV ALIGN="CENTER"><A NAME="TSTKH"> </A><A NAME="812"> </A><TABLE WIDTH="50%"><CAPTION><STRONG>Figure 2:</STRONG>On the left, a typical 4 × 4 table. Thepairs WX and YZ are covalently linked. WZ is assumed to be the closingbase pair of a hairpin loop, and XY is the mismatched pair. `X' refersto row , and `Y' to column, in order A, C, G and U. Thus `aGU' is thesame as `a34' and is the mismatch free energy for a GU mismatch (X=Gand Y=U). On the right is a sample table for W=C and Z=G.</CAPTION><TR><TD ALIGN=LEFT><PRE> 5' --> 3' 5' --> 3' WX CX ZY GY 3' <-- 5' 3' <-- 5' Y: A C G U A C G U ------------------ ----------------- X:A | aAA aAC aAG aAU -1.5 -1.5 -1.4 -1.8 C | aCA aCC aCG aCU -1.0 -0.9 -2.9 -0.8 G | aGA aGC aGG aGU -2.2 -2.0 -1.6 -1.2 U | aUA aUC aUG aUU -1.7 -1.4 -1.9 -2.0</PRE></TD></TR></TABLE></DIV><BR><P>In addition, the effects of <I>terminal mismatched pairs</I> are takeninto account for hairpin loops of size greater than 3. For loops ofsize 4 and greater closed by a base pair <I>i</I>.<I>j</I>, an extra <!-- MATH: $\delta \delta G$ --><IMG WIDTH="31" HEIGHT="15" ALIGN="BOTTOM" BORDER="0" SRC="img27.gif" ALT="$\delta \delta G$">isapplied. This is referred to as the <I>terminal mismatch</I> free energyfor hairpin loops. These parameters are stored in a file named <I>tstackh.dg</I> or <I>tstackh.TC</I>, as above. The data are arranged in 4 × 4tables that each comprise 4 rows and columns. Figure <A HREF="node5.html#TSTKH">2</A> illustrates how the parameters are stored.<P>Both the <I>loop</I> and <I>tstackh</I> files treat hairpin loops in ageneric way, and assume no special structure for the bases in theloop. We know that this is not true in general. For example, theanti-codon loop of tRNA is certainly not unstructured. For certainsmall hairpin loops, special rules apply. Hairpin loops of size 3 arecalled triloops and those of size 4 are called tetraloops. Files of<I>distinguished</I> triloops and tetraloops have been created to storethe free energy bonus assigned to those loops. These parameters arestored in files <I>triloop.dg</I> and <I>tloop.dg</I>, respectively (or <I>triloop.TC</I> and <I>tloop.TC</I> for a specific temperature, <I>TC</I>). Some typical entries are given in Figure <A HREF="node5.html#TLOOP">3</A><P><BR><DIV ALIGN="CENTER"><A NAME="TLOOP"> </A><A NAME="813"> </A><TABLE WIDTH="50%"><CAPTION><STRONG>Figure 3:</STRONG>Sample <I> distinguished</I> tetraloops togetherwith the free energy bonuses, in kcal/mole, attached to them. Theseentries include the closing base pair of the loop. Triloops are notshown since they are not currently in use for RNA folding.</CAPTION><TR><TD><PRE> Seq Energy ------------- GGGGAC -3.0 ... CGAAGG -2.5 CUACGG -2.5 ... GUGAAC -1.5 UGGAAA -1.5 </PRE></TD></TR></TABLE></DIV><BR><P>Finally, there are some special hairpin loop rules derived fromexperiments that will be defined explicitly here. A hairpin loopclosed by <I>r</I><SUB><I>i</I></SUB> and <I>r</I><SUB><I>j</I></SUB> (<I>i</I><<I>j</I>) called a ``GGG'' loop if<!-- MATH: $r_{i-2} = r_{i-1} = r_{i} = G$ --><I>r</I><SUB><I>i</I>-2</SUB> = <I>r</I><SUB><I>i</I>-1</SUB> = <I>r</I><SUB><I>i</I></SUB> = <I>G</I> and <I>r</I><SUB><I>j</I></SUB> = <I>U</I>. Such a loop receivesa free energy bonus that is stored in the <I>miscloop.dg</I> or <I>miscloop.TC</I> file, which contains a variety of miscellaneous, orextra free energy parameters. Another special case is the ``poly-C''hairpin loop, where all the single stranded bases are C. If the loophas size 3, it is given a free energy penalty of <I>c</I>3. Otherwise, thepenalty is c<SUB>2</SUB> + c<SUB>1</SUB> × l<SUB>s</SUB>.The constants <!-- MATH: $c_{1}, c_{2}$ --><I>c</I><SUB>1</SUB>, <I>c</I><SUB>2</SUB>and <I>c</I><SUB>3</SUB> are all stored in the <I>miscloop</I> file.<P>To summarize, we can write the free energy, <!-- MATH: $\delta \delta G_{H}$ --><IMG WIDTH="43" HEIGHT="29" ALIGN="MIDDLE" BORDER="0" SRC="img35.gif" ALT="$\delta \delta G_{H}$">of a hairpin loop as:
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -