📄 第三章.htm
字号:
<!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026"/>
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1"/>
</o:shapelayout></xml><![endif]-->
</head>
<body lang=ZH-CN link="#efce8f" vlink=purple style='tab-interval:21.0pt;
text-justify-trim:punctuation'>
<div class=Section1 style='layout-grid:15.6pt'>
<h1 align=center style='text-align:center'><span style='font-family:隶书;
color:#EFCE8F'>第三章</span><span lang=EN-US style='font-size:36.0pt;mso-ascii-font-family:
隶书;mso-fareast-font-family:隶书;color:#EFCE8F'> </span><span
style='font-family:隶书;color:#EFCE8F'>序列比较</span><span lang=EN-US><o:p></o:p></span></h1>
<!--mstheme-->
<p align=center style='text-align:center'><span lang=EN-US style='font-size:
36.0pt;font-family:隶书;color:#EFCE8F'><!--[if gte vml 1]><v:shapetype id="_x0000_t75"
coordsize="21600,21600" o:spt="75" o:preferrelative="t" path="m@4@5l@4@11@9@11@9@5xe"
filled="f" stroked="f">
<v:stroke joinstyle="miter"/>
<v:formulas>
<v:f eqn="if lineDrawn pixelLineWidth 0"/>
<v:f eqn="sum @0 1 0"/>
<v:f eqn="sum 0 0 @1"/>
<v:f eqn="prod @2 1 2"/>
<v:f eqn="prod @3 21600 pixelWidth"/>
<v:f eqn="prod @3 21600 pixelHeight"/>
<v:f eqn="sum @0 0 1"/>
<v:f eqn="prod @6 1 2"/>
<v:f eqn="prod @7 21600 pixelWidth"/>
<v:f eqn="sum @8 21600 0"/>
<v:f eqn="prod @7 21600 pixelHeight"/>
<v:f eqn="sum @10 21600 0"/>
</v:formulas>
<v:path o:extrusionok="f" gradientshapeok="t" o:connecttype="rect"/>
<o:lock v:ext="edit" aspectratio="t"/>
</v:shapetype><v:shape id="_x0000_i1025" type="#_x0000_t75" alt="" style='width:495.75pt;
height:18.75pt'>
<v:imagedata src="./第三章.files/image001.jpg" o:href="http://www.lmbe.seu.edu.cn/chenyuan/xsun/bioinfomatics/web/images/mytemp14.jpg"/>
</v:shape><![endif]--><![if !vml]><img width=661 height=25
src="./第三章.files/image002.jpg" border=0 v:shapes="_x0000_i1025"><![endif]></span><span
lang=EN-US><o:p></o:p></span></p>
<p><span lang=EN-US><SELECT NAME="str_sel"
onchange="javascript:window.location=(this.options[this.selectedIndex].value);">
<OPTION SELECTED>========= 选择章节 ==========
<OPTION VALUE="3.1.htm">3.1 序列的相似性
<OPTION VALUE="3.2.htm">3.2 两两比对算法
<OPTION VALUE="3.3.htm">3.3 序列多重比对
<OPTION VALUE="3.4.htm">3.4 DNA片段组装
<OPTION VALUE="3.question.htm">问题与练习
<OPTION VALUE="3.referance.htm">参考文献
</SELECT><o:p></o:p></span></p>
<p class=MsoPlainText style='text-indent:21.25pt;line-height:150%'><span
style='font-size:10.0pt;mso-bidi-font-size:10.5pt'>序列比较是生物信息学中最基本、最重要的操作,通过序列比对可以发现生物序列中的功能、结构和进化的信息。序列比较的根本任务是:通过比较生物分子序列,发现它们的相似性,找出序列之间共同的区域,同时辨别序列之间的差异。在分子生物学中,<span
lang=EN-US>DNA或蛋白质的相似性是多方面的,可能是核酸或氨基酸序列的相似,可能是结构的相似,也可能是功能的相似。一个普遍的规律是序列决定结构,结构决定功能。研究序列相似性的目的之一是,通过相似的序列得到相似的结构或相似的功能。这种方法在大多数情况下是成功的,当然,也存在着这样的情况,即两条序列几乎没有相似之处,但分子却折叠成相同的空间形状,并具有相同的功能。这里先不考虑空间结构或功能的相似性,仅研究序列的相似性。研究序列相似性的另一个目的是通过序列的相似性,判别序列之间的同源性,推测序列之间的进化关系。这里,将序列看成由基本字符组成的字符串,无论核酸序列还是蛋白质序列,都是特殊的字符串。本章着重介绍通用的序列比较方法。</span></span><span
lang=EN-US><o:p></o:p></span></p>
<h2 align=center style='text-align:center'><!--mstheme--><span lang=EN-US
style='font-family:隶书;color:#EFCE8F'>3.1 序列的相似性</span><span lang=EN-US><o:p></o:p></span></h2>
<!--mstheme-->
<p class=MsoPlainText style='text-indent:21.25pt;line-height:150%'><span
style='font-size:10.0pt;mso-bidi-font-size:10.5pt'>序列的相似性可以是定量的数值,也可以是定性的描述。相似度是一个数值,反映两条序列的相似程度。关于两条序列之间的关系,有许多名词,如相同、相似、同源、同功、直向同源、共生同源等。
在进行序列比较时经常使用<span lang=EN-US>“同源”(homology)和“相似”(similarity)这两个概念,这是两个经常容易被混淆的不同概念。两条序列同源是指它们具有共同的祖先。在这个意义上,无所谓同源的程度,两条序列要么同源,要么不同源。而相似则是有程度的差别,如两条序列的相似程度达到30%或60%。一般来说,相似性很高的两条序列往往具有同源关系。但也有例外,即两条序列的相似性很高,但它们可能并不是同源序列,这两条序列的相似性可能是由随机因素所产生的,这在进化上称为“趋同”(convergence),这样一对序列可称为同功序列。直向同源(orthologous)序列是来自于不同的种属同源序列,而共生同源(paralogous)序列则是来自于同一种属的序列,它是由进化过程中的序列复制而产生的。</span></span><span
lang=EN-US><o:p></o:p></span></p>
<p class=MsoPlainText align=center style='text-align:center;text-indent:21.25pt;
line-height:150%'><span lang=EN-US><!--[if gte vml 1]><v:shape id="_x0000_i1026"
type="#_x0000_t75" alt="" style='width:337.5pt;height:284.25pt'>
<v:imagedata src="./第三章.files/image003.png" o:href="http://www.lmbe.seu.edu.cn/chenyuan/xsun/bioinfomatics/web/images/76.bmp"/>
</v:shape><![endif]--><![if !vml]><img width=450 height=379
src="./第三章.files/image004.jpg" border=0 v:shapes="_x0000_i1026"><![endif]><o:p></o:p></span></p>
<p class=MsoPlainText style='text-indent:21.25pt;line-height:150%'><span
style='font-size:10.0pt;mso-bidi-font-size:10.5pt'>序列比较的基本操作是比对(<span
lang=EN-US>align)。两条序列的比对(alignment)是指这两条序列中各个字符的一种一一对应关系,或字符对比排列。序列的比对是一种关于序列相似性的定性描述,它反映在什么部位两条序列相似,在什么部位两条序列存在差别。最优比对揭示两条序列的最大相似程度,指出序列之间的根本差异。</span></span><span
lang=EN-US><o:p></o:p></span></p>
<p><b><span lang=EN-US style='color:#EFCE8F'>3.1.1 字母表和序列 </span></b><span
lang=EN-US><o:p></o:p></span></p>
<p class=MsoPlainText style='text-indent:21.25pt;line-height:150%'><span
style='font-size:10.0pt'>在生物分子信息处理过程中,将生物分子序列抽象为字符串,其中的字符取自特定的字母表。字母表是一组符号或字符,字母表中的元素组成序列。一些重要的字母表有:</span><span
lang=EN-US><o:p></o:p></span></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:150%'><span style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:
"Times New Roman";mso-hansi-font-family:"Times New Roman"'>(</span><span
lang=EN-US style='font-size:10.0pt'>1</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>)</span><span lang=EN-US style='font-size:10.0pt'>4</span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>字符</span><span lang=EN-US
style='font-size:10.0pt'>DNA</span><span style='font-size:10.0pt;font-family:
宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"'>字母表</span><span
lang=EN-US style='font-size:10.0pt'> {A, C, G, T}</span><span style='font-size:
10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>;</span><span lang=EN-US><o:p></o:p></span></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:150%'><span style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:
"Times New Roman";mso-hansi-font-family:"Times New Roman"'>(</span><span
lang=EN-US style='font-size:10.0pt'>2</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>)扩展的遗传学字母表或</span><span lang=EN-US style='font-size:10.0pt'>IUPAC</span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>编码(</span><span lang=EN-US
style='font-size:10.0pt'><a
href="http://www.lmbe.seu.edu.cn/chenyuan/xsun/bioinfomatics/web/第二章/2.3.htm#table2.3"
target="_blank"><span style='font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>见表</span>2.3</a></span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>);</span><span lang=EN-US><o:p></o:p></span></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:150%'><span style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:
"Times New Roman";mso-hansi-font-family:"Times New Roman"'>(</span><span
lang=EN-US style='font-size:10.0pt'>3</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>)单字母氨基酸编码(</span><span lang=EN-US style='font-size:10.0pt;
color:blue'><a
href="http://www.lmbe.seu.edu.cn/chenyuan/xsun/bioinfomatics/web/第二章/2.2.htm#table2.1"
target="_blank"><span style='font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>见表</span>2.1</a></span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>);</span><span lang=EN-US><o:p></o:p></span></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:150%'><span style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:
"Times New Roman";mso-hansi-font-family:"Times New Roman"'>(</span><span
lang=EN-US style='font-size:10.0pt'>4</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>)上述字母表形成的子集。</span><span lang=EN-US><o:p></o:p></span></p>
<p class=MsoPlainText style='text-indent:21.25pt;line-height:150%'><span
style='font-size:10.0pt'>下面所讨论的内容独立于特定的字母表。 首先规定一些特定的符号:</span><span
lang=EN-US><o:p></o:p></span></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
text-indent:21.25pt;line-height:150%'><span style='font-size:10.0pt;font-family:
宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"'>①</span><span
style='font-size:10.0pt'> </span><span lang=EN-US style='font-size:10.0pt;
font-family:Symbol;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman";mso-char-type:symbol;mso-symbol-font-family:Symbol'><span
style='mso-char-type:symbol;mso-symbol-font-family:Symbol'>A</span></span><span
lang=EN-US style='font-size:10.0pt'> </span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>—</span><span style='font-size:10.0pt'> </span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>字母表;</span><span lang=EN-US><o:p></o:p></span></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
text-indent:21.25pt;line-height:150%'><span style='font-size:10.0pt;font-family:
宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"'>②</span><span
style='font-size:10.0pt'> <span lang=EN-US>A* </span></span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>—</span><span style='font-size:10.0pt'>
</span><span style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>由字母表</span><span lang=EN-US
style='font-size:10.0pt'>A</span><span style='font-size:10.0pt;font-family:
宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"'>中字符所形成的一系列有限长度序列或字符串的集合;</span><span
lang=EN-US><o:p></o:p></span></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
text-indent:21.25pt;line-height:150%'><span style='font-size:10.0pt;font-family:
宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"'>③</span><span
lang=EN-US style='font-size:10.0pt'> a</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>、</span><span lang=EN-US style='font-size:10.0pt'>b</span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>、</span><span lang=EN-US
style='font-size:10.0pt'>c </span><span style='font-size:10.0pt;font-family:
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -