⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 第三章.htm

📁 这是一些经典算法的描述
💻 HTM
📖 第 1 页 / 共 5 页
字号:
"Times New Roman";mso-hansi-font-family:"Times New Roman"'>两条序列中有很多匹配的字符对,因而在点矩阵中会形成很多点标记。当对比较长的序列进行比较时,这样的点阵图很快会变得非常复杂和模糊。使用滑动窗口代替一次一个位点的比较是解决这个问题的有效方法。假设窗口大小为</span><span
lang=EN-US style='font-size:10.0pt;mso-bidi-font-size:10.5pt'>10</span><span
style='font-size:10.0pt;mso-bidi-font-size:10.5pt;font-family:宋体;mso-ascii-font-family:
"Times New Roman";mso-hansi-font-family:"Times New Roman"'>,相似度阈值为</span><span
lang=EN-US style='font-size:10.0pt;mso-bidi-font-size:10.5pt'>8</span><span
style='font-size:10.0pt;mso-bidi-font-size:10.5pt;font-family:宋体;mso-ascii-font-family:
"Times New Roman";mso-hansi-font-family:"Times New Roman"'>。首先,将</span><span
lang=EN-US style='font-size:10.0pt;mso-bidi-font-size:10.5pt'>X</span><span
style='font-size:10.0pt;mso-bidi-font-size:10.5pt;font-family:宋体;mso-ascii-font-family:
"Times New Roman";mso-hansi-font-family:"Times New Roman"'>轴序列的第</span><span
lang=EN-US style='font-size:10.0pt;mso-bidi-font-size:10.5pt'>1-10</span><span
style='font-size:10.0pt;mso-bidi-font-size:10.5pt;font-family:宋体;mso-ascii-font-family:
"Times New Roman";mso-hansi-font-family:"Times New Roman"'>个字符与</span><span
lang=EN-US style='font-size:10.0pt;mso-bidi-font-size:10.5pt'>Y</span><span
style='font-size:10.0pt;mso-bidi-font-size:10.5pt;font-family:宋体;mso-ascii-font-family:
"Times New Roman";mso-hansi-font-family:"Times New Roman"'>轴序列的第</span><span
lang=EN-US style='font-size:10.0pt;mso-bidi-font-size:10.5pt'>1-10</span><span
style='font-size:10.0pt;mso-bidi-font-size:10.5pt;font-family:宋体;mso-ascii-font-family:
"Times New Roman";mso-hansi-font-family:"Times New Roman"'>个字符进行比较。如果在第一次比较中,这</span><span
lang=EN-US style='font-size:10.0pt;mso-bidi-font-size:10.5pt'>10</span><span
style='font-size:10.0pt;mso-bidi-font-size:10.5pt;font-family:宋体;mso-ascii-font-family:
"Times New Roman";mso-hansi-font-family:"Times New Roman"'>个字符中有</span><span
lang=EN-US style='font-size:10.0pt;mso-bidi-font-size:10.5pt'>8</span><span
style='font-size:10.0pt;mso-bidi-font-size:10.5pt;font-family:宋体;mso-ascii-font-family:
"Times New Roman";mso-hansi-font-family:"Times New Roman"'>个或者</span><span
lang=EN-US style='font-size:10.0pt;mso-bidi-font-size:10.5pt'>8</span><span
style='font-size:10.0pt;mso-bidi-font-size:10.5pt;font-family:宋体;mso-ascii-font-family:
"Times New Roman";mso-hansi-font-family:"Times New Roman"'>个以上相同,那么就在点阵空间(</span><span
lang=EN-US style='font-size:10.0pt;mso-bidi-font-size:10.5pt'>1,1</span><span
style='font-size:10.0pt;mso-bidi-font-size:10.5pt;font-family:宋体;mso-ascii-font-family:
"Times New Roman";mso-hansi-font-family:"Times New Roman"'>)的位置画上点标记。然后窗口沿</span><span
lang=EN-US style='font-size:10.0pt;mso-bidi-font-size:10.5pt'>X</span><span
style='font-size:10.0pt;mso-bidi-font-size:10.5pt;font-family:宋体;mso-ascii-font-family:
"Times New Roman";mso-hansi-font-family:"Times New Roman"'>轴向右移动一个字符的位置,比较</span><span
lang=EN-US style='font-size:10.0pt;mso-bidi-font-size:10.5pt'>X</span><span
style='font-size:10.0pt;mso-bidi-font-size:10.5pt;font-family:宋体;mso-ascii-font-family:
"Times New Roman";mso-hansi-font-family:"Times New Roman"'>轴序列的第</span><span
lang=EN-US style='font-size:10.0pt;mso-bidi-font-size:10.5pt'>2-11</span><span
style='font-size:10.0pt;mso-bidi-font-size:10.5pt;font-family:宋体;mso-ascii-font-family:
"Times New Roman";mso-hansi-font-family:"Times New Roman"'>个字符与</span><span
lang=EN-US style='font-size:10.0pt;mso-bidi-font-size:10.5pt'>Y</span><span
style='font-size:10.0pt;mso-bidi-font-size:10.5pt;font-family:宋体;mso-ascii-font-family:
"Times New Roman";mso-hansi-font-family:"Times New Roman"'>轴序列的第</span><span
lang=EN-US style='font-size:10.0pt;mso-bidi-font-size:10.5pt'>1-10</span><span
style='font-size:10.0pt;mso-bidi-font-size:10.5pt;font-family:宋体;mso-ascii-font-family:
"Times New Roman";mso-hansi-font-family:"Times New Roman"'>个字符。不断重复这个过程,直到</span><span
lang=EN-US style='font-size:10.0pt;mso-bidi-font-size:10.5pt'>X</span><span
style='font-size:10.0pt;mso-bidi-font-size:10.5pt;font-family:宋体;mso-ascii-font-family:
"Times New Roman";mso-hansi-font-family:"Times New Roman"'>轴上所有长度为</span><span
lang=EN-US style='font-size:10.0pt;mso-bidi-font-size:10.5pt'>10</span><span
style='font-size:10.0pt;mso-bidi-font-size:10.5pt;font-family:宋体;mso-ascii-font-family:
"Times New Roman";mso-hansi-font-family:"Times New Roman"'>的子串都与</span><span
lang=EN-US style='font-size:10.0pt;mso-bidi-font-size:10.5pt'>Y</span><span
style='font-size:10.0pt;mso-bidi-font-size:10.5pt;font-family:宋体;mso-ascii-font-family:
"Times New Roman";mso-hansi-font-family:"Times New Roman"'>轴第</span><span
lang=EN-US style='font-size:10.0pt;mso-bidi-font-size:10.5pt'>1-10</span><span
style='font-size:10.0pt;mso-bidi-font-size:10.5pt;font-family:宋体;mso-ascii-font-family:
"Times New Roman";mso-hansi-font-family:"Times New Roman"'>个字符组成的子串比较过为止。然后,将</span><span
lang=EN-US style='font-size:10.0pt;mso-bidi-font-size:10.5pt'>Y</span><span
style='font-size:10.0pt;mso-bidi-font-size:10.5pt;font-family:宋体;mso-ascii-font-family:
"Times New Roman";mso-hansi-font-family:"Times New Roman"'>轴的窗口向上移动一个字符的位置,重复以上过程,直到两条序列中所有长度为</span><span
lang=EN-US style='font-size:10.0pt;mso-bidi-font-size:10.5pt'>10</span><span
style='font-size:10.0pt;mso-bidi-font-size:10.5pt;font-family:宋体;mso-ascii-font-family:
"Times New Roman";mso-hansi-font-family:"Times New Roman"'>的子串都被两两比较过为止。基于滑动窗口的点矩阵方法可以明显地降低点阵图的噪声,并且可以明确地指出两条序列间具有显著相似性的区域。</span><span
lang=EN-US><o:p></o:p></span></p>

<p class=MsoNormal align=center style='mso-margin-top-alt:auto;mso-margin-bottom-alt:
auto;text-align:center;text-indent:21.25pt;line-height:150%'><span lang=EN-US><!--[if gte vml 1]><v:shape
 id="_x0000_i1039" type="#_x0000_t75" alt="" style='width:5in;height:237.75pt'>
 <v:imagedata src="./第三章.files/image029.png" o:href="http://www.lmbe.seu.edu.cn/chenyuan/xsun/bioinfomatics/web/images/89.bmp"/>
</v:shape><![endif]--><![if !vml]><img border=0 width=480 height=317
src="./第三章.files/image030.jpg" v:shapes="_x0000_i1039"><![endif]><O:P></O:P><o:p></o:p></span></p>

<p class=MsoPlainText style='line-height:150%'><b><span lang=EN-US
style='color:#EFCE8F'>3.1.4 序列的两两比对 </span></b><span lang=EN-US><o:p></o:p></span></p>

<p class=MsoPlainText style='line-height:150%'><span lang=EN-US
style='font-size:10.0pt;mso-bidi-font-size:10.5pt'>&nbsp;&nbsp;&nbsp; 序列的两两比对(Pairwise
Sequence Alignment)就是对两条序列进行编辑操作,通过字符匹配和替换,或者插入和删除字符,使得两条序列达到一样的长度,并使两条序列中相同的字符尽可能地一一对应。设两条序列分别是s和t,在s或t中插入空位符号,使s和t达到一样的长度。</span><span
style='font-size:10.0pt;mso-bidi-font-size:10.5pt;color:blue'>图<span
lang=EN-US>3.6</span></span><span style='font-size:10.0pt;mso-bidi-font-size:
10.5pt'>是对序列<span lang=EN-US>AGCACACA和ACACACTA的两种比对结果以及对应的字符编辑操作。</span></span><span
lang=EN-US><o:p></o:p></span></p>

<p class=MsoPlainText align=center style='text-align:center;line-height:150%'><span
lang=EN-US><!--[if gte vml 1]><v:shape id="_x0000_i1040" type="#_x0000_t75"
 alt="" style='width:294.75pt;height:237.75pt'>
 <v:imagedata src="./第三章.files/image031.png" o:href="http://www.lmbe.seu.edu.cn/chenyuan/xsun/bioinfomatics/web/images/90.bmp"/>
</v:shape><![endif]--><![if !vml]><img border=0 width=393 height=317
src="./第三章.files/image032.jpg" v:shapes="_x0000_i1040"><![endif]><o:p></o:p></span></p>

<p class=MsoPlainText style='line-height:150%'><span lang=EN-US
style='font-size:10.0pt;mso-bidi-font-size:10.5pt'>&nbsp;&nbsp; 下面就不同类型的编辑操作定义函数w,它表示“代价(cost)”或“权重(weight)”。对字母表A中的任意字符a、b,定义:</span><span
lang=EN-US><o:p></o:p></span></p>

<p class=MsoPlainText align=center style='text-align:center;line-height:150%'><span
lang=EN-US><!--[if gte vml 1]><v:shape id="_x0000_i1041" type="#_x0000_t75"
 alt="" style='width:164.25pt;height:54.75pt'>
 <v:imagedata src="./第三章.files/image033.png" o:href="http://www.lmbe.seu.edu.cn/chenyuan/xsun/bioinfomatics/web/images/92.bmp"/>
</v:shape><![endif]--><![if !vml]><img border=0 width=219 height=73
src="./第三章.files/image034.jpg" v:shapes="_x0000_i1041"><![endif]><o:p></o:p></span></p>

<p class=MsoPlainText style='line-height:150%'><span lang=EN-US
style='font-size:10.0pt;mso-bidi-font-size:10.5pt'>&nbsp;&nbsp;&nbsp; 这是一种简单的代价定义,在实际应用中还需使用更复杂的代价模型。一方面,可以改变各编辑操作的代价值,例如,在蛋白质序列比较时,用理化性质相近的氨基酸进行替换的代价应该比完全不同的氨基酸替换代价小;另一方面,也可以使用得分(score)函数来评价编辑操作。下面给出一种基本的得分函数:</span><span
lang=EN-US><o:p></o:p></span></p>

<p class=MsoPlainText align=center style='text-align:center;line-height:150%'><span
lang=EN-US><!--[if gte vml 1]><v:shape id="_x0000_i1042" type="#_x0000_t75"
 alt="" style='width:168.75pt;height:60pt'>
 <v:imagedata src="./第三章.files/image035.png" o:href="http://www.lmbe.seu.edu.cn/chenyuan/xsun/bioinfomatics/web/images/91.bmp"/>
</v:shape><![endif]--><![if !vml]><img border=0 width=225 height=80
src="./第三章.files/image036.jpg" v:shapes="_x0000_i1042"><![endif]><o:p></o:p></span></p>

<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
text-indent:21.25pt'><span style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:
"Times New Roman";mso-hansi-font-family:"Times New Roman"'>在进行序列比对时,可根据实际情况选用代价函数或得分函数,即选用<span
style='color:blue'>(</span></span><span lang=EN-US style='font-size:10.0pt;
color:blue'>3-1</span><span style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:
"Times New Roman";mso-hansi-font-family:"Times New Roman";color:blue'>)式或(</span><span
lang=EN-US style='font-size:10.0pt;color:blue'>3-2</span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman";color:blue'>)式</span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>。</span><span lang=EN-US><o:p></o:p></span></p>

<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
text-indent:21.25pt'><span style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:
"Times New Roman";mso-hansi-font-family:"Times New Roman"'>下面给出在进行序列比对时常用的概念:</span><span
lang=EN-US><o:p></o:p></span></p>

<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
margin-left:39.1pt;text-indent:-1.0cm;tab-stops:list 37.1pt'><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>(</span><span lang=EN-US
style='font-size:10.0pt'>1</span><span style='font-size:10.0pt;font-family:
宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"'>)两条序列</span><span
lang=EN-US style='font-size:10.0pt'>s </span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>和</span><span lang=EN-US style='font-size:10.0pt'> t </span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>的比对的得分(或代价)等于将</span><span lang=EN-US
style='font-size:10.0pt'>s </span><span style='font-size:10.0pt;font-family:
宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"'>转化为</span><span
lang=EN-US style='font-size:10.0pt'>t </span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>所用的所有编辑操作的得分(或代价)总和;</span><span lang=EN-US><o:p></o:p></span></p>

<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
margin-left:36.95pt;text-indent:-26.25pt;tab-stops:list 37.1pt'><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>(</span><span lang=EN-US
style='font-size:10.0pt'>2</span><span style='font-size:10.0pt;font-family:
宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"'>)</span><span
lang=EN-US style='font-size:10.0pt'>s </span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>和</span><span lang=EN-US style='font-size:10.0pt'>t </span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>的最优比对是所有可能的比对中得分最高(或代价最小)的一个比对;</span><span
lang=EN-US><o:p></o:p></span></p>

<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
margin-left:36.95pt;text-indent:-26.25pt;tab-stops:list 37.1pt 47.5pt'><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>(</span><span lang=EN-US
style='font-size:10.0pt'>3</span><span style='font-size:10.0pt;font-family:
宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"'>)</span><span
lang=EN-US style='font-size:10.0pt'>s </span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>和</span><span lang=EN-US style='font-size:10.0pt'>t </span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>的真实距离应该是在得分函数</span><span lang=EN-US
style='font-size:10.0pt'>p</span><span style='font-size:10.0pt;font-family:
宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"'>值(或代价函数</span><span
lang=EN-US style='font-size:10.0pt'>w</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>值)最优时的距离。</span><span

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -