📄 第六章.htm
字号:
style='font-size:10.5pt;mso-bidi-font-size:10.0pt;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman";mso-font-kerning:1.0pt'>之间,当两个序列完全一致时,距离为</span><span
lang=EN-US style='font-size:10.5pt;mso-bidi-font-size:10.0pt;font-family:"Times New Roman";
mso-font-kerning:1.0pt'>0</span><span style='font-size:10.5pt;mso-bidi-font-size:
10.0pt;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman";
mso-font-kerning:1.0pt'>;当两个序列差异很大时,距离接近于</span><span lang=EN-US
style='font-size:10.5pt;mso-bidi-font-size:10.0pt;font-family:"Times New Roman";
mso-font-kerning:1.0pt'>1</span><span style='font-size:10.5pt;mso-bidi-font-size:
10.0pt;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman";
mso-font-kerning:1.0pt'>。如果在上式中令</span><span lang=EN-US style='font-size:10.5pt;
mso-bidi-font-size:10.0pt;font-family:"Times New Roman";mso-font-kerning:1.0pt'>S<sub>r</sub>(i,j)=0</span><span
style='font-size:10.5pt;mso-bidi-font-size:10.0pt;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman";mso-font-kerning:1.0pt'>,则计算公式变为:</span><span
lang=EN-US><o:p></o:p></span></p>
<p align=center style='text-align:center;line-height:150%'><span lang=EN-US><!--[if gte vml 1]><v:shape
id="_x0000_i1032" type="#_x0000_t75" alt="" style='width:367.5pt;height:36pt'>
<v:imagedata src="./第六章.files/image011.png" o:href="http://www.lmbe.seu.edu.cn/chenyuan/xsun/bioinfomatics/web/images/6-2.bmp"/>
</v:shape><![endif]--><![if !vml]><img width=490 height=48
src="./第六章.files/image012.jpg" border=0 v:shapes="_x0000_i1032"><![endif]><o:p></o:p></span></p>
<p style='line-height:150%'><span style='font-size:10.5pt;mso-bidi-font-size:
10.0pt;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman";
mso-font-kerning:1.0pt'>为了适合于处理相似性较小的序列,可以进一步修改距离计算公式:</span><span lang=EN-US><o:p></o:p></span></p>
<p align=center style='text-align:center;line-height:150%'><span lang=EN-US><!--[if gte vml 1]><v:shape
id="_x0000_i1033" type="#_x0000_t75" alt="" style='width:369pt;height:33.75pt'>
<v:imagedata src="./第六章.files/image013.png" o:href="http://www.lmbe.seu.edu.cn/chenyuan/xsun/bioinfomatics/web/images/6-3.bmp"/>
</v:shape><![endif]--><![if !vml]><img width=492 height=45
src="./第六章.files/image014.jpg" border=0 v:shapes="_x0000_i1033"><![endif]><o:p></o:p></span></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
line-height:150%'><span style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:
"Times New Roman";mso-hansi-font-family:"Times New Roman"'>序列比对得分的加权和可以根据常用的打分矩阵获得,如果待处理的序列是蛋白质,则用</span><span
lang=EN-US style='font-size:10.0pt'>PAM</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>矩阵、</span><span lang=EN-US style='font-size:10.0pt'>BLOSUM</span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>矩阵等;如果待处理的序列是</span><span lang=EN-US
style='font-size:10.0pt'>DNA</span><span style='font-size:10.0pt;font-family:
宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"'>或者</span><span
lang=EN-US style='font-size:10.0pt'>RNA</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>,则用等价矩阵、核苷酸</span><span style='font-size:10.0pt;mso-bidi-font-size:
10.5pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>转换</span><span lang=EN-US style='font-size:10.0pt;
mso-bidi-font-size:10.5pt'>-</span><span style='font-size:10.0pt;mso-bidi-font-size:
10.5pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>颠换矩阵</span><span style='font-size:10.0pt;font-family:宋体;
mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"'>或者其它具有非对称置换频率的矩阵。</span><span
lang=EN-US><o:p></o:p></span></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
text-indent:21.25pt;line-height:150%'><span style='font-size:10.0pt;font-family:
宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"'>距离是系统发生分析时所使用的一类数据,另一类数据就是所谓的离散特征数据。离散特征数据可分为二态特征与多态特征。二态的离散特征只有</span><span
lang=EN-US style='font-size:10.0pt'>2</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>种可能的状况,即具有与不具有某种特征,通常用“</span><span lang=EN-US
style='font-size:10.0pt'>0</span><span style='font-size:10.0pt;font-family:
宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"'>”或“</span><span
lang=EN-US style='font-size:10.0pt'>1</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>”表示。例如,</span><span lang=EN-US style='font-size:10.0pt'>DNA</span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>序列上的某个位置如果是剪切位点,其特征值为</span><span
lang=EN-US style='font-size:10.0pt'>1</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>,否则为</span><span lang=EN-US style='font-size:10.0pt'>0</span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>。多态离散特征具有两种以上可能的状态,如核酸的序列信息,对序列中某一位置来说,其可能的碱基有A、T、G、C共</span><span
lang=EN-US style='font-size:10.0pt'>4</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>种。可以将特征数据转换为距离数据。如果建立所有可能状态之间相似性的度量,特征数据就很容易被转换成距离数据。</span><span
lang=EN-US><o:p></o:p></span></p>
<p style='line-height:150%'><b><span lang=EN-US style='color:#EFCE8F'>6.1.4 分子系统发生分析过程</span></b><span
lang=EN-US><o:p></o:p></span></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
text-indent:21.25pt;line-height:150%'><span style='font-size:10.0pt;mso-bidi-font-size:
10.5pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>分子系统</span><span style='font-size:10.0pt;font-family:宋体;
mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"'>发生</span><span
style='font-size:10.0pt;mso-bidi-font-size:10.5pt;font-family:宋体;mso-ascii-font-family:
"Times New Roman";mso-hansi-font-family:"Times New Roman"'>分析主要分成三个步骤:(</span><span
lang=EN-US style='font-size:10.0pt;mso-bidi-font-size:10.5pt'>1</span><span
style='font-size:10.0pt;mso-bidi-font-size:10.5pt;font-family:宋体;mso-ascii-font-family:
"Times New Roman";mso-hansi-font-family:"Times New Roman"'>)分子序列或特征数据的分析;(</span><span
lang=EN-US style='font-size:10.0pt;mso-bidi-font-size:10.5pt'>2</span><span
style='font-size:10.0pt;mso-bidi-font-size:10.5pt;font-family:宋体;mso-ascii-font-family:
"Times New Roman";mso-hansi-font-family:"Times New Roman"'>)系统发生树的构造;(</span><span
lang=EN-US style='font-size:10.0pt;mso-bidi-font-size:10.5pt'>3</span><span
style='font-size:10.0pt;mso-bidi-font-size:10.5pt;font-family:宋体;mso-ascii-font-family:
"Times New Roman";mso-hansi-font-family:"Times New Roman"'>)结果的检验。其中,第一步的作用是通过分析,产生距离或特征数据,为建立系统发生树提供依据。</span><span
lang=EN-US><o:p></o:p></span></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
text-indent:21.25pt;line-height:150%'><span style='font-size:10.0pt;font-family:
宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"'>系统发生树的构建方法很多种。根据所处理数据的类型,可以将系统发生树的构建方法大体上分为两大类。一类是基于距离的构建方法,利用所有物种或分类单元间的进化距离,依据一定的原则及算法构建系统发生树。基本思路是列出所有可能的序列对,计算序列之间的遗传距离,选出相似程度比较大或非常相关的序列对,利用遗传距离预测进化关系。这类方法有非加权分组平均法(</span><span
lang=EN-US style='font-size:10.0pt'>unweighted pair group method with
arithmetic means</span><span style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:
"Times New Roman";mso-hansi-font-family:"Times New Roman"'>)、邻近归并法(</span><span
lang=EN-US style='font-size:10.0pt'>neighbor joining method</span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>)、</span><span lang=EN-US
style='font-size:10.0pt'>Fitch-Margoliash</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>法、最小进化方法(</span><span lang=EN-US style='font-size:10.0pt'>minimum
evolution</span><span style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:
"Times New Roman";mso-hansi-font-family:"Times New Roman"'>)等。另一类方法是基于离散特征的构建方法,利用的是具有离散特征状态的数据,如</span><span
lang=EN-US style='font-size:10.0pt'>DNA</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>序列中的特定位点的核苷酸。建树时,着重分析分类单位或序列间每个特征(如核苷酸位点)的进化关系等。属于这一类的方法有最大简约法(</span><span
lang=EN-US style='font-size:10.0pt'>maximum parsimony method</span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>)、最大似然法(</span><span lang=EN-US
style='font-size:10.0pt'>maximum likelihood method</span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>)、进化简约法(</span><span lang=EN-US
style='font-size:10.0pt'>evolutionary parsimony method</span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>)、相容性方法(</span><span lang=EN-US
style='font-size:10.0pt'>compatibility</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>)等。对相似性和距离数据,在重建系统发生树时只能利用距离法。离散特征数据通过适当的方法可转换成距离数据,因此,对于这类数据在重建系统发生树时,既可以用距离法,亦可以采用离散特征法。</span><span
lang=EN-US><o:p></o:p></span></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
text-indent:21.25pt;line-height:150%'><span style='font-size:10.0pt;font-family:
宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"'>根据建树算法在执行过程中采用的搜索方式,系统发生树的构建方法也可以分为三类。第一类是穷尽搜索方法,即产生所有可能的树,然后根据评价标准选择一棵最优的树。需要注意的是,系统发生树可能的个数随序列的个数急剧增加。假设要为</span><span
lang=EN-US style='font-size:10.0pt'>n</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>个分类单元建立系统发生树,则可能的有根树个数(</span><span lang=EN-US
style='font-size:10.0pt'>N<sub>R</sub></span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>)和无根系统发生树个数(</span><span lang=EN-US style='font-size:10.0pt'>N<sub>U</sub></span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>)可用下面的算式计算得到:</span><span lang=EN-US><o:p></o:p></span></p>
<p class=MsoNormal align=center style='mso-margin-top-alt:auto;mso-margin-bottom-alt:
auto;text-align:center;text-indent:21.25pt;line-height:150%'><span lang=EN-US><!--[if gte vml 1]><v:shape
id="_x0000_i1034" type="#_x0000_t75" alt="" style='width:386.25pt;height:66pt'>
<v:imagedata src="./第六章.files/image015.png" o:href="http://www.lmbe.seu.edu.cn/chenyuan/xsun/bioinfomatics/web/images/6-4&6-5.bmp"/>
</v:shape><![endif]--><![if !vml]><img width=515 height=88
src="./第六章.files/image016.jpg" border=0 v:shapes="_x0000_i1034"><![endif]><o:p></o:p></span></p>
<p class=MsoNormal align=left style='mso-margin-top-alt:auto;mso-margin-bottom-alt:
auto;text-align:left;line-height:150%'><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>可以看到,随着</span><span lang=EN-US style='font-size:10.0pt'>n</span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>的增加,可能的有根系统发生树和无根系统发生树的数目迅速增加。<span
style='color:blue'>表</span></span><span lang=EN-US style='font-size:10.0pt;
color:blue'>6.1</span><span style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:
"Times New Roman";mso-hansi-font-family:"Times New Roman"'>中列出了一些</span><span
lang=EN-US style='font-size:10.0pt'>n</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>值,以及对应的有根树和无根树的数目。当</span><span lang=EN-US style='font-size:
10.0pt'>n</span><span style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:
"Times New Roman";mso-hansi-font-family:"Times New Roman"'>大于等于</span><span
lang=EN-US style='font-size:10.0pt'>15</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>时,可能的系统发生树数目变得非常惊人,但是只有其中的一棵树代表了待分析的基因或者物种之间的真实进化关系,我们的目的就是找出这棵反映真实进化关系的树。</span><span
style='font-size:10.0pt'><O:P> </span><span lang=EN-US><o:p></o:p></span></p>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -