📄 第六章.htm
字号:
text-indent:21.25pt;line-height:150%'><span style='font-size:10.0pt;font-family:
宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"'>二叉树是一种特殊的树,每个节点最多有两个子节点。在有权值的树中,分支的长度(或权值)一般与分类单元之间的变化成正比,它是关于生物进化时间或者遗传距离的一种度量形式。一般假设存在一个分子钟,进化的速率恒定。</span><span
lang=EN-US><o:p></o:p></span></p>
<p class=MsoBodyText style='text-align:justify;text-justify:inter-ideograph;
text-indent:21.25pt'><span style='font-size:10.0pt;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>系统发生树具有以下性质:<O:P></span><span
style='font-size:10.0pt'> </O:P></span><span lang=EN-US><o:p></o:p></span></p>
<p class=MsoBodyText style='margin-left:26.4pt;text-align:justify;text-justify:
inter-ideograph;text-indent:-26.4pt;line-height:150%;tab-stops:list 26.4pt'><span
style='font-size:10.0pt;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>(</span><span lang=EN-US style='font-size:10.0pt;font-family:
"Times New Roman"'>1</span><span style='font-size:10.0pt;mso-ascii-font-family:
"Times New Roman";mso-hansi-font-family:"Times New Roman"'>)如果是一棵有根树,则树根代表在进化历史上是最早的、并且与其它所有分类单元都有联系的分类单元;<O:P></span><span
style='font-size:10.0pt'> </O:P></span><span lang=EN-US><o:p></o:p></span></p>
<p class=MsoBodyText style='margin-left:26.4pt;text-align:justify;text-justify:
inter-ideograph;text-indent:-26.4pt;line-height:150%;tab-stops:list 26.4pt'><span
style='font-size:10.0pt;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>(</span><span lang=EN-US style='font-size:10.0pt;font-family:
"Times New Roman"'>2</span><span style='font-size:10.0pt;mso-ascii-font-family:
"Times New Roman";mso-hansi-font-family:"Times New Roman"'>)如果找不到可以作为树根的单元,则系统发生树是无根树;<O:P></span><span
style='font-size:10.0pt'> </O:P></span><span lang=EN-US><o:p></o:p></span></p>
<p class=MsoBodyText style='margin-left:26.4pt;text-align:justify;text-justify:
inter-ideograph;text-indent:-26.4pt;line-height:150%;tab-stops:list 26.4pt'><span
style='font-size:10.0pt;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>(</span><span lang=EN-US style='font-size:10.0pt;font-family:
"Times New Roman"'>3</span><span style='font-size:10.0pt;mso-ascii-font-family:
"Times New Roman";mso-hansi-font-family:"Times New Roman"'>)从根节点出发,到任何一个节点的路径均指明进化时间或者进化距离。<O:P></span><span
style='font-size:10.0pt'> </O:P></span><span lang=EN-US><o:p></o:p></span></p>
<p class=MsoBodyText style='text-align:justify;text-justify:inter-ideograph;
text-indent:21.25pt;line-height:150%'><span style='font-size:10.0pt;mso-ascii-font-family:
"Times New Roman";mso-hansi-font-family:"Times New Roman";color:blue'>图</span><span
lang=EN-US style='font-size:10.0pt;color:blue'>6.1(a)</span><span
style='font-size:10.0pt;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>所示的是一棵有根树,而</span><span style='font-size:10.0pt;mso-ascii-font-family:
"Times New Roman";mso-hansi-font-family:"Times New Roman";color:blue'>图</span><span
lang=EN-US style='font-size:10.0pt;color:blue'>6.1(b)</span><span
style='font-size:10.0pt;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>显示的是一棵无根树,图中的</span><span lang=EN-US style='font-size:10.0pt'>A</span><span
style='font-size:10.0pt;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>、</span><span lang=EN-US style='font-size:10.0pt'>B</span><span
style='font-size:10.0pt;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>、</span><span lang=EN-US style='font-size:10.0pt'>C</span><span
style='font-size:10.0pt;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>、</span><span lang=EN-US style='font-size:10.0pt'>D</span><span
style='font-size:10.0pt;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>为所研究的分类单元。<O:P></span><span style='font-size:10.0pt'> </O:P></span><span
lang=EN-US><o:p></o:p></span></p>
<p class=MsoBodyTextIndent style='line-height:150%'><span style='font-size:
10.0pt;mso-ascii-font-family:"Times New Roman"'>对于给定的分类单元数,有很多棵可能的系统发生树,但是只有一棵树是正确的,分析的目标就是要寻找这棵正确的树。<O:P></span><span
style='font-family:"Times New Roman"'> </O:P></span><span lang=EN-US><o:p></o:p></span></p>
<p align=center style='text-align:center'><span lang=EN-US><!--[if gte vml 1]><v:shape
id="_x0000_i1030" type="#_x0000_t75" alt="" style='width:333pt;height:168pt'>
<v:imagedata src="./第六章.files/image007.png" o:href="http://www.lmbe.seu.edu.cn/chenyuan/xsun/bioinfomatics/web/images/p6.1.bmp"/>
</v:shape><![endif]--><![if !vml]><img width=444 height=224
src="./第六章.files/image008.jpg" border=0 v:shapes="_x0000_i1030"><![endif]><o:p></o:p></span></p>
<p style='line-height:150%'><span lang=EN-US style='font-size:10.5pt;
mso-bidi-font-size:10.0pt;font-family:"Times New Roman";mso-font-kerning:1.0pt'>
</span><span style='font-size:10.0pt;mso-font-kerning:1.0pt'>基于单个同源基因差异构建的系统发生树称为基因树(</span><span
lang=EN-US style='font-size:10.0pt;font-family:"Times New Roman";mso-font-kerning:
1.0pt'>gene tree</span><span style='font-size:10.0pt;mso-ascii-font-family:
"Times New Roman";mso-hansi-font-family:"Times New Roman";mso-font-kerning:
1.0pt'>),这比称作物种树(</span><span lang=EN-US style='font-size:10.0pt;font-family:
"Times New Roman";mso-font-kerning:1.0pt'>species tree</span><span
style='font-size:10.0pt;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman";mso-font-kerning:1.0pt'>)更为合理。因为这种树代表的仅仅是单个基因的进化历史,而不是它所在物种的进化历史。物种树一般最好是通过综合多个基因数据的分析结果而产生。基因树和物种树之间的差异是很重要的,例如,假设只用</span><span
lang=EN-US style='font-size:10.0pt;font-family:"Times New Roman";mso-font-kerning:
1.0pt'>HLA</span><span style='font-size:10.0pt;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman";mso-font-kerning:1.0pt'>的等位基因来构建物种树,许多人将与大猩猩分在一起,而不是和其他人分在一起。</span><span
lang=EN-US><o:p></o:p></span></p>
<p style='line-height:150%'><b><span lang=EN-US style='color:#EFCE8F'>6.1.3 距离和特征</span></b><span
lang=EN-US><o:p></o:p></span></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
text-indent:21.25pt;line-height:150%'><span style='font-size:10.0pt;font-family:
宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"'>用于构建系统发生树的分子数据分成两类:(</span><span
lang=EN-US style='font-size:10.0pt'>1</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>)距离(</span><span lang=EN-US style='font-size:10.0pt'>distances</span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>)数据,常用距离矩阵描述,表示两个数据集之间所有两两差异;(</span><span
lang=EN-US style='font-size:10.0pt'>2</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>)特征</span><span lang=EN-US style='font-size:10.0pt'>(characters)</span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>数据,表示分子所具有的特征。</span><span lang=EN-US><o:p></o:p></span></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
text-indent:21.25pt;line-height:150%'><span style='font-size:10.0pt;font-family:
宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"'>分子系统发生分析的目的是探讨物种之间的进化关系,其分析的对象往往是一组同源的序列。这些序列取自于不同生物基因组的共同位点。序列比对是进行同源分析的一种基本手段,是进行系统发生分析的基础,一般采用基于两两比对渐进的多重序列比对方法,如</span><span
lang=EN-US style='font-size:10.0pt'>ClustalW</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>程序。通过序列的比对,可以分析序列之间的差异,计算序列之间的距离。</span><span lang=EN-US><o:p></o:p></span></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
text-indent:21.25pt;line-height:150%'><span style='font-size:10.0pt;font-family:
宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"'>无论是</span><span
lang=EN-US style='font-size:10.0pt'>DNA</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>序列,还是蛋白质序列,都是由特定字母表中的字符组成的。计算序列之间距离的一个前提条件是要有一个字符替换模型,替换模型影响序列多重比对的结果,影响系统发生树的构造结果。在具体的分析过程中,需要选择一个合理的字符替换模型,参见第</span><span
lang=EN-US style='font-size:10.0pt'>3</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>章的各种打分模型或代价、距离模型。</span><span lang=EN-US><o:p></o:p></span></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
text-indent:21.25pt;line-height:150%'><span style='font-size:10.0pt;font-family:
宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"'>距离(或者相似度)是反映序列之间关系的一种度量,是建立系统发生树时所常用的一类数据。在计算距离之前,首先进行序列比对,然后累加每个比对位置的得分。可以应用第</span><span
lang=EN-US style='font-size:10.0pt'>3</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>章介绍的关于序列比较方法,直接计算序列之间的距离。如果在进行序列比较时使用的是打分函数或相似性度量函数,则需要将相似度(或者得分)转换成距离。令</span><span
lang=EN-US style='font-size:10.0pt'>S(i,j)</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>是序列</span><code><span lang=EN-US style='mso-ansi-font-size:
10.5pt;mso-ascii-font-family:"Times New Roman";mso-fareast-font-family:宋体;
mso-hansi-font-family:"Times New Roman"'>i</span></code><span style='font-size:
10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>和序列</span><code><span lang=EN-US style='mso-ansi-font-size:
10.5pt;mso-ascii-font-family:"Times New Roman";mso-fareast-font-family:宋体;
mso-hansi-font-family:"Times New Roman"'>j</span></code><code><span
style='mso-ansi-font-size:10.5pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>各个比对位置得分的加权和,一种归一化的距离计算公式为:<O:P></span></code><code><span
style='mso-ansi-font-size:10.5pt;mso-ascii-font-family:"Times New Roman";
mso-fareast-font-family:宋体;mso-hansi-font-family:"Times New Roman"'> </O:P></span></code><span
lang=EN-US><o:p></o:p></span></p>
<p align=center style='text-align:center;line-height:150%'><span lang=EN-US><!--[if gte vml 1]><v:shape
id="_x0000_i1031" type="#_x0000_t75" alt="" style='width:371.25pt;height:35.25pt'>
<v:imagedata src="./第六章.files/image009.png" o:href="http://www.lmbe.seu.edu.cn/chenyuan/xsun/bioinfomatics/web/images/6-1.bmp"/>
</v:shape><![endif]--><![if !vml]><img width=495 height=47
src="./第六章.files/image010.jpg" border=0 v:shapes="_x0000_i1031"><![endif]><o:p></o:p></span></p>
<p style='line-height:150%'><span style='font-size:10.5pt;mso-bidi-font-size:
10.0pt;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman";
mso-font-kerning:1.0pt'>其中,</span><span lang=EN-US style='font-size:10.5pt;
mso-bidi-font-size:10.0pt;font-family:"Times New Roman";mso-font-kerning:1.0pt'>S<sub>r</sub>(i,j)</span><span
style='font-size:10.5pt;mso-bidi-font-size:10.0pt;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman";mso-font-kerning:1.0pt'>是序列</span><code><span
lang=EN-US style='font-size:10.5pt;mso-bidi-font-size:10.0pt;font-family:"Times New Roman";
mso-fareast-font-family:宋体;mso-bidi-font-family:"Courier New";mso-font-kerning:
1.0pt'>i</span></code><code><span style='font-size:10.5pt;mso-bidi-font-size:
10.0pt;mso-ascii-font-family:"Times New Roman";mso-fareast-font-family:宋体;
mso-hansi-font-family:"Times New Roman";mso-font-kerning:1.0pt'>和</span></code><code><span
lang=EN-US style='font-size:10.5pt;mso-bidi-font-size:10.0pt;font-family:"Times New Roman";
mso-fareast-font-family:宋体;mso-bidi-font-family:"Courier New";mso-font-kerning:
1.0pt'>j</span></code><span style='font-size:10.5pt;mso-bidi-font-size:10.0pt;
mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman";
mso-font-kerning:1.0pt'>随机化之后的比对得分</span><code><span style='font-size:10.5pt;
mso-bidi-font-size:10.0pt;mso-ascii-font-family:"Times New Roman";mso-fareast-font-family:
宋体;mso-hansi-font-family:"Times New Roman";mso-font-kerning:1.0pt'>的加权和,</span></code><span
lang=EN-US style='font-size:10.5pt;mso-bidi-font-size:10.0pt;font-family:"Times New Roman";
mso-font-kerning:1.0pt'>S<sub>max</sub>(i,j)</span><span style='font-size:10.5pt;
mso-bidi-font-size:10.0pt;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman";mso-font-kerning:1.0pt'>是两条序列所有可能的比对的最大值(当两条序列相同时,取最大值)。两个序列归一化距离的值处于</span><span
lang=EN-US style='font-size:10.5pt;mso-bidi-font-size:10.0pt;font-family:"Times New Roman";
mso-font-kerning:1.0pt'>0</span><span style='font-size:10.5pt;mso-bidi-font-size:
10.0pt;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman";
mso-font-kerning:1.0pt'>和</span><span lang=EN-US style='font-size:10.5pt;
mso-bidi-font-size:10.0pt;font-family:"Times New Roman";mso-font-kerning:1.0pt'>1</span><span
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -