📄 第四章.htm
字号:
mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"'>、</span><span
lang=EN-US style='font-size:10.0pt'>T</span><span style='font-size:10.0pt;
mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"'>、</span><span
lang=EN-US style='font-size:10.0pt'>C</span><span style='font-size:10.0pt;
mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"'>、</span><span
lang=EN-US style='font-size:10.0pt'>G</span><span style='font-size:10.0pt;
mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"'>;说它复杂,是因为这本天书包括所有人类生长发育的信息,隐含人类生老病死的规律。对于基因组序列,我们最关心的就是从序列之中找到基因及其表达调控信息。可以通过识别特殊的序列功能位点、分析序列的组成特征来识别基因,发现与基因表达调控相关的信息。</span></p>
<h2 align=center style='text-align:center'><span lang=EN-US style='font-family:
隶书;color:#EFCE8F'>5.1 关于遗传语言 </span></h2>
<h3><span lang=EN-US style='font-size:12.0pt;color:#EFCE8F'>5.1.1 基因组DNA的奥秘</span></h3>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
text-indent:20.0pt;mso-char-indent-count:2.0;mso-char-indent-size:10.0pt;
line-height:150%'><span style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:
"Times New Roman";mso-hansi-font-family:"Times New Roman"'>生命是大自然最伟大的创造物,经过亿万年的进化,生命的形式从简单的有机物发展到现在高度复杂但有序的生物系统。蛋白质是构造生命机器的基本元件,大量结构不同、功能各异的蛋白质在遗传信息的控制之下,被不断地合成出来,并有机地组成复杂的生物体。遗传信息存贮在基因组中,具体说就是存贮在由</span><span
lang=EN-US style='font-size:10.0pt'>4</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>种字符组成的核酸序列中。随着分子生物学中心法则的确立,人们逐渐认识到,遗传信息的载体主要是</span><span
lang=EN-US style='font-size:10.0pt'>DNA</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>(在少数情况下</span><span lang=EN-US style='font-size:10.0pt'>RNA</span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>也充当遗传信息载体),控制生物体性状的基因则是一系列</span><span
lang=EN-US style='font-size:10.0pt'>DNA</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>片段。一方面,</span><span lang=EN-US style='font-size:10.0pt'>DNA</span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>通过自我复制,在生物体的繁衍过程中传递遗传信息;另一方面,基因通过转录和翻译,使遗传信息在生物个体中得以表达,并使后代表现出与亲代相似的生物性状。在基因表达过程中,基因上的遗传信息首先通过转录从</span><span
lang=EN-US style='font-size:10.0pt'>DNA</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>传到</span><span lang=EN-US style='font-size:10.0pt'>RNA</span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>,然后再通过翻译从</span><span lang=EN-US
style='font-size:10.0pt'>RNA</span><span style='font-size:10.0pt;font-family:
宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"'>传递到蛋白质。基因控制着蛋白质的合成,基因的</span><span
lang=EN-US style='font-size:10.0pt'>DNA</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>序列到蛋白质序列存在着一种明确的对应关系,而这种对应关系就是我们所知道的遗传密码。</span><span
lang=EN-US style='font-size:10.0pt'>1961</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>年,科学家</span><span lang=EN-US style='font-size:10.0pt'>Nirenberg</span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>使用信使</span><span lang=EN-US
style='font-size:10.0pt'>RNA</span><span style='font-size:10.0pt;font-family:
宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"'>分子研究得到第一个遗传密码,</span><span
lang=EN-US style='font-size:10.0pt'>1969</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>年确定全部的遗传密码。遗传密码的发现拉开了在分子水平上的生命信息科学研究的序幕,启动了人类探索遗传语言奥秘的进程。许多科学家认为,基因组</span><span
lang=EN-US style='font-size:10.0pt'>DNA</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>序列并非是一种简单的生物分子序列,而可能是一种语言,该语言描述遗传信息,控制生物体的性状,规定生物个体的生老病死。为了深刻揭示这种遗传语言的奥秘,科学家们开始测序人类及其它模式生物基因组,希望解读和破译遗传信息,使人类在分子水平上全面地认识自我。由于生物技术的高速发展,人类基因组计划已经提前至</span><span
lang=EN-US style='font-size:10.0pt'>2003</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>年全部完成,我们已经得到了关于人类遗传信息的长达数百万页的“天书”。这本天书就是用遗传语言书写的人类遗传蓝本,是解读遗传语言的基础。之所以称它为天书,不单是因为它所包含的信息量巨大,更重要的是目前人类对它了解甚少,还无法读懂它。天书中只有</span><span
lang=EN-US style='font-size:10.0pt'>4</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>个字符(碱基</span><span lang=EN-US style='font-size:10.0pt'>A</span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>、</span><span lang=EN-US
style='font-size:10.0pt'>T</span><span style='font-size:10.0pt;font-family:
宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"'>、</span><span
lang=EN-US style='font-size:10.0pt'>G</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>、</span><span lang=EN-US style='font-size:10.0pt'>C</span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>),既没有段落,也没有标点符号,是一个长度为</span><span
lang=EN-US style='font-size:10.0pt'>3</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>×</span><span lang=EN-US style='font-size:10.0pt'>10<sup>9</sup></span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>的一维序列。迄今为止,科学家对这本天书了解最多的部分就是遗传密码,或者说掌握了</span><span
lang=EN-US style='font-size:10.0pt'>DNA</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>翻译成蛋白质的编码规律。遗传密码又称为三联体密码,它说明</span><span lang=EN-US
style='font-size:10.0pt'>DNA</span><span style='font-size:10.0pt;font-family:
宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"'>序列三个连续的碱基为一个蛋白质的氨基酸编码。已知自然界中的蛋白质由</span><span
lang=EN-US style='font-size:10.0pt'>20</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>种不同的氨基酸所组成,究竟需要几个连续的碱基为</span><span lang=EN-US
style='font-size:10.0pt'>1</span><span style='font-size:10.0pt;font-family:
宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"'>个氨基酸编码呢?显然,</span><span
lang=EN-US style='font-size:10.0pt'>1</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>个碱基不行,</span><span lang=EN-US style='font-size:10.0pt'>1</span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>个碱基最多只能编码</span><span lang=EN-US
style='font-size:10.0pt'>4</span><span style='font-size:10.0pt;font-family:
宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"'>种氨基酸。那么,</span><span
lang=EN-US style='font-size:10.0pt'>2</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>个碱基是否能够完成编码任务呢?</span><span lang=EN-US style='font-size:
10.0pt'>2</span><span style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:
"Times New Roman";mso-hansi-font-family:"Times New Roman"'>个碱基最多能产生</span><span
lang=EN-US style='font-size:10.0pt'>16</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>(</span><span lang=EN-US style='font-size:10.0pt'>4<sup>2</sup></span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>)密码子,只能为</span><span lang=EN-US
style='font-size:10.0pt'>16</span><span style='font-size:10.0pt;font-family:
宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"'>种氨基酸编码,也不行。而</span><span
lang=EN-US style='font-size:10.0pt'>3</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>个连续碱基可能形成的密码子共有</span><span lang=EN-US style='font-size:
10.0pt'>64</span><span style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:
"Times New Roman";mso-hansi-font-family:"Times New Roman"'>(</span><span
lang=EN-US style='font-size:10.0pt'>4<sup>3</sup></span><span style='font-size:
10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>)个,完全满足编码的需要,所以遗传密码是三联体密码。由于三联体密码的密码子数目大于氨基酸种类数目,所以,对于一种氨基酸,可能存在多个密码子,同义密码子一般在第三位发生变化。例如</span><span
lang=EN-US style='font-size:10.0pt'>UCU</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>、</span><span lang=EN-US style='font-size:10.0pt'>UCC</span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>、</span><span lang=EN-US
style='font-size:10.0pt'>UCA</span><span style='font-size:10.0pt;font-family:
宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"'>和</span><span
lang=EN-US style='font-size:10.0pt'>UCG</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>均为丝氨酸的密码子。显然,这种编码方式具有一定的容错性,一位密码发生错误可能不会对蛋白质翻译结果产生影响。假设丝氨酸密码子的最后一位发生变化,其变化结果仍然是同义密码子,对信息传递影响不大。遗传密码具有通用性,在生物界除了线粒体等细胞质基因外,密码子几乎是通用的,因而,可以说生物界中的遗传语言也是通用的。密码子的使用是非随机的。如果密码子的第一、第二位碱基分别是</span><span
lang=EN-US style='font-size:10.0pt'>A</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>和</span><span lang=EN-US style='font-size:10.0pt'>U</span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>,那么,第三位将尽可能使用</span><span lang=EN-US
style='font-size:10.0pt'>G</span><span style='font-size:10.0pt;font-family:
宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"'>或</span><span
lang=EN-US style='font-size:10.0pt'>C</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>;反之亦然。由于</span><span lang=EN-US style='font-size:10.0pt'>G</span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>、</span><span lang=EN-US
style='font-size:10.0pt'>C</span><span style='font-size:10.0pt;font-family:
宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"'>之间可以形成三对氢键,而</span><span
lang=EN-US style='font-size:10.0pt'>A</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>、</span><span lang=EN-US style='font-size:10.0pt'>U</span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>之间只能形成两对氢键,因此,如果三位都用</span><span
lang=EN-US style='font-size:10.0pt'>G</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>、</span><span lang=EN-US style='font-size:10.0pt'>C</span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>,则配对容易,分解难;三位都用</span><span
lang=EN-US style='font-size:10.0pt'>A</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>、</span><span lang=EN-US style='font-size:10.0pt'>U</span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>,则相反。一般地说,高表达的基因,要求翻译速度快,密码子和反密码子配对快、分手也快。密码子的第一位和第二位极少有选择的余地,所以,只能在第三位进行取舍。密码子的使用具有一定的统计规律。基因对同义密码子的使用存在着偏爱,但不同种属偏爱的密码子不同,并且偏爱程度也不同。特别的是,根据统计,在人类基因组中,密码子第三位取</span><span
lang=EN-US style='font-size:10.0pt'>A</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>、</span><span lang=EN-US style='font-size:10.0pt'>U</span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>的情况占</span><span lang=EN-US
style='font-size:10.0pt'>90%</span><span style='font-size:10.0pt;font-family:
宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"'>,而第三位取</span><span
lang=EN-US style='font-size:10.0pt'>G</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>、</span><span lang=EN-US style='font-size:10.0pt'>C</span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>仅占</span><span lang=EN-US
style='font-size:10.0pt'>10%</span><span style='font-size:10.0pt;font-family:
宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"'>。密码子中三个碱基所处的位置,与它所编码的氨基酸性质存在着某种联系。例如,如果密码子的第一位是</span><span
lang=EN-US style='font-size:10.0pt'>U</span><span style='font-size:10.0pt;
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -