📄 第五章.htm
字号:
"Times New Roman"'>万多个基因中的一部分。那么如何在“天书”中找到其它的基因呢?一种方法是通过分子生物学实验确定基因的位置和序列,另一种方法就是通过信息分析寻找基因。科学家已经发现在基因的前后两端存在一些特殊的信号,基因的蛋白质编码区域与非编码区域在序列的统计特征上有明显的差异,因此,可以用数学方法、人工智能的模式识别方法或神经网络方法识别</span><span
lang=EN-US style='font-size:10.0pt'>DNA</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>序列上与基因相关的信号,区分统计特性,从而识别基因。</span><span lang=EN-US><o:p></o:p></span></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
text-indent:20.0pt;mso-char-indent-count:2.0;mso-char-indent-size:10.0pt;
line-height:150%;mso-char-indent-size:10.5pt'><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>虽然我们已经了解基因的结构,掌握了遗传密码,但是相对于庞大的基因组,我们了解得还很少。就人类基因组而言,编码区域在人类基因组所占的比例不超过</span><span
lang=EN-US style='font-size:10.0pt'>3%</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>。其余</span><span lang=EN-US style='font-size:10.0pt'>97%</span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>是非编码序列。对于非编码序列,人们了解得比较少,尚不清楚其含义或功能。然而,非编码区域对于生命活动具有重要的意义。这部分序列主要包括内含子、简单重复序列、移动元件(</span><span
lang=EN-US style='font-size:10.0pt'>mobile element</span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>)及其遗留物、伪基因(</span><span lang=EN-US
style='font-size:10.0pt'>pseudo gene</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>)等。卫星(</span><span lang=EN-US style='font-size:10.0pt'>satellite</span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>)</span><span lang=EN-US
style='font-size:10.0pt'>DNA</span><span style='font-size:10.0pt;font-family:
宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"'>、小卫星(</span><span
lang=EN-US style='font-size:10.0pt'>mini-satellite</span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>)</span><span lang=EN-US
style='font-size:10.0pt'>DNA</span><span style='font-size:10.0pt;font-family:
宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"'>、微卫星(</span><span
lang=EN-US style='font-size:10.0pt'>micro-satellite</span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>)</span><span lang=EN-US
style='font-size:10.0pt'>DNA</span><span style='font-size:10.0pt;font-family:
宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"'>就是一些典型的重复序列。移动元件有:以</span><span
lang=EN-US style='font-size:10.0pt'>DNA</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>为基础的移动元件(</span><span lang=EN-US style='font-size:10.0pt'>DNA
based transposable element</span><span style='font-size:10.0pt;font-family:
宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"'>)、自主的逆转录转座子(</span><span
lang=EN-US style='font-size:10.0pt'>autonomous retrotransposon</span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>)、非自主的逆转录转座子(</span><span lang=EN-US
style='font-size:10.0pt'>non autonomous retrotransposon</span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>)等。科学家们通过分析人类基因组,发现四种主要的重复元件覆盖了</span><span
lang=EN-US style='font-size:10.0pt'>43%</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>的人类基因组,这四种重复元件包括短散布序列</span><span lang=EN-US
style='font-size:10.0pt'>(SINEs)</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>、长散布序列</span><span lang=EN-US style='font-size:10.0pt'>(LINEs)</span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>、长末端重复元件</span><span lang=EN-US
style='font-size:10.0pt'>(LTR elements)</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>以及</span><span lang=EN-US style='font-size:10.0pt'>DNA</span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>转座子。除此之外,在基因组序列中还有各种顺式转录调控元件,如启动子、增强子、沉默子等,也都属于非编码序列。</span><span
style='font-size:10.0pt'> <O:P></O:P></span><span lang=EN-US><o:p></o:p></span></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
text-indent:20.0pt;mso-char-indent-count:2.0;mso-char-indent-size:10.0pt;
line-height:150%;mso-char-indent-size:10.5pt'><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>我们知道基因组有</span><span lang=EN-US style='font-size:10.0pt'>GC</span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>碱基含量相对较高的区域和</span><span lang=EN-US
style='font-size:10.0pt'>AT</span><span style='font-size:10.0pt;font-family:
宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"'>含量较高的区域,然而,是什么使得基因组中</span><span
lang=EN-US style='font-size:10.0pt'>GC/AT</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>比值不调和仍然是一个未被解答的问题。我们所了解的事实是:在基因组中富含</span><span lang=EN-US
style='font-size:10.0pt'>GC</span><span style='font-size:10.0pt;font-family:
宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"'>碱基的区域,其基因密度较大且内含子的平均</span><span
class=style11><span style='font-size:10.0pt;font-family:宋体'>尺寸</span></span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>较小。</span><span lang=EN-US><o:p></o:p></span></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
text-indent:20.0pt;mso-char-indent-count:2.0;mso-char-indent-size:10.0pt;
line-height:150%;mso-char-indent-size:10.5pt'><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>虽然对</span><span lang=EN-US style='font-size:10.0pt'>97%</span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>的非编码区的含义和作用人们还不清楚,但是,从生物进化的观点来看,这部分序列肯定具有重要的生物学功能。人类是大自然完美的创造物,难以想象在人类基因组中存在那么多无用的东西。目前对非编码区普遍的认识是,它们与基因在四维时空的表达调控有关,即控制各个基因在什么时间、在生物体的什么部位表达。基因的表达调控必定存在着一套严格的规律,这些规律有待我们去探索、发现。我们确实也了解一小部分非编码区域,如,与基因转录和翻译有关的调控区,像基因的启动子、增强子等。</span><span
lang=EN-US><o:p></o:p></span></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
text-indent:20.0pt;mso-char-indent-count:2.0;mso-char-indent-size:10.0pt;
line-height:150%;mso-char-indent-size:10.5pt'><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>内含子自</span><span lang=EN-US style='font-size:10.0pt'>1977</span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>年被发现以来,逐渐被明确地定义为:基因中间插着的若干段序列,在</span><span
lang=EN-US style='font-size:10.0pt'>RNA</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>转录物水平上经剪接除去,不参与该基因在蛋白质水平上的表达。那么,内含子是如何来的?内含子的存在究竟有何意义?它担负着什么样的功能?内含子又何以能在一些真核生物中非常广泛地分布呢?关于内含子起源的问题,还没有确定的说法。一直有两种假说。一种假说认为,内含子与它所在的基因一样古老,在装配第一个这样的基因时,内含子就已存在。早期的内含子具有自催化、自我复制等能力,因此,它们是原始基因和基因组的组织与复制必不可少的部分。而今天的原核生物和少数低等的真核生物,由于它们需要进行快速的</span><span
lang=EN-US style='font-size:10.0pt'>DNA</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>复制从而进行快速的细胞分裂,因而失去了内含子。现代的内含子是一类进化遗迹,它们之所以能继续存在,是因为具有重新组合基因组中的外显子以形成新的基因的能力,即内含子能赋予其携带者更大的进化潜力。另一种假说认为,内含子不是基因原有的,而是在进化的某一过程中通过转座作用插入到连续基因中去的,内含子在较高级的功能基因或在真核生物出现之后才产生。这种假说必须面对一个难题,即内含子最初如何能插入到连续编码的基因中而保持基因的功能不变?</span><span
lang=EN-US><o:p></o:p></span></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
text-indent:21.1pt;mso-char-indent-count:2.0;mso-char-indent-size:10.55pt;
line-height:150%;mso-char-indent-size:10.5pt'><b><span lang=EN-US
style='color:#EFCE8F'>5.1.2 </span></b><b><span style='font-family:宋体;
mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman";
color:#EFCE8F'>探索遗传语言</span></b><span lang=EN-US><o:p></o:p></span></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;
text-indent:20.0pt;mso-char-indent-count:2.0;mso-char-indent-size:10.0pt;
line-height:150%;mso-char-indent-size:10.5pt'><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>对于遗传语言,可以用语言学的方法进行研究,以发现遗传语言的规律。人类已经成功地使用了两种语言,一种是人类进行感情和信息交流的自然语言,它是随着人类社会文明发展而不断发展丰富的;另一种语言是计算机高级程序语言,如</span><span
lang=EN-US style='font-size:10.0pt'>Basic</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>、</span><span lang=EN-US style='font-size:10.0pt'>Fortran</span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>、</span><span lang=EN-US
style='font-size:10.0pt'>C</span><span style='font-size:10.0pt;font-family:
宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"'>等,它们是随着人类电子信息科学技术发展而创造出来的一种语言。这两种语言的代码都可以被转换为二进制的</span><span
lang=EN-US style='font-size:10.0pt'>0</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>、</span><span lang=EN-US style='font-size:10.0pt'>1</span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>序列。目前世界上先进的图书馆已经将大量的自然语言文字转换成二进制序列,存放在数字图书馆中。而利用计算机编译程序,也可以将高级程序语言转换为二进制机器指令,形成可执行程序。遗传语言的代码实际上就是</span><span
lang=EN-US style='font-size:10.0pt'>DNA</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>序列,是由</span><span lang=EN-US style='font-size:10.0pt'>A</span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>、</span><span lang=EN-US
style='font-size:10.0pt'>T</span><span style='font-size:10.0pt;font-family:
宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"'>、</span><span
lang=EN-US style='font-size:10.0pt'>G</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>、</span><span lang=EN-US style='font-size:10.0pt'>C</span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>四种字符组成的一种四进制代码,毫无疑问也可以转换为二进制序列。因此,三种语言在形式上可以统一起来。科学家在探索性的实验中发现,在不同的二进制序列中,</span><span
lang=EN-US style='font-size:10.0pt'>0</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>、</span><span lang=EN-US style='font-size:10.0pt'>1</span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>的长程关联不同。自然语言的二进制序列</span><span
lang=EN-US style='font-size:10.0pt'>0</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>、</span><span lang=EN-US style='font-size:10.0pt'>1</span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>的长程关联性远远低于高级程序语言的二进制序列,而遗传语言编码区域</span><span
lang=EN-US style='font-size:10.0pt'>0</span><span style='font-size:10.0pt;
font-family:宋体;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman"'>、</span><span lang=EN-US style='font-size:10.0pt'>1</span><span
style='font-size:10.0pt;font-family:宋体;mso-ascii-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman"'>的长程关联性远低于非编码区域。这在某种程度上说明:非编码区域相当于“程序”,起着控制的作用。同时,也暗示在基因组序列中可能隐藏着某种语言的规律性。一维线性的</span><span
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -