⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 arch-dev.sgml

📁 PostgreSQL 8.1.4的源码 适用于Linux下的开源数据库系统
💻 SGML
📖 第 1 页 / 共 2 页
字号:
<!--$PostgreSQL: pgsql/doc/src/sgml/arch-dev.sgml,v 2.25 2005/01/05 23:42:02 tgl Exp $--> <chapter id="overview">  <title>Overview of PostgreSQL Internals</title>  <note>   <title>Author</title>   <para>    This chapter originated as part of    <xref linkend="SIM98">, Stefan Simkovics'    Master's Thesis prepared at Vienna University of Technology under the direction    of O.Univ.Prof.Dr. Georg Gottlob and Univ.Ass. Mag. Katrin Seyr.   </para>  </note>  <para>   This chapter gives an overview of the internal structure of the   backend of <productname>PostgreSQL</productname>.  After having   read the following sections you should have an idea of how a query   is processed. This chapter does not aim to provide a detailed   description of the internal operation of   <productname>PostgreSQL</productname>, as such a document would be   very extensive. Rather, this chapter is intended to help the reader   understand the general sequence of operations that occur within the   backend from the point at which a query is received, to the point   at which the results are returned to the client.  </para>  <sect1 id="query-path">   <title>The Path of a Query</title>   <para>    Here we give a short overview of the stages a query has to pass in    order to obtain a result.   </para>   <procedure>    <step>     <para>      A connection from an application program to the <productname>PostgreSQL</productname>      server has to be established. The application program transmits a      query to the server and waits to receive the results sent back by the      server.     </para>    </step>    <step>     <para>      The <firstterm>parser stage</firstterm> checks the query      transmitted by the application      program for correct syntax and creates      a <firstterm>query tree</firstterm>.     </para>    </step>    <step>     <para>      The <firstterm>rewrite system</firstterm> takes      the query tree created by the parser stage and looks for      any <firstterm>rules</firstterm> (stored in the      <firstterm>system catalogs</firstterm>) to apply to       the query tree.  It performs the      transformations given in the <firstterm>rule bodies</firstterm>.     </para>     <para>      One application of the rewrite system is in the realization of      <firstterm>views</firstterm>.      Whenever a query against a view      (i.e. a <firstterm>virtual table</firstterm>) is made,      the rewrite system rewrites the user's query to      a query that accesses the <firstterm>base tables</firstterm> given in      the <firstterm>view definition</firstterm> instead.     </para>    </step>    <step>     <para>      The <firstterm>planner/optimizer</firstterm> takes      the (rewritten) query tree and creates a       <firstterm>query plan</firstterm> that will be the input to the      <firstterm>executor</firstterm>.     </para>     <para>      It does so by first creating all possible <firstterm>paths</firstterm>      leading to the same result. For example if there is an index on a      relation to be scanned, there are two paths for the      scan. One possibility is a simple sequential scan and the other      possibility is to use the index. Next the cost for the execution of      each path is estimated and the cheapest path is chosen.  The cheapest      path is expanded into a complete plan that the executor can use.     </para>    </step>    <step>     <para>      The executor recursively steps through      the <firstterm>plan tree</firstterm> and      retrieves rows in the way represented by the plan.      The executor makes use of the      <firstterm>storage system</firstterm> while scanning      relations, performs <firstterm>sorts</firstterm> and <firstterm>joins</firstterm>,      evaluates <firstterm>qualifications</firstterm> and finally hands back the rows derived.     </para>    </step>   </procedure>   <para>    In the following sections we will cover each of the above listed items    in more detail to give a better understanding of <productname>PostgreSQL</productname>'s internal    control and data structures.   </para>  </sect1>  <sect1 id="connect-estab">   <title>How Connections are Established</title>   <para>    <productname>PostgreSQL</productname> is implemented using a    simple <quote>process per user</> client/server model.  In this model    there is one <firstterm>client process</firstterm> connected to    exactly one <firstterm>server process</firstterm>.  As we do not    know ahead of time how many connections will be made, we have to    use a <firstterm>master process</firstterm> that spawns a new    server process every time a connection is requested. This master    process is called <literal>postmaster</literal> and listens at a    specified TCP/IP port for incoming connections. Whenever a request    for a connection is detected the <literal>postmaster</literal>    process spawns a new server process called    <literal>postgres</literal>. The server tasks    (<literal>postgres</literal> processes) communicate with each    other using <firstterm>semaphores</firstterm> and    <firstterm>shared memory</firstterm> to ensure data integrity    throughout concurrent data access.   </para>   <para>    The client process can be any program that understands the    <productname>PostgreSQL</productname> protocol described in    <xref linkend="protocol">.  Many clients are based on the    C-language library <application>libpq</>, but several independent    implementations of the protocol exist, such as the Java    <application>JDBC</> driver.   </para>   <para>    Once a connection is established the client process can send a query    to the <firstterm>backend</firstterm> (server). The query is transmitted using plain text,    i.e. there is no parsing done in the <firstterm>frontend</firstterm> (client). The    server parses the query, creates an <firstterm>execution plan</firstterm>,    executes the plan and returns the retrieved rows to the client    by transmitting them over the established connection.   </para>  </sect1>  <sect1 id="parser-stage">   <title>The Parser Stage</title>   <para>    The <firstterm>parser stage</firstterm> consists of two parts:    <itemizedlist>     <listitem>      <para>       The <firstterm>parser</firstterm> defined in       <filename>gram.y</filename> and <filename>scan.l</filename> is       built using the Unix tools <application>yacc</application>       and <application>lex</application>.      </para>     </listitem>     <listitem>      <para>       The <firstterm>transformation process</firstterm> does       modifications and augmentations to the data structures returned by the parser.      </para>     </listitem>    </itemizedlist>   </para>   <sect2>    <title>Parser</title>    <para>     The parser has to check the query string (which arrives as plain     ASCII text) for valid syntax. If the syntax is correct a     <firstterm>parse tree</firstterm> is built up and handed back;     otherwise an error is returned. The parser and lexer are     implemented using the well-known Unix tools <application>yacc</>     and <application>lex</>.    </para>    <para>     The <firstterm>lexer</firstterm> is defined in the file     <filename>scan.l</filename> and is responsible     for recognizing <firstterm>identifiers</firstterm>,     the <firstterm>SQL key words</firstterm> etc. For     every key word or identifier that is found, a <firstterm>token</firstterm>     is generated and handed to the parser.    </para>    <para>     The parser is defined in the file <filename>gram.y</filename> and     consists of a set of <firstterm>grammar rules</firstterm> and     <firstterm>actions</firstterm> that are executed whenever a rule     is fired. The code of the actions (which is actually C code) is     used to build up the parse tree.    </para>    <para>     The file <filename>scan.l</filename> is transformed to the C     source file <filename>scan.c</filename> using the program     <application>lex</application> and <filename>gram.y</filename> is     transformed to <filename>gram.c</filename> using     <application>yacc</application>.  After these transformations     have taken place a normal C compiler can be used to create the     parser. Never make any changes to the generated C files as they     will be overwritten the next time <application>lex</application>     or <application>yacc</application> is called.     <note>      <para>       The mentioned transformations and compilations are normally done       automatically using the <firstterm>makefiles</firstterm>       shipped with the <productname>PostgreSQL</productname>       source distribution.      </para>     </note>    </para>    <para>     A detailed description of <application>yacc</application> or     the grammar rules given in <filename>gram.y</filename> would be     beyond the scope of this paper. There are many books and     documents dealing with <application>lex</application> and     <application>yacc</application>. You should be familiar with     <application>yacc</application> before you start to study the     grammar given in <filename>gram.y</filename> otherwise you won't     understand what happens there.    </para>   </sect2>   <sect2>     <title>Transformation Process</title>    <para>     The parser stage creates a parse tree using only fixed rules about     the syntactic structure of SQL.  It does not make any lookups in the     system catalogs, so there is no possibility to understand the detailed     semantics of the requested operations.  After the parser completes,     the <firstterm>transformation process</firstterm> takes the tree handed     back by the parser as input and does the semantic interpretation needed     to understand which tables, functions, and operators are referenced by     the query.  The data structure that is built to represent this     information is called the <firstterm>query tree</>.    </para>    <para>     The reason for separating raw parsing from semantic analysis is that     system catalog lookups can only be done within a transaction, and we     do not wish to start a transaction immediately upon receiving a query     string.  The raw parsing stage is sufficient to identify the transaction     control commands (<command>BEGIN</>, <command>ROLLBACK</>, etc), and     these can then be correctly executed without any further analysis.     Once we know that we are dealing with an actual query (such as     <command>SELECT</> or <command>UPDATE</>), it is okay to     start a transaction if we're not already in one.  Only then can the     transformation process be invoked.    </para>    <para>     The query tree created by the transformation process is structurally     similar to the raw parse tree in most places, but it has many differences     in detail.  For example, a <structname>FuncCall</> node in the     parse tree represents something that looks syntactically like a function

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -