📄 arch-dev.sgml

📁 关系型数据库 Postgresql 6.5.2
💻 SGML
📖 第 1 页 / 共 5 页
字号:
12 3 4 5 下一页
 <chapter id="overview">  <title>Overview of PostgreSQL Internals</title>  <note>   <title>Author</title>   <para>    This chapter originally appeared as a part of    <xref linkend="SIM98" endterm="SIM98">, Stefan Simkovics'    Master's Thesis prepared at Vienna University of Technology under the direction    of O.Univ.Prof.Dr. Georg Gottlob and Univ.Ass. Mag. Katrin Seyr.   </para>  </note>  <para>   This chapter gives an overview of the internal structure of the   backend of <productname>Postgres</productname>.   After having read the following sections you   should have an idea of how a query is processed. Don't expect a   detailed description here (I think such a description dealing with   all data structures and functions used within <productname>Postgres</productname>   would exceed 1000   pages!). This chapter is intended to help understanding the general   control and data flow within the backend from receiving a query to   sending the results.  </para>  <sect1>   <title>The Path of a Query</title>   <para>    Here we give a short overview of the stages a query has to pass in    order to obtain a result.   </para>   <procedure>    <step>     <para>      A connection from an application program to the <productname>Postgres</productname>      server has to be established. The application program transmits a      query to the server and receives the results sent back by the server.     </para>    </step>    <step>     <para>      The <firstterm>parser stage</firstterm> checks the query      transmitted by the application      program (client) for correct syntax and creates      a <firstterm>query tree</firstterm>.     </para>    </step>    <step>     <para>      The <firstterm>rewrite system</firstterm> takes      the query tree created by the parser stage and looks for      any <firstterm>rules</firstterm> (stored in the      <firstterm>system catalogs</firstterm>) to apply to       the <firstterm>querytree</firstterm> and performs the      transformations given in the <firstterm>rule bodies</firstterm>.      One application of the rewrite system is given in the realization of      <firstterm>views</firstterm>.     </para>     <para>      Whenever a query against a view      (i.e. a <firstterm>virtual table</firstterm>) is made,      the rewrite system rewrites the user's query to      a query that accesses the <firstterm>base tables</firstterm> given in      the <firstterm>view definition</firstterm> instead.     </para>    </step>    <step>     <para>      The <firstterm>planner/optimizer</firstterm> takes      the (rewritten) querytree and creates a       <firstterm>queryplan</firstterm> that will be the input to the      <firstterm>executor</firstterm>.     </para>     <para>      It does so by first creating all possible <firstterm>paths</firstterm>      leading to the same result. For example if there is an index on a      relation to be scanned, there are two paths for the      scan. One possibility is a simple sequential scan and the other      possibility is to use the index. Next the cost for the execution of      each plan is estimated and the      cheapest plan is chosen and handed back.     </para>    </step>    <step>     <para>      The executor recursively steps through      the <firstterm>plan tree</firstterm> and      retrieves tuples in the way represented by the plan.      The executor makes use of the      <firstterm>storage system</firstterm> while scanning      relations, performs <firstterm>sorts</firstterm> and <firstterm>joins</firstterm>,      evaluates <firstterm>qualifications</firstterm> and finally hands back the tuples derived.     </para>    </step>   </procedure>   <para>    In the following sections we will cover every of the above listed items    in more detail to give a better understanding on <productname>Postgres</productname>'s internal    control and data structures.   </para>  </sect1>  <sect1>   <title>How Connections are Established</title>   <para>    <productname>Postgres</productname> is implemented using a simple "process per-user"    client/server model.  In this model there is one <firstterm>client process</firstterm>    connected to exactly one <firstterm>server process</firstterm>.    As we don't know <foreignphrase>per se</foreignphrase>     how many connections will be made, we have to use a <firstterm>master process</firstterm>    that spawns a new server process every time a connection is    requested. This master process is called <literal>postmaster</literal> and     listens at a specified TCP/IP port for incoming connections. Whenever    a request for a connection is detected the <literal>postmaster</literal> process    spawns a new server process called <literal>postgres</literal>. The server    tasks (<literal>postgres</literal> processes) communicate with each other using    <firstterm>semaphores</firstterm> and <firstterm>shared memory</firstterm>    to ensure data integrity    throughout concurrent data access. Figure    \ref{connection} illustrates the interaction of the master process     <literal>postmaster</literal> the server process <literal>postgres</literal> and a client    application.    </para>   <para>    The client process can either be the <application>psql</application> frontend (for    interactive SQL queries) or any user application implemented using    the <filename>libpg</filename> library. Note that applications implemented using    <application>ecpg</application>    (the <productname>Postgres</productname> embedded SQL preprocessor for C)    also use this library.   </para>   <para>    Once a connection is established the client process can send a query    to the <firstterm>backend</firstterm> (server). The query is transmitted using plain text,    i.e. there is no parsing done in the <firstterm>frontend</firstterm> (client). The    server parses the query, creates an <firstterm>execution plan</firstterm>,    executes the plan and returns the retrieved tuples to the client    by transmitting them over the established connection.   </para><!--\begin{figure}[ht]\begin{center}\epsfig{figure=figures/connection.ps}\caption{How a connection is established}\label{connection}\end{center}\end{figure}-->  </sect1>  <sect1>   <title>The Parser Stage</title>   <para>    The <firstterm>parser stage</firstterm> consists of two parts:    <itemizedlist>     <listitem>      <para>       The <firstterm>parser</firstterm> defined in       <filename>gram.y</filename> and <filename>scan.l</filename> is       built using the UNIX tools <application>yacc</application>       and <application>lex</application>.      </para>     </listitem>     <listitem>      <para>       The <firstterm>transformation process</firstterm> does       modifications and augmentations to the data structures returned by the parser.      </para>     </listitem>    </itemizedlist>   </para>   <sect2>    <title>Parser</title>    <para>     The parser has to check the query string (which arrives as     plain ASCII text) for valid syntax. If the syntax is correct a     <firstterm>parse tree</firstterm> is built up and handed back otherwise an error is     returned. For the implementation the well known UNIX     tools <application>lex</application> and <application>yacc</application>     are used.    </para>    <para>     The <firstterm>lexer</firstterm> is defined in the file     <filename>scan.l</filename> and is responsible     for recognizing <firstterm>identifiers</firstterm>,     the <firstterm>SQL keywords</firstterm> etc. For     every keyword or identifier that is found, a <firstterm>token</firstterm>     is generated and handed to the parser.    </para>    <para>     The parser is defined in the file <filename>gram.y</filename> and consists of a     set of <firstterm>grammar rules</firstterm> and <firstterm>actions</firstterm>     that are executed     whenever a rule is fired. The code of the actions (which     is actually C-code) is used to build up the parse tree.    </para>    <para>     The file <filename>scan.l</filename> is transformed to     the C-source file <filename>scan.c</filename>     using the program <application>lex</application>     and <filename>gram.y</filename> is transformed to     <filename>gram.c</filename> using <application>yacc</application>.     After these transformations have taken     place a normal C-compiler can be used to create the     parser. Never make any changes to the generated C-files as they will     be overwritten the next time <application>lex</application>     or <application>yacc</application> is called.     <note>      <para>       The mentioned transformations and compilations are normally done       automatically using the <firstterm>makefiles</firstterm>       shipped with the <productname>Postgres</productname>       source distribution.      </para>     </note>    </para>    <para>     A detailed description of <application>yacc</application> or     the grammar rules given in <filename>gram.y</filename> would be     beyond the scope of this paper. There are many books and     documents dealing with <application>lex</application> and     <application>yacc</application>. You should be familiar with     <application>yacc</application> before you start to study the     grammar given in <filename>gram.y</filename> otherwise you won't     understand what happens there.    </para>    <para>     For a better understanding of the data structures used in     <productname>Postgres</productname>     for the processing of a query we use an example to illustrate the     changes made to these data structures in every stage.    </para>    <example id="simple-select">     <title>A Simple Select</title>     <para>      This example contains the following simple query that will be used in      various descriptions and figures throughout the following      sections. The query assumes that the tables given in      <citetitle>The Supplier Database</citetitle>      <!--      XXX The above citetitle should really be an xref,      but that part has not yet been converted from Stefan's original document.      - thomas 1999-02-11      <xref linkend="supplier" endterm="supplier">      -->      have already been defined.      <programlisting>select s.sname, se.pno    from supplier s, sells se    where s.sno > 2 and s.sno = se.sno;      </programlisting>     </para>    </example>    <para>     Figure \ref{parsetree} shows the <firstterm>parse tree</firstterm> built by the     grammar rules and actions given in <filename>gram.y</filename> for the query     given in  <xref linkend="simple-select" endterm="simple-select">     (without the <firstterm>operator tree</firstterm> for     the <firstterm>where clause</firstterm> which is shown in figure \ref{where_clause}     because there was not enough space to show both data structures in one     figure).    </para>    <para>     The top node of the tree is a <literal>SelectStmt</literal> node. For every entry     appearing in the <firstterm>from clause</firstterm> of the SQL query a <literal>RangeVar</literal>     node is created holding the name of the <firstterm>alias</firstterm> and a pointer to a     <literal>RelExpr</literal> node holding the name of the <firstterm>relation</firstterm>. All     <literal>RangeVar</literal> nodes are collected in a list which is attached to the field     <literal>fromClause</literal> of the <literal>SelectStmt</literal> node.    </para>    <para>     For every entry appearing in the <firstterm>select list</firstterm> of the SQL query a     <literal>ResTarget</literal> node is created holding a pointer to an <literal>Attr</literal>     node. The <literal>Attr</literal> node holds the <firstterm>relation name</firstterm> of the entry and     a pointer to a <literal>Value</literal> node holding the name of the     <firstterm>attribute</firstterm>.     All <literal>ResTarget</literal> nodes are collected to a list which is     connected to the field <literal>targetList</literal> of the <literal>SelectStmt</literal> node.    </para>    <para>     Figure \ref{where_clause} shows the operator tree built for the     where clause of the SQL query given in example     <xref linkend="simple-select" endterm="simple-select">     which is attached to the field     <literal>qual</literal> of the <literal>SelectStmt</literal> node. The top node of the     operator tree is an <literal>A_Expr</literal> node representing an <literal>AND</literal>     operation. This node has two successors called <literal>lexpr</literal> and     <literal>rexpr</literal> pointing to two <firstterm>subtrees</firstterm>. The subtree attached to     <literal>lexpr</literal> represents the qualification <literal>s.sno &gt; 2</literal> and the one     attached to <literal>rexpr</literal> represents <literal>s.sno = se.sno</literal>. For every     attribute an <literal>Attr</literal> node is created holding the name of the     relation and a pointer to a <literal>Value</literal> node holding the name of the     attribute. For the constant term appearing in the query a     <literal>Const</literal> node is created holding the value.    </para><!--XXX merge in the figures later... - thomas 1999-01-29\begin{figure}[ht]\begin{center}\epsfig{figure=figures/parsetree.ps}\caption{{\it TargetList} and {\it FromList} for query of example \ref{simple_select}}\label{parsetree}\end{center}\end{figure}\begin{figure}[ht]\begin{center}\epsfig{figure=figures/where_clause.ps}\caption{{\it WhereClause} for query of example \ref{simple_select}}\label{where_clause}\end{center}
12 3 4 5 下一页
💿 文件大小 6496 K
👤 上传用户 huajunun
📂 所属分类数据库系统
🏷️ 相关标签

#Postgresql #数据库
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -