📄 sql.sgml
字号:
<!--$PostgreSQL: pgsql/doc/src/sgml/sql.sgml,v 1.38 2005/08/01 20:31:05 tgl Exp $--> <chapter id="sql-intro"> <title>SQL</title> <abstract> <para> This chapter introduces the mathematical concepts behind relational databases. It is not required reading, so if you bog down or want to get straight to some simple examples feel free to jump ahead to the next chapter and come back when you have more time and patience. This stuff is supposed to be fun! </para> <para> This material originally appeared as a part of Stefan Simkovics' Master's Thesis (<xref linkend="SIM98" endterm="SIM98">). </para> </abstract> <para> <acronym>SQL</acronym> has become the most popular relational query language. The name <quote><acronym>SQL</acronym></quote> is an abbreviation for <firstterm>Structured Query Language</firstterm>. In 1974 Donald Chamberlin and others defined the language SEQUEL (<firstterm>Structured English Query Language</firstterm>) at IBM Research. This language was first implemented in an IBM prototype called SEQUEL-XRM in 1974-75. In 1976-77 a revised version of SEQUEL called SEQUEL/2 was defined and the name was changed to <acronym>SQL</acronym> subsequently. </para> <para> A new prototype called System R was developed by IBM in 1977. System R implemented a large subset of SEQUEL/2 (now <acronym>SQL</acronym>) and a number of changes were made to <acronym>SQL</acronym> during the project. System R was installed in a number of user sites, both internal IBM sites and also some selected customer sites. Thanks to the success and acceptance of System R at those user sites IBM started to develop commercial products that implemented the <acronym>SQL</acronym> language based on the System R technology. </para> <para> Over the next years IBM and also a number of other vendors announced <acronym>SQL</acronym> products such as <productname>SQL/DS</productname> (IBM), <productname>DB2</productname> (IBM), <productname>ORACLE</productname> (Oracle Corp.), <productname>DG/SQL</productname> (Data General Corp.), and <productname>SYBASE</productname> (Sybase Inc.). </para> <para> <acronym>SQL</acronym> is also an official standard now. In 1982 the American National Standards Institute (<acronym>ANSI</acronym>) chartered its Database Committee X3H2 to develop a proposal for a standard relational language. This proposal was ratified in 1986 and consisted essentially of the IBM dialect of <acronym>SQL</acronym>. In 1987 this <acronym>ANSI</acronym> standard was also accepted as an international standard by the International Organization for Standardization (<acronym>ISO</acronym>). This original standard version of <acronym>SQL</acronym> is often referred to, informally, as <quote><abbrev>SQL/86</abbrev></quote>. In 1989 the original standard was extended and this new standard is often, again informally, referred to as <quote><abbrev>SQL/89</abbrev></quote>. Also in 1989, a related standard called <firstterm>Database Language Embedded <acronym>SQL</acronym></firstterm> (<acronym>ESQL</acronym>) was developed. </para> <para> The <acronym>ISO</acronym> and <acronym>ANSI</acronym> committees have been working for many years on the definition of a greatly expanded version of the original standard, referred to informally as <firstterm><acronym>SQL2</acronym></firstterm> or <firstterm><acronym>SQL/92</acronym></firstterm>. This version became a ratified standard - <quote>International Standard ISO/IEC 9075:1992, Database Language <acronym>SQL</acronym></quote> - in late 1992. <acronym>SQL/92</acronym> is the version normally meant when people refer to <quote>the <acronym>SQL</acronym> standard</quote>. A detailed description of <acronym>SQL/92</acronym> is given in <xref linkend="DATE97" endterm="DATE97">. At the time of writing this document a new standard informally referred to as <firstterm><acronym>SQL3</acronym></firstterm> is under development. It is planned to make <acronym>SQL</acronym> a Turing-complete language, i.e. all computable queries (e.g. recursive queries) will be possible. This has now been completed as SQL:2003. </para> <sect1 id="rel-model"> <title>The Relational Data Model</title> <para> As mentioned before, <acronym>SQL</acronym> is a relational language. That means it is based on the <firstterm>relational data model</firstterm> first published by E.F. Codd in 1970. We will give a formal description of the relational model later (in <xref linkend="formal-notion" endterm="formal-notion">) but first we want to have a look at it from a more intuitive point of view. </para> <para> A <firstterm>relational database</firstterm> is a database that is perceived by its users as a <firstterm>collection of tables</firstterm> (and nothing else but tables). A table consists of rows and columns where each row represents a record and each column represents an attribute of the records contained in the table. <xref linkend="supplier-fig" endterm="supplier-fig"> shows an example of a database consisting of three tables: <itemizedlist> <listitem> <para> SUPPLIER is a table storing the number (SNO), the name (SNAME) and the city (CITY) of a supplier. </para> </listitem> <listitem> <para> PART is a table storing the number (PNO) the name (PNAME) and the price (PRICE) of a part. </para> </listitem> <listitem> <para> SELLS stores information about which part (PNO) is sold by which supplier (SNO). It serves in a sense to connect the other two tables together. </para> </listitem> </itemizedlist> <example> <title id="supplier-fig">The Suppliers and Parts Database</title> <programlisting>SUPPLIER: SELLS: SNO | SNAME | CITY SNO | PNO----+---------+-------- -----+----- 1 | Smith | London 1 | 1 2 | Jones | Paris 1 | 2 3 | Adams | Vienna 2 | 4 4 | Blake | Rome 3 | 1 3 | 3 4 | 2PART: 4 | 3 PNO | PNAME | PRICE 4 | 4----+---------+--------- 1 | Screw | 10 2 | Nut | 8 3 | Bolt | 15 4 | Cam | 25 </programlisting> </example> </para> <para> The tables PART and SUPPLIER may be regarded as <firstterm>entities</firstterm> and SELLS may be regarded as a <firstterm>relationship</firstterm> between a particular part and a particular supplier. </para> <para> As we will see later, <acronym>SQL</acronym> operates on tables like the ones just defined but before that we will study the theory of the relational model. </para> </sect1> <sect1 id="relmodel-formal"> <title id="formal-notion">Relational Data Model Formalities</title> <para> The mathematical concept underlying the relational model is the set-theoretic <firstterm>relation</firstterm> which is a subset of the Cartesian product of a list of domains. This set-theoretic relation gives the model its name (do not confuse it with the relationship from the <firstterm>Entity-Relationship model</firstterm>). Formally a domain is simply a set of values. For example the set of integers is a domain. Also the set of character strings of length 20 and the real numbers are examples of domains. </para> <para><!--\begin{definition}The <firstterm>Cartesian product</firstterm> of domains $D_{1}, D_{2},\ldots, D_{k}$ written\mbox{$D_{1} \times D_{2} \times \ldots \times D_{k}$} is the set ofall $k$-tuples $(v_{1},v_{2},\ldots,v_{k})$ such that \mbox{$v_{1} \inD_{1}, v_{2} \in D_{2}, \ldots, v_{k} \in D_{k}$}.\end{definition}--> The <firstterm>Cartesian product</firstterm> of domains <parameter>D<subscript>1</subscript></parameter>, <parameter>D<subscript>2</subscript></parameter>, ... <parameter>D<subscript>k</subscript></parameter>, written <parameter>D<subscript>1</subscript></parameter> × <parameter>D<subscript>2</subscript></parameter> × ... × <parameter>D<subscript>k</subscript></parameter> is the set of all k-tuples <parameter>v<subscript>1</subscript></parameter>, <parameter>v<subscript>2</subscript></parameter>, ... <parameter>v<subscript>k</subscript></parameter>, such that <parameter>v<subscript>1</subscript></parameter> ∈ <parameter>D<subscript>1</subscript></parameter>, <parameter>v<subscript>2</subscript></parameter> ∈ <parameter>D<subscript>2</subscript></parameter>, ... <parameter>v<subscript>k</subscript></parameter> ∈ <parameter>D<subscript>k</subscript></parameter>. </para> <para> For example, when we have<!-- $k=2$, $D_{1}=\{0,1\}$ and$D_{2}=\{a,b,c\}$, then $D_{1} \times D_{2}$ is$\{(0,a),(0,b),(0,c),(1,a),(1,b),(1,c)\}$.--> <parameter>k</parameter>=2, <parameter>D<subscript>1</subscript></parameter>=<literal>{0,1}</literal> and <parameter>D<subscript>2</subscript></parameter>=<literal>{a,b,c}</literal> then <parameter>D<subscript>1</subscript></parameter> × <parameter>D<subscript>2</subscript></parameter> is <literal>{(0,a),(0,b),(0,c),(1,a),(1,b),(1,c)}</literal>. </para> <para><!--\begin{definition}A Relation is any subset of the Cartesian product of one or moredomains: $R \subseteq$ \mbox{$D_{1} \times D_{2} \times \ldots \times D_{k}$}\end{definition}--> A Relation is any subset of the Cartesian product of one or more domains: <parameter>R</parameter> ⊆ <parameter>D<subscript>1</subscript></parameter> × <parameter>D<subscript>2</subscript></parameter> × ... × <parameter>D<subscript>k</subscript></parameter>. </para> <para> For example <literal>{(0,a),(0,b),(1,a)}</literal> is a relation; it is in fact a subset of <parameter>D<subscript>1</subscript></parameter> × <parameter>D<subscript>2</subscript></parameter> mentioned above. </para> <para> The members of a relation are called tuples. Each relation of some Cartesian product <parameter>D<subscript>1</subscript></parameter> × <parameter>D<subscript>2</subscript></parameter> × ... × <parameter>D<subscript>k</subscript></parameter> is said to have arity <literal>k</literal> and is therefore a set of <literal>k</literal>-tuples. </para> <para> A relation can be viewed as a table (as we already did, remember <xref linkend="supplier-fig" endterm="supplier-fig"> where every tuple is represented by a row and every column corresponds to one component of a tuple. Giving names (called attributes) to the columns leads to the definition of a <firstterm>relation scheme</firstterm>. </para> <para><!--\begin{definition}A {\it relation scheme} $R$ is a finite set of attributes\mbox{$\{A_{1},A_{2},\ldots,A_{k}\}$}. There is a domain $D_{i}$ foreach attribute $A_{i}, 1 \le i \le k$ where the values of theattributes are taken from. We often write a relation scheme as\mbox{$R(A_{1},A_{2},\ldots,A_{k})$}.\end{definition}--> A <firstterm>relation scheme</firstterm> <literal>R</literal> is a finite set of attributes <parameter>A<subscript>1</subscript></parameter>, <parameter>A<subscript>2</subscript></parameter>, ... <parameter>A<subscript>k</subscript></parameter>. There is a domain <parameter>D<subscript>i</subscript></parameter>, for each attribute <parameter>A<subscript>i</subscript></parameter>, 1 <= <literal>i</literal> <= <literal>k</literal>, where the values of the attributes are taken from. We often write a relation scheme as <literal>R(<parameter>A<subscript>1</subscript></parameter>, <parameter>A<subscript>2</subscript></parameter>, ... <parameter>A<subscript>k</subscript></parameter>)</literal>. <note> <para> A <firstterm>relation scheme</firstterm> is just a kind of template whereas a <firstterm>relation</firstterm> is an instance of a <firstterm>relation scheme</firstterm>. The relation consists of tuples (and can therefore be viewed as a table); not so the relation scheme. </para> </note> </para> <sect2> <title id="domains">Domains vs. Data Types</title> <para> We often talked about <firstterm>domains</firstterm> in the last section. Recall that a domain is, formally, just a set of values (e.g., the set of integers or the real numbers). In terms of database systems we often talk of <firstterm>data types</firstterm> instead of domains. When we define a table we have to make a decision about which attributes to include. Additionally we have to decide which kind of data is going to be stored as attribute values. For example the values of <classname>SNAME</classname> from the table <classname>SUPPLIER</classname> will be character strings, whereas <classname>SNO</classname> will store integers. We define this by assigning a data type to each attribute. The type of <classname>SNAME</classname> will be <type>VARCHAR(20)</type> (this is the <acronym>SQL</acronym> type for character strings of length <= 20), the type of <classname>SNO</classname> will be <type>INTEGER</type>. With the assignment of a data type we also have selected a domain for an attribute. The domain of <classname>SNAME</classname> is the set of all character strings of length <= 20, the domain of <classname>SNO</classname> is the set of all integer numbers. </para> </sect2> </sect1> <sect1 id="relmodel-oper"> <title id="operations">Operations in the Relational Data Model</title> <para> In the previous section (<xref linkend="formal-notion" endterm="formal-notion">) we defined the mathematical notion of the relational model. Now we know how the data can be stored using a relational data model but we do not know what to do with all these tables to retrieve something from the database yet. For example somebody could ask for the names of all suppliers that sell the part 'Screw'. Therefore two rather different kinds of notations for expressing operations on relations have been defined: <itemizedlist> <listitem> <para> The <firstterm>Relational Algebra</firstterm> which is an algebraic notation, where queries are expressed by applying specialized operators to the relations. </para> </listitem> <listitem> <para> The <firstterm>Relational Calculus</firstterm> which is a logical notation, where queries are expressed by formulating some logical restrictions that the tuples in the answer must satisfy. </para> </listitem> </itemizedlist> </para> <sect2> <title id="rel-alg">Relational Algebra</title> <para> The <firstterm>Relational Algebra</firstterm> was introduced by E. F. Codd in 1972. It consists of a set of operations on relations:
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -