📄 copy.sgml

📁 postgresql8.3.4源码,开源数据库
💻 SGML
📖 第 1 页 / 共 2 页
字号:
上一页 12
    The specified null string is sent by <command>COPY TO</command> without    adding any backslashes; conversely, <command>COPY FROM</command> matches    the input against the null string before removing backslashes.  Therefore,    a null string such as <literal>\N</literal> cannot be confused with    the actual data value <literal>\N</literal> (which would be represented    as <literal>\\N</literal>).   </para>   <para>    The following special backslash sequences are recognized by    <command>COPY FROM</command>:   <informaltable>    <tgroup cols="2">     <thead>      <row>       <entry>Sequence</entry>       <entry>Represents</entry>      </row>     </thead>     <tbody>      <row>       <entry><literal>\b</></entry>       <entry>Backspace (ASCII 8)</entry>      </row>      <row>       <entry><literal>\f</></entry>       <entry>Form feed (ASCII 12)</entry>      </row>      <row>       <entry><literal>\n</></entry>       <entry>Newline (ASCII 10)</entry>      </row>      <row>       <entry><literal>\r</></entry>       <entry>Carriage return (ASCII 13)</entry>      </row>      <row>       <entry><literal>\t</></entry>       <entry>Tab (ASCII 9)</entry>      </row>      <row>       <entry><literal>\v</></entry>       <entry>Vertical tab (ASCII 11)</entry>      </row>      <row>       <entry><literal>\</><replaceable>digits</></entry>       <entry>Backslash followed by one to three octal digits specifies       the character with that numeric code</entry>      </row>      <row>       <entry><literal>\x</><replaceable>digits</></entry>       <entry>Backslash <literal>x</> followed by one or two hex digits specifies       the character with that numeric code</entry>      </row>     </tbody>    </tgroup>   </informaltable>    Presently, <command>COPY TO</command> will never emit an octal or     hex-digits backslash sequence, but it does use the other sequences    listed above for those control characters.   </para>   <para>    Any other backslashed character that is not mentioned in the above table    will be taken to represent itself.  However, beware of adding backslashes    unnecessarily, since that might accidentally produce a string matching the    end-of-data marker (<literal>\.</>) or the null string (<literal>\N</> by    default).  These strings will be recognized before any other backslash    processing is done.   </para>   <para>    It is strongly recommended that applications generating <command>COPY</command> data convert    data newlines and carriage returns to the <literal>\n</> and    <literal>\r</> sequences respectively.  At present it is    possible to represent a data carriage return by a backslash and carriage    return, and to represent a data newline by a backslash and newline.      However, these representations might not be accepted in future releases.    They are also highly vulnerable to corruption if the <command>COPY</command> file is    transferred across different machines (for example, from Unix to Windows    or vice versa).   </para>   <para>    <command>COPY TO</command> will terminate each row with a Unix-style     newline (<quote><literal>\n</></>).  Servers running on Microsoft Windows instead    output carriage return/newline (<quote><literal>\r\n</></>), but only for    <command>COPY</> to a server file; for consistency across platforms,    <command>COPY TO STDOUT</> always sends <quote><literal>\n</></>    regardless of server platform.    <command>COPY FROM</command> can handle lines ending with newlines,    carriage returns, or carriage return/newlines.  To reduce the risk of    error due to un-backslashed newlines or carriage returns that were    meant as data, <command>COPY FROM</command> will complain if the line    endings in the input are not all alike.   </para>  </refsect2>  <refsect2>   <title>CSV Format</title>   <para>    This format is used for importing and exporting the Comma    Separated Value (<literal>CSV</>) file format used by many other    programs, such as spreadsheets. Instead of the escaping used by    <productname>PostgreSQL</productname>'s standard text mode, it    produces and recognizes the common CSV escaping mechanism.   </para>   <para>    The values in each record are separated by the <literal>DELIMITER</>    character. If the value contains the delimiter character, the    <literal>QUOTE</> character, the <literal>NULL</> string, a carriage    return, or line feed character, then the whole value is prefixed and    suffixed by the <literal>QUOTE</> character, and any occurrence    within the value of a <literal>QUOTE</> character or the    <literal>ESCAPE</> character is preceded by the escape character.    You can also use <literal>FORCE QUOTE</> to force quotes when outputting    non-<literal>NULL</> values in specific columns.   </para>   <para>     The <literal>CSV</> format has no standard way to distinguish a    <literal>NULL</> value from an empty string.    <productname>PostgreSQL</>'s <command>COPY</> handles this by    quoting. A <literal>NULL</> is output as the <literal>NULL</>    string and is not quoted, while a data value matching the    <literal>NULL</> string is quoted. Therefore, using the default    settings, a <literal>NULL</> is written as an unquoted empty    string, while an empty string is written with double quotes    (<literal>""</>). Reading values follows similar rules. You can    use <literal>FORCE NOT NULL</> to prevent <literal>NULL</> input    comparisons for specific columns.   </para>   <para>     Because backslash is not a special character in the <literal>CSV</>    format, <literal>\.</>, the end-of-data marker, could also appear    as a data value.  To avoid any misinterpretation, a <literal>\.</>    data value appearing as a lone entry on a line is automatically     quoted on output, and on input, if quoted, is not interpreted as the     end-of-data marker.  If you are loading a file created by another    application that has a single unquoted column and might have a     value of <literal>\.</>, you might need to quote that value in the     input file.   </para>   <note>    <para>     In <literal>CSV</> mode, all characters are significant. A quoted value      surrounded by white space, or any characters other than      <literal>DELIMITER</>, will include those characters. This can cause      errors if  you import data from a system that pads <literal>CSV</>      lines with white space out to some fixed width. If such a situation      arises you might need to preprocess the <literal>CSV</> file to remove      the trailing white space, before importing the data into      <productname>PostgreSQL</>.     </para>   </note>   <note>    <para>     CSV mode will both recognize and produce CSV files with quoted     values containing embedded carriage returns and line feeds. Thus     the files are not strictly one line per table row like text-mode     files.    </para>   </note>   <note>    <para>     Many programs produce strange and occasionally perverse CSV files,     so the file format is more a convention than a standard. Thus you     might encounter some files that cannot be imported using this     mechanism, and <command>COPY</> might produce files that other     programs cannot process.    </para>   </note>      </refsect2>  <refsect2>   <title>Binary Format</title>   <para>    The file format used for <command>COPY BINARY</command> changed in    <productname>PostgreSQL</productname> 7.4. The new format consists    of a file header, zero or more tuples containing the row data, and    a file trailer. Headers and data are now in network byte order.   </para>   <refsect3>    <title>File Header</title>    <para>     The file header consists of 15 bytes of fixed fields, followed     by a variable-length header extension area.  The fixed fields are:    <variablelist>     <varlistentry>      <term>Signature</term>      <listitem>       <para>11-byte sequence <literal>PGCOPY\n\377\r\n\0</> &mdash; note that the zero byteis a required part of the signature.  (The signature is designed to alloweasy identification of files that have been munged by a non-8-bit-cleantransfer.  This signature will be changed by end-of-line-translationfilters, dropped zero bytes, dropped high bits, or parity changes.)       </para>      </listitem>     </varlistentry>     <varlistentry>      <term>Flags field</term>      <listitem>       <para>32-bit integer bit mask to denote important aspects of the file format. Bitsare numbered from 0 (<acronym>LSB</>) to 31 (<acronym>MSB</>).  Note thatthis field is stored in network byte order (most significant byte first),as are all the integer fields used in the file format.  Bits16-31 are reserved to denote critical file format issues; a readershould abort if it finds an unexpected bit set in this range. Bits 0-15are reserved to signal backwards-compatible format issues; a readershould simply ignore any unexpected bits set in this range. Currentlyonly one flag bit is defined, and the rest must be zero:        <variablelist>         <varlistentry>          <term>Bit 16</term>          <listitem>           <para>            if 1, OIDs are included in the data; if 0, not           </para>          </listitem>         </varlistentry>        </variablelist>       </para>      </listitem>     </varlistentry>     <varlistentry>      <term>Header extension area length</term>      <listitem>       <para>32-bit integer, length in bytes of remainder of header, not including self.Currently, this is zero, and the first tuple followsimmediately.  Future changes to the format might allow additional datato be present in the header.  A reader should silently skip over any headerextension data it does not know what to do with.       </para>      </listitem>     </varlistentry>    </variablelist>    </para>    <para>The header extension area is envisioned to contain a sequence ofself-identifying chunks.  The flags field is not intended to tell readerswhat is in the extension area.  Specific design of header extension contentsis left for a later release.    </para>    <para>     This design allows for both backwards-compatible header additions (add     header extension chunks, or set low-order flag bits) and     non-backwards-compatible changes (set high-order flag bits to signal such     changes, and add supporting data to the extension area if needed).    </para>   </refsect3>   <refsect3>    <title>Tuples</title>    <para>Each tuple begins with a 16-bit integer count of the number of fields in thetuple.  (Presently, all tuples in a table will have the same count, but thatmight not always be true.)  Then, repeated for each field in the tuple, thereis a 32-bit length word followed by that many bytes of field data.  (Thelength word does not include itself, and can be zero.)  As a special case,-1 indicates a NULL field value.  No value bytes follow in the NULL case.    </para>    <para>There is no alignment padding or any other extra data between fields.    </para>    <para>Presently, all data values in a <command>COPY BINARY</command> file areassumed to be in binary format (format code one).  It is anticipated that afuture extension might add a header field that allows per-column format codesto be specified.    </para>    <para>To determine the appropriate binary format for the actual tuple data youshould consult the <productname>PostgreSQL</productname> source, inparticular the <function>*send</> and <function>*recv</> functions foreach column's data type (typically these functions are found in the<filename>src/backend/utils/adt/</filename> directory of the sourcedistribution).    </para>    <para>If OIDs are included in the file, the OID field immediately follows thefield-count word.  It is a normal field except that it's not includedin the field-count.  In particular it has a length word &mdash; this will allowhandling of 4-byte vs. 8-byte OIDs without too much pain, and will allowOIDs to be shown as null if that ever proves desirable.    </para>   </refsect3>   <refsect3>    <title>File Trailer</title>    <para>     The file trailer consists of a 16-bit integer word containing -1.  This     is easily distinguished from a tuple's field-count word.    </para>    <para>     A reader should report an error if a field-count word is neither -1     nor the expected number of columns.  This provides an extra     check against somehow getting out of sync with the data.    </para>   </refsect3>  </refsect2> </refsect1>  <refsect1>  <title>Examples</title>  <para>   The following example copies a table to the client   using the vertical bar (<literal>|</literal>) as the field delimiter:<programlisting>COPY country TO STDOUT WITH DELIMITER '|';</programlisting>  </para>  <para>   To copy data from a file into the <literal>country</> table:<programlisting>COPY country FROM '/usr1/proj/bray/sql/country_data';</programlisting>  </para>  <para>   To copy into a file just the countries whose names start with 'A':<programlisting>COPY (SELECT * FROM country WHERE country_name LIKE 'A%') TO '/usr1/proj/bray/sql/a_list_countries.copy';</programlisting>  </para>  <para>   Here is a sample of data suitable for copying into a table from   <literal>STDIN</literal>:<programlisting>AF      AFGHANISTANAL      ALBANIADZ      ALGERIAZM      ZAMBIAZW      ZIMBABWE</programlisting>   Note that the white space on each line is actually a tab character.  </para>  <para>   The following is the same data, output in binary format.   The data is shown after filtering through the   Unix utility <command>od -c</command>. The table has three columns;   the first has type <type>char(2)</type>, the second has type <type>text</type>,   and the third has type <type>integer</type>. All the rows have a null value   in the third column.<programlisting>0000000   P   G   C   O   P   Y  \n 377  \r  \n  \0  \0  \0  \0  \0  \00000020  \0  \0  \0  \0 003  \0  \0  \0 002   A   F  \0  \0  \0 013   A0000040   F   G   H   A   N   I   S   T   A   N 377 377 377 377  \0 0030000060  \0  \0  \0 002   A   L  \0  \0  \0 007   A   L   B   A   N   I0000100   A 377 377 377 377  \0 003  \0  \0  \0 002   D   Z  \0  \0  \00000120 007   A   L   G   E   R   I   A 377 377 377 377  \0 003  \0  \00000140  \0 002   Z   M  \0  \0  \0 006   Z   A   M   B   I   A 377 3770000160 377 377  \0 003  \0  \0  \0 002   Z   W  \0  \0  \0  \b   Z   I0000200   M   B   A   B   W   E 377 377 377 377 377 377</programlisting>  </para> </refsect1>  <refsect1>  <title>Compatibility</title>    <para>   There is no <command>COPY</command> statement in the SQL standard.  </para>  <para>   The following syntax was used before <productname>PostgreSQL</>   version 7.3 and is still supported:<synopsis>COPY [ BINARY ] <replaceable class="parameter">tablename</replaceable> [ WITH OIDS ]    FROM { '<replaceable class="parameter">filename</replaceable>' | STDIN }    [ [USING] DELIMITERS '<replaceable class="parameter">delimiter</replaceable>' ]    [ WITH NULL AS '<replaceable class="parameter">null string</replaceable>' ]COPY [ BINARY ] <replaceable class="parameter">tablename</replaceable> [ WITH OIDS ]    TO { '<replaceable class="parameter">filename</replaceable>' | STDOUT }    [ [USING] DELIMITERS '<replaceable class="parameter">delimiter</replaceable>' ]    [ WITH NULL AS '<replaceable class="parameter">null string</replaceable>' ]</synopsis>  </para> </refsect1></refentry>
上一页 12
💿 文件大小 17116 K
👤 上传用户 fsweb
📂 所属分类其他书籍
🏷️ 相关标签

#postgresql #源码 #开源 #数据库
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -