📄 charset.sgml
字号:
<screen>initdb -E EUC_JP</screen> sets the default character set (encoding) to <literal>EUC_JP</literal> (Extended Unix Code for Japanese). You can use <option>--encoding</option> instead of <option>-E</option> if you prefer to type longer option strings. If no <option>-E</> or <option>--encoding</option> option is given, <literal>SQL_ASCII</> is used. </para> <para> You can create a database with a different character set:<screen>createdb -E EUC_KR korean</screen> This will create a database named <literal>korean</literal> that uses the character set <literal>EUC_KR</literal>. Another way to accomplish this is to use this SQL command:<programlisting>CREATE DATABASE korean WITH ENCODING 'EUC_KR';</programlisting> The encoding for a database is stored in the system catalog <literal>pg_database</literal>. You can see that by using the <option>-l</option> option or the <command>\l</command> command of <command>psql</command>.<screen>$ <userinput>psql -l</userinput> List of databases Database | Owner | Encoding ---------------+---------+--------------- euc_cn | t-ishii | EUC_CN euc_jp | t-ishii | EUC_JP euc_kr | t-ishii | EUC_KR euc_tw | t-ishii | EUC_TW mule_internal | t-ishii | MULE_INTERNAL regression | t-ishii | SQL_ASCII template1 | t-ishii | EUC_JP test | t-ishii | EUC_JP unicode | t-ishii | UNICODE(9 rows)</screen> </para> </sect2> <sect2> <title>Automatic Character Set Conversion Between Server and Client</title> <para> <productname>PostgreSQL</productname> supports automatic character set conversion between server and client for certain character sets. The conversion information is stored in the <literal>pg_conversion</> system catalog. You can create a new conversion by using the SQL command <command>CREATE CONVERSION</command>. <productname>PostgreSQL</> comes with some predefined conversions. They are listed in <xref linkend="multibyte-translation-table">. </para> <table id="multibyte-translation-table"> <title>Client/Server Character Set Conversions</title> <tgroup cols="2"> <thead> <row> <entry>Server Character Set</entry> <entry>Available Client Character Sets</entry> </row> </thead> <tbody> <row> <entry><literal>SQL_ASCII</literal></entry> <entry><literal>SQL_ASCII</literal>, <literal>UNICODE</literal>, <literal>MULE_INTERNAL</literal> </entry> </row> <row> <entry><literal>EUC_JP</literal></entry> <entry><literal>EUC_JP</literal>, <literal>SJIS</literal>, <literal>UNICODE</literal>, <literal>MULE_INTERNAL</literal> </entry> </row> <row> <entry><literal>EUC_CN</literal></entry> <entry><literal>EUC_CN</literal>, <literal>UNICODE</literal>, <literal>MULE_INTERNAL</literal> </entry> </row> <row> <entry><literal>EUC_KR</literal></entry> <entry><literal>EUC_KR</literal>, <literal>UNICODE</literal>, <literal>MULE_INTERNAL</literal> </entry> </row> <row> <entry><literal>JOHAB</literal></entry> <entry><literal>JOHAB</literal>, <literal>UNICODE</literal> </entry> </row> <row> <entry><literal>EUC_TW</literal></entry> <entry><literal>EUC_TW</literal>, <literal>BIG5</literal>, <literal>UNICODE</literal>, <literal>MULE_INTERNAL</literal> </entry> </row> <row> <entry><literal>LATIN1</literal></entry> <entry><literal>LATIN1</literal>, <literal>UNICODE</literal> <literal>MULE_INTERNAL</literal> </entry> </row> <row> <entry><literal>LATIN2</literal></entry> <entry><literal>LATIN2</literal>, <literal>WIN1250</literal>, <literal>UNICODE</literal>, <literal>MULE_INTERNAL</literal> </entry> </row> <row> <entry><literal>LATIN3</literal></entry> <entry><literal>LATIN3</literal>, <literal>UNICODE</literal>, <literal>MULE_INTERNAL</literal> </entry> </row> <row> <entry><literal>LATIN4</literal></entry> <entry><literal>LATIN4</literal>, <literal>UNICODE</literal>, <literal>MULE_INTERNAL</literal> </entry> </row> <row> <entry><literal>LATIN5</literal></entry> <entry><literal>LATIN5</literal>, <literal>UNICODE</literal> </entry> </row> <row> <entry><literal>LATIN6</literal></entry> <entry><literal>LATIN6</literal>, <literal>UNICODE</literal>, <literal>MULE_INTERNAL</literal> </entry> </row> <row> <entry><literal>LATIN7</literal></entry> <entry><literal>LATIN7</literal>, <literal>UNICODE</literal>, <literal>MULE_INTERNAL</literal> </entry> </row> <row> <entry><literal>LATIN8</literal></entry> <entry><literal>LATIN8</literal>, <literal>UNICODE</literal>, <literal>MULE_INTERNAL</literal> </entry> </row> <row> <entry><literal>LATIN9</literal></entry> <entry><literal>LATIN9</literal>, <literal>UNICODE</literal>, <literal>MULE_INTERNAL</literal> </entry> </row> <row> <entry><literal>LATIN10</literal></entry> <entry><literal>LATIN10</literal>, <literal>UNICODE</literal>, <literal>MULE_INTERNAL</literal> </entry> </row> <row> <entry><literal>ISO_8859_5</literal></entry> <entry><literal>ISO_8859_5</literal>, <literal>UNICODE</literal>, <literal>MULE_INTERNAL</literal>, <literal>WIN</literal>, <literal>ALT</literal>, <literal>KOI8</literal> </entry> </row> <row> <entry><literal>ISO_8859_6</literal></entry> <entry><literal>ISO_8859_6</literal>, <literal>UNICODE</literal> </entry> </row> <row> <entry><literal>ISO_8859_7</literal></entry> <entry><literal>ISO_8859_7</literal>, <literal>UNICODE</literal> </entry> </row> <row> <entry><literal>ISO_8859_8</literal></entry> <entry><literal>ISO_8859_8</literal>, <literal>UNICODE</literal> </entry> </row> <row> <entry><literal>UNICODE</literal></entry> <entry> <literal>EUC_JP</literal>, <literal>SJIS</literal>, <literal>EUC_KR</literal>, <literal>UHC</literal>, <literal>JOHAB</literal>, <literal>EUC_CN</literal>, <literal>GBK</literal>, <literal>EUC_TW</literal>, <literal>BIG5</literal>, <literal>LATIN1</literal> to <literal>LATIN10</literal>, <literal>ISO_8859_5</literal>, <literal>ISO_8859_6</literal>, <literal>ISO_8859_7</literal>, <literal>ISO_8859_8</literal>, <literal>WIN</literal>, <literal>ALT</literal>, <literal>KOI8</literal>, <literal>WIN1256</literal>, <literal>TCVN</literal>, <literal>WIN874</literal>, <literal>GB18030</literal>, <literal>WIN1250</literal> </entry> </row> <row> <entry><literal>MULE_INTERNAL</literal></entry> <entry><literal>EUC_JP</literal>, <literal>SJIS</literal>, <literal>EUC_KR</literal>, <literal>EUC_CN</literal>, <literal>EUC_TW</literal>, <literal>BIG5</literal>, <literal>LATIN1</literal> to <literal>LATIN5</literal>, <literal>WIN</literal>, <literal>ALT</literal>, <literal>WIN1250</literal>, <literal>BIG5</literal>, <literal>ISO_8859_5</literal>, <literal>KOI8</literal></entry> </row> <row> <entry><literal>KOI8</literal></entry> <entry><literal>ISO_8859_5</literal>, <literal>WIN</literal>, <literal>ALT</literal>, <literal>KOI8</literal>, <literal>UNICODE</literal>, <literal>MULE_INTERNAL</literal> </entry> </row> <row> <entry><literal>WIN</literal></entry> <entry><literal>ISO_8859_5</literal>, <literal>WIN</literal>, <literal>ALT</literal>, <literal>KOI8</literal>, <literal>UNICODE</literal>, <literal>MULE_INTERNAL</literal> </entry> </row> <row> <entry><literal>ALT</literal></entry> <entry><literal>ISO_8859_5</literal>, <literal>WIN</literal>, <literal>ALT</literal>, <literal>KOI8</literal>, <literal>UNICODE</literal>, <literal>MULE_INTERNAL</literal> </entry> </row> <row> <entry><literal>WIN1256</literal></entry> <entry><literal>WIN1256</literal>, <literal>UNICODE</literal> </entry> </row> <row> <entry><literal>TCVN</literal></entry> <entry><literal>TCVN</literal>, <literal>UNICODE</literal> </entry> </row> <row> <entry><literal>WIN874</literal></entry> <entry><literal>WIN874</literal>, <literal>UNICODE</literal> </entry> </row> </tbody> </tgroup> </table> <para> To enable the automatic character set conversion, you have to tell <productname>PostgreSQL</productname> the character set (encoding) you would like to use in the client. There are several ways to accomplish this: <itemizedlist> <listitem> <para> Using the <command>\encoding</command> command in <application>psql</application>. <command>\encoding</command> allows you to change client encoding on the fly. For example, to change the encoding to <literal>SJIS</literal>, type:<programlisting>\encoding SJIS</programlisting> </para> </listitem> <listitem> <para> Using <application>libpq</> functions. <command>\encoding</command> actually calls <function>PQsetClientEncoding()</function> for its purpose.<synopsis>int PQsetClientEncoding(PGconn *<replaceable>conn</replaceable>, const char *<replaceable>encoding</replaceable>);</synopsis> where <replaceable>conn</replaceable> is a connection to the server, and <replaceable>encoding</replaceable> is the encoding you want to use. If the function successfully sets the encoding, it returns 0, otherwise -1. The current encoding for this connection can be determined by using:<synopsis>int PQclientEncoding(const PGconn *<replaceable>conn</replaceable>);</synopsis> Note that it returns the encoding ID, not a symbolic string such as <literal>EUC_JP</literal>. To convert an encoding ID to an encoding name, you can use:<synopsis>char *pg_encoding_to_char(int <replaceable>encoding_id</replaceable>);</synopsis> </para> </listitem> <listitem> <para> Using <command>SET client_encoding TO</command>. Setting the client encoding can be done with this SQL command:<programlisting>SET CLIENT_ENCODING TO '<replaceable>value</>';</programlisting> Also you can use the more standard SQL syntax <literal>SET NAMES</literal> for this purpose:<programlisting>SET NAMES '<replaceable>value</>';</programlisting> To query the current client encoding:<programlisting>SHOW client_encoding;</programlisting> To return to the default encoding:<programlisting>RESET client_encoding;</programlisting> </para> </listitem> <listitem> <para> Using <envar>PGCLIENTENCODING</envar>. If environment variable <envar>PGCLIENTENCODING</envar> is defined in the client's environment, that client encoding is automatically selected when a connection to the server is made. (This can subsequently be overridden using any of the other methods mentioned above.) </para> </listitem> <listitem> <para> Using the configuration variable <varname>client_encoding</varname>. If the <varname>client_encoding</> variable in <filename>postgresql.conf</> is set, that client encoding is automatically selected when a connection to the server is made. (This can subsequently be overridden using any of the other methods mentioned above.) </para> </listitem> </itemizedlist> </para> <para> If the conversion of a particular character is not possible -- suppose you chose <literal>EUC_JP</literal> for the server and <literal>LATIN1</literal> for the client, then some Japanese characters cannot be converted to <literal>LATIN1</literal> -- it is transformed to its hexadecimal byte values in parentheses, e.g., <literal>(826C)</literal>. </para> </sect2> <sect2> <title>Further Reading</title> <para> These are good sources to start learning about various kinds of encoding systems. <variablelist> <varlistentry> <term><ulink url="ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/cjk.inf"></ulink></term> <listitem> <para> Detailed explanations of <literal>EUC_JP</literal>, <literal>EUC_CN</literal>, <literal>EUC_KR</literal>, <literal>EUC_TW</literal> appear in section 3.2. </para> </listitem> </varlistentry> <varlistentry> <term><ulink url="http://www.unicode.org/"></ulink></term> <listitem> <para> The web site of the Unicode Consortium </para> </listitem> </varlistentry> <varlistentry> <term>RFC 2044</term> <listitem> <para> <acronym>UTF</acronym>-8 is defined here. </para> </listitem> </varlistentry> </variablelist> </para> </sect2> </sect1></chapter><!-- Keep this comment at the end of the fileLocal variables:mode:sgmlsgml-omittag:nilsgml-shorttag:tsgml-minimize-attributes:nilsgml-always-quote-attributes:tsgml-indent-step:1sgml-indent-data:tsgml-parent-document:nilsgml-default-dtd-file:"./reference.ced"sgml-exposed-tags:nilsgml-local-catalogs:("/usr/lib/sgml/catalog")sgml-local-ecat-files:nilEnd:-->
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -