📄 charset.sgml
字号:
</row> <row> <entry><literal>ISO_8859_5</literal></entry> <entry>ISO 8859-5, <acronym>ECMA</> 113</entry> <entry>Latin/Cyrillic</entry> <entry>1</entry> <entry></entry> </row> <row> <entry><literal>ISO_8859_6</literal></entry> <entry>ISO 8859-6, <acronym>ECMA</> 114</entry> <entry>Latin/Arabic</entry> <entry>1</entry> <entry></entry> </row> <row> <entry><literal>ISO_8859_7</literal></entry> <entry>ISO 8859-7, <acronym>ECMA</> 118</entry> <entry>Latin/Greek</entry> <entry>1</entry> <entry></entry> </row> <row> <entry><literal>ISO_8859_8</literal></entry> <entry>ISO 8859-8, <acronym>ECMA</> 121</entry> <entry>Latin/Hebrew</entry> <entry>1</entry> <entry></entry> </row> <row> <entry><literal>JOHAB</literal></entry> <entry><acronym>JOHAB</></entry> <entry>Korean (Hangul)</entry> <entry>1-3</entry> <entry></entry> </row> <row> <entry><literal>KOI8</literal></entry> <entry><acronym>KOI</acronym>8-R(U)</entry> <entry>Cyrillic</entry> <entry>1</entry> <entry><literal>KOI8R</></entry> </row> <row> <entry><literal>LATIN1</literal></entry> <entry>ISO 8859-1, <acronym>ECMA</> 94</entry> <entry>Western European</entry> <entry>1</entry> <entry><literal>ISO88591</></entry> </row> <row> <entry><literal>LATIN2</literal></entry> <entry>ISO 8859-2, <acronym>ECMA</> 94</entry> <entry>Central European</entry> <entry>1</entry> <entry><literal>ISO88592</></entry> </row> <row> <entry><literal>LATIN3</literal></entry> <entry>ISO 8859-3, <acronym>ECMA</> 94</entry> <entry>South European</entry> <entry>1</entry> <entry><literal>ISO88593</></entry> </row> <row> <entry><literal>LATIN4</literal></entry> <entry>ISO 8859-4, <acronym>ECMA</> 94</entry> <entry>North European</entry> <entry>1</entry> <entry><literal>ISO88594</></entry> </row> <row> <entry><literal>LATIN5</literal></entry> <entry>ISO 8859-9, <acronym>ECMA</> 128</entry> <entry>Turkish</entry> <entry>1</entry> <entry><literal>ISO88599</></entry> </row> <row> <entry><literal>LATIN6</literal></entry> <entry>ISO 8859-10, <acronym>ECMA</> 144</entry> <entry>Nordic</entry> <entry>1</entry> <entry><literal>ISO885910</></entry> </row> <row> <entry><literal>LATIN7</literal></entry> <entry>ISO 8859-13</entry> <entry>Baltic</entry> <entry>1</entry> <entry><literal>ISO885913</></entry> </row> <row> <entry><literal>LATIN8</literal></entry> <entry>ISO 8859-14</entry> <entry>Celtic</entry> <entry>1</entry> <entry><literal>ISO885914</></entry> </row> <row> <entry><literal>LATIN9</literal></entry> <entry>ISO 8859-15</entry> <entry>LATIN1 with Euro and accents</entry> <entry>1</entry> <entry>ISO885915</entry> </row> <row> <entry><literal>LATIN10</literal></entry> <entry>ISO 8859-16, <acronym>ASRO</> SR 14111</entry> <entry>Romanian</entry> <entry>1</entry> <entry><literal>ISO885916</></entry> </row> <row> <entry><literal>MULE_INTERNAL</literal></entry> <entry>Mule internal code</entry> <entry>Multilingual Emacs</entry> <entry>1-4</entry> <entry></entry> </row> <row> <entry><literal>SJIS</literal></entry> <entry>Shift JIS</entry> <entry>Japanese</entry> <entry>1-2</entry> <entry><literal>Mskanji</>, <literal>ShiftJIS</>, <literal>WIN932</>, <literal>Windows932</></entry> </row> <row> <entry><literal>SQL_ASCII</literal></entry> <entry>unspecified (see text)</entry> <entry><emphasis>any</></entry> <entry>1</entry> <entry></entry> </row> <row> <entry><literal>UHC</literal></entry> <entry>Unified Hangul Code</entry> <entry>Korean</entry> <entry>1-2</entry> <entry><literal>WIN949</>, <literal>Windows949</></entry> </row> <row> <entry><literal>UTF8</literal></entry> <entry>Unicode, 8-bit</entry> <entry><emphasis>all</></entry> <entry>1-4</entry> <entry><literal>Unicode</></entry> </row> <row> <entry><literal>WIN866</literal></entry> <entry>Windows CP866</entry> <entry>Cyrillic</entry> <entry>1</entry> <entry><literal>ALT</></entry> </row> <row> <entry><literal>WIN874</literal></entry> <entry>Windows CP874</entry> <entry>Thai</entry> <entry>1</entry> <entry></entry> </row> <row> <entry><literal>WIN1250</literal></entry> <entry>Windows CP1250</entry> <entry>Central European</entry> <entry>1</entry> <entry></entry> </row> <row> <entry><literal>WIN1251</literal></entry> <entry>Windows CP1251</entry> <entry>Cyrillic</entry> <entry>1</entry> <entry><literal>WIN</></entry> </row> <row> <entry><literal>WIN1252</literal></entry> <entry>Windows CP1252</entry> <entry>Western European</entry> <entry>1</entry> <entry></entry> </row> <row> <entry><literal>WIN1256</literal></entry> <entry>Windows CP1256</entry> <entry>Arabic</entry> <entry>1</entry> <entry></entry> </row> <row> <entry><literal>WIN1258</literal></entry> <entry>Windows CP1258</entry> <entry>Vietnamese</entry> <entry>1</entry> <entry><literal>ABC</>, <literal>TCVN</>, <literal>TCVN5712</>, <literal>VSCII</></entry> </row> </tbody> </tgroup> </table> <para> Not all <acronym>API</>s support all the listed character sets. For example, the <productname>PostgreSQL</> JDBC driver does not support <literal>MULE_INTERNAL</>, <literal>LATIN6</>, <literal>LATIN8</>, and <literal>LATIN10</>. </para> <para> The <literal>SQL_ASCII</> setting behaves considerably differently from the other settings. When the server character set is <literal>SQL_ASCII</>, the server interprets byte values 0-127 according to the ASCII standard, while byte values 128-255 are taken as uninterpreted characters. No encoding conversion will be done when the setting is <literal>SQL_ASCII</>. Thus, this setting is not so much a declaration that a specific encoding is in use, as a declaration of ignorance about the encoding. In most cases, if you are working with any non-ASCII data, it is unwise to use the <literal>SQL_ASCII</> setting, because <productname>PostgreSQL</productname> will be unable to help you by converting or validating non-ASCII characters. </para> </sect2> <sect2> <title>Setting the Character Set</title> <para> <command>initdb</> defines the default character set for a <productname>PostgreSQL</productname> cluster. For example,<screen>initdb -E EUC_JP</screen> sets the default character set (encoding) to <literal>EUC_JP</literal> (Extended Unix Code for Japanese). You can use <option>--encoding</option> instead of <option>-E</option> if you prefer to type longer option strings. If no <option>-E</> or <option>--encoding</option> option is given, <command>initdb</> attempts to determine the appropriate encoding to use based on the specified or default locale. </para> <para> You can create a database with a different character set:<screen>createdb -E EUC_KR korean</screen> This will create a database named <literal>korean</literal> that uses the character set <literal>EUC_KR</literal>. Another way to accomplish this is to use this SQL command:<programlisting>CREATE DATABASE korean WITH ENCODING 'EUC_KR';</programlisting> The encoding for a database is stored in the system catalog <literal>pg_database</literal>. You can see that by using the <option>-l</option> option or the <command>\l</command> command of <command>psql</command>.<screen>$ <userinput>psql -l</userinput> List of databases Database | Owner | Encoding ---------------+---------+--------------- euc_cn | t-ishii | EUC_CN euc_jp | t-ishii | EUC_JP euc_kr | t-ishii | EUC_KR euc_tw | t-ishii | EUC_TW mule_internal | t-ishii | MULE_INTERNAL postgres | t-ishii | EUC_JP regression | t-ishii | SQL_ASCII template1 | t-ishii | EUC_JP test | t-ishii | EUC_JP utf8 | t-ishii | UTF8(9 rows)</screen> </para> <important> <para> Although you can specify any encoding you want for a database, it is unwise to choose an encoding that is not what is expected by the locale you have selected. The <literal>LC_COLLATE</literal> and <literal>LC_CTYPE</literal> settings imply a particular encoding, and locale-dependent operations (such as sorting) are likely to misinterpret data that is in an incompatible encoding. </para> <para> Since these locale settings are frozen by <command>initdb</>, the apparent flexibility to use different encodings in different databases of a cluster is more theoretical than real. It is likely that these mechanisms will be revisited in future versions of <productname>PostgreSQL</productname>. </para> <para> One way to use multiple encodings safely is to set the locale to <literal>C</> or <literal>POSIX</> during <command>initdb</>, thus disabling any real locale awareness. </para> </important> </sect2> <sect2> <title>Automatic Character Set Conversion Between Server and Client</title> <para> <productname>PostgreSQL</productname> supports automatic character set conversion between server and client for certain character sets. The conversion information is stored in the <literal>pg_conversion</> system catalog. You can create a new conversion by using the SQL command <command>CREATE CONVERSION</command>. <productname>PostgreSQL</> comes with some predefined conversions. They are listed in <xref linkend="multibyte-translation-table">. </para> <table id="multibyte-translation-table"> <title>Client/Server Character Set Conversions</title> <tgroup cols="2"> <thead> <row> <entry>Server Character Set</entry> <entry>Available Client Character Sets</entry> </row> </thead> <tbody> <row> <entry><literal>BIG5</literal></entry> <entry><emphasis>not supported as a server encoding</emphasis> </entry> </row> <row> <entry><literal>EUC_CN</literal></entry> <entry><emphasis>EUC_CN</emphasis>, <literal>MULE_INTERNAL</literal>, <literal>UTF8</literal> </entry> </row> <row> <entry><literal>EUC_JP</literal></entry> <entry><emphasis>EUC_JP</emphasis>, <literal>MULE_INTERNAL</literal>, <literal>SJIS</literal>, <literal>UTF8</literal> </entry> </row> <row> <entry><literal>EUC_KR</literal></entry> <entry><emphasis>EUC_KR</emphasis>, <literal>MULE_INTERNAL</literal>, <literal>UTF8</literal> </entry> </row> <row> <entry><literal>EUC_TW</literal></entry> <entry><emphasis>EUC_TW</emphasis>, <literal>BIG5</literal>, <literal>MULE_INTERNAL</literal>, <literal>UTF8</literal> </entry> </row> <row> <entry><literal>GB18030</literal></entry> <entry><emphasis>not supported as a server encoding</emphasis> </entry> </row> <row> <entry><literal>GBK</literal></entry> <entry><emphasis>not supported as a server encoding</emphasis> </entry> </row> <row> <entry><literal>ISO_8859_5</literal></entry> <entry><emphasis>ISO_8859_5</emphasis>, <literal>KOI8</literal>, <literal>MULE_INTERNAL</literal>, <literal>UTF8</literal>, <literal>WIN866</literal>, <literal>WIN1251</literal> </entry> </row> <row> <entry><literal>ISO_8859_6</literal></entry>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -