📄 webi18n5.html
字号:
<?xml version="1.0" encoding="ISO-8859-1"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" /> <meta http-equiv="Content-Style-Type" content="text/css" /> <title>Character Sets and Encodings</title> <link rel="StyleSheet" href="document.css" type="text/css" media="all" /> <link rel="StyleSheet" href="catalog.css" type="text/css" media="all" /> <link rel="Table of Contents" href="J2EETutorialTOC.html" /> <link rel="Previous" href="WebI18N4.html" /> <link rel="Next" href="WebI18N6.html" /> <link rel="Index" href="J2EETutorialIX.html" /> </head> <body> <table width="550" summary="layout" id="SummaryNotReq1"> <tr> <td align="left" valign="center"> <font size="-1"> <a href="http://java.sun.com/j2ee/1.4/download.html#tutorial" target="_blank">Download</a> <br> <a href="http://java.sun.com/j2ee/1.4/docs/tutorial/information/faq.html" target="_blank">FAQ</a> <br> <a href="http://java.sun.com/j2ee/1.4/docs/tutorial/information/history.html" target="_blank">History</a> </td> <td align="center" valign="center"><a accesskey="p" href="WebI18N4.html"><img id="LongDescNotReq1" src="images/PrevArrow.gif" width="26" height="26" border="0" alt="Prev" /></a><a accesskey="c" href="J2EETutorialFront.html"><img id="LongDescNotReq1" src="images/UpArrow.gif" width="26" height="26" border="0" alt="Home" /></a><a accesskey="n" href="WebI18N6.html"><img id="LongDescNotReq3" src="images/NextArrow.gif" width="26" height="26" border="0" alt="Next" /></a><a accesskey="i" href="J2EETutorialIX.html"></a> </td> <td align="right" valign="center"> <font size="-1"> <a href="http://java.sun.com/j2ee/1.4/docs/api/index.html" target="_blank">API</a> <br> <a href="http://java.sun.com/j2ee/1.4/docs/tutorial/information/search.html" target="_blank">Search</a> <br> <a href="http://java.sun.com/j2ee/1.4/docs/tutorial/information/sendusmail.html" target="_blank">Feedback</a></font> </font> </td> </tr> </table> <img src="images/blueline.gif" width="550" height="8" ALIGN="BOTTOM" NATURALSIZEFLAG="3" ALT="Divider"> <blockquote><a name="wp86518"> </a><h2 class="pHeading1">Character Sets and Encodings</h2><a name="wp87070"> </a><h3 class="pHeading2">Character Sets</h3><a name="wp87972"> </a><p class="pBody">A <em class="cEmphasis">character set</em> is a set of textual and graphic symbols, each of which is mapped to a set of nonnegative integers. </p><a name="wp87986"> </a><p class="pBody">The first character set used in computing was ASCII. It is limited in that it can only represent American English. ASCII contains upper- and lower-case Latin alphabets, numerals, punctuation, a set of control codes, and a few miscellaneous symbols. </p><a name="wp87930"> </a><p class="pBody">Unicode defines a standardized, universal character set that can be extended to accommodate additions. Unicode characters may be represented as escape sequences, using the notation <code class="cCode">\u</code><code class="cVariable">XXXX</code>, where <code class="cVariable">XXXX</code> is the character's 16-bit representation in hexadecimal when the Java program source file encoding doesn't support Unicode. For example, the Spanish version of the Duke's Bookstore message file uses Unicode for non-ASCII characters:</p><div class="pPreformattedRelative"><pre class="pPreformattedRelative">{"TitleCashier", "Cajero"},{"TitleBookDescription", "Descripci" + "\u00f3" + "n del Libro"},{"Visitor", "Es visitanten" + "\u00fa" + "mero "},{"What", "Qu" + "\u00e9" + " libros leemos"},{"Talk", " describe como componentes de software de web pueden transformar la manera en que desrrollamos aplicaciones para el web. Este libro es obligatorio para cualquier programador de respeto!"},{"Start", "Empezar a Comprar"},<a name="wp86524"> </a></pre></div><a name="wp86526"> </a><h3 class="pHeading2">Character Encoding</h3><a name="wp87950"> </a><p class="pBody">A <em class="cEmphasis">character</em> <em class="cEmphasis">encoding</em> maps a character set to units of a specific width, and defines byte serialization and ordering rules. Many character sets have more than one encoding. For example, Java programs can represent Japanese character sets using the EUC-JP or Shift-JIS encodings, among others. Each encoding has rules for representing and serializing a character set. </p><a name="wp88281"> </a><p class="pBody">The ISO 8859 series defines thirteen character encodings that can represent texts in dozens of languages. Each ISO 8859 character encoding may have up to 256 characters. ISO 8859-1 (Latin-1) comprises the ASCII character set, characters with diacritics (accents, diaereses, cedillas, circumflexes, and so on), and additional symbols. </p><a name="wp86530"> </a><p class="pBody">UTF-8 (Unicode Transformation Format, 8 bit form) is a variable-width character encoding that encodes 16-bit Unicode characters as one to four bytes. A byte in UTF-8 is equivalent to 7-bit ASCII if its high-order bit is zero; otherwise, the character comprises a variable number of bytes.</p><a name="wp87033"> </a><p class="pBody">UTF-8 is compatible with the majority of existing Web content and provides access to the Unicode character set. Current versions of browsers and email clients support UTF-8. In addition, many new Web standards specify UTF-8 as their character encoding. For example, UTF-8 is one of the two required encodings for XML documents (the other is UTF-16). </p><a name="wp88317"> </a><p class="pBody">See Appendix <a href="Encodings.html#wp64107">A</a> for more information on character encodings in the Java 2 platform.</p><a name="wp88324"> </a><p class="pBody">Web components usually use <code class="cCode">PrintWriter</code> to produce responses, which automatically encodes using ISO 8859-1. Servlets may also output binary data with <code class="cCode">OutputStream</code> classes, which perform no encoding. An application that uses a character set that cannot use the default encoding must explicitly set a different encoding.</p><a name="wp88328"> </a><p class="pBody">For Web components, three encodings must be considered:</p><div class="pSmartList1"><ul class="pSmartList1"><a name="wp87044"> </a><div class="pSmartList1"><li>Request</li></div><a name="wp87048"> </a><div class="pSmartList1"><li>Page (JSP pages)</li></div><a name="wp87049"> </a><div class="pSmartList1"><li>Response</li></div></ul></div><a name="wp86652"> </a><h4 class="pHeading3">Request Encoding</h4><a name="wp86653"> </a><p class="pBody">The <em class="cEmphasis">request encoding</em> is the character encoding in which parameters in an incoming request are interpreted. Currently, many browsers do not send a request encoding qualifier with the <code class="cCode">Content-Type</code> header. In such cases, a Web container will use the default encoding--ISO-8859-1--to parse request data. </p>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -