📄 rfc2130.txt
字号:
Weider, et. al. Informational [Page 19]RFC 2130 Character Set Workshop Report April 1997Appendix A:A-1: IETF Protocols The following list describes how various existing protocols handle multiple character set information. Email SMTP See 8.2. ESMTP makes it easy to negotiate the use of alternate language and encoding if it is needed. Headers RFC 1522 forms an adequate framework for supporting text; UTF-8 alone is not a possible solution, because the mail pathways are assumed to be 7-bit 'forever'. However, RFC 1522 should be extended to allow language tagging of the free text parts of message headers. Bodies Selection of charset parameters for Email text bodies is reasonably well covered by the charset= parameter on Text/* MIME types. Language is defined by the Content-language header of RFC 1766. Other information will have to be added using body part headers; due to the way MIME differentiates between body part headers and message headers, these will all have to have names starting with Content- . NetNews NNTP See 8.2. No strong tradition for negotiation of encoding in NNTP exists. NetNews Messages These should be able to leverage off the mechanisms defined for Email. One difference is that nearly all NNTP channels are 8- bit clean; some NNTP newsgroups have a tradition of using 8-bit charsets in both headers and bodies. Defining character set default on a per newsgroup basis might be a suitable approach. RTCP The identifiers carried as information about parties are already defined to be in UTF-8.Weider, et. al. Informational [Page 20]RFC 2130 Character Set Workshop Report April 1997 FTP Protocol See 8.2. The common use of welcome banners in the login response means that there might be strong reason here to allow client and server to negotiate a language different from the default for greetings and error messages. This should be a simple protocol extension. Filenames Many fileservers now how have the capability of using non-ASCII characters in filenames, while the "dir" and "get" commands of are defined in terms of US-ASCII only. One possible solution would be to define a "UTF-8" mode for the transfer of filenames and directory information; this would need to be a negotiated facility, with fallback to US-ASCII if not negotiated. The important point here is consistency between all implementations; a single charset is better here than the ability to handle multiple charsets. World Wide Web HTTP See 8.2. The single-shot stype of HTTP makes negotiation more complex than it would otherwise be. HTML Internationalization of HTML [I18N] seems fairly well covered in the current "I18N" document. It needs review to see if it needs more specific details in order to carry application information apart from the language. URLs URLs are "input identifiers", and powerful arguments should be made if they are ever to be anything but US-ASCII. IMAP IMAP's information objects are MIME Email objects, and therefore are able to use that standard's methods. However, IMAP folder names are local identifiers; there is strong reason to allow non-ASCII characters in these. A UTF-8 negotiation might be the most appropriate thing, however, UTF-8 is awkward to use. Unfortunately, UTF-7 isn't suitable because it conflicts with popular hierarchy delimiters. The most recent IMAP work in progress specification describes a modified UTF-7 which avoids this problem.Weider, et. al. Informational [Page 21]RFC 2130 Character Set Workshop Report April 1997 DNS DNS names are the prime example of identifiers that need to stay in US-ASCII for global interoperability. However, some DNS information, in particular TXT records, may represent information (such as names) that is outside the ASCII range. A single solution is the best; problems resulting from UTF-8 should be investigated. WHOIS++ WHOIS++ version 1 is defined to use ISO 8859-1. The next version will use UTF-8. The currently designed changes will also allow the specification of individual attributes on attribute names; these will make the passing of application information about the values (such as language) easier. No immediate action seems necessary. WHOIS This has been a stable protocol for so many years now that it seems unwise to suggest that it be modified. Furthermore, compatible extensions exist in RWHOIS and WHOIS++; modification should rather be made to these protocols than to the WHOIS protocol itself. Telnet This is a prime example of protocol where character set support is necessary and nonexistent. The current work in progress on character set negotiation in Telnet seems adequate to the task; the question of passing other application data that might be useful is still open.A-2: Non-IETF protocols For these protocols, the IETF does not have any power to change them. However, the guidelines developed by the workshop may still be useful as input to the further development of the protocols. Gopher: Gopher, Gopher+ Prospero (Archie) NFS: Filesystem CORBA, Finger, GEDI, IRC, ISO 10160/1, Kerberos, LPR, RSTAT, RWhois, SGML, TFTP, X11, X.500, Z39.50Weider, et. al. Informational [Page 22]RFC 2130 Character Set Workshop Report April 1997Appendix B: Acronyms ASCII American National Standard Code for Information Character Sets CCS Coded Character Sets CEN ENV European Committee for Standardisation (CEN) European pre-standard (ENV) CES Character Encoding Scheme CJK Chinese Japanese Korean CORBA Common Object Request Broker Architecture CTE Content Transfer Encoding DNS Domain Name Service ESMTP Extended SMTP FTP File Transfer Protocol HTML Hypertext Transfer Protocol I18N Internationalization (or 18 characters between the first (I) and last (n)character) IAB Internet Activities Board IANA Internet Assigned Numbers Authority IESG Internet Engineering Steering Group IETF Internet Engineering Task Force IMAP Internet Message Access Protocol IRC Internet Relay Chat IRTF Internet Research Task Force ISI Information Sciences Institute ISO International Standards Organization MIME Multipurpose Internet Mail Extensions NFS Networked File Server NNTP Net News Transfer Protocol POSIX Portable Operating System Interface RFC Request for Comments (Internet standards documents) RPC Remote Procedure Call RSTAT Remote Statistics RTCP Real-Time Transport Control Protocol Rwhois Referral Whois SGML Standard Generalized Mark-up Language SMTP Simple Mail Transfer Protocol TES Transfer Encoding Syntax TFTP Trivial File Transfer Protocol URL Uniform Resource Locator UTF Universal Text/Translation FormatWeider, et. al. Informational [Page 23]RFC 2130 Character Set Workshop Report April 1997Appendix C: Glossary Bi-directionality - A property of some text where text written right- to- left (Arabic or Hebrew) and text written left-to-right (e.g. Latin) are intermixed in one and the same line. Character - A single graphic symbol represented by sequence of one or more bytes. Character Encoding Scheme - The mapping from a coded character set to an encoding which may be more suitable for specific purpose. For example, UTF-8 is a character encoding scheme for ISO 10646. Character Set - An enumerated group of symbols (e.g., letters, numbers or glyphs) Coded Character Set - The mapping from a set of integers to the characters of a character set. Culture - Preferences in the display of text based on cultural norms, such as spelling and word choice. Language - The words and combinations of words the constitute a system of expression and communication among people with a shared history or set of traditions. Layout - Information needed to display text to the user, similar to the presentation layer in the ISO telecommunications model. Locale - The attributes of communication, such as language, character set and cultural conventions. On-the-wire - The data that actually gets put into packets for transmission to other computers. Transfer Encoding Syntax - The mapping from a coded character set which has been encoded in a Character Encoding Scheme to an encoding which may be more suitable for transmission using specific protocols. For example, Base64 is a transfer encoding syntax.Weider, et. al. Informational [Page 24]RFC 2130 Character Set Workshop Report April 1997Appendix D: References[*] Non-ASCII character[ASCII] ANSI X3.4:1986 "Coded Character Sets - 7 Bit American National Standard Code for Information Interchange (7-bit ASCII)"[Base64] Freed, N., and N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies", RFC 2045, November 1996.[CEN] see http://tobbi.iti.is/TC304/welcome.html for current status.[HTML] Berners-Lee, T., and D. Connolly, "Hypertext Markup Language - 2.0", RFC 1866, November 1995.[HTTP] Berners-Lee, T., Fielding, R., and H. Nielsen, "Hypertext Transfer Protocol -- HTTP/1.0", RFC 1945, May 1996.[I18N] Yergeau, F., et.al., "Internationalization of the Hypertext Markup Language", RFC 2070, January 1997.[IANA] Reynolds, J., and J. Postel, "Assigned Numbers", STD 2, RFC 1700, ISI, October 1994.[ISO-2022] ISO/IEC 2022:1994, "Information technology -- Character Code Structure and Extension Techniques", JTC1/SC2.[ISO-7498] ISO/IEC 7498-1:1994, "Information technology - Open Systems Interconnection - Basic Reference Model: The Basic Model".[ISO-8859] Information Processing -- 8-bit Single-Byte Coded Graphic Character Sets -- Part 1: Latin Alphabet no. 1, ISO 8859-1:1987(E). Part 2: Latin Alphabet no. 2, ISO 8859-2 1987(E). Part 3: Latin Alphabet no. 3, ISO 8859-3:1988(E). Part 4: Latin Alphabet no. 4, ISO 8859-4, 1988(E). Part 5: Latin/Cyrillic Alphabet ISO 8859-5, 1988(E). Part 6: Latin/Arabic Alphabet, ISO 8859-6, 1987(E). Part 7: Latin/Greek Alphabet, ISO 8859-7, 1987(E). Part 8: Latin/Hebrew Alphabet, ISO 8859-8-1988(E).Part 9: Latin Alphabet no. 5, ISO 8859-9, 1990(E). Part 10: Latin Alphabet no. 6, ISO 8859-10:1992(E).
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -