📄 rfc1345.txt
字号:
Network Working Group K. Simonsen
Request for Comments: 1345 Rationel Almen Planlaegning
June 1992
Character Mnemonics & Character Sets
Status of the Memo
This memo provides information for the Internet community. It does
not specify an Internet standard. Distribution of this memo is
unlimited.
Summary
This memo lists a selection of characters and their presence in some
coded character sets. To facilitate the coded character set
tabulations an unambiguous mnemonic for each character is used, and a
format for tabulating the coded character sets is defined. The coded
character sets are given names for easy reference. A family of coded
character sets called the mnemonic character sets and conversion
between these coded character set without information loss is
defined.
The character set names are registered with the Internet Assigned
Numbers Authority (IANA). Additional character sets not described in
this memo should be registered with the IANA. This memo may be
updated periodically, or additional specifications may be published,
to reflect other coded character sets.
Please send any comments including comments about the accuracy of the
tables to the author, Keld Simonsen <Keld.Simonsen@dkuug.dk>.
1. INTRODUCTION
With the growing internationalization of the Internet, support for
many coded character sets is required. It is the intention of this
memo to document precisely the mapping between all characters and
their corresponding coded representations in various coded character
sets, and give names to these coded character sets, so they can be
referenced unambiguously in Internet standards.
This memo does not indicate anything about the validity of using
these specifications in any Internet standard, so you should consult
each individual Internet standard to see which coded character sets
and names are allowed there.
Unambiguous character mnemonics are specified, which provide a
practical way of identifying a character, without reference to a
coded character set and its code in this coded character set. The
mnemonics are written in a minimal set of characters, namely the
invariant 83 graphical characters of ISO 646, which is a kind of
greatest common subset to be found between the majority of coded
Simonsen [Page 1]
RFC 1345 Character Mnemonics & Character Sets June 1992
character sets, including ASCII, national variants of the ISO 646 7-
bit character set and various EBCDICs. In addition, the numeric
value of the coded representations of all these characters are the
same in all coded character sets compatible with ISO standards. All
of them except two, EXCLAMATION MARK and QUOTATION MARK, have the
same coded representation in all variants of EBCDIC. This minimal
set of characters is called the reference character set in this memo.
The mnemonics can be used in Internet standards for easy and
unambiguous reference, and they can also serve as a fallback
representation in various Internet specifications.
The coded character sets covered include all parts of ISO 8859, ISO
6937-2 and all ISO 646 conforming coded character sets in the ISO
character set registry managed by ECMA according to ISO 2375. Almost
all graphic coded character sets in the ECMA registry (1) are
covered. The graphic coded character sets not included are registry
numbers 31, 38, 39, 53, 59, 68, 71, 72, 129 and 137. In addition
many vendor defined character sets are covered, including PC
codepages (4), (7), (8), many EBCDIC character sets (4), (5), (6) and
HP, DEC and Apple character sets (8), (9), (10), (13), (14). The
East-Asian 16-bit character sets from the ECMA registry is also
included in this memo.
2. CHARACTER MNEMONICS
2.1 General Syntax
The character mnemonics are taken from the ISO committee draft (CD)
of the POSIX.2 standard (3). They are classified into two groups:
1. A group with two-character mnemonics
- Primarily intended for alphabetic scripts like Latin, Greek,
Cyrillic, Hebrew and Arabic, and special characters.
2. A group with variable-length mnemonics
- primarily intended for non-alphabetic scripts like Japanese and
Chinese, but also used for some accented letters and special
characters.
In the two-character mnemonics, all invariant graphic character in
the ISO 646 character codes except "&" are used, i.e. the following
characters:
! " % ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z _
a b c d e f g h i j k l m n o p q r s t u v w x y z
The character "_" is not used as the first character.
In the variable-length mnemonics, the character "_" is not used as
the first character. If it is used in a name, its presence is
doubled.
Simonsen [Page 2]
RFC 1345 Character Mnemonics & Character Sets June 1992
The mnemonics can be used in several different ways for different
purposes. One of these is description of coded character sets, which
is detailed in section 3. Another is for extending a given coded
character set to a mnemonic character set. This is described in
section 4. The restrictions on the use of the characters "&" and "_"
are due to demands of the compositional methods of these techniques.
2.2 ISO Official Long Descriptive Character Name
For all mnemonics, the character for which it stands is indicated in
the following table by a long descriptive name. This name is
identical to the ISO name of the character as given in reference (2).
For a few characters that are not included there, descriptive names
of the same kind are introduced in this memo. The source of each
character is stated in the table after the name and should be
consulted for a reliable identification of the character.
These long descriptive names consists only of the capital Latin
letters of the invariant part of ISO 646, the digits, "-", and SPACE.
Digits are only used in names of ideographic and Hangul characters
and never as the first character.
2.3 The 2-character Mnemonics
The two-character mnemonics include various accented Latin letters,
Greek, Cyrillic, Hebrew, Arabic, Hiragana and Katakana. Also a fair
number of special characters are included. Almost all ISO or ISO
registered 7- and 8-bit graphical coded character sets are covered
with these two-character mnemonics.
The two characters are chosen so the graphical appearance in the
reference set resembles as much as possible (within the possibilities
available) the graphical appearance of the character. The basic
character set of ISO 646 is used as the reference set, as mentioned
above.
The characters in the reference character set are chosen to represent
themselves.
For control characters from ISO 646 the two-character acronyms of ISO
2047 are used as mnemonics. For the other control characters of ISO
6429, two-character mnemonics have been selected based on the
variable-length acronyms used in that standard.
Letters, including Greek, Cyrillic, Arabic and Hebrew, are
represented with the base letter as the first letter, and the second
letter represents an accent or relation to a non-Latin script. Non-
Latin letters are transliterated to Latin letters, following
transliteration standards as closely as possible. This is also done
with the Latin letters such as ETH and THORN, and the
Danish/Norwegian/Swedish letter A WITH RING ABOVE is transliterated
into "aa".
Simonsen [Page 3]
RFC 1345 Character Mnemonics & Character Sets June 1992
After a letter, the second character signifies the following:
Exclamation mark ! Grave
Apostrophe ' Acute accent
Greater-Than sign > Circumflex accent
Question Mark ? tilde
Hyphen-Minus - Macron
Left parenthesis ( Breve
Full Stop . Dot Above
Colon : Diaeresis
Comma , Cedilla
Underline _ Underline
Solidus / Stroke
Quotation mark " Double acute accent
Semicolon ; Ogonek
Less-Than sign < Caron
Zero 0 Ring above
Two 2 Hook
Nine 9 Horn
Equals = Cyrillic
Asterisk * Greek
Percent sign % Greek/Cyrillic special
Plus + smalls: Arabic, capitals: Hebrew
Three 3 some Latin/Greek/Cyrillic letters
Four 4 Bopomofo
Five 5 Hiragana
Six 6 Katakana
In designing the mnemonics the following special characters were
reserved: The ampersand is reserved as an intro character, indicating
that the following string is in the mnemonic character set. The
underline character is reserved for the variable-length mnemonics.
This use does not eliminate usage as an accent or language
identifier.
Special characters are encoded with some mnemonic value. These are
not systematic thruout, but most mnemonics start with a related
special character of the reference set.
2.4 The Variable-length Character Mnemonics
The Variable-length Character Mnemonics are primarily meant for the
ideographic characters in larger Asian character sets, but are also
used for accented characters with several accents and some special
characters. To have the mnemonics as short as possible, which both
saves storage and is easier to input, a quite short name is
preferred. Considering the Chinese standard GB 2312-1980, the
Japanese standards JIS X0208 and JIS X0212, and the Korean standard
KS C 5601, they are all given by row and column numbers between 1 and
94. So two positions for row and column and a character set
identifier of one character would be almost as short as possible.
The following character set identifiers are defined:
Simonsen [Page 4]
RFC 1345 Character Mnemonics & Character Sets June 1992
c GB 2312-1980
j JIS X0208-1990
J JIS X0212-1990
k KS C 5601-1987
This system for the representation of ideographic characters and
Hangul characters is not truly mnemonic, but it provides short
representations that are easy to connect to the corresponding
character by means of the code table of an official character set
standard. Alternative methods based on the graphic appearance or the
pronunciation of the characters are thought to be unfeasible.
One prominent character in the reference character set is reserved
for identifying variable-length mnemonics, namely the underline
character "_". This character is intended as a delimiter both in the
front and in the end of the mnemonic. An example of its use would be:
(&=intro):
&_j3210_ &_j4436_&_j6530_
3. CHARACTER MNEMONIC TABLE
The following table contains the character mnemonic and the encoding
and long descriptive name of ISO 2DIS 10646 (2). Although the ISO
10646 is only at DIS stage at this moment of writing and there is
quite some debate about it, the long descriptive naming in the DIS is
considered to be stable and the best official ISO reference to
character names. The 2-octet encoded value of the ISO 2DIS 10646 is
also used, but only as an identification of the character, and it
should only be used for identification purposes as the coded
representation may be changed in the final 10646 international
standard. Some characters not in the ISO 2DIS 10646 are allocated
values in the private use zone and given names and references to a
character set where it is used.
The format of the table is:
1st field is the character mnemonic (mostly 2 characters).
2nd field is the ISO 2DIS 10646 code in hexadecimal.
3rd field is the long descriptive name of ISO 2DIS 10646.
SP 0020 SPACE
! 0021 EXCLAMATION MARK
" 0022 QUOTATION MARK
Nb 0023 NUMBER SIGN
DO 0024 DOLLAR SIGN
% 0025 PERCENT SIGN
& 0026 AMPERSAND
' 0027 APOSTROPHE
( 0028 LEFT PARENTHESIS
) 0029 RIGHT PARENTHESIS
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -