📄 charset.html
字号:
<HTML><HEAD><TITLE>Characters</TITLE></HEAD><BODY><H1><A NAME="Characters">Characters</A></H1><HR><P><B><A HREF="#Character Sets">Character Sets</A>· <A HREF="#Character Sets and Locales">Character Sets and Locales</A>· <A HREF="#Escape Sequences">Escape Sequences</A>· <A HREF="#Numeric Escape Sequences">Numeric Escape Sequences</A>· <A HREF="#Trigraphs">Trigraphs</A>· <A HREF="#Multibyte Characters">Multibyte Characters</A>· <A HREF="#Wide-Character Encoding">Wide-Character Encoding</A></B></P><HR><P>Characters play a central role in Standard C. You representa C program as one or more<B><A NAME="source file">source files</A></B>.The translator reads a source file as a<A NAME="text stream">text stream</A>consisting of characters that you can read when youdisplay the stream on a terminal screen or produce hard copy with aprinter. You often manipulate text when a C program executes. Theprogram might produce a text stream that people can read, or it mightread a text stream entered by someone typing at a keyboardor from a file modified using a text editor.This document describes the characters that youuse to write C source files and that you manipulate as streamswhen executing C programs.</P><H2><A NAME="Character Sets">Character Sets</A></H2><P>When you write a program, you express C source files as<A HREF="lib_file.html#text lines">text lines</A>containing characters from the<B><A NAME="source character set">source character set</A></B>.When a program executes in the<B><A NAME="target environment">target environment</A></B>,it uses characters from the<B><A NAME="target character set">target character set</A></B>.These character sets are related, but need not havethe same encoding or all the same members.</P><P>Every character set contains a distinct code value for eachcharacter in the<B><A NAME="basic C character set">basic C character set</A></B>.A character set can also contain additional characterswith other code values. For example:</P><UL><LI>The<B><A NAME="character constant">character constant</A></B><CODE>'x'</CODE> becomes the value ofthe code for the character corresponding to <CODE>x</CODE> in the targetcharacter set.</LI><LI>The<B><A NAME="string literal">string literal</A></B><CODE>"xyz"</CODE> becomes a sequence ofcharacter constants stored in successive bytes of memory, followedby a byte containing the value zero:<BR><CODE>{'x', 'y', 'z', '\0'}</CODE></LI></UL><P>A string literal is one way to specify a<B><A NAME="null-terminated string">null-terminated string</A></B>,an array of zero or more bytes followed by a byte containing thevalue zero.</P><P><B><A NAME="visible graphic characters">Visible graphic characters</A></B>in the basic C character set:</P><PRE><B>Form</B> <B>Members</B><I>letter</I> A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e f g h i j k l m n o p q r s t u v w x y z<I>digit</I> 0 1 2 3 4 5 6 7 8 9<I>underscore</I> _<I>punctuation</I> ! " # % & ' ( ) * + , - . / : ; < = > ? [ \ ] ^ { | } ~</PRE><P><B><A NAME="additional graphic characters">Additionalgraphic characters</A></B> in the basic C character set:</P><PRE><B>Character</B> <B>Meaning</B><A NAME="space"><I>space</I></A> <I>leave blank space</I><A NAME="BEL"><I>BEL</I></A> <I>signal an alert (BELl)</I><A NAME="BS"><I>BS</I></A> <I>go back one position (BackSpace)</I><A NAME="FF"><I>FF</I></A> <I>go to top of page (Form Feed)</I><A NAME="NL"><I>NL</I></A> <I>go to start of next line (NewLine)</I><A NAME="CR"><I>CR</I></A> <I>go to start of this line (Carriage Return)</I><A NAME="HT"><I>HT</I></A> <I>go to next Horizontal Tab stop</I><A NAME="VT"><I>VT</I></A> <I>go to next Vertical Tab stop</I></PRE><P>The code value zero is reserved for the<B><A NAME="null character">null character</A></B>which is always in the target character set. Code values for the basicC character set are positive when stored in an object of type <I>char.</I>Code values for the digits are contiguous, with increasing value.For example, <CODE>'0' + 5</CODE> equals <CODE>'5'</CODE>.Code values for anytwo letters are <I>not</I> necessarily contiguous.</P><H3><A NAME="Character Sets and Locales">Character Sets and Locales</A></H3><P>An implementation can support multiple<A HREF="locale.html">locales</A>, eachwith a different character set. A locale summarizes conventions particularto a given culture, such as how to format dates or how to sort names.To change locales and, therefore, target character sets while theprogram is running, use the function<A HREF="locale.html#setlocale"><CODE>setlocale</CODE></A>.The translator encodes character constants andstring literals for the<A HREF="locale.html#C locale"><CODE>"C"</CODE></A> locale,which is the locale in effect at program startup.</P><H2><A NAME="Escape Sequences">Escape Sequences</A></H2><P>Within character constants and string literals, you can writea variety of <B>escape sequences</B>. Each escape sequence determinesthe code value for a single character. You use escape sequencesto represent character codes:</P><UL><LI>you cannot otherwise write (such as <CODE>\n</CODE>)</LI><LI>that can be difficult to read properly (such as <CODE>\t</CODE>)</LI><LI>that might change value in different target character sets (suchas <CODE>\a</CODE>)</LI><LI>that must not change in value among different target environments(such as <CODE>\0</CODE>)</LI></UL><P>An escape sequence takes the form shown in the diagram.</P><P><IMG SRC="escape.gif"></P><P><B><A NAME="mnemonic escape sequences">Mnemonic escape sequences</A></B>help you remember the characters they represent:</P><PRE><B>Character</B> <B>Escape Sequence</B>" \"' \'? \?\ \\<I>BEL</I> \a<I>BS</I> \b<I>FF</I> \f<I>NL</I> \n<I>CR</I> \r<I>HT</I> \t<I>VT</I> \v</PRE><H3><A NAME="Numeric Escape Sequences">Numeric Escape Sequences</A></H3><P>You can also write <B>numeric escape sequences</B> using eitheroctal or hexadecimal digits. An<B><A NAME="octal escape sequence">octal escape sequence</A></B>takes one of the forms:</P><PRE> \<I>d</I> <B>or</B> \<I>dd</I> <B>or</B> \<I>ddd</I></PRE><P>The escape sequence yields a code value that is the numericvalue of the 1-, 2-, or 3-digit octal number following the backslash(<CODE>\</CODE>). Each <CODE><I>d</I></CODE> can beany digit in the range <CODE>0-7</CODE>.</P><P>A<B><A NAME="hexadecimal escape sequence">hexadecimal escape sequence</A></B> takes one of the forms:</P><PRE> \x<I>h</I> <B>or</B> \x<I>hh</I> <B>or</B> ...</PRE><P>The escape sequence yields a code value that is the numericvalue of the arbitrary-length hexadecimal number following the backslash(<CODE>\</CODE>). Each <CODE><I>h</I></CODE> can be anydecimal digit <CODE>0-9</CODE>, orany of the letters <CODE>a-f</CODE> or <CODE>A-F</CODE>.The letters representthe digit values 10-15, where either <CODE>a</CODE> or <CODE>A</CODE> hasthe value 10.</P><P>A numeric escape sequence terminates with the first characterthat does not fit the digit pattern. Here are some examples:</P><UL><LI>You can write the<A HREF="#null character">null character</A>as <CODE>'\0'</CODE>.</LI><LI>You can write a newline character (<CODE><I>NL</I></CODE>)within a string literal by writing:<BR><CODE>"hi\n" <B>which becomes the array</B><BR> {'h', 'i', '\n', 0}</CODE></LI><LI>You can write a string literal that begins with a specific numericvalue:<BR><CODE>"\3abc" <B>which becomes the array</B><BR> {3, 'a', 'b', 'c', 0}</CODE></LI><LI>You can write a string literal that contains the hexadecimalescape sequence <CODE>\xF</CODE> followed bythe digit <CODE>3</CODE> by writingtwo string literals:<BR><CODE>"\xF" "3" <B>which becomes the array</B><BR> {0xF, '3', 0}</CODE></LI></UL>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -