📄 mbyte.texi
字号:
to a wide character array. The @code{wcstombs} function does thereverse. These functions are declared in the header file@file{stdlib.h}.In most programs, these functions are the only ones you need forconversion between wide strings and multibyte character strings. Butthey have limitations. If your data is not null-terminated or is notall in core at once, you probably need to use the low-level conversionfunctions to convert one character at a time. @xref{Converting OneChar}.@comment stdlib.h@comment ANSI@deftypefun size_t mbstowcs (wchar_t *@var{wstring}, const char *@var{string}, size_t @var{size})The @code{mbstowcs} (``multibyte string to wide character string'')function converts the null-terminated string of multibyte characters@var{string} to an array of wide character codes, storing not more than@var{size} wide characters into the array beginning at @var{wstring}.The terminating null character counts towards the size, so if @var{size}is less than the actual number of wide characters resulting from@var{string}, no terminating null character is stored.The conversion of characters from @var{string} begins in the initialshift state.If an invalid multibyte character sequence is found, this functionreturns a value of @code{-1}. Otherwise, it returns the number of widecharacters stored in the array @var{wstring}. This number does notinclude the terminating null character, which is present if the numberis less than @var{size}.Here is an example showing how to convert a string of multibytecharacters, allocating enough space for the result.@smallexamplewchar_t *mbstowcs_alloc (const char *string)@{ size_t size = strlen (string) + 1; wchar_t *buf = xmalloc (size * sizeof (wchar_t)); size = mbstowcs (buf, string, size); if (size == (size_t) -1) return NULL; buf = xrealloc (buf, (size + 1) * sizeof (wchar_t)); return buf;@}@end smallexample@end deftypefun@comment stdlib.h@comment ANSI@deftypefun size_t wcstombs (char *@var{string}, const wchar_t @var{wstring}, size_t @var{size})The @code{wcstombs} (``wide character string to multibyte string'')function converts the null-terminated wide character array @var{wstring}into a string containing multibyte characters, storing not more than@var{size} bytes starting at @var{string}, followed by a terminatingnull character if there is room. The conversion of characters begins inthe initial shift state.The terminating null character counts towards the size, so if @var{size}is less than or equal to the number of bytes needed in @var{wstring}, noterminating null character is stored.If a code that does not correspond to a valid multibyte character isfound, this function returns a value of @code{-1}. Otherwise, thereturn value is the number of bytes stored in the array @var{string}.This number does not include the terminating null character, which ispresent if the number is less than @var{size}.@end deftypefun@node Length of Char, Converting One Char, Wide String Conversion, Extended Characters@section Multibyte Character Length@cindex multibyte character, length of@cindex length of multibyte characterThis section describes how to scan a string containing multibytecharacters, one character at a time. The difficulty in doing thisis to know how many bytes each character contains. Your program can use @code{mblen} to find this out.@comment stdlib.h@comment ANSI@deftypefun int mblen (const char *@var{string}, size_t @var{size})The @code{mblen} function with a non-null @var{string} argument returnsthe number of bytes that make up the multibyte character beginning at@var{string}, never examining more than @var{size} bytes. (The idea isto supply for @var{size} the number of bytes of data you have in hand.)The return value of @code{mblen} distinguishes three possibilities: thefirst @var{size} bytes at @var{string} start with valid multibytecharacter, they start with an invalid byte sequence or just part of acharacter, or @var{string} points to an empty string (a null character).For a valid multibyte character, @code{mblen} returns the number ofbytes in that character (always at least @code{1}, and never more than@var{size}). For an invalid byte sequence, @code{mblen} returns@code{-1}. For an empty string, it returns @code{0}.If the multibyte character code uses shift characters, then @code{mblen}maintains and updates a shift state as it scans. If you call@code{mblen} with a null pointer for @var{string}, that initializes theshift state to its standard initial value. It also returns nonzero ifthe multibyte character code in use actually has a shift state.@xref{Shift State}.@pindex stdlib.hThe function @code{mblen} is declared in @file{stdlib.h}.@end deftypefun@node Converting One Char, Example of Conversion, Length of Char, Extended Characters@section Conversion of Extended Characters One by One@cindex extended characters, converting@cindex converting extended characters@pindex stdlib.hYou can convert multibyte characters one at a time to wide characterswith the @code{mbtowc} function. The @code{wctomb} function does thereverse. These functions are declared in @file{stdlib.h}.@comment stdlib.h@comment ANSI@deftypefun int mbtowc (wchar_t *@var{result}, const char *@var{string}, size_t @var{size})The @code{mbtowc} (``multibyte to wide character'') function when calledwith non-null @var{string} converts the first multibyte characterbeginning at @var{string} to its corresponding wide character code. Itstores the result in @code{*@var{result}}.@code{mbtowc} never examines more than @var{size} bytes. (The idea isto supply for @var{size} the number of bytes of data you have in hand.)@code{mbtowc} with non-null @var{string} distinguishes threepossibilities: the first @var{size} bytes at @var{string} start withvalid multibyte character, they start with an invalid byte sequence orjust part of a character, or @var{string} points to an empty string (anull character).For a valid multibyte character, @code{mbtowc} converts it to a widecharacter and stores that in @code{*@var{result}}, and returns thenumber of bytes in that character (always at least @code{1}, and nevermore than @var{size}).For an invalid byte sequence, @code{mbtowc} returns @code{-1}. For anempty string, it returns @code{0}, also storing @code{0} in@code{*@var{result}}.If the multibyte character code uses shift characters, then@code{mbtowc} maintains and updates a shift state as it scans. If youcall @code{mbtowc} with a null pointer for @var{string}, thatinitializes the shift state to its standard initial value. It alsoreturns nonzero if the multibyte character code in use actually has ashift state. @xref{Shift State}.@end deftypefun@comment stdlib.h@comment ANSI@deftypefun int wctomb (char *@var{string}, wchar_t @var{wchar})The @code{wctomb} (``wide character to multibyte'') function convertsthe wide character code @var{wchar} to its corresponding multibytecharacter sequence, and stores the result in bytes starting at@var{string}. At most @code{MB_CUR_MAX} characters are stored.@code{wctomb} with non-null @var{string} distinguishes threepossibilities for @var{wchar}: a valid wide character code (one that canbe translated to a multibyte character), an invalid code, and @code{0}.Given a valid code, @code{wctomb} converts it to a multibyte character,storing the bytes starting at @var{string}. Then it returns the numberof bytes in that character (always at least @code{1}, and never morethan @code{MB_CUR_MAX}).If @var{wchar} is an invalid wide character code, @code{wctomb} returns@code{-1}. If @var{wchar} is @code{0}, it returns @code{0}, alsostoring @code{0} in @code{*@var{string}}.If the multibyte character code uses shift characters, then@code{wctomb} maintains and updates a shift state as it scans. If youcall @code{wctomb} with a null pointer for @var{string}, thatinitializes the shift state to its standard initial value. It alsoreturns nonzero if the multibyte character code in use actually has ashift state. @xref{Shift State}.Calling this function with a @var{wchar} argument of zero when@var{string} is not null has the side-effect of reinitializing thestored shift state @emph{as well as} storing the multibyte character@code{0} and returning @code{0}.@end deftypefun@node Example of Conversion, Shift State, Converting One Char, Extended Characters@section Character-by-Character Conversion Example Here is an example that reads multibyte character text from descriptor@code{input} and writes the corresponding wide characters to descriptor@code{output}. We need to convert characters one by one for thisexample because @code{mbstowcs} is unable to continue past a nullcharacter, and cannot cope with an apparently invalid partial characterby reading more input.@smallexampleintfile_mbstowcs (int input, int output)@{ char buffer[BUFSIZ + MB_LEN_MAX]; int filled = 0; int eof = 0; while (!eof) @{ int nread; int nwrite; char *inp = buffer; wchar_t outbuf[BUFSIZ]; wchar_t *outp = outbuf; /* @r{Fill up the buffer from the input file.} */ nread = read (input, buffer + filled, BUFSIZ); if (nread < 0) @{ perror ("read"); return 0; @} /* @r{If we reach end of file, make a note to read no more.} */ if (nread == 0) eof = 1; /* @r{@code{filled} is now the number of bytes in @code{buffer}.} */ filled += nread; /* @r{Convert those bytes to wide characters--as many as we can.} */ while (1) @{ int thislen = mbtowc (outp, inp, filled); /* Stop converting at invalid character; this can mean we have read just the first part of a valid character. */ if (thislen == -1) break; /* @r{Treat null character like any other,} @r{but also reset shift state.} */ if (thislen == 0) @{ thislen = 1; mbtowc (NULL, NULL, 0); @} /* @r{Advance past this character.} */ inp += thislen; filled -= thislen; outp++; @} /* @r{Write the wide characters we just made.} */ nwrite = write (output, outbuf, (outp - outbuf) * sizeof (wchar_t)); if (nwrite < 0) @{ perror ("write"); return 0; @} /* @r{See if we have a @emph{real} invalid character.} */ if ((eof && filled > 0) || filled >= MB_CUR_MAX) @{ error ("invalid multibyte character"); return 0; @} /* @r{If any characters must be carried forward,} @r{put them at the beginning of @code{buffer}.} */ if (filled > 0) memcpy (inp, buffer, filled); @} @} return 1;@}@end smallexample@node Shift State, , Example of Conversion, Extended Characters@section Multibyte Codes Using Shift SequencesIn some multibyte character codes, the @emph{meaning} of any particularbyte sequence is not fixed; it depends on what other sequences have comeearlier in the same string. Typically there are just a few sequencesthat can change the meaning of other sequences; these few are called@dfn{shift sequences} and we say that they set the @dfn{shift state} forother sequences that follow.To illustrate shift state and shift sequences, suppose we decide thatthe sequence @code{0200} (just one byte) enters Japanese mode, in whichpairs of bytes in the range from @code{0240} to @code{0377} are singlecharacters, while @code{0201} enters Latin-1 mode, in which single bytesin the range from @code{0240} to @code{0377} are characters, andinterpreted according to the ISO Latin-1 character set. This is amultibyte code which has two alternative shift states (``Japanese mode''and ``Latin-1 mode''), and two shift sequences that specify particularshift states.When the multibyte character code in use has shift states, then@code{mblen}, @code{mbtowc} and @code{wctomb} must maintain and updatethe current shift state as they scan the string. To make this workproperly, you must follow these rules:@itemize @bullet@itemBefore starting to scan a string, call the function with a null pointerfor the multibyte character address---for example, @code{mblen (NULL,0)}. This initializes the shift state to its standard initial value.@itemScan the string one character at a time, in order. Do not ``back up''and rescan characters already scanned, and do not intersperse theprocessing of different strings.@end itemizeHere is an example of using @code{mblen} following these rules:@smallexamplevoidscan_string (char *s)@{ int length = strlen (s); /* @r{Initialize shift state.} */ mblen (NULL, 0); while (1) @{ int thischar = mblen (s, length); /* @r{Deal with end of string and invalid characters.} */ if (thischar == 0) break; if (thischar == -1) @{ error ("invalid multibyte character"); break; @} /* @r{Advance past this character.} */ s += thischar; length -= thischar; @}@}@end smallexampleThe functions @code{mblen}, @code{mbtowc} and @code{wctomb} are notreentrant when using a multibyte code that uses a shift state. However,no other library functions call these functions, so you don't have toworry that the shift state will be changed mysteriously.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -