📄 mbyte.texi

📁 一个C源代码分析器
💻 TEXI
📖 第 1 页 / 共 2 页
字号:
上一页 12
to a wide character array.  The @code{wcstombs} function does thereverse.  These functions are declared in the header file@file{stdlib.h}.In most programs, these functions are the only ones you need forconversion between wide strings and multibyte character strings.  Butthey have limitations.  If your data is not null-terminated or is notall in core at once, you probably need to use the low-level conversionfunctions to convert one character at a time.  @xref{Converting OneChar}.@comment stdlib.h@comment ANSI@deftypefun size_t mbstowcs (wchar_t *@var{wstring}, const char *@var{string}, size_t @var{size})The @code{mbstowcs} (``multibyte string to wide character string'')function converts the null-terminated string of multibyte characters@var{string} to an array of wide character codes, storing not more than@var{size} wide characters into the array beginning at @var{wstring}.The terminating null character counts towards the size, so if @var{size}is less than the actual number of wide characters resulting from@var{string}, no terminating null character is stored.The conversion of characters from @var{string} begins in the initialshift state.If an invalid multibyte character sequence is found, this functionreturns a value of @code{-1}.  Otherwise, it returns the number of widecharacters stored in the array @var{wstring}.  This number does notinclude the terminating null character, which is present if the numberis less than @var{size}.Here is an example showing how to convert a string of multibytecharacters, allocating enough space for the result.@smallexamplewchar_t *mbstowcs_alloc (const char *string)@{  size_t size = strlen (string) + 1;  wchar_t *buf = xmalloc (size * sizeof (wchar_t));  size = mbstowcs (buf, string, size);  if (size == (size_t) -1)    return NULL;  buf = xrealloc (buf, (size + 1) * sizeof (wchar_t));  return buf;@}@end smallexample@end deftypefun@comment stdlib.h@comment ANSI@deftypefun size_t wcstombs (char *@var{string}, const wchar_t @var{wstring}, size_t @var{size})The @code{wcstombs} (``wide character string to multibyte string'')function converts the null-terminated wide character array @var{wstring}into a string containing multibyte characters, storing not more than@var{size} bytes starting at @var{string}, followed by a terminatingnull character if there is room.  The conversion of characters begins inthe initial shift state.The terminating null character counts towards the size, so if @var{size}is less than or equal to the number of bytes needed in @var{wstring}, noterminating null character is stored.If a code that does not correspond to a valid multibyte character isfound, this function returns a value of @code{-1}.  Otherwise, thereturn value is the number of bytes stored in the array @var{string}.This number does not include the terminating null character, which ispresent if the number is less than @var{size}.@end deftypefun@node Length of Char, Converting One Char, Wide String Conversion, Extended Characters@section Multibyte Character Length@cindex multibyte character, length of@cindex length of multibyte characterThis section describes how to scan a string containing multibytecharacters, one character at a time.  The difficulty in doing thisis to know how many bytes each character contains.  Your program can use @code{mblen} to find this out.@comment stdlib.h@comment ANSI@deftypefun int mblen (const char *@var{string}, size_t @var{size})The @code{mblen} function with a non-null @var{string} argument returnsthe number of bytes that make up the multibyte character beginning at@var{string}, never examining more than @var{size} bytes.  (The idea isto supply for @var{size} the number of bytes of data you have in hand.)The return value of @code{mblen} distinguishes three possibilities: thefirst @var{size} bytes at @var{string} start with valid multibytecharacter, they start with an invalid byte sequence or just part of acharacter, or @var{string} points to an empty string (a null character).For a valid multibyte character, @code{mblen} returns the number ofbytes in that character (always at least @code{1}, and never more than@var{size}).  For an invalid byte sequence, @code{mblen} returns@code{-1}.  For an empty string, it returns @code{0}.If the multibyte character code uses shift characters, then @code{mblen}maintains and updates a shift state as it scans.  If you call@code{mblen} with a null pointer for @var{string}, that initializes theshift state to its standard initial value.  It also returns nonzero ifthe multibyte character code in use actually has a shift state.@xref{Shift State}.@pindex stdlib.hThe function @code{mblen} is declared in @file{stdlib.h}.@end deftypefun@node Converting One Char, Example of Conversion, Length of Char, Extended Characters@section Conversion of Extended Characters One by One@cindex extended characters, converting@cindex converting extended characters@pindex stdlib.hYou can convert multibyte characters one at a time to wide characterswith the @code{mbtowc} function.  The @code{wctomb} function does thereverse.  These functions are declared in @file{stdlib.h}.@comment stdlib.h@comment ANSI@deftypefun int mbtowc (wchar_t *@var{result}, const char *@var{string}, size_t @var{size})The @code{mbtowc} (``multibyte to wide character'') function when calledwith non-null @var{string} converts the first multibyte characterbeginning at @var{string} to its corresponding wide character code.  Itstores the result in @code{*@var{result}}.@code{mbtowc} never examines more than @var{size} bytes.  (The idea isto supply for @var{size} the number of bytes of data you have in hand.)@code{mbtowc} with non-null @var{string} distinguishes threepossibilities: the first @var{size} bytes at @var{string} start withvalid multibyte character, they start with an invalid byte sequence orjust part of a character, or @var{string} points to an empty string (anull character).For a valid multibyte character, @code{mbtowc} converts it to a widecharacter and stores that in @code{*@var{result}}, and returns thenumber of bytes in that character (always at least @code{1}, and nevermore than @var{size}).For an invalid byte sequence, @code{mbtowc} returns @code{-1}.  For anempty string, it returns @code{0}, also storing @code{0} in@code{*@var{result}}.If the multibyte character code uses shift characters, then@code{mbtowc} maintains and updates a shift state as it scans.  If youcall @code{mbtowc} with a null pointer for @var{string}, thatinitializes the shift state to its standard initial value.  It alsoreturns nonzero if the multibyte character code in use actually has ashift state.  @xref{Shift State}.@end deftypefun@comment stdlib.h@comment ANSI@deftypefun int wctomb (char *@var{string}, wchar_t @var{wchar})The @code{wctomb} (``wide character to multibyte'') function convertsthe wide character code @var{wchar} to its corresponding multibytecharacter sequence, and stores the result in bytes starting at@var{string}.  At most @code{MB_CUR_MAX} characters are stored.@code{wctomb} with non-null @var{string} distinguishes threepossibilities for @var{wchar}: a valid wide character code (one that canbe translated to a multibyte character), an invalid code, and @code{0}.Given a valid code, @code{wctomb} converts it to a multibyte character,storing the bytes starting at @var{string}.  Then it returns the numberof bytes in that character (always at least @code{1}, and never morethan @code{MB_CUR_MAX}).If @var{wchar} is an invalid wide character code, @code{wctomb} returns@code{-1}.  If @var{wchar} is @code{0}, it returns @code{0}, alsostoring @code{0} in @code{*@var{string}}.If the multibyte character code uses shift characters, then@code{wctomb} maintains and updates a shift state as it scans.  If youcall @code{wctomb} with a null pointer for @var{string}, thatinitializes the shift state to its standard initial value.  It alsoreturns nonzero if the multibyte character code in use actually has ashift state.  @xref{Shift State}.Calling this function with a @var{wchar} argument of zero when@var{string} is not null has the side-effect of reinitializing thestored shift state @emph{as well as} storing the multibyte character@code{0} and returning @code{0}.@end deftypefun@node Example of Conversion, Shift State, Converting One Char, Extended Characters@section Character-by-Character Conversion Example Here is an example that reads multibyte character text from descriptor@code{input} and writes the corresponding wide characters to descriptor@code{output}.  We need to convert characters one by one for thisexample because @code{mbstowcs} is unable to continue past a nullcharacter, and cannot cope with an apparently invalid partial characterby reading more input.@smallexampleintfile_mbstowcs (int input, int output)@{  char buffer[BUFSIZ + MB_LEN_MAX];  int filled = 0;  int eof = 0;  while (!eof)    @{      int nread;      int nwrite;      char *inp = buffer;      wchar_t outbuf[BUFSIZ];      wchar_t *outp = outbuf;      /* @r{Fill up the buffer from the input file.}  */      nread = read (input, buffer + filled, BUFSIZ);      if (nread < 0)        @{          perror ("read");          return 0;        @}      /* @r{If we reach end of file, make a note to read no more.} */      if (nread == 0)        eof = 1;      /* @r{@code{filled} is now the number of bytes in @code{buffer}.} */      filled += nread;      /* @r{Convert those bytes to wide characters--as many as we can.} */      while (1)        @{          int thislen = mbtowc (outp, inp, filled);          /* Stop converting at invalid character;             this can mean we have read just the first part             of a valid character.  */          if (thislen == -1)            break;          /* @r{Treat null character like any other,}             @r{but also reset shift state.} */          if (thislen == 0) @{            thislen = 1;            mbtowc (NULL, NULL, 0);          @}          /* @r{Advance past this character.} */          inp += thislen;          filled -= thislen;          outp++;        @}      /* @r{Write the wide characters we just made.}  */      nwrite = write (output, outbuf,                      (outp - outbuf) * sizeof (wchar_t));      if (nwrite < 0)        @{          perror ("write");          return 0;        @}      /* @r{See if we have a @emph{real} invalid character.} */      if ((eof && filled > 0) || filled >= MB_CUR_MAX)        @{          error ("invalid multibyte character");          return 0;        @}      /* @r{If any characters must be carried forward,}         @r{put them at the beginning of @code{buffer}.} */      if (filled > 0)        memcpy (inp, buffer, filled);      @}    @}  return 1;@}@end smallexample@node Shift State,  , Example of Conversion, Extended Characters@section Multibyte Codes Using Shift SequencesIn some multibyte character codes, the @emph{meaning} of any particularbyte sequence is not fixed; it depends on what other sequences have comeearlier in the same string.  Typically there are just a few sequencesthat can change the meaning of other sequences; these few are called@dfn{shift sequences} and we say that they set the @dfn{shift state} forother sequences that follow.To illustrate shift state and shift sequences, suppose we decide thatthe sequence @code{0200} (just one byte) enters Japanese mode, in whichpairs of bytes in the range from @code{0240} to @code{0377} are singlecharacters, while @code{0201} enters Latin-1 mode, in which single bytesin the range from @code{0240} to @code{0377} are characters, andinterpreted according to the ISO Latin-1 character set.  This is amultibyte code which has two alternative shift states (``Japanese mode''and ``Latin-1 mode''), and two shift sequences that specify particularshift states.When the multibyte character code in use has shift states, then@code{mblen}, @code{mbtowc} and @code{wctomb} must maintain and updatethe current shift state as they scan the string.  To make this workproperly, you must follow these rules:@itemize @bullet@itemBefore starting to scan a string, call the function with a null pointerfor the multibyte character address---for example, @code{mblen (NULL,0)}.  This initializes the shift state to its standard initial value.@itemScan the string one character at a time, in order.  Do not ``back up''and rescan characters already scanned, and do not intersperse theprocessing of different strings.@end itemizeHere is an example of using @code{mblen} following these rules:@smallexamplevoidscan_string (char *s)@{  int length = strlen (s);  /* @r{Initialize shift state.} */  mblen (NULL, 0);  while (1)    @{      int thischar = mblen (s, length);      /* @r{Deal with end of string and invalid characters.} */      if (thischar == 0)        break;      if (thischar == -1)        @{          error ("invalid multibyte character");          break;        @}      /* @r{Advance past this character.} */      s += thischar;      length -= thischar;    @}@}@end smallexampleThe functions @code{mblen}, @code{mbtowc} and @code{wctomb} are notreentrant when using a multibyte code that uses a shift state.  However,no other library functions call these functions, so you don't have toworry that the shift state will be changed mysteriously.
上一页 12
💿 文件大小 6821 K
👤 上传用户 lihuitao1987
📂 所属分类编译器/解释器
🏷️ 相关标签

#源代码 #分析器
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -