📄 library_6.html

📁 Linux程序员的工作手册
💻 HTML
📖 第 1 页 / 共 2 页
字号:
上一页 12
In most programs, these functions are the only ones you need forconversion between wide strings and multibyte character strings.  Butthey have limitations.  If your data is not null-terminated or is notall in core at once, you probably need to use the low-level conversionfunctions to convert one character at a time.  See section <A HREF="library_6.html#SEC73" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_6.html#SEC73">Conversion of Extended Characters One by One</A>.<P><A NAME="IDX345"></A><U>Function:</U> size_t <B>mbstowcs</B> <I>(wchar_t *<VAR>wstring</VAR>, const char *<VAR>string</VAR>, size_t <VAR>size</VAR>)</I><P>The <CODE>mbstowcs</CODE> ("multibyte string to wide character string")function converts the null-terminated string of multibyte characters<VAR>string</VAR> to an array of wide character codes, storing not more than<VAR>size</VAR> wide characters into the array beginning at <VAR>wstring</VAR>.The terminating null character counts towards the size, so if <VAR>size</VAR>is less than the actual number of wide characters resulting from<VAR>string</VAR>, no terminating null character is stored.<P>The conversion of characters from <VAR>string</VAR> begins in the initialshift state.<P>If an invalid multibyte character sequence is found, this functionreturns a value of <CODE>-1</CODE>.  Otherwise, it returns the number of widecharacters stored in the array <VAR>wstring</VAR>.  This number does notinclude the terminating null character, which is present if the numberis less than <VAR>size</VAR>.<P>Here is an example showing how to convert a string of multibytecharacters, allocating enough space for the result.<P><PRE>wchar_t *mbstowcs_alloc (char *string){  int size = strlen (string) + 1;  wchar_t *buffer = (wchar_t) xmalloc (size * sizeof (wchar_t));  size = mbstowcs (buffer, string, size);  if (size &#60; 0)    return NULL;  return (wchar_t) xrealloc (buffer, (size + 1) * sizeof (wchar_t));}</PRE><P><A NAME="IDX346"></A><U>Function:</U> size_t <B>wcstombs</B> <I>(char *<VAR>string</VAR>, const wchar_t <VAR>wstring</VAR>, size_t <VAR>size</VAR>)</I><P>The <CODE>wcstombs</CODE> ("wide character string to multibyte string")function converts the null-terminated wide character array <VAR>wstring</VAR>into a string containing multibyte characters, storing not more than<VAR>size</VAR> bytes starting at <VAR>string</VAR>, followed by a terminatingnull character if there is room.  The conversion of characters begins inthe initial shift state.<P>The terminating null character counts towards the size, so if <VAR>size</VAR>is less than or equal to the number of bytes needed in <VAR>wstring</VAR>, noterminating null character is stored.<P>If a code that does not correspond to a valid multibyte character isfound, this function returns a value of <CODE>-1</CODE>.  Otherwise, thereturn value is the number of bytes stored in the array <VAR>string</VAR>.This number does not include the terminating null character, which ispresent if the number is less than <VAR>size</VAR>.<P><A NAME="IDX347"></A><A NAME="IDX348"></A><H2><A NAME="SEC72" HREF="library_toc.html#SEC72" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_toc.html#SEC72">Multibyte Character Length</A></H2><P>This section describes how to scan a string containing multibytecharacters, one character at a time.  The difficulty in doing thisis to know how many bytes each character contains.  Your program can use <CODE>mblen</CODE> to find this out.<P><A NAME="IDX349"></A><U>Function:</U> int <B>mblen</B> <I>(const char *<VAR>string</VAR>, size_t <VAR>size</VAR>)</I><P>The <CODE>mblen</CODE> function with non-null <VAR>string</VAR> returns the numberof bytes that make up the multibyte character beginning at <VAR>string</VAR>,never examining more than <VAR>size</VAR> bytes.  (The idea is to supplyfor <VAR>size</VAR> the number of bytes of data you have in hand.)<P>The return value of <CODE>mblen</CODE> distinguishes three possibilities: thefirst <VAR>size</VAR> bytes at <VAR>string</VAR> start with valid multibytecharacter, they start with an invalid byte sequence or just part of acharacter, or <VAR>string</VAR> points to an empty string (a null character).<P>For a valid multibyte character, <CODE>mblen</CODE> returns the number ofbytes in that character (always at least <CODE>1</CODE>, and never more than<VAR>size</VAR>).  For an invalid byte sequence, <CODE>mblen</CODE> returns<CODE>-1</CODE>.  For an empty string, it returns <CODE>0</CODE>.<P>If the multibyte character code uses shift characters, then <CODE>mblen</CODE>maintains and updates a shift state as it scans.  If you call<CODE>mblen</CODE> with a null pointer for <VAR>string</VAR>, that initializes theshift state to its standard initial value.  It also returns nonzero ifthe multibyte character code in use actually has a shift state.See section <A HREF="library_6.html#SEC75" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_6.html#SEC75">Multibyte Codes Using Shift Sequences</A>.<A NAME="IDX350"></A><P>The function <CODE>mblen</CODE> is declared in <TT>`stdlib.h'</TT>.<P><A NAME="IDX351"></A><A NAME="IDX352"></A><H2><A NAME="SEC73" HREF="library_toc.html#SEC73" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_toc.html#SEC73">Conversion of Extended Characters One by One</A></H2><A NAME="IDX353"></A><P>You can convert multibyte characters one at a time to wide characterswith the <CODE>mbtowc</CODE> function.  The <CODE>wctomb</CODE> function does thereverse.  These functions are declared in <TT>`stdlib.h'</TT>.<P><A NAME="IDX354"></A><U>Function:</U> int <B>mbtowc</B> <I>(wchar_t *<VAR>result</VAR>, const char *<VAR>string</VAR>, size_t <VAR>size</VAR>)</I><P>The <CODE>mbtowc</CODE> ("multibyte to wide character") function when calledwith non-null <VAR>string</VAR> converts the first multibyte characterbeginning at <VAR>string</VAR> to its corresponding wide character code.  Itstores the result in <CODE>*<VAR>result</VAR></CODE>.<P><CODE>mbtowc</CODE> never examines more than <VAR>size</VAR> bytes.  (The idea isto supply for <VAR>size</VAR> the number of bytes of data you have in hand.)<P><CODE>mbtowc</CODE> with non-null <VAR>string</VAR> distinguishes threepossibilities: the first <VAR>size</VAR> bytes at <VAR>string</VAR> start withvalid multibyte character, they start with an invalid byte sequence orjust part of a character, or <VAR>string</VAR> points to an empty string (anull character).<P>For a valid multibyte character, <CODE>mbtowc</CODE> converts it to a widecharacter and stores that in <CODE>*<VAR>result</VAR></CODE>, and returns thenumber of bytes in that character (always at least <CODE>1</CODE>, and nevermore than <VAR>size</VAR>).<P>For an invalid byte sequence, <CODE>mbtowc</CODE> returns <CODE>-1</CODE>.  For anempty string, it returns <CODE>0</CODE>, also storing <CODE>0</CODE> in<CODE>*<VAR>result</VAR></CODE>.<P>If the multibyte character code uses shift characters, then<CODE>mbtowc</CODE> maintains and updates a shift state as it scans.  If youcall <CODE>mbtowc</CODE> with a null pointer for <VAR>string</VAR>, thatinitializes the shift state to its standard initial value.  It alsoreturns nonzero if the multibyte character code in use actually has ashift state.  See section <A HREF="library_6.html#SEC75" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_6.html#SEC75">Multibyte Codes Using Shift Sequences</A>.<P><A NAME="IDX355"></A><U>Function:</U> int <B>wctomb</B> <I>(char *<VAR>string</VAR>, wchar_t <VAR>wchar</VAR>)</I><P>The <CODE>wctomb</CODE> ("wide character to multibyte") function convertsthe wide character code <VAR>wchar</VAR> to its corresponding multibytecharacter sequence, and stores the result in bytes starting at<VAR>string</VAR>.  At most <CODE>MB_CUR_MAX</CODE> characters are stored.<P><CODE>wctomb</CODE> with non-null <VAR>string</VAR> distinguishes threepossibilities for <VAR>wchar</VAR>: a valid wide character code (one that canbe translated to a multibyte character), an invalid code, and <CODE>0</CODE>.<P>Given a valid code, <CODE>wctomb</CODE> converts it to a multibyte character,storing the bytes starting at <VAR>string</VAR>.  Then it returns the numberof bytes in that character (always at least <CODE>1</CODE>, and never morethan <CODE>MB_CUR_MAX</CODE>).<P>If <VAR>wchar</VAR> is an invalid wide character code, <CODE>wctomb</CODE> returns<CODE>-1</CODE>.  If <VAR>wchar</VAR> is <CODE>0</CODE>, it returns <CODE>0</CODE>, alsostoring <CODE>0</CODE> in <CODE>*<VAR>string</VAR></CODE>.<P>If the multibyte character code uses shift characters, then<CODE>wctomb</CODE> maintains and updates a shift state as it scans.  If youcall <CODE>wctomb</CODE> with a null pointer for <VAR>string</VAR>, thatinitializes the shift state to its standard initial value.  It alsoreturns nonzero if the multibyte character code in use actually has ashift state.  See section <A HREF="library_6.html#SEC75" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_6.html#SEC75">Multibyte Codes Using Shift Sequences</A>.<P>Calling this function with a <VAR>wchar</VAR> argument of zero when<VAR>string</VAR> is not null has the side-effect of reinitializing thestored shift state <EM>as well as</EM> storing the multibyte character<CODE>0</CODE> and returning <CODE>0</CODE>.<P><H2><A NAME="SEC74" HREF="library_toc.html#SEC74" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_toc.html#SEC74">Example of Character-by-Character Conversion</A></H2><P>Here is an example that reads multibyte character text from descriptor<CODE>input</CODE> and writes the corresponding wide characters to descriptor<CODE>output</CODE>.  We need to convert characters one by one for thisexample because <CODE>mbstowcs</CODE> is unable to continue past a nullcharacter, and cannot cope with an apparently invalid partial characterby reading more input.<P><PRE>intfile_mbstowcs (int input, int output){  char buffer[BUFSIZ + MB_LEN_MAX];  int filled = 0;  int eof = 0;  while (!eof)    {      int nread;      int nwrite;      char *inp = buffer;      wchar_t outbuf[BUFSIZ];      wchar_t *outp = outbuf;      /* Fill up the buffer from the input file.  */      nread = read (input, buffer + filled, BUFSIZ);      if (nread &#60; 0) {        perror ("read");        return 0;      }      /* If we reach end of file, make a note to read no more. */      if (nread == 0)        eof = 1;      /* <CODE>filled</CODE> is now the number of bytes in <CODE>buffer</CODE>. */      filled += nread;      /* Convert those bytes to wide characters--as many as we can. */      while (1)        {          int thislen = mbtowc (outp, inp, filled);          /* Stop converting at invalid character;             this can mean we have read just the first part             of a valid character.  */          if (thislen == -1)            break;          /* Treat null character like any other,             but also reset shift state. */          if (thislen == 0) {            thislen = 1;            mbtowc (NULL, NULL, 0);          }          /* Advance past this character. */          inp += thislen;          filled -= thislen;          outp++;        }      /* Write the wide characters we just made.  */      nwrite = write (output, outbuf,                      (outp - outbuf) * sizeof (wchar_t));      if (nwrite &#60; 0)        {          perror ("write");          return 0;        }      /* See if we have a <EM>real</EM> invalid character. */      if ((eof &#38;&#38; filled &#62; 0) || filled &#62;= MB_CUR_MAX)        {          error ("invalid multibyte character");          return 0;        }      /* If any characters must be carried forward,         put them at the beginning of <CODE>buffer</CODE>. */      if (filled &#62; 0)        memcpy (inp, buffer, filled);      }    }  return 1;}</PRE><P><H2><A NAME="SEC75" HREF="library_toc.html#SEC75" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_toc.html#SEC75">Multibyte Codes Using Shift Sequences</A></H2><P>In some multibyte character codes, the <EM>meaning</EM> of any particularbyte sequence is not fixed; it depends on what other sequences have comeearlier in the same string.  Typically there are just a few sequencesthat can change the meaning of other sequences; these few are called<DFN>shift sequences</DFN> and we say that they set the <DFN>shift state</DFN> forother sequences that follow.<P>To illustrate shift state and shift sequences, suppose we decide thatthe sequence <CODE>0200</CODE> (just one byte) enters Japanese mode, in whichpairs of bytes in the range from <CODE>0240</CODE> to <CODE>0377</CODE> are singlecharacters, while <CODE>0201</CODE> enters Latin-1 mode, in which single bytesin the range from <CODE>0240</CODE> to <CODE>0377</CODE> are characters, andinterpreted according to the ISO Latin-1 character set.  This is amultibyte code which has two alternative shift states ("Japanese mode"and "Latin-1 mode"), and two shift sequences that specify particularshift states.<P>When the multibyte character code in use has shift states, then<CODE>mblen</CODE>, <CODE>mbtowc</CODE> and <CODE>wctomb</CODE> must maintain and updatethe current shift state as they scan the string.  To make this workproperly, you must follow these rules:<P><UL><LI>Before starting to scan a string, call the function with a null pointerfor the multibyte character address--for example, <CODE>mblen (NULL,0)</CODE>.  This initializes the shift state to its standard initial value.<P><LI>Scan the string one character at a time, in order.  Do not "back up"and rescan characters already scanned, and do not intersperse theprocessing of different strings.</UL><P>Here is an example of using <CODE>mblen</CODE> following these rules:<P><PRE>voidscan_string (char *s){  int length = strlen (s);  /* Initialize shift state. */  mblen (NULL, 0);  while (1)    {      int thischar = mblen (s, length);      /* Deal with end of string and invalid characters. */      if (thischar == 0)        break;      if (thischar == -1)        {          error ("invalid multibyte character");          break;        }      /* Advance past this character. */      s += thischar;      length -= thischar;    }}</PRE><P>The functions <CODE>mblen</CODE>, <CODE>mbtowc</CODE> and <CODE>wctomb</CODE> are notreentrant when using a multibyte code that uses a shift state.  However,no other library functions call these functions, so you don't have toworry that the shift state will be changed mysteriously.<P>Go to the <A HREF="library_5.html" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_5.html">previous</A>, <A HREF="library_7.html" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_7.html">next</A> section.<P>
上一页 12
💿 文件大小 399 K
👤 上传用户 cq745
📂 所属分类 Linux/Unix编程
🏷️ 相关标签

#Linux #程序员 #工作手册
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -