📄 library_6.html
字号:
In most programs, these functions are the only ones you need forconversion between wide strings and multibyte character strings. Butthey have limitations. If your data is not null-terminated or is notall in core at once, you probably need to use the low-level conversionfunctions to convert one character at a time. See section <A HREF="library_6.html#SEC73" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_6.html#SEC73">Conversion of Extended Characters One by One</A>.<P><A NAME="IDX345"></A><U>Function:</U> size_t <B>mbstowcs</B> <I>(wchar_t *<VAR>wstring</VAR>, const char *<VAR>string</VAR>, size_t <VAR>size</VAR>)</I><P>The <CODE>mbstowcs</CODE> ("multibyte string to wide character string")function converts the null-terminated string of multibyte characters<VAR>string</VAR> to an array of wide character codes, storing not more than<VAR>size</VAR> wide characters into the array beginning at <VAR>wstring</VAR>.The terminating null character counts towards the size, so if <VAR>size</VAR>is less than the actual number of wide characters resulting from<VAR>string</VAR>, no terminating null character is stored.<P>The conversion of characters from <VAR>string</VAR> begins in the initialshift state.<P>If an invalid multibyte character sequence is found, this functionreturns a value of <CODE>-1</CODE>. Otherwise, it returns the number of widecharacters stored in the array <VAR>wstring</VAR>. This number does notinclude the terminating null character, which is present if the numberis less than <VAR>size</VAR>.<P>Here is an example showing how to convert a string of multibytecharacters, allocating enough space for the result.<P><PRE>wchar_t *mbstowcs_alloc (char *string){ int size = strlen (string) + 1; wchar_t *buffer = (wchar_t) xmalloc (size * sizeof (wchar_t)); size = mbstowcs (buffer, string, size); if (size < 0) return NULL; return (wchar_t) xrealloc (buffer, (size + 1) * sizeof (wchar_t));}</PRE><P><A NAME="IDX346"></A><U>Function:</U> size_t <B>wcstombs</B> <I>(char *<VAR>string</VAR>, const wchar_t <VAR>wstring</VAR>, size_t <VAR>size</VAR>)</I><P>The <CODE>wcstombs</CODE> ("wide character string to multibyte string")function converts the null-terminated wide character array <VAR>wstring</VAR>into a string containing multibyte characters, storing not more than<VAR>size</VAR> bytes starting at <VAR>string</VAR>, followed by a terminatingnull character if there is room. The conversion of characters begins inthe initial shift state.<P>The terminating null character counts towards the size, so if <VAR>size</VAR>is less than or equal to the number of bytes needed in <VAR>wstring</VAR>, noterminating null character is stored.<P>If a code that does not correspond to a valid multibyte character isfound, this function returns a value of <CODE>-1</CODE>. Otherwise, thereturn value is the number of bytes stored in the array <VAR>string</VAR>.This number does not include the terminating null character, which ispresent if the number is less than <VAR>size</VAR>.<P><A NAME="IDX347"></A><A NAME="IDX348"></A><H2><A NAME="SEC72" HREF="library_toc.html#SEC72" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_toc.html#SEC72">Multibyte Character Length</A></H2><P>This section describes how to scan a string containing multibytecharacters, one character at a time. The difficulty in doing thisis to know how many bytes each character contains. Your program can use <CODE>mblen</CODE> to find this out.<P><A NAME="IDX349"></A><U>Function:</U> int <B>mblen</B> <I>(const char *<VAR>string</VAR>, size_t <VAR>size</VAR>)</I><P>The <CODE>mblen</CODE> function with non-null <VAR>string</VAR> returns the numberof bytes that make up the multibyte character beginning at <VAR>string</VAR>,never examining more than <VAR>size</VAR> bytes. (The idea is to supplyfor <VAR>size</VAR> the number of bytes of data you have in hand.)<P>The return value of <CODE>mblen</CODE> distinguishes three possibilities: thefirst <VAR>size</VAR> bytes at <VAR>string</VAR> start with valid multibytecharacter, they start with an invalid byte sequence or just part of acharacter, or <VAR>string</VAR> points to an empty string (a null character).<P>For a valid multibyte character, <CODE>mblen</CODE> returns the number ofbytes in that character (always at least <CODE>1</CODE>, and never more than<VAR>size</VAR>). For an invalid byte sequence, <CODE>mblen</CODE> returns<CODE>-1</CODE>. For an empty string, it returns <CODE>0</CODE>.<P>If the multibyte character code uses shift characters, then <CODE>mblen</CODE>maintains and updates a shift state as it scans. If you call<CODE>mblen</CODE> with a null pointer for <VAR>string</VAR>, that initializes theshift state to its standard initial value. It also returns nonzero ifthe multibyte character code in use actually has a shift state.See section <A HREF="library_6.html#SEC75" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_6.html#SEC75">Multibyte Codes Using Shift Sequences</A>.<A NAME="IDX350"></A><P>The function <CODE>mblen</CODE> is declared in <TT>`stdlib.h'</TT>.<P><A NAME="IDX351"></A><A NAME="IDX352"></A><H2><A NAME="SEC73" HREF="library_toc.html#SEC73" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_toc.html#SEC73">Conversion of Extended Characters One by One</A></H2><A NAME="IDX353"></A><P>You can convert multibyte characters one at a time to wide characterswith the <CODE>mbtowc</CODE> function. The <CODE>wctomb</CODE> function does thereverse. These functions are declared in <TT>`stdlib.h'</TT>.<P><A NAME="IDX354"></A><U>Function:</U> int <B>mbtowc</B> <I>(wchar_t *<VAR>result</VAR>, const char *<VAR>string</VAR>, size_t <VAR>size</VAR>)</I><P>The <CODE>mbtowc</CODE> ("multibyte to wide character") function when calledwith non-null <VAR>string</VAR> converts the first multibyte characterbeginning at <VAR>string</VAR> to its corresponding wide character code. Itstores the result in <CODE>*<VAR>result</VAR></CODE>.<P><CODE>mbtowc</CODE> never examines more than <VAR>size</VAR> bytes. (The idea isto supply for <VAR>size</VAR> the number of bytes of data you have in hand.)<P><CODE>mbtowc</CODE> with non-null <VAR>string</VAR> distinguishes threepossibilities: the first <VAR>size</VAR> bytes at <VAR>string</VAR> start withvalid multibyte character, they start with an invalid byte sequence orjust part of a character, or <VAR>string</VAR> points to an empty string (anull character).<P>For a valid multibyte character, <CODE>mbtowc</CODE> converts it to a widecharacter and stores that in <CODE>*<VAR>result</VAR></CODE>, and returns thenumber of bytes in that character (always at least <CODE>1</CODE>, and nevermore than <VAR>size</VAR>).<P>For an invalid byte sequence, <CODE>mbtowc</CODE> returns <CODE>-1</CODE>. For anempty string, it returns <CODE>0</CODE>, also storing <CODE>0</CODE> in<CODE>*<VAR>result</VAR></CODE>.<P>If the multibyte character code uses shift characters, then<CODE>mbtowc</CODE> maintains and updates a shift state as it scans. If youcall <CODE>mbtowc</CODE> with a null pointer for <VAR>string</VAR>, thatinitializes the shift state to its standard initial value. It alsoreturns nonzero if the multibyte character code in use actually has ashift state. See section <A HREF="library_6.html#SEC75" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_6.html#SEC75">Multibyte Codes Using Shift Sequences</A>.<P><A NAME="IDX355"></A><U>Function:</U> int <B>wctomb</B> <I>(char *<VAR>string</VAR>, wchar_t <VAR>wchar</VAR>)</I><P>The <CODE>wctomb</CODE> ("wide character to multibyte") function convertsthe wide character code <VAR>wchar</VAR> to its corresponding multibytecharacter sequence, and stores the result in bytes starting at<VAR>string</VAR>. At most <CODE>MB_CUR_MAX</CODE> characters are stored.<P><CODE>wctomb</CODE> with non-null <VAR>string</VAR> distinguishes threepossibilities for <VAR>wchar</VAR>: a valid wide character code (one that canbe translated to a multibyte character), an invalid code, and <CODE>0</CODE>.<P>Given a valid code, <CODE>wctomb</CODE> converts it to a multibyte character,storing the bytes starting at <VAR>string</VAR>. Then it returns the numberof bytes in that character (always at least <CODE>1</CODE>, and never morethan <CODE>MB_CUR_MAX</CODE>).<P>If <VAR>wchar</VAR> is an invalid wide character code, <CODE>wctomb</CODE> returns<CODE>-1</CODE>. If <VAR>wchar</VAR> is <CODE>0</CODE>, it returns <CODE>0</CODE>, alsostoring <CODE>0</CODE> in <CODE>*<VAR>string</VAR></CODE>.<P>If the multibyte character code uses shift characters, then<CODE>wctomb</CODE> maintains and updates a shift state as it scans. If youcall <CODE>wctomb</CODE> with a null pointer for <VAR>string</VAR>, thatinitializes the shift state to its standard initial value. It alsoreturns nonzero if the multibyte character code in use actually has ashift state. See section <A HREF="library_6.html#SEC75" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_6.html#SEC75">Multibyte Codes Using Shift Sequences</A>.<P>Calling this function with a <VAR>wchar</VAR> argument of zero when<VAR>string</VAR> is not null has the side-effect of reinitializing thestored shift state <EM>as well as</EM> storing the multibyte character<CODE>0</CODE> and returning <CODE>0</CODE>.<P><H2><A NAME="SEC74" HREF="library_toc.html#SEC74" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_toc.html#SEC74">Example of Character-by-Character Conversion</A></H2><P>Here is an example that reads multibyte character text from descriptor<CODE>input</CODE> and writes the corresponding wide characters to descriptor<CODE>output</CODE>. We need to convert characters one by one for thisexample because <CODE>mbstowcs</CODE> is unable to continue past a nullcharacter, and cannot cope with an apparently invalid partial characterby reading more input.<P><PRE>intfile_mbstowcs (int input, int output){ char buffer[BUFSIZ + MB_LEN_MAX]; int filled = 0; int eof = 0; while (!eof) { int nread; int nwrite; char *inp = buffer; wchar_t outbuf[BUFSIZ]; wchar_t *outp = outbuf; /* Fill up the buffer from the input file. */ nread = read (input, buffer + filled, BUFSIZ); if (nread < 0) { perror ("read"); return 0; } /* If we reach end of file, make a note to read no more. */ if (nread == 0) eof = 1; /* <CODE>filled</CODE> is now the number of bytes in <CODE>buffer</CODE>. */ filled += nread; /* Convert those bytes to wide characters--as many as we can. */ while (1) { int thislen = mbtowc (outp, inp, filled); /* Stop converting at invalid character; this can mean we have read just the first part of a valid character. */ if (thislen == -1) break; /* Treat null character like any other, but also reset shift state. */ if (thislen == 0) { thislen = 1; mbtowc (NULL, NULL, 0); } /* Advance past this character. */ inp += thislen; filled -= thislen; outp++; } /* Write the wide characters we just made. */ nwrite = write (output, outbuf, (outp - outbuf) * sizeof (wchar_t)); if (nwrite < 0) { perror ("write"); return 0; } /* See if we have a <EM>real</EM> invalid character. */ if ((eof && filled > 0) || filled >= MB_CUR_MAX) { error ("invalid multibyte character"); return 0; } /* If any characters must be carried forward, put them at the beginning of <CODE>buffer</CODE>. */ if (filled > 0) memcpy (inp, buffer, filled); } } return 1;}</PRE><P><H2><A NAME="SEC75" HREF="library_toc.html#SEC75" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_toc.html#SEC75">Multibyte Codes Using Shift Sequences</A></H2><P>In some multibyte character codes, the <EM>meaning</EM> of any particularbyte sequence is not fixed; it depends on what other sequences have comeearlier in the same string. Typically there are just a few sequencesthat can change the meaning of other sequences; these few are called<DFN>shift sequences</DFN> and we say that they set the <DFN>shift state</DFN> forother sequences that follow.<P>To illustrate shift state and shift sequences, suppose we decide thatthe sequence <CODE>0200</CODE> (just one byte) enters Japanese mode, in whichpairs of bytes in the range from <CODE>0240</CODE> to <CODE>0377</CODE> are singlecharacters, while <CODE>0201</CODE> enters Latin-1 mode, in which single bytesin the range from <CODE>0240</CODE> to <CODE>0377</CODE> are characters, andinterpreted according to the ISO Latin-1 character set. This is amultibyte code which has two alternative shift states ("Japanese mode"and "Latin-1 mode"), and two shift sequences that specify particularshift states.<P>When the multibyte character code in use has shift states, then<CODE>mblen</CODE>, <CODE>mbtowc</CODE> and <CODE>wctomb</CODE> must maintain and updatethe current shift state as they scan the string. To make this workproperly, you must follow these rules:<P><UL><LI>Before starting to scan a string, call the function with a null pointerfor the multibyte character address--for example, <CODE>mblen (NULL,0)</CODE>. This initializes the shift state to its standard initial value.<P><LI>Scan the string one character at a time, in order. Do not "back up"and rescan characters already scanned, and do not intersperse theprocessing of different strings.</UL><P>Here is an example of using <CODE>mblen</CODE> following these rules:<P><PRE>voidscan_string (char *s){ int length = strlen (s); /* Initialize shift state. */ mblen (NULL, 0); while (1) { int thischar = mblen (s, length); /* Deal with end of string and invalid characters. */ if (thischar == 0) break; if (thischar == -1) { error ("invalid multibyte character"); break; } /* Advance past this character. */ s += thischar; length -= thischar; }}</PRE><P>The functions <CODE>mblen</CODE>, <CODE>mbtowc</CODE> and <CODE>wctomb</CODE> are notreentrant when using a multibyte code that uses a shift state. However,no other library functions call these functions, so you don't have toworry that the shift state will be changed mysteriously.<P>Go to the <A HREF="library_5.html" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_5.html">previous</A>, <A HREF="library_7.html" tppabs="http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_7.html">next</A> section.<P>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -