📄 ucnv.h
字号:
* the next empty chunk of target in case of a * <TT>U_BUFFER_OVERFLOW_ERROR</TT>, and updating the source pointers * with the next chunk of source when a successful error status is * returned, until there are no more chunks of source data. * @param converter the Unicode converter * @param target I/O parameter. Input : Points to the beginning of the buffer to copy * codepage characters to. Output : points to after the last codepage character copied * to <TT>target</TT>. * @param targetLimit the pointer just after last of the <TT>target</TT> buffer * @param source I/O parameter, pointer to pointer to the source Unicode character buffer. * @param sourceLimit the pointer just after the last of the source buffer * @param offsets if NULL is passed, nothing will happen to it, otherwise it needs to have the same number * of allocated cells as <TT>target</TT>. Will fill in offsets from target to source pointer * e.g: <TT>offsets[3]</TT> is equal to 6, it means that the <TT>target[3]</TT> was a result of transcoding <TT>source[6]</TT> * For output data carried across calls, and other data without a specific source character * (such as from escape sequences or callbacks) -1 will be placed for offsets. * @param flush set to <TT>TRUE</TT> if the current source buffer is the last available * chunk of the source, <TT>FALSE</TT> otherwise. Note that if a failing status is returned, * this function may have to be called multiple times with flush set to <TT>TRUE</TT> until * the source buffer is consumed. * @param err the error status. <TT>U_ILLEGAL_ARGUMENT_ERROR</TT> will be set if the * converter is <TT>NULL</TT>. * <code>U_BUFFER_OVERFLOW_ERROR</code> will be set if the target is full and there is * still data to be written to the target. * @see ucnv_fromUChars * @see ucnv_convert * @see ucnv_getMinCharSize * @see ucnv_setToUCallBack * @stable ICU 2.0 */U_STABLE void U_EXPORT2 ucnv_fromUnicode (UConverter * converter, char **target, const char *targetLimit, const UChar ** source, const UChar * sourceLimit, int32_t* offsets, UBool flush, UErrorCode * err);/** * Converts a buffer of codepage bytes into an array of unicode UChars * characters. This function is optimized for converting a continuous * stream of data in buffer-sized chunks, where the entire source and * target does not fit in available buffers. * * The source pointer is an in/out parameter. It starts out pointing where the * conversion is to begin, and ends up pointing after the last byte of source consumed. * * Target similarly starts out pointer at the first available UChar in the output * buffer, and ends up pointing after the last UChar written to the output. * It does NOT necessarily keep UChar sequences together. * * The converter always attempts to consume the entire source buffer, unless * (1.) the target buffer is full, or (2.) a failing error is returned from the * current callback function. When a successful error status has been * returned, it means that all of the source buffer has been * consumed. At that point, the caller should reset the source and * sourceLimit pointers to point to the next chunk. * * At the end of the stream (flush==TRUE), the input is completely consumed * when *source==sourceLimit and no error code is set * The converter object is then automatically reset by this function. * (This means that a converter need not be reset explicitly between data * streams if it finishes the previous stream without errors.) * * This is a <I>stateful</I> conversion. Additionally, even when all source data has * been consumed, some data may be in the converters' internal state. * Call this function repeatedly, updating the target pointers with * the next empty chunk of target in case of a * <TT>U_BUFFER_OVERFLOW_ERROR</TT>, and updating the source pointers * with the next chunk of source when a successful error status is * returned, until there are no more chunks of source data. * @param converter the Unicode converter * @param target I/O parameter. Input : Points to the beginning of the buffer to copy * UChars into. Output : points to after the last UChar copied. * @param targetLimit the pointer just after the end of the <TT>target</TT> buffer * @param source I/O parameter, pointer to pointer to the source codepage buffer. * @param sourceLimit the pointer to the byte after the end of the source buffer * @param offsets if NULL is passed, nothing will happen to it, otherwise it needs to have the same number * of allocated cells as <TT>target</TT>. Will fill in offsets from target to source pointer * e.g: <TT>offsets[3]</TT> is equal to 6, it means that the <TT>target[3]</TT> was a result of transcoding <TT>source[6]</TT> * For output data carried across calls, and other data without a specific source character * (such as from escape sequences or callbacks) -1 will be placed for offsets. * @param flush set to <TT>TRUE</TT> if the current source buffer is the last available * chunk of the source, <TT>FALSE</TT> otherwise. Note that if a failing status is returned, * this function may have to be called multiple times with flush set to <TT>TRUE</TT> until * the source buffer is consumed. * @param err the error status. <TT>U_ILLEGAL_ARGUMENT_ERROR</TT> will be set if the * converter is <TT>NULL</TT>. * <code>U_BUFFER_OVERFLOW_ERROR</code> will be set if the target is full and there is * still data to be written to the target. * @see ucnv_fromUChars * @see ucnv_convert * @see ucnv_getMinCharSize * @see ucnv_setFromUCallBack * @see ucnv_getNextUChar * @stable ICU 2.0 */U_STABLE void U_EXPORT2 ucnv_toUnicode(UConverter *converter, UChar **target, const UChar *targetLimit, const char **source, const char *sourceLimit, int32_t *offsets, UBool flush, UErrorCode *err);/** * Convert the Unicode string into a codepage string using an existing UConverter. * The output string is NUL-terminated if possible. * * This function is a more convenient but less powerful version of ucnv_fromUnicode(). * It is only useful for whole strings, not for streaming conversion. * * The maximum output buffer capacity required (barring output from callbacks) will be * UCNV_GET_MAX_BYTES_FOR_STRING(srcLength, ucnv_getMaxCharSize(cnv)). * * @param cnv the converter object to be used (ucnv_resetFromUnicode() will be called) * @param src the input Unicode string * @param srcLength the input string length, or -1 if NUL-terminated * @param dest destination string buffer, can be NULL if destCapacity==0 * @param destCapacity the number of chars available at dest * @param pErrorCode normal ICU error code; * common error codes that may be set by this function include * U_BUFFER_OVERFLOW_ERROR, U_STRING_NOT_TERMINATED_WARNING, * U_ILLEGAL_ARGUMENT_ERROR, and conversion errors * @return the length of the output string, not counting the terminating NUL; * if the length is greater than destCapacity, then the string will not fit * and a buffer of the indicated length would need to be passed in * @see ucnv_fromUnicode * @see ucnv_convert * @see UCNV_GET_MAX_BYTES_FOR_STRING * @stable ICU 2.0 */U_STABLE int32_t U_EXPORT2ucnv_fromUChars(UConverter *cnv, char *dest, int32_t destCapacity, const UChar *src, int32_t srcLength, UErrorCode *pErrorCode);/** * Convert the codepage string into a Unicode string using an existing UConverter. * The output string is NUL-terminated if possible. * * This function is a more convenient but less powerful version of ucnv_toUnicode(). * It is only useful for whole strings, not for streaming conversion. * * The maximum output buffer capacity required (barring output from callbacks) will be * 2*srcLength (each char may be converted into a surrogate pair). * * @param cnv the converter object to be used (ucnv_resetToUnicode() will be called) * @param src the input codepage string * @param srcLength the input string length, or -1 if NUL-terminated * @param dest destination string buffer, can be NULL if destCapacity==0 * @param destCapacity the number of UChars available at dest * @param pErrorCode normal ICU error code; * common error codes that may be set by this function include * U_BUFFER_OVERFLOW_ERROR, U_STRING_NOT_TERMINATED_WARNING, * U_ILLEGAL_ARGUMENT_ERROR, and conversion errors * @return the length of the output string, not counting the terminating NUL; * if the length is greater than destCapacity, then the string will not fit * and a buffer of the indicated length would need to be passed in * @see ucnv_toUnicode * @see ucnv_convert * @stable ICU 2.0 */U_STABLE int32_t U_EXPORT2ucnv_toUChars(UConverter *cnv, UChar *dest, int32_t destCapacity, const char *src, int32_t srcLength, UErrorCode *pErrorCode);/** * Convert a codepage buffer into Unicode one character at a time. * The input is completely consumed when the U_INDEX_OUTOFBOUNDS_ERROR is set. * * Advantage compared to ucnv_toUnicode() or ucnv_toUChars(): * - Faster for small amounts of data, for most converters, e.g., * US-ASCII, ISO-8859-1, UTF-8/16/32, and most "normal" charsets. * (For complex converters, e.g., SCSU, UTF-7 and ISO 2022 variants, * it uses ucnv_toUnicode() internally.) * - Convenient. * * Limitations compared to ucnv_toUnicode(): * - Always assumes flush=TRUE. * This makes ucnv_getNextUChar() unsuitable for "streaming" conversion, * that is, for where the input is supplied in multiple buffers, * because ucnv_getNextUChar() will assume the end of the input at the end * of the first buffer. * - Does not provide offset output. * * It is possible to "mix" ucnv_getNextUChar() and ucnv_toUnicode() because * ucnv_getNextUChar() uses the current state of the converter * (unlike ucnv_toUChars() which always resets first). * However, if ucnv_getNextUChar() is called after ucnv_toUnicode() * stopped in the middle of a character sequence (with flush=FALSE), * then ucnv_getNextUChar() will always use the slower ucnv_toUnicode() * internally until the next character boundary. * (This is new in ICU 2.6. In earlier releases, ucnv_getNextUChar() had to * start at a character boundary.) * * Instead of using ucnv_getNextUChar(), it is recommended * to convert using ucnv_toUnicode() or ucnv_toUChars() * and then iterate over the text using U16_NEXT() or a UCharIterator (uiter.h) * or a C++ CharacterIterator or similar. * This allows streaming conversion and offset output, for example. * * <p>Handling of surrogate pairs and supplementary-plane code points:<br> * There are two different kinds of codepages that provide mappings for surrogate characters: * <ul> * <li>Codepages like UTF-8, UTF-32, and GB 18030 provide direct representations for Unicode * code points U+10000-U+10ffff as well as for single surrogates U+d800-U+dfff. * Each valid sequence will result in exactly one returned code point. * If a sequence results in a single surrogate, then that will be returned * by itself, even if a neighboring sequence encodes the matching surrogate.</li> * <li>Codepages like SCSU and LMBCS (and UTF-16) provide direct representations only for BMP code points * including surrogates. Code points in supplementary planes are represented with * two sequences, each encoding a surrogate. * For these codepages, matching pairs of surrogates will be combined into single * code points for returning from this function. * (Note that SCSU is actually a mix of these codepage types.)</li> * </ul></p> * * @param converter an open UConverter * @param source the address of a pointer to the codepage buffer, will be * updated to point after the bytes consumed in the conversion call. * @param sourceLimit points to the end of the input buffer * @param err fills in error status (see ucnv_toUnicode) * <code>U_INDEX_OUTOFBOUNDS_ERROR</code> will be set if the input * is empty or does not convert to any output (e.g.: pure state-change * codes SI/SO, escape sequences for ISO 2022, * or if the callback did not output anything, ...). * This function will not set a <code>U_BUFFER_OVERFLOW_ERROR</code> because * the "buffer" is the return code. However, there might be subsequent output * stored in the converter object * that will be returned in following calls to this function. * @return a UChar32 resulting from the partial conversion of source * @see ucnv_toUnicode * @see ucnv_toUChars * @see ucnv_convert * @stable ICU 2.0 */U_STABLE UChar32 U_EXPORT2ucnv_getNextUChar(UConverter * converter, const char **source, const char * sourceLimit, UErrorCode * err);/** * Convert from one external charset to another using two existing UConverters. * Internally, two conversions - ucnv_toUnicode() and ucnv_fromUnicode() - * are used, "pivoting" through 16-bit Unicode. * * There is a similar function, ucnv_convert(), * which has the following limitations: * - it takes charset names, not converter objects, so that * - two converters are opened for each call * - only single-string conversion is possible, not streaming operation * - it does not provide enough information to find out, * in case of failure, whether the toUnicode or * the fromUnicode conversion failed * * By contrast, ucnv_convertEx() * - takes UConverter parameters instead of charset names * - fully exposes the pivot buffer for complete error handling * * ucnv_convertEx() also provides further convenience: * - an option to reset the converters at the beginning * (if reset==TRUE, see parameters; * also sets *pivotTarget=*pivotSource=pivotStart) * - allow NUL-terminated input * (only a single NUL byte, will not work for charsets with multi-byte NULs) * (if sourceLimit==NULL, see parameters) * - terminate with a NUL on output * (only a single NUL byte, not useful for charsets with multi-byte NULs), * or set U_STRING_NOT_TERMINATED_WARNING if the output exactly fills * the target buffer * - the pivot buffer can be provided internally; * in this case, the caller will not be able to get details about where an * error occurred * (if pivotStart==NULL, see below) * * The function returns when one of the following is true: * - the entire source text has been converted successfully to the target buffer * - a target buffer overflow occurred (U_BUFFER_OVERFLOW_ERROR) * - a conversion error occurred * (other U_FAILURE(), see description of pErrorCode) * * Limitation compared to the direct use of * ucnv_fromUnicode() and ucnv_toUnicode(): * ucnv_convertEx() does not provide offset information. * * Limitation compared to ucnv_fromUChars() and ucnv_toUChars(): * ucnv_convertEx() does not support preflighting directly. * * Sample code for converting a single string from * one external charset to UTF-8, ignoring the location of errors: * * \code * int32_t * myToUTF8(UConverter *cnv, * const char *s, int32_t length, * char *u8, int32_t capacity, * UErrorCode *pErrorCode) { * UConverter *utf8Cnv; * char *target; * * if(U_FAILURE(*pErrorCode)) { * return 0; * } *
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -