ucnv.h

来自「linux下开源浏览器WebKit的源码,市面上的很多商用浏览器都是移植自Web」· C头文件代码 · 共 1,561 行 · 第 1/5 页
1,561 行
 * the next empty chunk of target in case of a * <TT>U_BUFFER_OVERFLOW_ERROR</TT>, and updating the source  pointers *  with the next chunk of source when a successful error status is * returned, until there are no more chunks of source data. * @param converter the Unicode converter * @param target I/O parameter. Input : Points to the beginning of the buffer to copy *  codepage characters to. Output : points to after the last codepage character copied *  to <TT>target</TT>. * @param targetLimit the pointer just after last of the <TT>target</TT> buffer * @param source I/O parameter, pointer to pointer to the source Unicode character buffer.  * @param sourceLimit the pointer just after the last of the source buffer * @param offsets if NULL is passed, nothing will happen to it, otherwise it needs to have the same number * of allocated cells as <TT>target</TT>. Will fill in offsets from target to source pointer * e.g: <TT>offsets[3]</TT> is equal to 6, it means that the <TT>target[3]</TT> was a result of transcoding <TT>source[6]</TT> * For output data carried across calls, and other data without a specific source character * (such as from escape sequences or callbacks)  -1 will be placed for offsets.  * @param flush set to <TT>TRUE</TT> if the current source buffer is the last available * chunk of the source, <TT>FALSE</TT> otherwise. Note that if a failing status is returned, * this function may have to be called multiple times with flush set to <TT>TRUE</TT> until * the source buffer is consumed. * @param err the error status.  <TT>U_ILLEGAL_ARGUMENT_ERROR</TT> will be set if the * converter is <TT>NULL</TT>. * <code>U_BUFFER_OVERFLOW_ERROR</code> will be set if the target is full and there is  * still data to be written to the target. * @see ucnv_fromUChars * @see ucnv_convert * @see ucnv_getMinCharSize * @see ucnv_setToUCallBack * @stable ICU 2.0 */U_STABLE void U_EXPORT2 ucnv_fromUnicode (UConverter * converter,                  char **target,                  const char *targetLimit,                  const UChar ** source,                  const UChar * sourceLimit,                  int32_t* offsets,                  UBool flush,                  UErrorCode * err);/** * Converts a buffer of codepage bytes into an array of unicode UChars * characters. This function is optimized for converting a continuous * stream of data in buffer-sized chunks, where the entire source and * target does not fit in available buffers. *  * The source pointer is an in/out parameter. It starts out pointing where the  * conversion is to begin, and ends up pointing after the last byte of source consumed.  *  * Target similarly starts out pointer at the first available UChar in the output * buffer, and ends up pointing after the last UChar written to the output.  * It does NOT necessarily keep UChar sequences together. *  * The converter always attempts to consume the entire source buffer, unless  * (1.) the target buffer is full, or (2.) a failing error is returned from the * current callback function.  When a successful error status has been * returned, it means that all of the source buffer has been *  consumed. At that point, the caller should reset the source and *  sourceLimit pointers to point to the next chunk. * * At the end of the stream (flush==TRUE), the input is completely consumed * when *source==sourceLimit and no error code is set * The converter object is then automatically reset by this function. * (This means that a converter need not be reset explicitly between data * streams if it finishes the previous stream without errors.) *  * This is a <I>stateful</I> conversion. Additionally, even when all source data has * been consumed, some data may be in the converters' internal state. * Call this function repeatedly, updating the target pointers with * the next empty chunk of target in case of a * <TT>U_BUFFER_OVERFLOW_ERROR</TT>, and updating the source  pointers *  with the next chunk of source when a successful error status is * returned, until there are no more chunks of source data. * @param converter the Unicode converter * @param target I/O parameter. Input : Points to the beginning of the buffer to copy *  UChars into. Output : points to after the last UChar copied. * @param targetLimit the pointer just after the end of the <TT>target</TT> buffer * @param source I/O parameter, pointer to pointer to the source codepage buffer.  * @param sourceLimit the pointer to the byte after the end of the source buffer * @param offsets if NULL is passed, nothing will happen to it, otherwise it needs to have the same number * of allocated cells as <TT>target</TT>. Will fill in offsets from target to source pointer * e.g: <TT>offsets[3]</TT> is equal to 6, it means that the <TT>target[3]</TT> was a result of transcoding <TT>source[6]</TT> * For output data carried across calls, and other data without a specific source character * (such as from escape sequences or callbacks)  -1 will be placed for offsets.  * @param flush set to <TT>TRUE</TT> if the current source buffer is the last available * chunk of the source, <TT>FALSE</TT> otherwise. Note that if a failing status is returned, * this function may have to be called multiple times with flush set to <TT>TRUE</TT> until * the source buffer is consumed. * @param err the error status.  <TT>U_ILLEGAL_ARGUMENT_ERROR</TT> will be set if the * converter is <TT>NULL</TT>. * <code>U_BUFFER_OVERFLOW_ERROR</code> will be set if the target is full and there is  * still data to be written to the target.  * @see ucnv_fromUChars * @see ucnv_convert * @see ucnv_getMinCharSize * @see ucnv_setFromUCallBack * @see ucnv_getNextUChar * @stable ICU 2.0 */U_STABLE void U_EXPORT2 ucnv_toUnicode(UConverter *converter,               UChar **target,               const UChar *targetLimit,               const char **source,               const char *sourceLimit,               int32_t *offsets,               UBool flush,               UErrorCode *err);/** * Convert the Unicode string into a codepage string using an existing UConverter. * The output string is NUL-terminated if possible. * * This function is a more convenient but less powerful version of ucnv_fromUnicode(). * It is only useful for whole strings, not for streaming conversion. * * The maximum output buffer capacity required (barring output from callbacks) will be * UCNV_GET_MAX_BYTES_FOR_STRING(srcLength, ucnv_getMaxCharSize(cnv)). * * @param cnv the converter object to be used (ucnv_resetFromUnicode() will be called) * @param src the input Unicode string * @param srcLength the input string length, or -1 if NUL-terminated * @param dest destination string buffer, can be NULL if destCapacity==0 * @param destCapacity the number of chars available at dest * @param pErrorCode normal ICU error code; *                  common error codes that may be set by this function include *                  U_BUFFER_OVERFLOW_ERROR, U_STRING_NOT_TERMINATED_WARNING, *                  U_ILLEGAL_ARGUMENT_ERROR, and conversion errors * @return the length of the output string, not counting the terminating NUL; *         if the length is greater than destCapacity, then the string will not fit *         and a buffer of the indicated length would need to be passed in * @see ucnv_fromUnicode * @see ucnv_convert * @see UCNV_GET_MAX_BYTES_FOR_STRING * @stable ICU 2.0 */U_STABLE int32_t U_EXPORT2ucnv_fromUChars(UConverter *cnv,                char *dest, int32_t destCapacity,                const UChar *src, int32_t srcLength,                UErrorCode *pErrorCode);/** * Convert the codepage string into a Unicode string using an existing UConverter. * The output string is NUL-terminated if possible. * * This function is a more convenient but less powerful version of ucnv_toUnicode(). * It is only useful for whole strings, not for streaming conversion. * * The maximum output buffer capacity required (barring output from callbacks) will be * 2*srcLength (each char may be converted into a surrogate pair). * * @param cnv the converter object to be used (ucnv_resetToUnicode() will be called) * @param src the input codepage string * @param srcLength the input string length, or -1 if NUL-terminated * @param dest destination string buffer, can be NULL if destCapacity==0 * @param destCapacity the number of UChars available at dest * @param pErrorCode normal ICU error code; *                  common error codes that may be set by this function include *                  U_BUFFER_OVERFLOW_ERROR, U_STRING_NOT_TERMINATED_WARNING, *                  U_ILLEGAL_ARGUMENT_ERROR, and conversion errors * @return the length of the output string, not counting the terminating NUL; *         if the length is greater than destCapacity, then the string will not fit *         and a buffer of the indicated length would need to be passed in * @see ucnv_toUnicode * @see ucnv_convert * @stable ICU 2.0 */U_STABLE int32_t U_EXPORT2ucnv_toUChars(UConverter *cnv,              UChar *dest, int32_t destCapacity,              const char *src, int32_t srcLength,              UErrorCode *pErrorCode);/** * Convert a codepage buffer into Unicode one character at a time. * The input is completely consumed when the U_INDEX_OUTOFBOUNDS_ERROR is set. * * Advantage compared to ucnv_toUnicode() or ucnv_toUChars(): * - Faster for small amounts of data, for most converters, e.g., *   US-ASCII, ISO-8859-1, UTF-8/16/32, and most "normal" charsets. *   (For complex converters, e.g., SCSU, UTF-7 and ISO 2022 variants, *    it uses ucnv_toUnicode() internally.) * - Convenient. * * Limitations compared to ucnv_toUnicode(): * - Always assumes flush=TRUE. *   This makes ucnv_getNextUChar() unsuitable for "streaming" conversion, *   that is, for where the input is supplied in multiple buffers, *   because ucnv_getNextUChar() will assume the end of the input at the end *   of the first buffer. * - Does not provide offset output. * * It is possible to "mix" ucnv_getNextUChar() and ucnv_toUnicode() because * ucnv_getNextUChar() uses the current state of the converter * (unlike ucnv_toUChars() which always resets first). * However, if ucnv_getNextUChar() is called after ucnv_toUnicode() * stopped in the middle of a character sequence (with flush=FALSE), * then ucnv_getNextUChar() will always use the slower ucnv_toUnicode() * internally until the next character boundary. * (This is new in ICU 2.6. In earlier releases, ucnv_getNextUChar() had to * start at a character boundary.) * * Instead of using ucnv_getNextUChar(), it is recommended * to convert using ucnv_toUnicode() or ucnv_toUChars() * and then iterate over the text using U16_NEXT() or a UCharIterator (uiter.h) * or a C++ CharacterIterator or similar. * This allows streaming conversion and offset output, for example. * * <p>Handling of surrogate pairs and supplementary-plane code points:<br> * There are two different kinds of codepages that provide mappings for surrogate characters: * <ul> *   <li>Codepages like UTF-8, UTF-32, and GB 18030 provide direct representations for Unicode *       code points U+10000-U+10ffff as well as for single surrogates U+d800-U+dfff. *       Each valid sequence will result in exactly one returned code point. *       If a sequence results in a single surrogate, then that will be returned *       by itself, even if a neighboring sequence encodes the matching surrogate.</li> *   <li>Codepages like SCSU and LMBCS (and UTF-16) provide direct representations only for BMP code points *       including surrogates. Code points in supplementary planes are represented with *       two sequences, each encoding a surrogate. *       For these codepages, matching pairs of surrogates will be combined into single *       code points for returning from this function. *       (Note that SCSU is actually a mix of these codepage types.)</li> * </ul></p> * * @param converter an open UConverter * @param source the address of a pointer to the codepage buffer, will be *  updated to point after the bytes consumed in the conversion call. * @param sourceLimit points to the end of the input buffer * @param err fills in error status (see ucnv_toUnicode) * <code>U_INDEX_OUTOFBOUNDS_ERROR</code> will be set if the input  * is empty or does not convert to any output (e.g.: pure state-change  * codes SI/SO, escape sequences for ISO 2022, * or if the callback did not output anything, ...). * This function will not set a <code>U_BUFFER_OVERFLOW_ERROR</code> because *  the "buffer" is the return code. However, there might be subsequent output *  stored in the converter object * that will be returned in following calls to this function. * @return a UChar32 resulting from the partial conversion of source * @see ucnv_toUnicode * @see ucnv_toUChars * @see ucnv_convert * @stable ICU 2.0 */U_STABLE UChar32 U_EXPORT2ucnv_getNextUChar(UConverter * converter,                  const char **source,                  const char * sourceLimit,                  UErrorCode * err);/** * Convert from one external charset to another using two existing UConverters. * Internally, two conversions - ucnv_toUnicode() and ucnv_fromUnicode() - * are used, "pivoting" through 16-bit Unicode. * * There is a similar function, ucnv_convert(), * which has the following limitations: * - it takes charset names, not converter objects, so that *   - two converters are opened for each call *   - only single-string conversion is possible, not streaming operation * - it does not provide enough information to find out, *   in case of failure, whether the toUnicode or *   the fromUnicode conversion failed * * By contrast, ucnv_convertEx() * - takes UConverter parameters instead of charset names * - fully exposes the pivot buffer for complete error handling * * ucnv_convertEx() also provides further convenience: * - an option to reset the converters at the beginning *   (if reset==TRUE, see parameters; *    also sets *pivotTarget=*pivotSource=pivotStart) * - allow NUL-terminated input *   (only a single NUL byte, will not work for charsets with multi-byte NULs) *   (if sourceLimit==NULL, see parameters) * - terminate with a NUL on output *   (only a single NUL byte, not useful for charsets with multi-byte NULs), *   or set U_STRING_NOT_TERMINATED_WARNING if the output exactly fills *   the target buffer * - the pivot buffer can be provided internally; *   in this case, the caller will not be able to get details about where an *   error occurred *   (if pivotStart==NULL, see below) * * The function returns when one of the following is true: * - the entire source text has been converted successfully to the target buffer * - a target buffer overflow occurred (U_BUFFER_OVERFLOW_ERROR) * - a conversion error occurred *   (other U_FAILURE(), see description of pErrorCode) * * Limitation compared to the direct use of * ucnv_fromUnicode() and ucnv_toUnicode(): * ucnv_convertEx() does not provide offset information. * * Limitation compared to ucnv_fromUChars() and ucnv_toUChars(): * ucnv_convertEx() does not support preflighting directly. * * Sample code for converting a single string from * one external charset to UTF-8, ignoring the location of errors: * * \code * int32_t * myToUTF8(UConverter *cnv, *          const char *s, int32_t length, *          char *u8, int32_t capacity, *          UErrorCode *pErrorCode) { *     UConverter *utf8Cnv; *     char *target; * *     if(U_FAILURE(*pErrorCode)) { *         return 0; *     } *
ucnv.h - 源码说明

本页面展示了「linux下开源浏览器WebKit的源码,市面上的很多商用浏览器都是移植自WebKit」中的 ucnv.h 源码文件，采用 C头文件编程语言编写，共 1,561 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫下载站收录了大量与WebKit相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?