📄 unorm.h

📁 linux下开源浏览器WebKit的源码,市面上的很多商用浏览器都是移植自WebKit
💻 H
📖 第 1 页 / 共 2 页
字号:
上一页 12
 * * @see unorm_quickCheck * @stable ICU 2.2 */U_STABLE UBool U_EXPORT2unorm_isNormalized(const UChar *src, int32_t srcLength,                   UNormalizationMode mode,                   UErrorCode *pErrorCode);/** * Test if a string is in a given normalization form; same as unorm_isNormalized but * takes an extra options parameter like most normalization functions. * * @param src        String that is to be tested if it is in a normalization format. * @param srcLength  Length of source to test, or -1 if NUL-terminated. * @param mode       Which normalization form to test for. * @param options    The normalization options, ORed together (0 for no options). * @param pErrorCode ICU error code in/out parameter. *                   Must fulfill U_SUCCESS before the function call. * @return Boolean value indicating whether the source string is in the *         "mode/options" normalization form. * * @see unorm_quickCheck * @see unorm_isNormalized * @stable ICU 2.6 */U_STABLE UBool U_EXPORT2unorm_isNormalizedWithOptions(const UChar *src, int32_t srcLength,                              UNormalizationMode mode, int32_t options,                              UErrorCode *pErrorCode);/** * Iterative normalization forward. * This function (together with unorm_previous) is somewhat * similar to the C++ Normalizer class (see its non-static functions). * * Iterative normalization is useful when only a small portion of a longer * string/text needs to be processed. * * For example, the likelihood may be high that processing the first 10% of some * text will be sufficient to find certain data. * Another example: When one wants to concatenate two normalized strings and get a * normalized result, it is much more efficient to normalize just a small part of * the result around the concatenation place instead of re-normalizing everything. * * The input text is an instance of the C character iteration API UCharIterator. * It may wrap around a simple string, a CharacterIterator, a Replaceable, or any * other kind of text object. * * If a buffer overflow occurs, then the caller needs to reset the iterator to the * old index and call the function again with a larger buffer - if the caller cares * for the actual output. * Regardless of the output buffer, the iterator will always be moved to the next * normalization boundary. * * This function (like unorm_previous) serves two purposes: * * 1) To find the next boundary so that the normalization of the part of the text * from the current position to that boundary does not affect and is not affected * by the part of the text beyond that boundary. * * 2) To normalize the text up to the boundary. * * The second step is optional, per the doNormalize parameter. * It is omitted for operations like string concatenation, where the two adjacent * string ends need to be normalized together. * In such a case, the output buffer will just contain a copy of the text up to the * boundary. * * pNeededToNormalize is an output-only parameter. Its output value is only defined * if normalization was requested (doNormalize) and successful (especially, no * buffer overflow). * It is useful for operations like a normalizing transliterator, where one would * not want to replace a piece of text if it is not modified. * * If doNormalize==TRUE and pNeededToNormalize!=NULL then *pNeeded... is set TRUE * if the normalization was necessary. * * If doNormalize==FALSE then *pNeededToNormalize will be set to FALSE. * * If the buffer overflows, then *pNeededToNormalize will be undefined; * essentially, whenever U_FAILURE is true (like in buffer overflows), this result * will be undefined. * * @param src The input text in the form of a C character iterator. * @param dest The output buffer; can be NULL if destCapacity==0 for pure preflighting. * @param destCapacity The number of UChars that fit into dest. * @param mode The normalization mode. * @param options The normalization options, ORed together (0 for no options). * @param doNormalize Indicates if the source text up to the next boundary *                    is to be normalized (TRUE) or just copied (FALSE). * @param pNeededToNormalize Output flag indicating if the normalization resulted in *                           different text from the input. *                           Not defined if an error occurs including buffer overflow. *                           Always FALSE if !doNormalize. * @param pErrorCode ICU error code in/out parameter. *                   Must fulfill U_SUCCESS before the function call. * @return Length of output (number of UChars) when successful or buffer overflow. * * @see unorm_previous * @see unorm_normalize * * @stable ICU 2.1 */U_STABLE int32_t U_EXPORT2unorm_next(UCharIterator *src,           UChar *dest, int32_t destCapacity,           UNormalizationMode mode, int32_t options,           UBool doNormalize, UBool *pNeededToNormalize,           UErrorCode *pErrorCode);/** * Iterative normalization backward. * This function (together with unorm_next) is somewhat * similar to the C++ Normalizer class (see its non-static functions). * For all details see unorm_next. * * @param src The input text in the form of a C character iterator. * @param dest The output buffer; can be NULL if destCapacity==0 for pure preflighting. * @param destCapacity The number of UChars that fit into dest. * @param mode The normalization mode. * @param options The normalization options, ORed together (0 for no options). * @param doNormalize Indicates if the source text up to the next boundary *                    is to be normalized (TRUE) or just copied (FALSE). * @param pNeededToNormalize Output flag indicating if the normalization resulted in *                           different text from the input. *                           Not defined if an error occurs including buffer overflow. *                           Always FALSE if !doNormalize. * @param pErrorCode ICU error code in/out parameter. *                   Must fulfill U_SUCCESS before the function call. * @return Length of output (number of UChars) when successful or buffer overflow. * * @see unorm_next * @see unorm_normalize * * @stable ICU 2.1 */U_STABLE int32_t U_EXPORT2unorm_previous(UCharIterator *src,               UChar *dest, int32_t destCapacity,               UNormalizationMode mode, int32_t options,               UBool doNormalize, UBool *pNeededToNormalize,               UErrorCode *pErrorCode);/** * Concatenate normalized strings, making sure that the result is normalized as well. * * If both the left and the right strings are in * the normalization form according to "mode/options", * then the result will be * * \code *     dest=normalize(left+right, mode, options) * \endcode * * With the input strings already being normalized, * this function will use unorm_next() and unorm_previous() * to find the adjacent end pieces of the input strings. * Only the concatenation of these end pieces will be normalized and * then concatenated with the remaining parts of the input strings. * * It is allowed to have dest==left to avoid copying the entire left string. * * @param left Left source string, may be same as dest. * @param leftLength Length of left source string, or -1 if NUL-terminated. * @param right Right source string. * @param rightLength Length of right source string, or -1 if NUL-terminated. * @param dest The output buffer; can be NULL if destCapacity==0 for pure preflighting. * @param destCapacity The number of UChars that fit into dest. * @param mode The normalization mode. * @param options The normalization options, ORed together (0 for no options). * @param pErrorCode ICU error code in/out parameter. *                   Must fulfill U_SUCCESS before the function call. * @return Length of output (number of UChars) when successful or buffer overflow. * * @see unorm_normalize * @see unorm_next * @see unorm_previous * * @stable ICU 2.1 */U_STABLE int32_t U_EXPORT2unorm_concatenate(const UChar *left, int32_t leftLength,                  const UChar *right, int32_t rightLength,                  UChar *dest, int32_t destCapacity,                  UNormalizationMode mode, int32_t options,                  UErrorCode *pErrorCode);/** * Option bit for unorm_compare: * Both input strings are assumed to fulfill FCD conditions. * @stable ICU 2.2 */#define UNORM_INPUT_IS_FCD          0x20000/** * Option bit for unorm_compare: * Perform case-insensitive comparison. * @stable ICU 2.2 */#define U_COMPARE_IGNORE_CASE       0x10000#ifndef U_COMPARE_CODE_POINT_ORDER/* see also unistr.h and ustring.h *//** * Option bit for u_strCaseCompare, u_strcasecmp, unorm_compare, etc: * Compare strings in code point order instead of code unit order. * @stable ICU 2.2 */#define U_COMPARE_CODE_POINT_ORDER  0x8000#endif/** * Compare two strings for canonical equivalence. * Further options include case-insensitive comparison and * code point order (as opposed to code unit order). * * Canonical equivalence between two strings is defined as their normalized * forms (NFD or NFC) being identical. * This function compares strings incrementally instead of normalizing * (and optionally case-folding) both strings entirely, * improving performance significantly. * * Bulk normalization is only necessary if the strings do not fulfill the FCD * conditions. Only in this case, and only if the strings are relatively long, * is memory allocated temporarily. * For FCD strings and short non-FCD strings there is no memory allocation. * * Semantically, this is equivalent to *   strcmp[CodePointOrder](NFD(foldCase(NFD(s1))), NFD(foldCase(NFD(s2)))) * where code point order and foldCase are all optional. * * UAX 21 2.5 Caseless Matching specifies that for a canonical caseless match * the case folding must be performed first, then the normalization. * * @param s1 First source string. * @param length1 Length of first source string, or -1 if NUL-terminated. * * @param s2 Second source string. * @param length2 Length of second source string, or -1 if NUL-terminated. * * @param options A bit set of options: *   - U_FOLD_CASE_DEFAULT or 0 is used for default options: *     Case-sensitive comparison in code unit order, and the input strings *     are quick-checked for FCD. * *   - UNORM_INPUT_IS_FCD *     Set if the caller knows that both s1 and s2 fulfill the FCD conditions. *     If not set, the function will quickCheck for FCD *     and normalize if necessary. * *   - U_COMPARE_CODE_POINT_ORDER *     Set to choose code point order instead of code unit order *     (see u_strCompare for details). * *   - U_COMPARE_IGNORE_CASE *     Set to compare strings case-insensitively using case folding, *     instead of case-sensitively. *     If set, then the following case folding options are used. * *   - Options as used with case-insensitive comparisons, currently: * *   - U_FOLD_CASE_EXCLUDE_SPECIAL_I *    (see u_strCaseCompare for details) * *   - regular normalization options shifted left by UNORM_COMPARE_NORM_OPTIONS_SHIFT * * @param pErrorCode ICU error code in/out parameter. *                   Must fulfill U_SUCCESS before the function call. * @return <0 or 0 or >0 as usual for string comparisons * * @see unorm_normalize * @see UNORM_FCD * @see u_strCompare * @see u_strCaseCompare * * @stable ICU 2.2 */U_STABLE int32_t U_EXPORT2unorm_compare(const UChar *s1, int32_t length1,              const UChar *s2, int32_t length2,              uint32_t options,              UErrorCode *pErrorCode);#endif /* #if !UCONFIG_NO_NORMALIZATION */#endif
上一页 12
💿 文件大小 15751 K
👤 上传用户 Jane
📂 所属分类 Linux/Unix编程
🏷️ 相关标签

#WebKit #linux #浏览器 #开源
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -