📄 uchar.h

📁 linux下开源浏览器WebKit的源码,市面上的很多商用浏览器都是移植自WebKit
💻 H
📖 第 1 页 / 共 5 页
字号:
12 3 4 5 下一页
/************************************************************************   Copyright (C) 1997-2004, International Business Machines*   Corporation and others.  All Rights Reserved.************************************************************************ File UCHAR.H** Modification History:**   Date        Name        Description*   04/02/97    aliu        Creation.*   03/29/99    helena      Updated for C APIs.*   4/15/99     Madhu       Updated for C Implementation and Javadoc*   5/20/99     Madhu       Added the function u_getVersion()*   8/19/1999   srl         Upgraded scripts to Unicode 3.0*   8/27/1999   schererm    UCharDirection constants: U_...*   11/11/1999  weiv        added u_isalnum(), cleaned comments*   01/11/2000  helena      Renamed u_getVersion to u_getUnicodeVersion().*******************************************************************************/#ifndef UCHAR_H#define UCHAR_H#include "unicode/utypes.h"U_CDECL_BEGIN/*==========================================================================*//* Unicode version number                                                   *//*==========================================================================*//** * Unicode version number, default for the current ICU version. * The actual Unicode Character Database (UCD) data is stored in uprops.dat * and may be generated from UCD files from a different Unicode version. * Call u_getUnicodeVersion to get the actual Unicode version of the data. * * @see u_getUnicodeVersion * @stable ICU 2.0 */#define U_UNICODE_VERSION "4.0.1"/** * \file * \brief C API: Unicode Properties * * This C API provides low-level access to the Unicode Character Database. * In addition to raw property values, some convenience functions calculate * derived properties, for example for Java-style programming. * * Unicode assigns each code point (not just assigned character) values for * many properties. * Most of them are simple boolean flags, or constants from a small enumerated list. * For some properties, values are strings or other relatively more complex types. * * For more information see * "About the Unicode Character Database" (http://www.unicode.org/ucd/) * and the ICU User Guide chapter on Properties (http://oss.software.ibm.com/icu/userguide/properties.html). * * Many functions are designed to match java.lang.Character functions. * See the individual function documentation, * and see the JDK 1.4.1 java.lang.Character documentation * at http://java.sun.com/j2se/1.4.1/docs/api/java/lang/Character.html * * There are also functions that provide easy migration from C/POSIX functions * like isblank(). Their use is generally discouraged because the C/POSIX * standards do not define their semantics beyond the ASCII range, which means * that different implementations exhibit very different behavior. * Instead, Unicode properties should be used directly. * * There are also only a few, broad C/POSIX character classes, and they tend * to be used for conflicting purposes. For example, the "isalpha()" class * is sometimes used to determine word boundaries, while a more sophisticated * approach would at least distinguish initial letters from continuation * characters (the latter including combining marks). * (In ICU, BreakIterator is the most sophisticated API for word boundaries.) * Another example: There is no "istitle()" class for titlecase characters. * * A summary of the behavior of some C/POSIX character classification implementations * for Unicode is available at http://oss.software.ibm.com/cvs/icu/~checkout~/icuhtml/design/posix_classes.html * * <strong>Important</strong>: * The behavior of the ICU C/POSIX-style character classification * functions is subject to change according to discussion of the above summary. * * Note: There are several ICU whitespace functions. * Comparison: * - u_isUWhiteSpace=UCHAR_WHITE_SPACE: Unicode White_Space property; *       most of general categories "Z" (separators) + most whitespace ISO controls *       (including no-break spaces, but excluding IS1..IS4 and ZWSP) * - u_isWhitespace: Java isWhitespace; Z + whitespace ISO controls but excluding no-break spaces * - u_isJavaSpaceChar: Java isSpaceChar; just Z (including no-break spaces) * - u_isspace: Z + whitespace ISO controls (including no-break spaces) * - u_isblank: "horizontal spaces" = TAB + Zs - ZWSP *//** * Constants. *//** The lowest Unicode code point value. Code points are non-negative. @stable ICU 2.0 */#define UCHAR_MIN_VALUE 0/** * The highest Unicode code point value (scalar value) according to * The Unicode Standard. This is a 21-bit value (20.1 bits, rounded up). * For a single character, UChar32 is a simple type that can hold any code point value. * * @see UChar32 * @stable ICU 2.0 */#define UCHAR_MAX_VALUE 0x10ffff/** * Get a single-bit bit set (a flag) from a bit number 0..31. * @stable ICU 2.1 */#define U_MASK(x) ((uint32_t)1<<(x))/* * !! Note: Several comments in this file are machine-read by the * genpname tool.  These comments describe the correspondence between * icu enum constants and UCD entities.  Do not delete them.  Update * these comments as needed. * * Any comment of the form "/ *[name]* /" (spaces added) is such * a comment. * * The U_JG_* and U_GC_*_MASK constants are matched by their symbolic * name, which must match PropertyValueAliases.txt. *//** * Selection constants for Unicode properties. * These constants are used in functions like u_hasBinaryProperty to select * one of the Unicode properties. * * The properties APIs are intended to reflect Unicode properties as defined * in the Unicode Character Database (UCD) and Unicode Technical Reports (UTR). * For details about the properties see http://www.unicode.org/ucd/ . * For names of Unicode properties see the UCD file PropertyAliases.txt. * * Important: If ICU is built with UCD files from Unicode versions below, e.g., 3.2, * then properties marked with "new in Unicode 3.2" are not or not fully available. * Check u_getUnicodeVersion to be sure. * * @see u_hasBinaryProperty * @see u_getIntPropertyValue * @see u_getUnicodeVersion * @stable ICU 2.1 */typedef enum UProperty {    /*  See note !!.  Comments of the form "Binary property Dash",        "Enumerated property Script", "Double property Numeric_Value",        and "String property Age" are read by genpname. */    /*  Note: Place UCHAR_ALPHABETIC before UCHAR_BINARY_START so that    debuggers display UCHAR_ALPHABETIC as the symbolic name for 0,    rather than UCHAR_BINARY_START.  Likewise for other *_START    identifiers. */    /** Binary property Alphabetic. Same as u_isUAlphabetic, different from u_isalpha.        Lu+Ll+Lt+Lm+Lo+Nl+Other_Alphabetic @stable ICU 2.1 */    UCHAR_ALPHABETIC=0,    /** First constant for binary Unicode properties. @stable ICU 2.1 */    UCHAR_BINARY_START=UCHAR_ALPHABETIC,    /** Binary property ASCII_Hex_Digit. 0-9 A-F a-f @stable ICU 2.1 */    UCHAR_ASCII_HEX_DIGIT,    /** Binary property Bidi_Control.        Format controls which have specific functions        in the Bidi Algorithm. @stable ICU 2.1 */    UCHAR_BIDI_CONTROL,    /** Binary property Bidi_Mirrored.        Characters that may change display in RTL text.        Same as u_isMirrored.        See Bidi Algorithm, UTR 9. @stable ICU 2.1 */    UCHAR_BIDI_MIRRORED,    /** Binary property Dash. Variations of dashes. @stable ICU 2.1 */    UCHAR_DASH,    /** Binary property Default_Ignorable_Code_Point (new in Unicode 3.2).        Ignorable in most processing.        <2060..206F, FFF0..FFFB, E0000..E0FFF>+Other_Default_Ignorable_Code_Point+(Cf+Cc+Cs-White_Space) @stable ICU 2.1 */    UCHAR_DEFAULT_IGNORABLE_CODE_POINT,    /** Binary property Deprecated (new in Unicode 3.2).        The usage of deprecated characters is strongly discouraged. @stable ICU 2.1 */    UCHAR_DEPRECATED,    /** Binary property Diacritic. Characters that linguistically modify        the meaning of another character to which they apply. @stable ICU 2.1 */    UCHAR_DIACRITIC,    /** Binary property Extender.        Extend the value or shape of a preceding alphabetic character,        e.g., length and iteration marks. @stable ICU 2.1 */    UCHAR_EXTENDER,    /** Binary property Full_Composition_Exclusion.        CompositionExclusions.txt+Singleton Decompositions+        Non-Starter Decompositions. @stable ICU 2.1 */    UCHAR_FULL_COMPOSITION_EXCLUSION,    /** Binary property Grapheme_Base (new in Unicode 3.2).        For programmatic determination of grapheme cluster boundaries.        [0..10FFFF]-Cc-Cf-Cs-Co-Cn-Zl-Zp-Grapheme_Link-Grapheme_Extend-CGJ @stable ICU 2.1 */    UCHAR_GRAPHEME_BASE,    /** Binary property Grapheme_Extend (new in Unicode 3.2).        For programmatic determination of grapheme cluster boundaries.        Me+Mn+Mc+Other_Grapheme_Extend-Grapheme_Link-CGJ @stable ICU 2.1 */    UCHAR_GRAPHEME_EXTEND,    /** Binary property Grapheme_Link (new in Unicode 3.2).        For programmatic determination of grapheme cluster boundaries. @stable ICU 2.1 */    UCHAR_GRAPHEME_LINK,    /** Binary property Hex_Digit.        Characters commonly used for hexadecimal numbers. @stable ICU 2.1 */    UCHAR_HEX_DIGIT,    /** Binary property Hyphen. Dashes used to mark connections        between pieces of words, plus the Katakana middle dot. @stable ICU 2.1 */    UCHAR_HYPHEN,    /** Binary property ID_Continue.        Characters that can continue an identifier.        DerivedCoreProperties.txt also says "NOTE: Cf characters should be filtered out."        ID_Start+Mn+Mc+Nd+Pc @stable ICU 2.1 */    UCHAR_ID_CONTINUE,    /** Binary property ID_Start.        Characters that can start an identifier.        Lu+Ll+Lt+Lm+Lo+Nl @stable ICU 2.1 */    UCHAR_ID_START,    /** Binary property Ideographic.        CJKV ideographs. @stable ICU 2.1 */    UCHAR_IDEOGRAPHIC,    /** Binary property IDS_Binary_Operator (new in Unicode 3.2).        For programmatic determination of        Ideographic Description Sequences. @stable ICU 2.1 */    UCHAR_IDS_BINARY_OPERATOR,    /** Binary property IDS_Trinary_Operator (new in Unicode 3.2).        For programmatic determination of        Ideographic Description Sequences. @stable ICU 2.1 */    UCHAR_IDS_TRINARY_OPERATOR,    /** Binary property Join_Control.        Format controls for cursive joining and ligation. @stable ICU 2.1 */    UCHAR_JOIN_CONTROL,    /** Binary property Logical_Order_Exception (new in Unicode 3.2).        Characters that do not use logical order and        require special handling in most processing. @stable ICU 2.1 */    UCHAR_LOGICAL_ORDER_EXCEPTION,    /** Binary property Lowercase. Same as u_isULowercase, different from u_islower.        Ll+Other_Lowercase @stable ICU 2.1 */    UCHAR_LOWERCASE,    /** Binary property Math. Sm+Other_Math @stable ICU 2.1 */    UCHAR_MATH,    /** Binary property Noncharacter_Code_Point.        Code points that are explicitly defined as illegal        for the encoding of characters. @stable ICU 2.1 */    UCHAR_NONCHARACTER_CODE_POINT,    /** Binary property Quotation_Mark. @stable ICU 2.1 */    UCHAR_QUOTATION_MARK,    /** Binary property Radical (new in Unicode 3.2).        For programmatic determination of        Ideographic Description Sequences. @stable ICU 2.1 */    UCHAR_RADICAL,    /** Binary property Soft_Dotted (new in Unicode 3.2).        Characters with a "soft dot", like i or j.        An accent placed on these characters causes        the dot to disappear. @stable ICU 2.1 */    UCHAR_SOFT_DOTTED,    /** Binary property Terminal_Punctuation.        Punctuation characters that generally mark        the end of textual units. @stable ICU 2.1 */    UCHAR_TERMINAL_PUNCTUATION,    /** Binary property Unified_Ideograph (new in Unicode 3.2).        For programmatic determination of        Ideographic Description Sequences. @stable ICU 2.1 */    UCHAR_UNIFIED_IDEOGRAPH,    /** Binary property Uppercase. Same as u_isUUppercase, different from u_isupper.        Lu+Other_Uppercase @stable ICU 2.1 */    UCHAR_UPPERCASE,    /** Binary property White_Space.        Same as u_isUWhiteSpace, different from u_isspace and u_isWhitespace.        Space characters+TAB+CR+LF-ZWSP-ZWNBSP @stable ICU 2.1 */    UCHAR_WHITE_SPACE,    /** Binary property XID_Continue.        ID_Continue modified to allow closure under        normalization forms NFKC and NFKD. @stable ICU 2.1 */    UCHAR_XID_CONTINUE,    /** Binary property XID_Start. ID_Start modified to allow        closure under normalization forms NFKC and NFKD. @stable ICU 2.1 */    UCHAR_XID_START,    /** Binary property Case_Sensitive. Either the source of a case        mapping or _in_ the target of a case mapping. Not the same as        the general category Cased_Letter. @stable ICU 2.6 */    UCHAR_CASE_SENSITIVE,    /** Binary property STerm (new in Unicode 4.0.1).        Sentence Terminal. Used in UAX #29: Text Boundaries        (http://www.unicode.org/reports/tr29/)        @draft ICU 3.0 */    UCHAR_S_TERM,    /** Binary property Variation_Selector (new in Unicode 4.0.1).        Indicates all those characters that qualify as Variation Selectors.        For details on the behavior of these characters,        see StandardizedVariants.html and 15.6 Variation Selectors.        @draft ICU 3.0 */    UCHAR_VARIATION_SELECTOR,    /** Binary property NFD_Inert.        ICU-specific property for characters that are inert under NFD,        i.e., they do not interact with adjacent characters.        Used for example in normalizing transforms in incremental mode        to find the boundary of safely normalizable text despite possible        text additions.        There is one such property per normalization form.        These properties are computed as follows - an inert character is:        a) unassigned, or ALL of the following:        b) of combining class 0.        c) not decomposed by this normalization form.        AND if NFC or NFKC,        d) can never compose with a previous character.        e) can never compose with a following character.        f) can never change if another character is added.           Example: a-breve might satisfy all but f, but if you           add an ogonek it changes to a-ogonek + breve        See also com.ibm.text.UCD.NFSkippable in the ICU4J repository,        and icu/source/common/unormimp.h .        @draft ICU 3.0 */    UCHAR_NFD_INERT,    /** Binary property NFKD_Inert.        ICU-specific property for characters that are inert under NFKD,        i.e., they do not interact with adjacent characters.        Used for example in normalizing transforms in incremental mode        to find the boundary of safely normalizable text despite possible        text additions.        @see UCHAR_NFD_INERT        @draft ICU 3.0 */    UCHAR_NFKD_INERT,    /** Binary property NFC_Inert.        ICU-specific property for characters that are inert under NFC,        i.e., they do not interact with adjacent characters.        Used for example in normalizing transforms in incremental mode        to find the boundary of safely normalizable text despite possible        text additions.        @see UCHAR_NFD_INERT        @draft ICU 3.0 */    UCHAR_NFC_INERT,    /** Binary property NFKC_Inert.        ICU-specific property for characters that are inert under NFKC,        i.e., they do not interact with adjacent characters.        Used for example in normalizing transforms in incremental mode        to find the boundary of safely normalizable text despite possible        text additions.        @see UCHAR_NFD_INERT        @draft ICU 3.0 */    UCHAR_NFKC_INERT,    /** Binary Property Segment_Starter.        ICU-specific property for characters that are starters in terms of        Unicode normalization and combining character sequences.        They have ccc=0 and do not occur in non-initial position of the        canonical decomposition of any character        (like " in NFD(a-umlaut) and a Jamo T in an NFD(Hangul LVT)).        ICU uses this property for segmenting a string for generating a set of        canonically equivalent strings, e.g. for canonical closure while        processing collation tailoring rules.        @draft ICU 3.0 */    UCHAR_SEGMENT_STARTER,    /** One more than the last constant for binary Unicode properties. @stable ICU 2.1 */    UCHAR_BINARY_LIMIT,    /** Enumerated property Bidi_Class.        Same as u_charDirection, returns UCharDirection values. @stable ICU 2.2 */    UCHAR_BIDI_CLASS=0x1000,    /** First constant for enumerated/integer Unicode properties. @stable ICU 2.2 */    UCHAR_INT_START=UCHAR_BIDI_CLASS,    /** Enumerated property Block.        Same as ublock_getCode, returns UBlockCode values. @stable ICU 2.2 */    UCHAR_BLOCK,    /** Enumerated property Canonical_Combining_Class.
12 3 4 5 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -