gregex.sgml

来自「This GLib version 2.16.1. GLib is the lo」· SGML 代码 · 共 632 行 · 第 1/2 页
SGML
632 行
<!-- ##### SECTION Title ##### -->Perl-compatible regular expressions<!-- ##### SECTION Short_Description ##### -->matches strings against regular expressions<!-- ##### SECTION Long_Description ##### --><para>The <function>g_regex_*()</function> functions implement regularexpression pattern matching using syntax and semantics similar toPerl regular expression.</para><para>Some functions accept a <parameter>start_position</parameter> argument,setting it differs from just passing over a shortened string and setting#G_REGEX_MATCH_NOTBOL in the case of a pattern that begins with any kindof lookbehind assertion.For example, consider the pattern "\Biss\B" which finds occurrences of "iss"in the middle of words. ("\B" matches only if the current position in thesubject is not a word boundary.) When applied to the string "Mississipi"from the fourth byte, namely "issipi", it does not match, because "\B" isalways false at the start of the subject, which is deemed to be a wordboundary. However, if the entire string is passed , but with<parameter>start_position</parameter> set to 4, it finds the secondoccurrence of "iss" because it is able to look behind the starting pointto discover that it is preceded by a letter.</para><para>Note that, unless you set the #G_REGEX_RAW flag, all the strings passedto these functions must be encoded in UTF-8. The lengths and the positionsinside the strings are in bytes and not in characters, so, for instance,"\xc3\xa0" (i.e. "&agrave;") is two bytes long but it is treated as a singlecharacter. If you set #G_REGEX_RAW the strings can be non-valid UTF-8strings and a byte is treated as a character, so "\xc3\xa0" is two bytesand two characters long.</para><para>When matching a pattern, "\n" matches only against a "\n" character in thestring, and "\r" matches only a "\r" character. To match any newline sequenceuse "\R". This particular group matches either the two-character sequenceCR + LF ("\r\n"), or one of the single characters LF (linefeed, U+000A, "\n"), VT(vertical tab, U+000B, "\v"), FF (formfeed, U+000C, "\f"), CR (carriage return,U+000D, "\r"), NEL (next line, U+0085), LS (line separator, U+2028), or PS(paragraph separator, U+2029).</para><para>The behaviour of the dot, circumflex, and dollar metacharacters are affected bynewline characters, the default is to recognize any newline character (the samecharacters recognized by "\R"). This can be changed with #G_REGEX_NEWLINE_CR,#G_REGEX_NEWLINE_LF and #G_REGEX_NEWLINE_CRLF compile options,and with #G_REGEX_MATCH_NEWLINE_ANY, #G_REGEX_MATCH_NEWLINE_CR,#G_REGEX_MATCH_NEWLINE_LF and #G_REGEX_MATCH_NEWLINE_CRLF match options.These settings are also relevant when compiling a pattern if#G_REGEX_EXTENDED is set, and an unescaped "#" outside a character class isencountered. This indicates a comment that lasts until after the nextnewline.</para><para>Creating and manipulating the same #GRegex structure from differentthreads is not a problem as #GRegex does not modify its internalstate between creation and destruction, on the other hand #GMatchInfo isnot threadsafe.</para><para>The regular expressions low level functionalities are obtained throughthe excellent <ulink url="http://www.pcre.org/">PCRE</ulink> librarywritten by Philip Hazel.</para><!-- ##### SECTION See_Also ##### --><para></para><!-- ##### SECTION Stability_Level ##### --><!-- ##### ENUM GRegexError ##### --><para>Error codes returned by regular expressions functions.</para>@G_REGEX_ERROR_COMPILE: Compilation of the regular expression failed.@G_REGEX_ERROR_OPTIMIZE: Optimization of the regular expression failed.@G_REGEX_ERROR_REPLACE: Replacement failed due to an ill-formed replacement string.@G_REGEX_ERROR_MATCH: The match process failed.@G_REGEX_ERROR_INTERNAL: Internal error of the regular expression engine. Since 2.16@G_REGEX_ERROR_STRAY_BACKSLASH: "\\" at end of pattern. Since 2.16@G_REGEX_ERROR_MISSING_CONTROL_CHAR: "\\c" at end of pattern. Since 2.16@G_REGEX_ERROR_UNRECOGNIZED_ESCAPE: Unrecognized character follows "\\". Since 2.16@G_REGEX_ERROR_QUANTIFIERS_OUT_OF_ORDER: Numbers out of order in "{}" quantifier. Since 2.16@G_REGEX_ERROR_QUANTIFIER_TOO_BIG: Number too big in "{}" quantifier. Since 2.16@G_REGEX_ERROR_UNTERMINATED_CHARACTER_CLASS: Missing terminating "]" for character class. Since 2.16@G_REGEX_ERROR_INVALID_ESCAPE_IN_CHARACTER_CLASS: Invalid escape sequence in character class. Since 2.16@G_REGEX_ERROR_RANGE_OUT_OF_ORDER: Range out of order in character class. Since 2.16@G_REGEX_ERROR_NOTHING_TO_REPEAT: Nothing to repeat. Since 2.16@G_REGEX_ERROR_UNRECOGNIZED_CHARACTER: Unrecognized character after "(?", "(?&lt;" or "(?P". Since 2.16@G_REGEX_ERROR_POSIX_NAMED_CLASS_OUTSIDE_CLASS: POSIX named classes are supported only within a class. Since 2.16@G_REGEX_ERROR_UNMATCHED_PARENTHESIS: Missing terminating ")" or ")" without opening "(". Since 2.16@G_REGEX_ERROR_INEXISTENT_SUBPATTERN_REFERENCE: Reference to non-existent subpattern. Since 2.16@G_REGEX_ERROR_UNTERMINATED_COMMENT: Missing terminating ")" after comment. Since 2.16@G_REGEX_ERROR_EXPRESSION_TOO_LARGE: Regular expression too large. Since 2.16@G_REGEX_ERROR_MEMORY_ERROR: Failed to get memory. Since 2.16@G_REGEX_ERROR_VARIABLE_LENGTH_LOOKBEHIND: Lookbehind assertion is not fixed length. Since 2.16@G_REGEX_ERROR_MALFORMED_CONDITION: Malformed number or name after "(?(". Since 2.16@G_REGEX_ERROR_TOO_MANY_CONDITIONAL_BRANCHES: Conditional group contains more than two branches. Since 2.16@G_REGEX_ERROR_ASSERTION_EXPECTED: Assertion expected after "(?(". Since 2.16@G_REGEX_ERROR_UNKNOWN_POSIX_CLASS_NAME: Unknown POSIX class name. Since 2.16@G_REGEX_ERROR_POSIX_COLLATING_ELEMENTS_NOT_SUPPORTED: POSIX collating elements are not supported. Since 2.16@G_REGEX_ERROR_HEX_CODE_TOO_LARGE: Character value in "\\x{...}" sequence is too large. Since 2.16@G_REGEX_ERROR_INVALID_CONDITION: Invalid condition "(?(0)". Since 2.16@G_REGEX_ERROR_SINGLE_BYTE_MATCH_IN_LOOKBEHIND: \\C not allowed in lookbehind assertion. Since 2.16@G_REGEX_ERROR_INFINITE_LOOP: Recursive call could loop indefinitely. Since 2.16@G_REGEX_ERROR_MISSING_SUBPATTERN_NAME_TERMINATOR: Missing terminator in subpattern name. Since 2.16@G_REGEX_ERROR_DUPLICATE_SUBPATTERN_NAME: Two named subpatterns have the same name. Since 2.16@G_REGEX_ERROR_MALFORMED_PROPERTY: Malformed "\\P" or "\\p" sequence. Since 2.16@G_REGEX_ERROR_UNKNOWN_PROPERTY: Unknown property name after "\\P" or "\\p". Since 2.16@G_REGEX_ERROR_SUBPATTERN_NAME_TOO_LONG: Subpattern name is too long (maximum 32 characters). Since 2.16@G_REGEX_ERROR_TOO_MANY_SUBPATTERNS: Too many named subpatterns (maximum 10,000). Since 2.16@G_REGEX_ERROR_INVALID_OCTAL_VALUE: Octal value is greater than "\\377". Since 2.16@G_REGEX_ERROR_TOO_MANY_BRANCHES_IN_DEFINE: "DEFINE" group contains more than one branch. Since 2.16@G_REGEX_ERROR_DEFINE_REPETION: Repeating a "DEFINE" group is not allowed. Since 2.16@G_REGEX_ERROR_INCONSISTENT_NEWLINE_OPTIONS: Inconsistent newline options. Since 2.16@G_REGEX_ERROR_MISSING_BACK_REFERENCE: "\\g" is not followed by a braced name or anoptionally braced non-zero number. Since 2.16@Since: 2.14<!-- ##### MACRO G_REGEX_ERROR ##### --><para>Error domain for regular expressions. Errors in this domain will be from the #GRegexError enumeration. See #GError for information on error domains.</para>@Since: 2.14<!-- ##### ENUM GRegexCompileFlags ##### --><para>Flags specifying compile-time options.</para>@G_REGEX_CASELESS: Letters in the pattern match both upper and lower caseletters. It be changed within a pattern by a "(?i)" option setting.@G_REGEX_MULTILINE: By default, GRegex treats the strings as consistingof a single line of characters (even if it actually contains newlines).The "start of line" metacharacter ("^") matches only at the start of thestring, while the "end of line" metacharacter ("$") matches only at theend of the string, or before a terminating newline (unless#G_REGEX_DOLLAR_ENDONLY is set). When #G_REGEX_MULTILINE is set,the "start of line" and "end of line" constructs match immediately followingor immediately before any newline in the string, respectively, as wellas at the very start and end. This can be changed within a pattern by a"(?m)" option setting.@G_REGEX_DOTALL: A dot metacharater (".") in the pattern matches allcharacters, including newlines. Without it, newlines are excluded. Thisoption can be changed within a pattern by a ("?s") option setting.@G_REGEX_EXTENDED: Whitespace data characters in the pattern aretotally ignored except when escaped or inside a character class.Whitespace does not include the VT character (code 11). In addition,characters between an unescaped "#" outside a character class andthe next newline character, inclusive, are also ignored. This can bechanged within a pattern by a "(?x)" option setting.@G_REGEX_ANCHORED: The pattern is forced to be "anchored", that is,it is constrained to match only at the first matching point in the stringthat is being searched. This effect can also be achieved by appropriateconstructs in the pattern itself such as the "^" metacharater.@G_REGEX_DOLLAR_ENDONLY: A dollar metacharacter ("$") in the patternmatches only at the end of the string. Without this option, a dollar alsomatches immediately before the final character if it is a newline (butnot before any other newlines). This option is ignored if#G_REGEX_MULTILINE is set.@G_REGEX_UNGREEDY: Inverts the "greediness" of thequantifiers so that they are not greedy by default, but become greedyif followed by "?". It can also be set by a "(?U)" option setting withinthe pattern.@G_REGEX_RAW: Usually strings must be valid UTF-8 strings, using thisflag they are considered as a raw sequence of bytes.@G_REGEX_NO_AUTO_CAPTURE: Disables the use of numbered capturingparentheses in the pattern. Any opening parenthesis that is not followedby "?" behaves as if it were followed by "?:" but named parentheses canstill be used for capturing (and they acquire numbers in the usual way).@G_REGEX_OPTIMIZE: Optimize the regular expression. If the pattern willbe used many times, then it may be worth the effort to optimize it toimprove the speed of matches.@G_REGEX_DUPNAMES: Names used to identify capturing subpatterns need notbe unique. This can be helpful for certain types of pattern when it is knownthat only one instance of the named subpattern can ever be matched.@G_REGEX_NEWLINE_CR: Usually any newline character is recognized, if thisoption is set, the only recognized newline character is '\r'.@G_REGEX_NEWLINE_LF: Usually any newline character is recognized, if thisoption is set, the only recognized newline character is '\n'.@G_REGEX_NEWLINE_CRLF: Usually any newline character is recognized, if thisoption is set, the only recognized newline character sequence is '\r\n'.@Since: 2.14<!-- ##### ENUM GRegexMatchFlags ##### --><para>Flags specifying match-time options.</para>@G_REGEX_MATCH_ANCHORED: The pattern is forced to be "anchored", that is,it is constrained to match only at the first matching point in the stringthat is being searched. This effect can also be achieved by appropriateconstructs in the pattern itself such as the "^" metacharater.@G_REGEX_MATCH_NOTBOL: Specifies that first character of the string isnot the beginning of a line, so the circumflex metacharacter should notmatch before it. Setting this without G_REGEX_MULTILINE (at compile time)causes circumflex never to match. This option affects only the behaviour ofthe circumflex metacharacter, it does not affect "\A".@G_REGEX_MATCH_NOTEOL: Specifies that the end of the subject string isnot the end of a line, so the dollar metacharacter should not match it nor(except in multiline mode) a newline immediately before it. Setting thiswithout G_REGEX_MULTILINE (at compile time) causes dollar never to match.This option affects only the behaviour of the dollar metacharacter, it doesnot affect "\Z" or "\z".@G_REGEX_MATCH_NOTEMPTY: An empty string is not considered to be a validmatch if this option is set. If there are alternatives in the pattern, theyare tried. If all the alternatives match the empty string, the entire matchfails. For example, if the pattern "a?b?" is applied to a string not beginningwith "a" or "b", it matches the empty string at the start of the string.With this flag set, this match is not valid, so GRegex searches furtherinto the string for occurrences of "a" or "b".@G_REGEX_MATCH_PARTIAL: Turns on the partial matching feature, for moredocumentation on partial matching see g_regex_is_partial_match().@G_REGEX_MATCH_NEWLINE_CR: Overrides the newline definition set when creatinga new #GRegex, setting the '\r' character as line terminator.@G_REGEX_MATCH_NEWLINE_LF: Overrides the newline definition set when creatinga new #GRegex, setting the '\n' character as line terminator.@G_REGEX_MATCH_NEWLINE_CRLF: Overrides the newline definition set when creatinga new #GRegex, setting the '\r\n' characters as line terminator.@G_REGEX_MATCH_NEWLINE_ANY: Overrides the newline definition set when creatinga new #GRegex, any newline character or character sequence is recognized.@Since: 2.14<!-- ##### STRUCT GRegex ##### --><para>A GRegex is the "compiled" form of a regular expression pattern. Thisstructure is opaque and its fields cannot be accessed directly.</para>@Since: 2.14<!-- ##### USER_FUNCTION GRegexEvalCallback ##### --><para>Specifies the type of the function passed to g_regex_replace_eval().It is called for each occurance of the pattern in the string passedto g_regex_replace_eval(), and it should append the replacement to@result.</para>@match_info: the #GMatchInfo generated by the match. Use g_match_info_get_regex() and g_match_info_get_string() if you need the #GRegex or the matched string.@result: a #GString containing the new string@user_data: user data passed to g_regex_replace_eval()@Returns: %FALSE to continue the replacement process, %TRUE to stop it@Since: 2.14<!-- ##### FUNCTION g_regex_new ##### --><para></para>@pattern: @compile_options: @match_options: @error: @Returns: <!-- ##### FUNCTION g_regex_ref ##### --><para></para>@regex: @Returns: <!-- ##### FUNCTION g_regex_unref ##### --><para></para>@regex: <!-- ##### FUNCTION g_regex_get_pattern ##### --><para></para>@regex: @Returns: <!-- ##### FUNCTION g_regex_get_max_backref ##### --><para></para>@regex: @Returns: <!-- ##### FUNCTION g_regex_get_capture_count ##### --><para></para>@regex: @Returns: <!-- ##### FUNCTION g_regex_get_string_number ##### --><para>
gregex.sgml - 源码说明

本页面展示了「This GLib version 2.16.1. GLib is the low-level core library that forms the basis for projects such」中的 gregex.sgml 源码文件，采用 SGML 编程语言编写，共 632 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫下载站收录了大量与GLib相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?