syntax_perl.qbk
来自「Boost provides free peer-reviewed portab」· QBK 代码 · 共 517 行 · 第 1/2 页
QBK
517 行
[/ Copyright 2006-2007 John Maddock. Distributed under the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt).][section:perl_syntax Perl Regular Expression Syntax][h3 Synopsis]The Perl regular expression syntax is based on that used by the programming language Perl . Perl regular expressions are the default behavior in Boost.Regex or you can pass the flag `perl` to the [basic_regex] constructor, for example: // e1 is a case sensitive Perl regular expression: // since Perl is the default option there's no need to explicitly specify the syntax used here: boost::regex e1(my_expression); // e2 a case insensitive Perl regular expression: boost::regex e2(my_expression, boost::regex::perl|boost::regex::icase);[h3 Perl Regular Expression Syntax]In Perl regular expressions, all characters match themselves except for the following special characters:[pre .\[{()\\\*+?|^$][h4 Wildcard]The single character '.' when used outside of a character set will match any single character except:* The NULL character when the [link boost_regex.ref.match_flag_type flag `match_not_dot_null`] is passed to the matching algorithms.* The newline character when the [link boost_regex.ref.match_flag_type flag `match_not_dot_newline`] is passed to the matching algorithms. [h4 Anchors]A '^' character shall match the start of a line.A '$' character shall match the end of a line.[h4 Marked sub-expressions]A section beginning `(` and ending `)` acts as a marked sub-expression. Whatever matched the sub-expression is split out in a separate field by the matching algorithms. Marked sub-expressions can also repeated, or referred to by a back-reference.[h4 Non-marking grouping]A marked sub-expression is useful to lexically group part of a regular expression, but has the side-effect of spitting out an extra field in the result. As an alternative you can lexically group part of a regular expression, without generating a marked sub-expression by using `(?:` and `)` , for example `(?:ab)+` will repeat `ab` without splitting out any separate sub-expressions.[h4 Repeats]Any atom (a single character, a marked sub-expression, or a character class) can be repeated with the `*`, `+`, `?`, and `{}` operators.The `*` operator will match the preceding atom zero or more times, for example the expression `a*b` will match any of the following: b ab aaaaaaaabThe `+` operator will match the preceding atom one or more times, for example the expression `a+b` will match any of the following: ab aaaaaaaabBut will not match: bThe `?` operator will match the preceding atom zero or one times, for example the expression ca?b will match any of the following: cb cabBut will not match: caabAn atom can also be repeated with a bounded repeat:`a{n}` Matches 'a' repeated exactly n times.`a{n,}` Matches 'a' repeated n or more times.`a{n, m}` Matches 'a' repeated between n and m times inclusive.For example:[pre ^a{2,3}$]Will match either of: aa aaaBut neither of: a aaaaIt is an error to use a repeat operator, if the preceding construct can not be repeated, for example: a(*)Will raise an error, as there is nothing for the `*` operator to be applied to.[h4 Non greedy repeats]The normal repeat operators are "greedy", that is to say they will consume as much input as possible. There are non-greedy versions available that will consume as little input as possible while still producing a match.`*?` Matches the previous atom zero or more times, while consuming as little input as possible.`+?` Matches the previous atom one or more times, while consuming as little input as possible.`??` Matches the previous atom zero or one times, while consuming as little input as possible.`{n,}?` Matches the previous atom n or more times, while consuming as little input as possible.`{n,m}?` Matches the previous atom between n and m times, while consuming as little input as possible. [h4 Back references]An escape character followed by a digit /n/, where /n/ is in the range 1-9, matches the same string that was matched by sub-expression /n/. For example the expression:[pre ^(a\*).\*\\1$]Will match the string: aaabbaaaBut not the string: aaabba[h4 Alternation]The `|` operator will match either of its arguments, so for example: `abc|def` will match either "abc" or "def". Parenthesis can be used to group alternations, for example: `ab(d|ef)` will match either of "abd" or "abef".Empty alternatives are not allowed (these are almost always a mistake), but if you really want an empty alternative use `(?:)` as a placeholder, for example:`|abc` is not a valid expression, but`(?:)|abc` is and is equivalent, also the expression:`(?:abc)??` has exactly the same effect.[h4 Character sets]A character set is a bracket-expression starting with `[` and ending with `]`, it defines a set of characters, and matches any single character that is a member of that set.A bracket expression may contain any combination of the following:[h5 Single characters]For example `[abc]`, will match any of the characters 'a', 'b', or 'c'.[h5 Character ranges]For example `[a-c]` will match any single character in the range 'a' to 'c'. By default, for Perl regular expressions, a character x is within the range y to z, if the code point of the character lies within the codepoints ofthe endpoints of the range. Alternatively, if you set the [link boost_regex.ref.syntax_option_type.syntax_option_type_perl `collate` flag] when constructing the regular expression, then ranges are locale sensitive.[h5 Negation]If the bracket-expression begins with the ^ character, then it matches the complement of the characters it contains, for example `[^a-c]` matches any character that is not in the range `a-c`.[h5 Character classes]An expression of the form `[[:name:]]` matches the named character class "name", for example `[[:lower:]]` matches any lower case character. See [link boost_regex.syntax.character_classes character class names].[h5 Collating Elements]An expression of the form `[[.col.]` matches the collating element /col/. A collating element is any single character, or any sequence of characters that collates as a single unit. Collating elements may also be used as the end point of a range, for example: `[[.ae.]-c]` matches the character sequence "ae", plus any single character in the range "ae"-c, assuming that "ae" is treated as a single collating element in the current locale.As an extension, a collating element may also be specified via it's [link boost_regex.syntax.collating_names symbolic name], for example: [[.NUL.]]matches a `\0` character.[h5 Equivalence classes]An expression of the form `[[=col=]]`, matches any character or collating element whose primary sort key is the same as that for collating element /col/, as with collating elements the name /col/ may be a [link boost_regex.syntax.collating_names symbolic name]. A primary sort key is one that ignores case, accentation, or locale-specific tailorings; so for example `[[=a=]]` matches any of the characters: a, '''À''', '''Á''', '''Â''', '''Ã''', '''Ä''', '''Å''', A, '''à''', '''á''', '''â''', '''ã''', '''ä''' and '''å'''. Unfortunately implementation of this is reliant on the platform's collation and localisation support; this feature can not be relied upon to work portably across all platforms, or even all locales on one platform.[h5 Escaped Characters]All the escape sequences that match a single character, or a single character class are permitted within a character class definition. For example`[\[\]]` would match either of `[` or `]` while `[\W\d]` would match any characterthat is either a "digit", /or/ is /not/ a "word" character.[h5 Combinations]All of the above can be combined in one character set declaration, for example: `[[:digit:]a-c[.NUL.]]`.[h4 Escapes]Any special character preceded by an escape shall match itself.The following escape sequences are all synonyms for single characters:
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?