📄 preg.txt

📁 emboss的linux版本的源代码
💻 TXT
📖 第 1 页 / 共 3 页
字号:
   An opening square bracket introduces a character class, terminated by   a closing square bracket. A closing square bracket on its own is not   special. If a closing square bracket is required as a member of the   class, it should be the first data character in the class (after an   initial circumflex, if present) or escaped with a backslash.   A character class matches a single character in the subject. A matched   character must be in the set of characters defined by the class,   unless the first character in the class definition is a circumflex, in   which case the subject character must not be in the set defined by the   class. If a circumflex is actually required as a member of the class,   ensure it is not the first character, or escape it with a backslash.   For example, the character class [aeiou] matches any lower case vowel,   while [^aeiou] matches any character that is not a lower case vowel.   Note that a circumflex is just a convenient notation for specifying   the characters which are in the class by enumerating those that are   not. It is not an assertion: it still consumes a character from the   subject string, and fails if the current pointer is at the end of the   string.   When caseless matching is set, any letters in a class represent both   their upper case and lower case versions, so for example, a caseless   [aeiou] matches "A" as well as "a", and a caseless [^aeiou] does not   match "A", whereas a caseful version would. PCRE does not support the   concept of case for characters with values greater than 255. A class   such as [^a] will always match a newline.   The minus (hyphen) character can be used to specify a range of   characters in a character class. For example, [d-m] matches any letter   between d and m, inclusive. If a minus character is required in a   class, it must be escaped with a backslash or appear in a position   where it cannot be interpreted as indicating a range, typically as the   first or last character in the class.   It is not possible to have the literal character "]" as the end   character of a range. A pattern such as [W-]46] is interpreted as a   class of two characters ("W" and "-") followed by a literal string   "46]", so it would match "W46]" or "-46]". However, if the "]" is   escaped with a backslash it is interpreted as the end of range, so   [W-\]46] is interpreted as a single class containing a range followed   by two separate characters. The octal or hexadecimal representation of   "]" can also be used to end a range.   The character types \d, \D, \s, \S, \w, and \W may also appear in a   character class, and add the characters that they match to the class.   For example, [\dABCDEF] matches any hexadecimal digit. A circumflex   can conveniently be used with the upper case character types to   specify a more restricted set of characters than the matching lower   case type. For example, the class [^\W_] matches any letter or digit,   but not underscore.   All non-alphameric characters other than \, -, ^ (at the start) and   the terminating ] are non-special in character classes, but it does no   harm if they are escaped.    VERTICAL BAR   Vertical bar characters are used to separate alternative patterns. For   example, the pattern   gilbert|sullivan   matches either "gilbert" or "sullivan". Any number of alternatives may   appear, and an empty alternative is permitted (matching the empty   string). The matching process tries each alternative in turn, from   left to right, and the first one that succeeds is used. If the   alternatives are within a subpattern (defined below), "succeeds" means   matching the rest of the main pattern as well as the alternative in   the subpattern.    INTERNAL OPTION SETTING   The settings of the PCRE_CASELESS, PCRE_MULTILINE, PCRE_DOTALL, and   PCRE_EXTENDED options can be changed from within the pattern by a   sequence of Perl option letters enclosed between "(?" and ")". The   option letters are       i  for PCRE_CASELESS       m  for PCRE_MULTILINE       s  for PCRE_DOTALL       x  for PCRE_EXTENDED   For example, (?im) sets caseless, multiline matching. It is also   possible to unset these options by preceding the letter with a hyphen,   and a combined setting and unsetting such as (?im-sx), which sets   PCRE_CASELESS and PCRE_MULTILINE while unsetting PCRE_DOTALL and   PCRE_EXTENDED, is also permitted. If a letter appears both before and   after the hyphen, the option is unset.   When an option change occurs at top level (that is, not inside   subpattern parentheses), the change applies to the remainder of the   pattern that follows. If the change is placed right at the start of a   pattern, PCRE extracts it into the global options (and it will   therefore show up in data extracted by the pcre_fullinfo() function).   An option change within a subpattern affects only that part of the   current pattern that follows it, so   (a(?i)b)c   matches abc and aBc and no other strings (assuming PCRE_CASELESS is   not used). By this means, options can be made to have different   settings in different parts of the pattern. Any changes made in one   alternative do carry on into subsequent branches within the same   subpattern. For example,   (a(?i)b|c)   matches "ab", "aB", "c", and "C", even though when matching "C" the   first branch is abandoned before the option setting. This is because   the effects of option settings happen at compile time. There would be   some very weird behaviour otherwise.   The PCRE-specific options PCRE_UNGREEDY and PCRE_EXTRA can be changed   in the same way as the Perl-compatible options by using the characters   U and X respectively. The (?X) flag setting is special in that it must   always occur earlier in the pattern than any of the additional   features it turns on, even when it is at top level. It is best put at   the start.    SUBPATTERNS   Subpatterns are delimited by parentheses (round brackets), which can   be nested. Marking part of a pattern as a subpattern does two things:   1. It localizes a set of alternatives. For example, the pattern   cat(aract|erpillar|)   matches one of the words "cat", "cataract", or "caterpillar". Without   the parentheses, it would match "cataract", "erpillar" or the empty   string.   2. It sets up the subpattern as a capturing subpattern (as defined   above). When the whole pattern matches, that portion of the subject   string that matched the subpattern is passed back to the caller via   the ovector argument of pcre_exec(). Opening parentheses are counted   from left to right (starting from 1) to obtain the numbers of the   capturing subpatterns.   For example, if the string "the red king" is matched against the   pattern   the ((red|white) (king|queen))   the captured substrings are "red king", "red", and "king", and are   numbered 1, 2, and 3, respectively.   The fact that plain parentheses fulfil two functions is not always   helpful. There are often times when a grouping subpattern is required   without a capturing requirement. If an opening parenthesis is followed   by a question mark and a colon, the subpattern does not do any   capturing, and is not counted when computing the number of any   subsequent capturing subpatterns. For example, if the string "the   white queen" is matched against the pattern   the ((?:red|white) (king|queen))   the captured substrings are "white queen" and "queen", and are   numbered 1 and 2. The maximum number of capturing subpatterns is   65535, and the maximum depth of nesting of all subpatterns, both   capturing and non-capturing, is 200.   As a convenient shorthand, if any option settings are required at the   start of a non-capturing subpattern, the option letters may appear   between the "?" and the ":". Thus the two patterns       (?i:saturday|sunday)       (?:(?i)saturday|sunday)   match exactly the same set of strings. Because alternative branches   are tried from left to right, and options are not reset until the end   of the subpattern is reached, an option setting in one branch does   affect subsequent branches, so the above patterns match "SUNDAY" as   well as "Saturday".    REPETITION   Repetition is specified by quantifiers, which can follow any of the   following items:       a literal data character       the . meta-character       the \C escape sequence       escapes such as \d that match single characters       a character class       a back reference (see next section)       a parenthesized subpattern (unless it is an assertion)   The general repetition quantifier specifies a minimum and maximum   number of permitted matches, by giving the two numbers in curly   brackets (braces), separated by a comma. The numbers must be less than   65536, and the first must be less than or equal to the second. For   example:   z{2,4}   matches "zz", "zzz", or "zzzz". A closing brace on its own is not a   special character. If the second number is omitted, but the comma is   present, there is no upper limit; if the second number and the comma   are both omitted, the quantifier specifies an exact number of required   matches. Thus   [aeiou]{3,}   matches at least 3 successive vowels, but may match many more, while   \d{8}   matches exactly 8 digits. An opening curly bracket that appears in a   position where a quantifier is not allowed, or one that does not match   the syntax of a quantifier, is taken as a literal character. For   example, {,6} is not a quantifier, but a literal string of four   characters.   The quantifier {0} is permitted, causing the expression to behave as   if the previous item and the quantifier were not present.   For convenience (and historical compatibility) the three most common   quantifiers have single-character abbreviations:       *    is equivalent to {0,}       +    is equivalent to {1,}       ?    is equivalent to {0,1}   It is possible to construct infinite loops by following a subpattern   that can match no characters with a quantifier that has no upper   limit, for example:   (a?)*   Earlier versions of Perl and PCRE used to give an error at compile   time for such patterns. However, because there are cases where this   can be useful, such patterns are now accepted, but if any repetition   of the subpattern does in fact match no characters, the loop is   forcibly broken.   By default, the quantifiers are "greedy", that is, they match as much   as possible (up to the maximum number of permitted times), without   causing the rest of the pattern to fail. The classic example of where   this gives problems is in trying to match comments in C programs.   These appear between the sequences /* and */ and within the sequence,   individual * and / characters may appear. An attempt to match C   comments by applying the pattern   /\*.*\*/   to the string       /* first command */  not comment  /* second comment */   fails, because it matches the entire string owing to the greediness of   the .* item.   However, if a quantifier is followed by a question mark, it ceases to   be greedy, and instead matches the minimum number of times possible,   so the pattern   /\*.*?\*/   does the right thing with the C comments. The meaning of the various   quantifiers is not otherwise changed, just the preferred number of   matches. Do not confuse this use of question mark with its use as a   quantifier in its own right. Because it has two uses, it can sometimes   appear doubled, as in   \d??\d   which matches one digit by preference, but can match two if that is   the only way the rest of the pattern matches.   If the PCRE_UNGREEDY option is set (an option which is not available   in Perl), the quantifiers are not greedy by default, but individual   ones can be made greedy by following them with a question mark. In   other words, it inverts the default behaviour.
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -