📄 dreg.txt

📁 emboss的linux版本的源代码
💻 TXT
📖 第 1 页 / 共 3 页
字号:
   but not underscore.   All non-alphameric characters other than \, -, ^ (at the start) and   the terminating ] are non-special in character classes, but it does no   harm if they are escaped.    VERTICAL BAR   Vertical bar characters are used to separate alternative patterns. For   example, the pattern   gilbert|sullivan   matches either "gilbert" or "sullivan". Any number of alternatives may   appear, and an empty alternative is permitted (matching the empty   string). The matching process tries each alternative in turn, from   left to right, and the first one that succeeds is used. If the   alternatives are within a subpattern (defined below), "succeeds" means   matching the rest of the main pattern as well as the alternative in   the subpattern.    INTERNAL OPTION SETTING   The settings of the PCRE_CASELESS, PCRE_MULTILINE, PCRE_DOTALL, and   PCRE_EXTENDED options can be changed from within the pattern by a   sequence of Perl option letters enclosed between "(?" and ")". The   option letters are       i  for PCRE_CASELESS       m  for PCRE_MULTILINE       s  for PCRE_DOTALL       x  for PCRE_EXTENDED   For example, (?im) sets caseless, multiline matching. It is also   possible to unset these options by preceding the letter with a hyphen,   and a combined setting and unsetting such as (?im-sx), which sets   PCRE_CASELESS and PCRE_MULTILINE while unsetting PCRE_DOTALL and   PCRE_EXTENDED, is also permitted. If a letter appears both before and   after the hyphen, the option is unset.   When an option change occurs at top level (that is, not inside   subpattern parentheses), the change applies to the remainder of the   pattern that follows. If the change is placed right at the start of a   pattern, PCRE extracts it into the global options (and it will   therefore show up in data extracted by the pcre_fullinfo() function).   An option change within a subpattern affects only that part of the   current pattern that follows it, so   (a(?i)b)c   matches abc and aBc and no other strings (assuming PCRE_CASELESS is   not used). By this means, options can be made to have different   settings in different parts of the pattern. Any changes made in one   alternative do carry on into subsequent branches within the same   subpattern. For example,   (a(?i)b|c)   matches "ab", "aB", "c", and "C", even though when matching "C" the   first branch is abandoned before the option setting. This is because   the effects of option settings happen at compile time. There would be   some very weird behaviour otherwise.   The PCRE-specific options PCRE_UNGREEDY and PCRE_EXTRA can be changed   in the same way as the Perl-compatible options by using the characters   U and X respectively. The (?X) flag setting is special in that it must   always occur earlier in the pattern than any of the additional   features it turns on, even when it is at top level. It is best put at   the start.    SUBPATTERNS   Subpatterns are delimited by parentheses (round brackets), which can   be nested. Marking part of a pattern as a subpattern does two things:   1. It localizes a set of alternatives. For example, the pattern   cat(aract|erpillar|)   matches one of the words "cat", "cataract", or "caterpillar". Without   the parentheses, it would match "cataract", "erpillar" or the empty   string.   2. It sets up the subpattern as a capturing subpattern (as defined   above). When the whole pattern matches, that portion of the subject   string that matched the subpattern is passed back to the caller via   the ovector argument of pcre_exec(). Opening parentheses are counted   from left to right (starting from 1) to obtain the numbers of the   capturing subpatterns.   For example, if the string "the red king" is matched against the   pattern   the ((red|white) (king|queen))   the captured substrings are "red king", "red", and "king", and are   numbered 1, 2, and 3, respectively.   The fact that plain parentheses fulfil two functions is not always   helpful. There are often times when a grouping subpattern is required   without a capturing requirement. If an opening parenthesis is followed   by a question mark and a colon, the subpattern does not do any   capturing, and is not counted when computing the number of any   subsequent capturing subpatterns. For example, if the string "the   white queen" is matched against the pattern   the ((?:red|white) (king|queen))   the captured substrings are "white queen" and "queen", and are   numbered 1 and 2. The maximum number of capturing subpatterns is   65535, and the maximum depth of nesting of all subpatterns, both   capturing and non-capturing, is 200.   As a convenient shorthand, if any option settings are required at the   start of a non-capturing subpattern, the option letters may appear   between the "?" and the ":". Thus the two patterns       (?i:saturday|sunday)       (?:(?i)saturday|sunday)   match exactly the same set of strings. Because alternative branches   are tried from left to right, and options are not reset until the end   of the subpattern is reached, an option setting in one branch does   affect subsequent branches, so the above patterns match "SUNDAY" as   well as "Saturday".    REPETITION   Repetition is specified by quantifiers, which can follow any of the   following items:       a literal data character       the . meta-character       the \C escape sequence       escapes such as \d that match single characters       a character class       a back reference (see next section)       a parenthesized subpattern (unless it is an assertion)   The general repetition quantifier specifies a minimum and maximum   number of permitted matches, by giving the two numbers in curly   brackets (braces), separated by a comma. The numbers must be less than   65536, and the first must be less than or equal to the second. For   example:   z{2,4}   matches "zz", "zzz", or "zzzz". A closing brace on its own is not a   special character. If the second number is omitted, but the comma is   present, there is no upper limit; if the second number and the comma   are both omitted, the quantifier specifies an exact number of required   matches. Thus   [aeiou]{3,}   matches at least 3 successive vowels, but may match many more, while   \d{8}   matches exactly 8 digits. An opening curly bracket that appears in a   position where a quantifier is not allowed, or one that does not match   the syntax of a quantifier, is taken as a literal character. For   example, {,6} is not a quantifier, but a literal string of four   characters.   The quantifier {0} is permitted, causing the expression to behave as   if the previous item and the quantifier were not present.   For convenience (and historical compatibility) the three most common   quantifiers have single-character abbreviations:       *    is equivalent to {0,}       +    is equivalent to {1,}       ?    is equivalent to {0,1}   It is possible to construct infinite loops by following a subpattern   that can match no characters with a quantifier that has no upper   limit, for example:   (a?)*   Earlier versions of Perl and PCRE used to give an error at compile   time for such patterns. However, because there are cases where this   can be useful, such patterns are now accepted, but if any repetition   of the subpattern does in fact match no characters, the loop is   forcibly broken.   By default, the quantifiers are "greedy", that is, they match as much   as possible (up to the maximum number of permitted times), without   causing the rest of the pattern to fail. The classic example of where   this gives problems is in trying to match comments in C programs.   These appear between the sequences /* and */ and within the sequence,   individual * and / characters may appear. An attempt to match C   comments by applying the pattern   /\*.*\*/   to the string       /* first command */  not comment  /* second comment */   fails, because it matches the entire string owing to the greediness of   the .* item.   However, if a quantifier is followed by a question mark, it ceases to   be greedy, and instead matches the minimum number of times possible,   so the pattern   /\*.*?\*/   does the right thing with the C comments. The meaning of the various   quantifiers is not otherwise changed, just the preferred number of   matches. Do not confuse this use of question mark with its use as a   quantifier in its own right. Because it has two uses, it can sometimes   appear doubled, as in   \d??\d   which matches one digit by preference, but can match two if that is   the only way the rest of the pattern matches.   If the PCRE_UNGREEDY option is set (an option which is not available   in Perl), the quantifiers are not greedy by default, but individual   ones can be made greedy by following them with a question mark. In   other words, it inverts the default behaviour.   When a parenthesized subpattern is quantified with a minimum repeat   count that is greater than 1 or with a limited maximum, more store is   required for the compiled pattern, in proportion to the size of the   minimum or maximum. If a pattern starts with .* or .{0,} and the   PCRE_DOTALL option (equivalent to Perl's /s) is set, thus allowing the   . to match newlines, the pattern is implicitly anchored, because   whatever follows will be tried against every character position in the   subject string, so there is no point in retrying the overall match at   any position after the first. PCRE normally treats such a pattern as   though it were preceded by \A.   In cases where it is known that the subject string contains no   newlines, it is worth setting PCRE_DOTALL in order to obtain this   optimization, or alternatively using ^ to indicate anchoring   explicitly.   However, there is one situation where the optimization cannot be used.   When .* is inside capturing parentheses that are the subject of a   backreference elsewhere in the pattern, a match at the start may fail,   and a later one succeed. Consider, for example:   (.*)abc\1   If the subject is "xyz123abc123" the match point is the fourth   character. For this reason, such a pattern is not implicitly anchored.   When a capturing subpattern is repeated, the value captured is the   substring that matched the final iteration. For example, after   (tweedle[dume]{3}\s*)+   has matched "tweedledum tweedledee" the value of the captured   substring is "tweedledee". However, if there are nested capturing   subpatterns, the corresponding captured values may have been set in   previous iterations. For example, after       /(a|(b))+/  PCRE PERFORMANCE   Certain items that may appear in regular expression patterns are more   efficient than others. It is more efficient to use a character class   like [aeiou] than a set of alternatives such as (a|e|i|o|u). In   general, the simplest construction that provides the required   behaviour is usually the most efficient. Jeffrey Friedl's book   contains a lot of discussion about optimizing regular expressions for   efficient performance.   When a pattern begins with .* not in parentheses, or in parentheses   that are not the subject of a backreference, and the PCRE_DOTALL   option is set, the pattern is implicitly anchored by PCRE, since it   can match only at the start of a subject string. However, if   PCRE_DOTALL is not set, PCRE cannot make this optimization, because   the . meta-character does not then match a newline, and if the subject   string contains newlines, the pattern may match from the character   immediately following one of them instead of from the very start. For   example, the pattern   .*second   matches the subject "first\nand second" (where \n stands for a newline   character), with the match starting at the seventh character. In order   to do this, PCRE has to retry the match starting after every newline   in the subject.   If you are using such a pattern with subject strings that do not   contain newlines, the best performance is obtained by setting   PCRE_DOTALL, or starting the pattern with ^.* to indicate explicit   anchoring. That saves PCRE from having to scan along the subject   looking for a newline to restart at.   Beware of patterns that contain nested indefinite repeats. These can   take a long time to run when applied to a string that does not match.   Consider the pattern fragment   (a+)*   This can match "aaaa" in 33 different ways, and this number increases   very rapidly as the string gets longer. (The * repeat can match 0, 1,   2, 3, or 4 times, and for each of those cases other than 0, the +   repeats can match different numbers of times.) When the remainder of   the pattern is such that the entire match is going to fail, PCRE has   in principle to try every possible variation, and this can take an   extremely long time. An optimization catches some of the more simple   cases such as   (a+)*b   where a literal character follows. Before embarking on the standard   matching procedure, PCRE checks that there is a "b" later in the   subject string, and if there is not, it fails the match immediately.   However, when there is no following literal this optimization cannot   be used. You can see the difference by comparing the behaviour of   (a+)*\d
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -