⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 perlretut.pod

📁 视频监控网络部分的协议ddns,的模块的实现代码,请大家大胆指正.
💻 POD
📖 第 1 页 / 共 5 页
字号:
One might initially guess that Perl would find the C<at> in C<cat> andstop there, but that wouldn't give the longest possible string to thefirst quantifier C<.*>.  Instead, the first quantifier C<.*> grabs asmuch of the string as possible while still having the regexp match.  Inthis example, that means having the C<at> sequence with the final C<at>in the string.  The other important principle illustrated here is thatwhen there are two or more elements in a regexp, the I<leftmost>quantifier, if there is one, gets to grab as much the string aspossible, leaving the rest of the regexp to fight over scraps.  Thus inour example, the first quantifier C<.*> grabs most of the string, whilethe second quantifier C<.*> gets the empty string.   Quantifiers thatgrab as much of the string as possible are called I<maximal match> orI<greedy> quantifiers.When a regexp can match a string in several different ways, we can usethe principles above to predict which way the regexp will match:=over 4=item *Principle 0: Taken as a whole, any regexp will be matched at theearliest possible position in the string.=item *Principle 1: In an alternation C<a|b|c...>, the leftmost alternativethat allows a match for the whole regexp will be the one used.=item *Principle 2: The maximal matching quantifiers C<?>, C<*>, C<+> andC<{n,m}> will in general match as much of the string as possible whilestill allowing the whole regexp to match.=item *Principle 3: If there are two or more elements in a regexp, theleftmost greedy quantifier, if any, will match as much of the stringas possible while still allowing the whole regexp to match.  The nextleftmost greedy quantifier, if any, will try to match as much of thestring remaining available to it as possible, while still allowing thewhole regexp to match.  And so on, until all the regexp elements aresatisfied.=backAs we have seen above, Principle 0 overrides the others -- the regexpwill be matched as early as possible, with the other principlesdetermining how the regexp matches at that earliest characterposition.Here is an example of these principles in action:    $x = "The programming republic of Perl";    $x =~ /^(.+)(e|r)(.*)$/;  # matches,                              # $1 = 'The programming republic of Pe'                              # $2 = 'r'                              # $3 = 'l'This regexp matches at the earliest string position, C<'T'>.  Onemight think that C<e>, being leftmost in the alternation, would bematched, but C<r> produces the longest string in the first quantifier.    $x =~ /(m{1,2})(.*)$/;  # matches,                            # $1 = 'mm'                            # $2 = 'ing republic of Perl'Here, The earliest possible match is at the first C<'m'> inC<programming>. C<m{1,2}> is the first quantifier, so it gets to matcha maximal C<mm>.    $x =~ /.*(m{1,2})(.*)$/;  # matches,                              # $1 = 'm'                              # $2 = 'ing republic of Perl'Here, the regexp matches at the start of the string. The firstquantifier C<.*> grabs as much as possible, leaving just a singleC<'m'> for the second quantifier C<m{1,2}>.    $x =~ /(.?)(m{1,2})(.*)$/;  # matches,                                # $1 = 'a'                                # $2 = 'mm'                                # $3 = 'ing republic of Perl'Here, C<.?> eats its maximal one character at the earliest possibleposition in the string, C<'a'> in C<programming>, leaving C<m{1,2}>the opportunity to match both C<m>'s. Finally,    "aXXXb" =~ /(X*)/; # matches with $1 = ''because it can match zero copies of C<'X'> at the beginning of thestring.  If you definitely want to match at least one C<'X'>, useC<X+>, not C<X*>.Sometimes greed is not good.  At times, we would like quantifiers tomatch a I<minimal> piece of string, rather than a maximal piece.  Forthis purpose, Larry Wall created the I<minimal match> orI<non-greedy> quantifiers C<??>, C<*?>, C<+?>, and C<{}?>.  These arethe usual quantifiers with a C<?> appended to them.  They have thefollowing meanings:=over 4=item *C<a??> means: match 'a' 0 or 1 times. Try 0 first, then 1.=item *C<a*?> means: match 'a' 0 or more times, i.e., any number of times,but as few times as possible=item *C<a+?> means: match 'a' 1 or more times, i.e., at least once, butas few times as possible=item *C<a{n,m}?> means: match at least C<n> times, not more than C<m>times, as few times as possible=item *C<a{n,}?> means: match at least C<n> times, but as few times aspossible=item *C<a{n}?> means: match exactly C<n> times.  Because we match exactlyC<n> times, C<a{n}?> is equivalent to C<a{n}> and is just there fornotational consistency.=backLet's look at the example above, but with minimal quantifiers:    $x = "The programming republic of Perl";    $x =~ /^(.+?)(e|r)(.*)$/; # matches,                              # $1 = 'Th'                              # $2 = 'e'                              # $3 = ' programming republic of Perl'The minimal string that will allow both the start of the string C<^>and the alternation to match is C<Th>, with the alternation C<e|r>matching C<e>.  The second quantifier C<.*> is free to gobble up therest of the string.    $x =~ /(m{1,2}?)(.*?)$/;  # matches,                              # $1 = 'm'                              # $2 = 'ming republic of Perl'The first string position that this regexp can match is at the firstC<'m'> in C<programming>. At this position, the minimal C<m{1,2}?>matches just one C<'m'>.  Although the second quantifier C<.*?> wouldprefer to match no characters, it is constrained by the end-of-stringanchor C<$> to match the rest of the string.    $x =~ /(.*?)(m{1,2}?)(.*)$/;  # matches,                                  # $1 = 'The progra'                                  # $2 = 'm'                                  # $3 = 'ming republic of Perl'In this regexp, you might expect the first minimal quantifier C<.*?>to match the empty string, because it is not constrained by a C<^>anchor to match the beginning of the word.  Principle 0 applies here,however.  Because it is possible for the whole regexp to match at thestart of the string, it I<will> match at the start of the string.  Thusthe first quantifier has to match everything up to the first C<m>.  Thesecond minimal quantifier matches just one C<m> and the thirdquantifier matches the rest of the string.    $x =~ /(.??)(m{1,2})(.*)$/;  # matches,                                 # $1 = 'a'                                 # $2 = 'mm'                                 # $3 = 'ing republic of Perl'Just as in the previous regexp, the first quantifier C<.??> can matchearliest at position C<'a'>, so it does.  The second quantifier isgreedy, so it matches C<mm>, and the third matches the rest of thestring.We can modify principle 3 above to take into account non-greedyquantifiers:=over 4=item *Principle 3: If there are two or more elements in a regexp, theleftmost greedy (non-greedy) quantifier, if any, will match as much(little) of the string as possible while still allowing the wholeregexp to match.  The next leftmost greedy (non-greedy) quantifier, ifany, will try to match as much (little) of the string remainingavailable to it as possible, while still allowing the whole regexp tomatch.  And so on, until all the regexp elements are satisfied.=backJust like alternation, quantifiers are also susceptible tobacktracking.  Here is a step-by-step analysis of the example    $x = "the cat in the hat";    $x =~ /^(.*)(at)(.*)$/; # matches,                            # $1 = 'the cat in the h'                            # $2 = 'at'                            # $3 = ''   (0 matches)=over 4=item 0Start with the first letter in the string 't'.=item 1The first quantifier '.*' starts out by matching the wholestring 'the cat in the hat'.=item 2'a' in the regexp element 'at' doesn't match the end of thestring.  Backtrack one character.=item 3'a' in the regexp element 'at' still doesn't match the lastletter of the string 't', so backtrack one more character.=item 4Now we can match the 'a' and the 't'.=item 5Move on to the third element '.*'.  Since we are at the end ofthe string and '.*' can match 0 times, assign it the empty string.=item 6We are done!=backMost of the time, all this moving forward and backtracking happensquickly and searching is fast. There are some pathological regexps,however, whose execution time exponentially grows with the size of thestring.  A typical structure that blows up in your face is of the form    /(a|b+)*/;The problem is the nested indeterminate quantifiers.  There are manydifferent ways of partitioning a string of length n between the C<+>and C<*>: one repetition with C<b+> of length n, two repetitions withthe first C<b+> length k and the second with length n-k, m repetitionswhose bits add up to length n, etc.  In fact there are an exponentialnumber of ways to partition a string as a function of its length.  Aregexp may get lucky and match early in the process, but if there isno match, Perl will try I<every> possibility before giving up.  So becareful with nested C<*>'s, C<{n,m}>'s, and C<+>'s.  The bookI<Mastering Regular Expressions> by Jeffrey Friedl gives a wonderfuldiscussion of this and other efficiency issues.=head2 Possessive quantifiersBacktracking during the relentless search for a match may be a wasteof time, particularly when the match is bound to fail.  Considerthe simple pattern    /^\w+\s+\w+$/; # a word, spaces, a wordWhenever this is applied to a string which doesn't quite meet thepattern's expectations such as S<C<"abc  ">> or S<C<"abc  def ">>,the regex engine will backtrack, approximately once for each characterin the string.  But we know that there is no way around taking I<all>of the initial word characters to match the first repetition, that I<all>spaces must be eaten by the middle part, and the same goes for the secondword.With the introduction of the I<possessive quantifiers> in Perl 5.10, wehave a way of instructing the regex engine not to backtrack, with theusual quantifiers with a C<+> appended to them.  This makes them greedy aswell as stingy; once they succeed they won't give anything back to permitanother solution. They have the following meanings:=over 4=item *C<a{n,m}+> means: match at least C<n> times, not more than C<m> times,as many times as possible, and don't give anything up. C<a?+> is shortfor C<a{0,1}+>=item *C<a{n,}+> means: match at least C<n> times, but as many times as possible,and don't give anything up. C<a*+> is short for C<a{0,}+> and C<a++> isshort for C<a{1,}+>.=item *C<a{n}+> means: match exactly C<n> times.  It is just there fornotational consistency.=backThese possessive quantifiers represent a special case of a more generalconcept, the I<independent subexpression>, see below.As an example where a possessive quantifier is suitable we considermatching a quoted string, as it appears in several programming languages.The backslash is used as an escape character that indicates that thenext character is to be taken literally, as another character for thestring.  Therefore, after the opening quote, we expect a (possiblyempty) sequence of alternatives: either some character except anunescaped quote or backslash or an escaped character.    /"(?:[^"\\]++|\\.)*+"/;=head2 Building a regexpAt this point, we have all the basic regexp concepts covered, so let'sgive a more involved example of a regular expression.  We will build aregexp that matches numbers.The first task in building a regexp is to decide what we want to matchand what we want to exclude.  In our case, we want to match bothintegers and floating point numbers and we want to reject any stringthat isn't a number.The next task is to break the problem down into smaller problems thatare easily converted into a regexp.The simplest case is integers.  These consist of a sequence of digits,with an optional sign in front.  The digits we can represent with

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -