⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 perlretut.pod

📁 MSYS在windows下模拟了一个类unix的终端
💻 POD
📖 第 1 页 / 共 5 页
字号:
    $x = "The programming republic of Perl";    $x =~ /^(.+?)(e|r)(.*)$/; # matches,                              # $1 = 'Th'                              # $2 = 'e'                              # $3 = ' programming republic of Perl'The minimal string that will allow both the start of the string C<^>and the alternation to match is C<Th>, with the alternation C<e|r>matching C<e>.  The second quantifier C<.*> is free to gobble up therest of the string.    $x =~ /(m{1,2}?)(.*?)$/;  # matches,                              # $1 = 'm'                              # $2 = 'ming republic of Perl'The first string position that this regexp can match is at the firstC<'m'> in C<programming>. At this position, the minimal C<m{1,2}?>matches just one C<'m'>.  Although the second quantifier C<.*?> wouldprefer to match no characters, it is constrained by the end-of-stringanchor C<$> to match the rest of the string.    $x =~ /(.*?)(m{1,2}?)(.*)$/;  # matches,                                  # $1 = 'The progra'                                  # $2 = 'm'                                  # $3 = 'ming republic of Perl'In this regexp, you might expect the first minimal quantifier C<.*?>to match the empty string, because it is not constrained by a C<^>anchor to match the beginning of the word.  Principle 0 applies here,however.  Because it is possible for the whole regexp to match at thestart of the string, it I<will> match at the start of the string.  Thusthe first quantifier has to match everything up to the first C<m>.  Thesecond minimal quantifier matches just one C<m> and the thirdquantifier matches the rest of the string.    $x =~ /(.??)(m{1,2})(.*)$/;  # matches,                                 # $1 = 'a'                                 # $2 = 'mm'                                 # $3 = 'ing republic of Perl'Just as in the previous regexp, the first quantifier C<.??> can matchearliest at position C<'a'>, so it does.  The second quantifier isgreedy, so it matches C<mm>, and the third matches the rest of thestring.We can modify principle 3 above to take into account non-greedyquantifiers:=over 4=item *Principle 3: If there are two or more elements in a regexp, theleftmost greedy (non-greedy) quantifier, if any, will match as much(little) of the string as possible while still allowing the wholeregexp to match.  The next leftmost greedy (non-greedy) quantifier, ifany, will try to match as much (little) of the string remainingavailable to it as possible, while still allowing the whole regexp tomatch.  And so on, until all the regexp elements are satisfied.=backJust like alternation, quantifiers are also susceptible tobacktracking.  Here is a step-by-step analysis of the example    $x = "the cat in the hat";    $x =~ /^(.*)(at)(.*)$/; # matches,                            # $1 = 'the cat in the h'                            # $2 = 'at'                            # $3 = ''   (0 matches)=over 4=item 0Start with the first letter in the string 't'.=item 1The first quantifier '.*' starts out by matching the wholestring 'the cat in the hat'.=item 2'a' in the regexp element 'at' doesn't match the end of thestring.  Backtrack one character.=item 3'a' in the regexp element 'at' still doesn't match the lastletter of the string 't', so backtrack one more character.=item 4Now we can match the 'a' and the 't'.=item 5Move on to the third element '.*'.  Since we are at the end ofthe string and '.*' can match 0 times, assign it the empty string.=item 6We are done!=backMost of the time, all this moving forward and backtracking happensquickly and searching is fast.   There are some pathological regexps,however, whose execution time exponentially grows with the size of thestring.  A typical structure that blows up in your face is of the form    /(a|b+)*/;The problem is the nested indeterminate quantifiers.  There are manydifferent ways of partitioning a string of length n between the C<+>and C<*>: one repetition with C<b+> of length n, two repetitions withthe first C<b+> length k and the second with length n-k, m repetitionswhose bits add up to length n, etc.  In fact there are an exponentialnumber of ways to partition a string as a function of length.  Aregexp may get lucky and match early in the process, but if there isno match, perl will try I<every> possibility before giving up.  So becareful with nested C<*>'s, C<{n,m}>'s, and C<+>'s.  The bookI<Mastering regular expressions> by Jeffrey Friedl gives a wonderfuldiscussion of this and other efficiency issues.=head2 Building a regexpAt this point, we have all the basic regexp concepts covered, so let'sgive a more involved example of a regular expression.  We will build aregexp that matches numbers.The first task in building a regexp is to decide what we want to matchand what we want to exclude.  In our case, we want to match bothintegers and floating point numbers and we want to reject any stringthat isn't a number.The next task is to break the problem down into smaller problems thatare easily converted into a regexp.The simplest case is integers.  These consist of a sequence of digits,with an optional sign in front.  The digits we can represent withC<\d+> and the sign can be matched with C<[+-]>.  Thus the integerregexp is    /[+-]?\d+/;  # matches integersA floating point number potentially has a sign, an integral part, adecimal point, a fractional part, and an exponent.  One or more of theseparts is optional, so we need to check out the differentpossibilities.  Floating point numbers which are in proper form include123., 0.345, .34, -1e6, and 25.4E-72.  As with integers, the sign outfront is completely optional and can be matched by C<[+-]?>.  We cansee that if there is no exponent, floating point numbers must have adecimal point, otherwise they are integers.  We might be tempted tomodel these with C<\d*\.\d*>, but this would also match just a singledecimal point, which is not a number.  So the three cases of floatingpoint number sans exponent are   /[+-]?\d+\./;  # 1., 321., etc.   /[+-]?\.\d+/;  # .1, .234, etc.   /[+-]?\d+\.\d+/;  # 1.0, 30.56, etc.These can be combined into a single regexp with a three-way alternation:   /[+-]?(\d+\.\d+|\d+\.|\.\d+)/;  # floating point, no exponentIn this alternation, it is important to put C<'\d+\.\d+'> beforeC<'\d+\.'>.  If C<'\d+\.'> were first, the regexp would happily match thatand ignore the fractional part of the number.Now consider floating point numbers with exponents.  The keyobservation here is that I<both> integers and numbers with decimalpoints are allowed in front of an exponent.  Then exponents, like theoverall sign, are independent of whether we are matching numbers withor without decimal points, and can be 'decoupled' from themantissa.  The overall form of the regexp now becomes clear:    /^(optional sign)(integer | f.p. mantissa)(optional exponent)$/;The exponent is an C<e> or C<E>, followed by an integer.  So theexponent regexp is   /[eE][+-]?\d+/;  # exponentPutting all the parts together, we get a regexp that matches numbers:   /^[+-]?(\d+\.\d+|\d+\.|\.\d+|\d+)([eE][+-]?\d+)?$/;  # Ta da!Long regexps like this may impress your friends, but can be hard todecipher.  In complex situations like this, the C<//x> modifier for amatch is invaluable.  It allows one to put nearly arbitrary whitespaceand comments into a regexp without affecting their meaning.  Using it,we can rewrite our 'extended' regexp in the more pleasing form   /^      [+-]?         # first, match an optional sign      (             # then match integers or f.p. mantissas:          \d+\.\d+  # mantissa of the form a.b         |\d+\.     # mantissa of the form a.         |\.\d+     # mantissa of the form .b         |\d+       # integer of the form a      )      ([eE][+-]?\d+)?  # finally, optionally match an exponent   $/x;If whitespace is mostly irrelevant, how does one include spacecharacters in an extended regexp? The answer is to backslash itS<C<'\ '> > or put it in a character class S<C<[ ]> >.  The same thinggoes for pound signs, use C<\#> or C<[#]>.  For instance, Perl allowsa space between the sign and the mantissa/integer, and we could addthis to our regexp as follows:   /^      [+-]?\ *      # first, match an optional sign *and space*      (             # then match integers or f.p. mantissas:          \d+\.\d+  # mantissa of the form a.b         |\d+\.     # mantissa of the form a.         |\.\d+     # mantissa of the form .b         |\d+       # integer of the form a      )      ([eE][+-]?\d+)?  # finally, optionally match an exponent   $/x;In this form, it is easier to see a way to simplify thealternation.  Alternatives 1, 2, and 4 all start with C<\d+>, so itcould be factored out:   /^      [+-]?\ *      # first, match an optional sign      (             # then match integers or f.p. mantissas:          \d+       # start out with a ...          (              \.\d* # mantissa of the form a.b or a.          )?        # ? takes care of integers of the form a         |\.\d+     # mantissa of the form .b      )      ([eE][+-]?\d+)?  # finally, optionally match an exponent   $/x;or written in the compact form,    /^[+-]?\ *(\d+(\.\d*)?|\.\d+)([eE][+-]?\d+)?$/;This is our final regexp.  To recap, we built a regexp by=over 4=item *specifying the task in detail,=item *breaking down the problem into smaller parts,=item *translating the small parts into regexps,=item *combining the regexps,=item *and optimizing the final combined regexp.=backThese are also the typical steps involved in writing a computerprogram.  This makes perfect sense, because regular expressions areessentially programs written a little computer language that specifiespatterns.=head2 Using regular expressions in PerlThe last topic of Part 1 briefly covers how regexps are used in Perlprograms.  Where do they fit into Perl syntax?We have already introduced the matching operator in its defaultC</regexp/> and arbitrary delimiter C<m!regexp!> forms.  We have usedthe binding operator C<=~> and its negation C<!~> to test for stringmatches.  Associated with the matching operator, we have discussed thesingle line C<//s>, multi-line C<//m>, case-insensitive C<//i> andextended C<//x> modifiers.There are a few more things you might want to know about matchingoperators.  First, we pointed out earlier that variables in regexps aresubstituted before the regexp is evaluated:    $pattern = 'Seuss';    while (<>) {        print if /$pattern/;    }This will print any lines containing the word C<Seuss>.  It is not asefficient as it could be, however, because perl has to re-evaluateC<$pattern> each time through the loop.  If C<$pattern> won't bechanging over the lifetime of the script, we can add the C<//o>modifier, which directs perl to only perform variable substitutionsonce:    #!/usr/bin/perl    #    Improved simple_grep    $regexp = shift;    while (<>) {        print if /$regexp/o;  # a good deal faster    }If you change C<$pattern> after the first substitution happens, perlwill ignore it.  If you don't want any substitutions at all, use thespecial delimiter C<m''>:    $pattern = 'Seuss';    while (<>) {        print if m'$pattern';  # matches '$pattern', not 'Seuss'    }C<m''> acts like single quotes on a regexp; all other C<m> delimitersact like double quotes.  If the regexp evaluates to the empty string,the regexp in the I<last successful match> is used instead.  So we have    "dog" =~ /d/;  # 'd' matches    "dogbert =~ //;  # this matches the 'd' regexp used beforeThe final two modifiers C<//g> and C<//c> concern multiple matches.The modifier C<//g> stands for global matching and allows the thematching operator to match within a string as many times as possible.In scalar context, successive invocations against a string will have`C<//g> jump from match to match, keeping track of position in thestring as it goes along.  You can get or set the position with theC<pos()> function.The use of C<//g> is shown in the following example.  Suppose we havea string that consists of words separated by spaces.  If we know how

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -