⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 perlretut.pod

📁 MSYS在windows下模拟了一个类unix的终端
💻 POD
📖 第 1 页 / 共 5 页
字号:
The special character C<'-'> acts as a range operator within characterclasses, so that a contiguous set of characters can be written as arange.  With ranges, the unwieldy C<[0123456789]> and C<[abc...xyz]>become the svelte C<[0-9]> and C<[a-z]>.  Some examples are    /item[0-9]/;  # matches 'item0' or ... or 'item9'    /[0-9bx-z]aa/;  # matches '0aa', ..., '9aa',                    # 'baa', 'xaa', 'yaa', or 'zaa'    /[0-9a-fA-F]/;  # matches a hexadecimal digit    /[0-9a-zA-Z_]/; # matches a "word" character,                    # like those in a perl variable nameIf C<'-'> is the first or last character in a character class, it istreated as an ordinary character; C<[-ab]>, C<[ab-]> and C<[a\-b]> areall equivalent.The special character C<^> in the first position of a character classdenotes a B<negated character class>, which matches any character butthose in the brackets.  Both C<[...]> and C<[^...]> must match acharacter, or the match fails.  Then    /[^a]at/;  # doesn't match 'aat' or 'at', but matches               # all other 'bat', 'cat, '0at', '%at', etc.    /[^0-9]/;  # matches a non-numeric character    /[a^]at/;  # matches 'aat' or '^at'; here '^' is ordinaryNow, even C<[0-9]> can be a bother the write multiple times, so in theinterest of saving keystrokes and making regexps more readable, Perlhas several abbreviations for common character classes:=over 4=item *\d is a digit and represents [0-9]=item *\s is a whitespace character and represents [\ \t\r\n\f]=item *\w is a word character (alphanumeric or _) and represents [0-9a-zA-Z_]=item *\D is a negated \d; it represents any character but a digit [^0-9]=item *\S is a negated \s; it represents any non-whitespace character [^\s]=item *\W is a negated \w; it represents any non-word character [^\w]=item *The period '.' matches any character but "\n"=backThe C<\d\s\w\D\S\W> abbreviations can be used both inside and outsideof character classes.  Here are some in use:    /\d\d:\d\d:\d\d/; # matches a hh:mm:ss time format    /[\d\s]/;         # matches any digit or whitespace character    /\w\W\w/;         # matches a word char, followed by a                      # non-word char, followed by a word char    /..rt/;           # matches any two chars, followed by 'rt'    /end\./;          # matches 'end.'    /end[.]/;         # same thing, matches 'end.'Because a period is a metacharacter, it needs to be escaped to matchas an ordinary period. Because, for example, C<\d> and C<\w> are setsof characters, it is incorrect to think of C<[^\d\w]> as C<[\D\W]>; infact C<[^\d\w]> is the same as C<[^\w]>, which is the same asC<[\W]>. Think DeMorgan's laws.An anchor useful in basic regexps is the S<B<word anchor> >C<\b>.  This matches a boundary between a word character and a non-wordcharacter C<\w\W> or C<\W\w>:    $x = "Housecat catenates house and cat";    $x =~ /cat/;    # matches cat in 'housecat'    $x =~ /\bcat/;  # matches cat in 'catenates'    $x =~ /cat\b/;  # matches cat in 'housecat'    $x =~ /\bcat\b/;  # matches 'cat' at end of stringNote in the last example, the end of the string is considered a wordboundary.You might wonder why C<'.'> matches everything but C<"\n"> - why notevery character? The reason is that often one is matching againstlines and would like to ignore the newline characters.  For instance,while the string C<"\n"> represents one line, we would like to thinkof as empty.  Then    ""   =~ /^$/;    # matches    "\n" =~ /^$/;    # matches, "\n" is ignored    ""   =~ /./;      # doesn't match; it needs a char    ""   =~ /^.$/;    # doesn't match; it needs a char    "\n" =~ /^.$/;    # doesn't match; it needs a char other than "\n"    "a"  =~ /^.$/;    # matches    "a\n"  =~ /^.$/;  # matches, ignores the "\n"This behavior is convenient, because we usually want to ignorenewlines when we count and match characters in a line.  Sometimes,however, we want to keep track of newlines.  We might even want C<^>and C<$> to anchor at the beginning and end of lines within thestring, rather than just the beginning and end of the string.  Perlallows us to choose between ignoring and paying attention to newlinesby using the C<//s> and C<//m> modifiers.  C<//s> and C<//m> stand forsingle line and multi-line and they determine whether a string is tobe treated as one continuous string, or as a set of lines.  The twomodifiers affect two aspects of how the regexp is interpreted: 1) howthe C<'.'> character class is defined, and 2) where the anchors C<^>and C<$> are able to match.  Here are the four possible combinations:=over 4=item *no modifiers (//): Default behavior.  C<'.'> matches any characterexcept C<"\n">.  C<^> matches only at the beginning of the string andC<$> matches only at the end or before a newline at the end.=item *s modifier (//s): Treat string as a single long line.  C<'.'> matchesany character, even C<"\n">.  C<^> matches only at the beginning ofthe string and C<$> matches only at the end or before a newline at theend.=item *m modifier (//m): Treat string as a set of multiple lines.  C<'.'>matches any character except C<"\n">.  C<^> and C<$> are able to matchat the start or end of I<any> line within the string.=item *both s and m modifiers (//sm): Treat string as a single long line, butdetect multiple lines.  C<'.'> matches any character, evenC<"\n">.  C<^> and C<$>, however, are able to match at the start or endof I<any> line within the string.=backHere are examples of C<//s> and C<//m> in action:    $x = "There once was a girl\nWho programmed in Perl\n";    $x =~ /^Who/;   # doesn't match, "Who" not at start of string    $x =~ /^Who/s;  # doesn't match, "Who" not at start of string    $x =~ /^Who/m;  # matches, "Who" at start of second line    $x =~ /^Who/sm; # matches, "Who" at start of second line    $x =~ /girl.Who/;   # doesn't match, "." doesn't match "\n"    $x =~ /girl.Who/s;  # matches, "." matches "\n"    $x =~ /girl.Who/m;  # doesn't match, "." doesn't match "\n"    $x =~ /girl.Who/sm; # matches, "." matches "\n"Most of the time, the default behavior is what is want, but C<//s> andC<//m> are occasionally very useful.  If C<//m> is being used, the startof the string can still be matched with C<\A> and the end of stringcan still be matched with the anchors C<\Z> (matches both the end andthe newline before, like C<$>), and C<\z> (matches only the end):    $x =~ /^Who/m;   # matches, "Who" at start of second line    $x =~ /\AWho/m;  # doesn't match, "Who" is not at start of string    $x =~ /girl$/m;  # matches, "girl" at end of first line    $x =~ /girl\Z/m; # doesn't match, "girl" is not at end of string    $x =~ /Perl\Z/m; # matches, "Perl" is at newline before end    $x =~ /Perl\z/m; # doesn't match, "Perl" is not at end of stringWe now know how to create choices among classes of characters in aregexp.  What about choices among words or character strings? Suchchoices are described in the next section.=head2 Matching this or thatSometimes we would like to our regexp to be able to match differentpossible words or character strings.  This is accomplished by usingthe B<alternation> metacharacter C<|>.  To match C<dog> or C<cat>, weform the regexp C<dog|cat>.  As before, perl will try to match theregexp at the earliest possible point in the string.  At eachcharacter position, perl will first try to match the firstalternative, C<dog>.  If C<dog> doesn't match, perl will then try thenext alternative, C<cat>.  If C<cat> doesn't match either, then thematch fails and perl moves to the next position in the string.  Someexamples:    "cats and dogs" =~ /cat|dog|bird/;  # matches "cat"    "cats and dogs" =~ /dog|cat|bird/;  # matches "cat"Even though C<dog> is the first alternative in the second regexp,C<cat> is able to match earlier in the string.    "cats"          =~ /c|ca|cat|cats/; # matches "c"    "cats"          =~ /cats|cat|ca|c/; # matches "cats"Here, all the alternatives match at the first string position, so thefirst alternative is the one that matches.  If some of thealternatives are truncations of the others, put the longest ones firstto give them a chance to match.    "cab" =~ /a|b|c/ # matches "c"                     # /a|b|c/ == /[abc]/The last example points out that character classes are likealternations of characters.  At a given character position, the firstalternative that allows the regexp match to succeed wil be the onethat matches.=head2 Grouping things and hierarchical matchingAlternation allows a regexp to choose among alternatives, but byitself it unsatisfying.  The reason is that each alternative is a wholeregexp, but sometime we want alternatives for just part of aregexp.  For instance, suppose we want to search for housecats orhousekeepers.  The regexp C<housecat|housekeeper> fits the bill, but isinefficient because we had to type C<house> twice.  It would be nice tohave parts of the regexp be constant, like C<house>, and and someparts have alternatives, like C<cat|keeper>.The B<grouping> metacharacters C<()> solve this problem.  Groupingallows parts of a regexp to be treated as a single unit.  Parts of aregexp are grouped by enclosing them in parentheses.  Thus we could solvethe C<housecat|housekeeper> by forming the regexp asC<house(cat|keeper)>.  The regexp C<house(cat|keeper)> means matchC<house> followed by either C<cat> or C<keeper>.  Some more examplesare    /(a|b)b/;    # matches 'ab' or 'bb'    /(ac|b)b/;   # matches 'acb' or 'bb'    /(^a|b)c/;   # matches 'ac' at start of string or 'bc' anywhere    /(a|[bc])d/; # matches 'ad', 'bd', or 'cd'    /house(cat|)/;  # matches either 'housecat' or 'house'    /house(cat(s|)|)/;  # matches either 'housecats' or 'housecat' or                        # 'house'.  Note groups can be nested.    /(19|20|)\d\d/;  # match years 19xx, 20xx, or the Y2K problem, xx    "20" =~ /(19|20|)\d\d/;  # matches the null alternative '()\d\d',                             # because '20\d\d' can't matchAlternations behave the same way in groups as out of them: at a givenstring position, the leftmost alternative that allows the regexp tomatch is taken.  So in the last example at tth first string position,C<"20"> matches the second alternative, but there is nothing left overto match the next two digits C<\d\d>.  So perl moves on to the nextalternative, which is the null alternative and that works, sinceC<"20"> is two digits.The process of trying one alternative, seeing if it matches, andmoving on to the next alternative if it doesn't, is calledB<backtracking>.  The term 'backtracking' comes from the idea thatmatching a regexp is like a walk in the woods.  Successfully matchinga regexp is like arriving at a destination.  There are many possibletrailheads, one for each string position, and each one is tried inorder, left to right.  From each trailhead there may be many paths,some of which get you there, and some which are dead ends.  When youwalk along a trail and hit a dead end, you have to backtrack along thetrail to an earlier point to try another trail.  If you hit yourdestination, you stop immediately and forget about trying all theother trails.  You are persistent, and only if you have tried all thetrails from all the trailheads and not arrived at your destination, doyou declare failure.  To be concrete, here is a step-by-step analysisof what perl does when it tries to match the regexp    "abcde" =~ /(abd|abc)(df|d|de)/;=over 4=item 0Start with the first letter in the string 'a'.=item 1Try the first alternative in the first group 'abd'.=item 2Match 'a' followed by 'b'. So far so good.=item 3'd' in the regexp doesn't match 'c' in the string - a deadend.  So backtrack two characters and pick the second alternative inthe first group 'abc'.=item 4Match 'a' followed by 'b' followed by 'c'.  We are on a rolland have satisfied the first group. Set $1 to 'abc'.=item 5Move on to the second group and pick the first alternative'df'.=item 6Match the 'd'.=item 7'f' in the regexp doesn't match 'e' in the string, so a deadend.  Backtrack one character and pick the second alternative in thesecond group 'd'.=item 8'd' matches. The second grouping is satisfied, so set $2 to'd'.=item 9We are at the end of the regexp, so we are done! We havematched 'abcd' out of the string "abcde".=backThere are a couple of things to note about this analysis.  First, thethird alternative in the second group 'de' also allows a match, but westopped before we got to it - at a given character position, leftmostwins.  Second, we were able to get a match at the first characterposition of the string 'a'.  If there were no matches at the firstposition, perl would move to the second character position 'b' andattempt the match all over again.  Only when all possible paths at all

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -