⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 perlretut.pod

📁 MSYS在windows下模拟了一个类unix的终端
💻 POD
📖 第 1 页 / 共 5 页
字号:
=head1 NAMEperlretut - Perl regular expressions tutorial=head1 DESCRIPTIONThis page provides a basic tutorial on understanding, creating andusing regular expressions in Perl.  It serves as a complement to thereference page on regular expressions L<perlre>.  Regular expressionsare an integral part of the C<m//>, C<s///>, C<qr//> and C<split>operators and so this tutorial also overlaps withL<perlop/"Regexp Quote-Like Operators"> and L<perlfunc/split>.Perl is widely renowned for excellence in text processing, and regularexpressions are one of the big factors behind this fame.  Perl regularexpressions display an efficiency and flexibility unknown in mostother computer languages.  Mastering even the basics of regularexpressions will allow you to manipulate text with surprising ease.What is a regular expression?  A regular expression is simply a stringthat describes a pattern.  Patterns are in common use these days;examples are the patterns typed into a search engine to find web pagesand the patterns used to list files in a directory, e.g., C<ls *.txt>or C<dir *.*>.  In Perl, the patterns described by regular expressionsare used to search strings, extract desired parts of strings, and todo search and replace operations.Regular expressions have the undeserved reputation of being abstractand difficult to understand.  Regular expressions are constructed usingsimple concepts like conditionals and loops and are no more difficultto understand than the corresponding C<if> conditionals and C<while>loops in the Perl language itself.  In fact, the main challenge inlearning regular expressions is just getting used to the tersenotation used to express these concepts.This tutorial flattens the learning curve by discussing regularexpression concepts, along with their notation, one at a time and withmany examples.  The first part of the tutorial will progress from thesimplest word searches to the basic regular expression concepts.  Ifyou master the first part, you will have all the tools needed to solveabout 98% of your needs.  The second part of the tutorial is for thosecomfortable with the basics and hungry for more power tools.  Itdiscusses the more advanced regular expression operators andintroduces the latest cutting edge innovations in 5.6.0.A note: to save time, 'regular expression' is often abbreviated asregexp or regex.  Regexp is a more natural abbreviation than regex, butis harder to pronounce.  The Perl pod documentation is evenly split onregexp vs regex; in Perl, there is more than one way to abbreviate it.We'll use regexp in this tutorial.=head1 Part 1: The basics=head2 Simple word matchingThe simplest regexp is simply a word, or more generally, a string ofcharacters.  A regexp consisting of a word matches any string thatcontains that word:    "Hello World" =~ /World/;  # matchesWhat is this perl statement all about? C<"Hello World"> is a simpledouble quoted string.  C<World> is the regular expression and theC<//> enclosing C</World/> tells perl to search a string for a match.The operator C<=~> associates the string with the regexp match andproduces a true value if the regexp matched, or false if the regexpdid not match.  In our case, C<World> matches the second word inC<"Hello World">, so the expression is true.  Expressions like thisare useful in conditionals:    if ("Hello World" =~ /World/) {        print "It matches\n";    }    else {        print "It doesn't match\n";    }There are useful variations on this theme.  The sense of the match canbe reversed by using C<!~> operator:    if ("Hello World" !~ /World/) {        print "It doesn't match\n";    }    else {        print "It matches\n";    }The literal string in the regexp can be replaced by a variable:    $greeting = "World";    if ("Hello World" =~ /$greeting/) {        print "It matches\n";    }    else {        print "It doesn't match\n";    }If you're matching against the special default variable C<$_>, theC<$_ =~> part can be omitted:    $_ = "Hello World";    if (/World/) {        print "It matches\n";    }    else {        print "It doesn't match\n";    }And finally, the C<//> default delimiters for a match can be changedto arbitrary delimiters by putting an C<'m'> out front:    "Hello World" =~ m!World!;   # matches, delimited by '!'    "Hello World" =~ m{World};   # matches, note the matching '{}'    "/usr/bin/perl" =~ m"/perl"; # matches after '/usr/bin',                                 # '/' becomes an ordinary charC</World/>, C<m!World!>, and C<m{World}> all represent thesame thing.  When, e.g., C<""> is used as a delimiter, the forwardslash C<'/'> becomes an ordinary character and can be used in a regexpwithout trouble.Let's consider how different regexps would match C<"Hello World">:    "Hello World" =~ /world/;  # doesn't match    "Hello World" =~ /o W/;    # matches    "Hello World" =~ /oW/;     # doesn't match    "Hello World" =~ /World /; # doesn't matchThe first regexp C<world> doesn't match because regexps arecase-sensitive.  The second regexp matches because the substringS<C<'o W'> > occurs in the string S<C<"Hello World"> >.  The spacecharacter ' ' is treated like any other character in a regexp and isneeded to match in this case.  The lack of a space character is thereason the third regexp C<'oW'> doesn't match.  The fourth regexpC<'World '> doesn't match because there is a space at the end of theregexp, but not at the end of the string.  The lesson here is thatregexps must match a part of the string I<exactly> in order for thestatement to be true.If a regexp matches in more than one place in the string, perl willalways match at the earliest possible point in the string:    "Hello World" =~ /o/;       # matches 'o' in 'Hello'    "That hat is red" =~ /hat/; # matches 'hat' in 'That'With respect to character matching, there are a few more points youneed to know about.   First of all, not all characters can be used 'asis' in a match.  Some characters, called B<metacharacters>, are reservedfor use in regexp notation.  The metacharacters are    {}[]()^$.|*+?\The significance of each of these will be explainedin the rest of the tutorial, but for now, it is important only to knowthat a metacharacter can be matched by putting a backslash before it:    "2+2=4" =~ /2+2/;    # doesn't match, + is a metacharacter    "2+2=4" =~ /2\+2/;   # matches, \+ is treated like an ordinary +    "The interval is [0,1)." =~ /[0,1)./     # is a syntax error!    "The interval is [0,1)." =~ /\[0,1\)\./  # matches    "/usr/bin/perl" =~ /\/usr\/local\/bin\/perl/;  # matchesIn the last regexp, the forward slash C<'/'> is also backslashed,because it is used to delimit the regexp.  This can lead to LTS(leaning toothpick syndrome), however, and it is often more readableto change delimiters.The backslash character C<'\'> is a metacharacter itself and needs tobe backslashed:    'C:\WIN32' =~ /C:\\WIN/;   # matchesIn addition to the metacharacters, there are some ASCII characterswhich don't have printable character equivalents and are insteadrepresented by B<escape sequences>.  Common examples are C<\t> for atab, C<\n> for a newline, C<\r> for a carriage return and C<\a> for abell.  If your string is better thought of as a sequence of arbitrarybytes, the octal escape sequence, e.g., C<\033>, or hexadecimal escapesequence, e.g., C<\x1B> may be a more natural representation for yourbytes.  Here are some examples of escapes:    "1000\t2000" =~ m(0\t2)   # matches    "1000\n2000" =~ /0\n20/   # matches    "1000\t2000" =~ /\000\t2/ # doesn't match, "0" ne "\000"    "cat"        =~ /\143\x61\x74/ # matches, but a weird way to spell catIf you've been around Perl a while, all this talk of escape sequencesmay seem familiar.  Similar escape sequences are used in double-quotedstrings and in fact the regexps in Perl are mostly treated asdouble-quoted strings.  This means that variables can be used inregexps as well.  Just like double-quoted strings, the values of thevariables in the regexp will be substituted in before the regexp isevaluated for matching purposes.  So we have:    $foo = 'house';    'housecat' =~ /$foo/;      # matches    'cathouse' =~ /cat$foo/;   # matches    'housecat' =~ /${foo}cat/; # matchesSo far, so good.  With the knowledge above you can already performsearches with just about any literal string regexp you can dream up.Here is a I<very simple> emulation of the Unix grep program:    % cat > simple_grep    #!/usr/bin/perl    $regexp = shift;    while (<>) {        print if /$regexp/;    }    ^D    % chmod +x simple_grep    % simple_grep abba /usr/dict/words    Babbage    cabbage    cabbages    sabbath    Sabbathize    Sabbathizes    sabbatical    scabbard    scabbardsThis program is easy to understand.  C<#!/usr/bin/perl> is the standardway to invoke a perl program from the shell.S<C<$regexp = shift;> > saves the first command line argument as theregexp to be used, leaving the rest of the command line arguments tobe treated as files.  S<C<< while (<>) >> > loops over all the lines inall the files.  For each line, S<C<print if /$regexp/;> > prints theline if the regexp matches the line.  In this line, both C<print> andC</$regexp/> use the default variable C<$_> implicitly.With all of the regexps above, if the regexp matched anywhere in thestring, it was considered a match.  Sometimes, however, we'd like tospecify I<where> in the string the regexp should try to match.  To dothis, we would use the B<anchor> metacharacters C<^> and C<$>.  Theanchor C<^> means match at the beginning of the string and the anchorC<$> means match at the end of the string, or before a newline at theend of the string.  Here is how they are used:    "housekeeper" =~ /keeper/;    # matches    "housekeeper" =~ /^keeper/;   # doesn't match    "housekeeper" =~ /keeper$/;   # matches    "housekeeper\n" =~ /keeper$/; # matchesThe second regexp doesn't match because C<^> constrains C<keeper> tomatch only at the beginning of the string, but C<"housekeeper"> haskeeper starting in the middle.  The third regexp does match, since theC<$> constrains C<keeper> to match only at the end of the string.When both C<^> and C<$> are used at the same time, the regexp has tomatch both the beginning and the end of the string, i.e., the regexpmatches the whole string.  Consider    "keeper" =~ /^keep$/;      # doesn't match    "keeper" =~ /^keeper$/;    # matches    ""       =~ /^$/;          # ^$ matches an empty stringThe first regexp doesn't match because the string has more to it thanC<keep>.  Since the second regexp is exactly the string, itmatches.  Using both C<^> and C<$> in a regexp forces the completestring to match, so it gives you complete control over which stringsmatch and which don't.  Suppose you are looking for a fellow namedbert, off in a string by himself:    "dogbert" =~ /bert/;   # matches, but not what you want    "dilbert" =~ /^bert/;  # doesn't match, but ..    "bertram" =~ /^bert/;  # matches, so still not good enough    "bertram" =~ /^bert$/; # doesn't match, good    "dilbert" =~ /^bert$/; # doesn't match, good    "bert"    =~ /^bert$/; # matches, perfectOf course, in the case of a literal string, one could just as easilyuse the string equivalence S<C<$string eq 'bert'> > and it would bemore efficient.   The  C<^...$> regexp really becomes useful when weadd in the more powerful regexp tools below.=head2 Using character classesAlthough one can already do quite a lot with the literal stringregexps above, we've only scratched the surface of regular expressiontechnology.  In this and subsequent sections we will introduce regexpconcepts (and associated metacharacter notations) that will allow aregexp to not just represent a single character sequence, but a I<wholeclass> of them.One such concept is that of a B<character class>.  A character classallows a set of possible characters, rather than just a singlecharacter, to match at a particular point in a regexp.  Characterclasses are denoted by brackets C<[...]>, with the set of charactersto be possibly matched inside.  Here are some examples:    /cat/;       # matches 'cat'    /[bcr]at/;   # matches 'bat, 'cat', or 'rat'    /item[0123456789]/;  # matches 'item0' or ... or 'item9'    "abc" =~ /[cab]/;    # matches 'a'In the last statement, even though C<'c'> is the first character inthe class, C<'a'> matches because the first character position in thestring is the earliest point at which the regexp can match.    /[yY][eE][sS]/;      # match 'yes' in a case-insensitive way                         # 'yes', 'Yes', 'YES', etc.This regexp displays a common task: perform a a case-insensitivematch.  Perl provides away of avoiding all those brackets by simplyappending an C<'i'> to the end of the match.  Then C</[yY][eE][sS]/;>can be rewritten as C</yes/i;>.  The C<'i'> stands forcase-insensitive and is an example of a B<modifier> of the matchingoperation.  We will meet other modifiers later in the tutorial.We saw in the section above that there were ordinary characters, whichrepresented themselves, and special characters, which needed abackslash C<\> to represent themselves.  The same is true in acharacter class, but the sets of ordinary and special charactersinside a character class are different than those outside a characterclass.  The special characters for a character class are C<-]\^$>.  C<]>is special because it denotes the end of a character class.  C<$> isspecial because it denotes a scalar variable.  C<\> is special becauseit is used in escape sequences, just like above.  Here is how thespecial characters C<]$\> are handled:   /[\]c]def/; # matches ']def' or 'cdef'   $x = 'bcr';   /[$x]at/;   # matches 'bat', 'cat', or 'rat'   /[\$x]at/;  # matches '$at' or 'xat'   /[\\$x]at/; # matches '\at', 'bat, 'cat', or 'rat'The last two are a little tricky.  in C<[\$x]>, the backslash protectsthe dollar sign, so the character class has two members C<$> and C<x>.In C<[\\$x]>, the backslash is protected, so C<$x> is treated as avariable and substituted in double quote fashion.

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -