📄 perlfaq6.1
字号:
\& )*\& \*(Aq ## End of \*(Aq ... \*(Aq string\&\& | ## OR\&\& . ## Anything other char\& [^/"\*(Aq\e\e]* ## Chars which doesn\*(Aqt start a comment, string or escape\& )\& }{defined $2 ? $2 : ""}gxse;.Ve.PPA slight modification also removes \*(C+ comments, as long as they are notspread over multiple lines using a continuation character):.PP.Vb 1\& s#/\e*[^*]*\e*+([^/*][^*]*\e*+)*/|//[^\en]*|("(\e\e.|[^"\e\e])*"|\*(Aq(\e\e.|[^\*(Aq\e\e])*\*(Aq|.[^/"\*(Aq\e\e]*)#defined $2 ? $2 : ""#gse;.Ve.Sh "Can I use Perl regular expressions to match balanced text?".IX Xref "regex, matching balanced test regexp, matching balanced test regular expression, matching balanced test".IX Subsection "Can I use Perl regular expressions to match balanced text?"Historically, Perl regular expressions were not capable of matchingbalanced text. As of more recent versions of perl including 5.6.1experimental features have been added that make it possible to do this.Look at the documentation for the (??{ }) construct in recent perlre manualpages to see an example of matching balanced parentheses. Be sure to takespecial notice of the warnings present in the manual before making useof this feature..PP\&\s-1CPAN\s0 contains many modules that can be useful for matching textdepending on the context. Damian Conway provides some usefulpatterns in Regexp::Common. The module Text::Balanced provides ageneral solution to this problem..PPOne of the common applications of balanced text matching is workingwith \s-1XML\s0 and \s-1HTML\s0. There are many modules available that supportthese needs. Two examples are HTML::Parser and XML::Parser. Thereare many others..PPAn elaborate subroutine (for 7\-bit \s-1ASCII\s0 only) to pull out balancedand possibly nested single chars, like \f(CW\*(C`\`\*(C'\fR and \f(CW\*(C`\*(Aq\*(C'\fR, \f(CW\*(C`{\*(C'\fR and \f(CW\*(C`}\*(C'\fR,or \f(CW\*(C`(\*(C'\fR and \f(CW\*(C`)\*(C'\fR can be found inhttp://www.cpan.org/authors/id/TOMC/scripts/pull_quotes.gz ..PPThe C::Scan module from \s-1CPAN\s0 also contains such subs for internal use,but they are undocumented..Sh "What does it mean that regexes are greedy? How can I get around it?".IX Xref "greedy greediness".IX Subsection "What does it mean that regexes are greedy? How can I get around it?"Most people mean that greedy regexes match as much as they can.Technically speaking, it's actually the quantifiers (\f(CW\*(C`?\*(C'\fR, \f(CW\*(C`*\*(C'\fR, \f(CW\*(C`+\*(C'\fR,\&\f(CW\*(C`{}\*(C'\fR) that are greedy rather than the whole pattern; Perl prefers localgreed and immediate gratification to overall greed. To get non-greedyversions of the same quantifiers, use (\f(CW\*(C`??\*(C'\fR, \f(CW\*(C`*?\*(C'\fR, \f(CW\*(C`+?\*(C'\fR, \f(CW\*(C`{}?\*(C'\fR)..PPAn example:.PP.Vb 3\& $s1 = $s2 = "I am very very cold";\& $s1 =~ s/ve.*y //; # I am cold\& $s2 =~ s/ve.*?y //; # I am very cold.Ve.PPNotice how the second substitution stopped matching as soon as itencountered \*(L"y \*(R". The \f(CW\*(C`*?\*(C'\fR quantifier effectively tells the regularexpression engine to find a match as quickly as possible and passcontrol on to whatever is next in line, like you would if you wereplaying hot potato..Sh "How do I process each word on each line?".IX Xref "word".IX Subsection "How do I process each word on each line?"Use the split function:.PP.Vb 5\& while (<>) {\& foreach $word ( split ) {\& # do something with $word here\& }\& }.Ve.PPNote that this isn't really a word in the English sense; it's justchunks of consecutive non-whitespace characters..PPTo work with only alphanumeric sequences (including underscores), youmight consider.PP.Vb 5\& while (<>) {\& foreach $word (m/(\ew+)/g) {\& # do something with $word here\& }\& }.Ve.Sh "How can I print out a word-frequency or line-frequency summary?".IX Subsection "How can I print out a word-frequency or line-frequency summary?"To do this, you have to parse out each word in the input stream. We'llpretend that by word you mean chunk of alphabetics, hyphens, orapostrophes, rather than the non-whitespace chunk idea of a word givenin the previous question:.PP.Vb 5\& while (<>) {\& while ( /(\eb[^\eW_\ed][\ew\*(Aq\-]+\eb)/g ) { # misses "\`sheep\*(Aq"\& $seen{$1}++;\& }\& }\&\& while ( ($word, $count) = each %seen ) {\& print "$count $word\en";\& }.Ve.PPIf you wanted to do the same thing for lines, you wouldn't need aregular expression:.PP.Vb 3\& while (<>) {\& $seen{$_}++;\& }\&\& while ( ($line, $count) = each %seen ) {\& print "$count $line";\& }.Ve.PPIf you want these output in a sorted order, see perlfaq4: \*(L"How do Isort a hash (optionally by value instead of key)?\*(R"..Sh "How can I do approximate matching?".IX Xref "match, approximate matching, approximate".IX Subsection "How can I do approximate matching?"See the module String::Approx available from \s-1CPAN\s0..Sh "How do I efficiently match many regular expressions at once?".IX Xref "regex, efficiency regexp, efficiency regular expression, efficiency".IX Subsection "How do I efficiently match many regular expressions at once?"( contributed by brian d foy ).PPAvoid asking Perl to compile a regular expression every timeyou want to match it. In this example, perl must recompilethe regular expression for every iteration of the \fIforeach()\fRloop since it has no way to know what \f(CW$pattern\fR will be..PP.Vb 1\& @patterns = qw( foo bar baz );\&\& LINE: while( <DATA> )\& {\& foreach $pattern ( @patterns )\& {\& if( /\eb$pattern\eb/i )\& {\& print;\& next LINE;\& }\& }\& }.Ve.PPThe qr// operator showed up in perl 5.005. It compiles aregular expression, but doesn't apply it. When you use thepre-compiled version of the regex, perl does less work. Inthis example, I inserted a \fImap()\fR to turn each pattern intoits pre-compiled form. The rest of the script is the same,but faster..PP.Vb 1\& @patterns = map { qr/\eb$_\eb/i } qw( foo bar baz );\&\& LINE: while( <> )\& {\& foreach $pattern ( @patterns )\& {\& print if /$pattern/i;\& next LINE;\& }\& }.Ve.PPIn some cases, you may be able to make several patterns intoa single regular expression. Beware of situations that requirebacktracking though..PP.Vb 1\& $regex = join \*(Aq|\*(Aq, qw( foo bar baz );\&\& LINE: while( <> )\& {\& print if /\eb(?:$regex)\eb/i;\& }.Ve.PPFor more details on regular expression efficiency, see MasteringRegular Expressions by Jeffrey Freidl. He explains how regularexpressions engine work and why some patterns are surprisinglyinefficient. Once you understand how perl applies regularexpressions, you can tune them for individual situations..ie n .Sh "Why don't word-boundary searches with ""\eb"" work for me?".el .Sh "Why don't word-boundary searches with \f(CW\eb\fP work for me?".IX Xref "\eb".IX Subsection "Why don't word-boundary searches with b work for me?"(contributed by brian d foy).PPEnsure that you know what \eb really does: it's the boundary between aword character, \ew, and something that isn't a word character. Thatthing that isn't a word character might be \eW, but it can also be thestart or end of the string..PPIt's not (not!) the boundary between whitespace and non-whitespace,and it's not the stuff between words we use to create sentences..PPIn regex speak, a word boundary (\eb) is a \*(L"zero width assertion\*(R",meaning that it doesn't represent a character in the string, but acondition at a certain position..PPFor the regular expression, /\ebPerl\eb/, there has to be a wordboundary before the \*(L"P\*(R" and after the \*(L"l\*(R". As long as something otherthan a word character precedes the \*(L"P\*(R" and succeeds the \*(L"l\*(R", thepattern will match. These strings match /\ebPerl\eb/..PP.Vb 4\& "Perl" # no word char before P or after l\& "Perl " # same as previous (space is not a word char)\& "\*(AqPerl\*(Aq" # the \*(Aq char is not a word char\& "Perl\*(Aqs" # no word char before P, non\-word char after "l".Ve.PPThese strings do not match /\ebPerl\eb/..PP.Vb 2\& "Perl_" # _ is a word char!\& "Perler" # no word char before P, but one after l.Ve.PPYou don't have to use \eb to match words though. You can look fornon-word characters surrounded by word characters. These stringsmatch the pattern /\eb'\eb/..PP.Vb 2\& "don\*(Aqt" # the \*(Aq char is surrounded by "n" and "t"\& "qep\*(Aqa\*(Aq" # the \*(Aq char is surrounded by "p" and "a".Ve.PPThese strings do not match /\eb'\eb/..PP.Vb 1\& "foo\*(Aq" # there is no word char after non\-word \*(Aq.Ve.PPYou can also use the complement of \eb, \eB, to specify that thereshould not be a word boundary..PPIn the pattern /\eBam\eB/, there must be a word character before the \*(L"a\*(R"and after the \*(L"m\*(R". These patterns match /\eBam\eB/:.PP.Vb 2\& "llama" # "am" surrounded by word chars\& "Samuel" # same.Ve.PPThese strings do not match /\eBam\eB/.PP.Vb 2\& "Sam" # no word boundary before "a", but one after "m"\& "I am Sam" # "am" surrounded by non\-word chars.Ve.Sh "Why does using $&, $`, or $' slow my program down?".IX Xref "$MATCH $& $POSTMATCH $' $PREMATCH $`".IX Subsection "Why does using $&, $`, or $' slow my program down?"(contributed by Anno Siegel).PPOnce Perl sees that you need one of these variables anywhere in theprogram, it provides them on each and every pattern match. That meansthat on every pattern match the entire string will be copied, part of itto $`, part to $&, and part to $'. Thus the penalty is most severe withlong strings and patterns that match often. Avoid $&, $', and $` if youcan, but if you can't, once you've used them at all, use them at willbecause you've already paid the price. Remember that some algorithmsreally appreciate them. As of the 5.005 release, the $& variable is nolonger \*(L"expensive\*(R" the way the other two are..PPSince Perl 5.6.1 the special variables @\- and @+ can functionally replace$`, $& and $'. These arrays contain pointers to the beginning and endof each match (see perlvar for the full story), so they give youessentially the same information, but without the risk of excessivestring copying..ie n .Sh "What good is ""\eG"" in a regular expression?".el .Sh "What good is \f(CW\eG\fP in a regular expression?".IX Xref "\eG".IX Subsection "What good is G in a regular expression?"You use the \f(CW\*(C`\eG\*(C'\fR anchor to start the next match on the samestring where the last match left off. The regularexpression engine cannot skip over any characters to findthe next match with this anchor, so \f(CW\*(C`\eG\*(C'\fR is similar to thebeginning of string anchor, \f(CW\*(C`^\*(C'\fR. The \f(CW\*(C`\eG\*(C'\fR anchor is typicallyused with the \f(CW\*(C`g\*(C'\fR flag. It uses the value of \f(CW\*(C`pos()\*(C'\fRas the position to start the next match. As the matchoperator makes successive matches, it updates \f(CW\*(C`pos()\*(C'\fR with theposition of the next character past the last match (or thefirst character of the next match, depending on how you liketo look at it). Each string has its own \f(CW\*(C`pos()\*(C'\fR value..PPSuppose you want to match all of consecutive pairs of digitsin a string like \*(L"1122a44\*(R" and stop matching when youencounter non-digits. You want to match \f(CW11\fR and \f(CW22\fR butthe letter <a> shows up between \f(CW22\fR and \f(CW44\fR and you wantto stop at \f(CW\*(C`a\*(C'\fR. Simply matching pairs of digits skips overthe \f(CW\*(C`a\*(C'\fR and still matches \f(CW44\fR.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -