perlre.pod
来自「视频监控网络部分的协议ddns,的模块的实现代码,请大家大胆指正.」· POD 代码 · 共 1,737 行 · 第 1/5 页
POD
1,737 行
=item [3]A Perl extension, see above.=backFor example use C<[:upper:]> to match all the uppercase characters.Note that the C<[]> are part of the C<[::]> construct, not part of thewhole character class. For example: [01[:alpha:]%]matches zero, one, any alphabetic character, and the percent sign.The following equivalences to Unicode \p{} constructs and equivalentbackslash character classes (if available), will hold:X<character class> X<\p> X<\p{}> [[:...:]] \p{...} backslash alpha IsAlpha alnum IsAlnum ascii IsASCII blank cntrl IsCntrl digit IsDigit \d graph IsGraph lower IsLower print IsPrint punct IsPunct space IsSpace IsSpacePerl \s upper IsUpper word IsWord xdigit IsXDigitFor example C<[[:lower:]]> and C<\p{IsLower}> are equivalent.If the C<utf8> pragma is not used but the C<locale> pragma is, theclasses correlate with the usual isalpha(3) interface (except for"word" and "blank").The other named classes are:=over 4=item cntrlX<cntrl>Any control character. Usually characters that don't produce output assuch but instead control the terminal somehow: for example newline andbackspace are control characters. All characters with ord() less than32 are usually classified as control characters (assuming ASCII,the ISO Latin character sets, and Unicode), as is the character withthe ord() value of 127 (C<DEL>).=item graphX<graph>Any alphanumeric or punctuation (special) character.=item printX<print>Any alphanumeric or punctuation (special) character or the space character.=item punctX<punct>Any punctuation (special) character.=item xdigitX<xdigit>Any hexadecimal digit. Though this may feel silly ([0-9A-Fa-f] wouldwork just fine) it is included for completeness.=backYou can negate the [::] character classes by prefixing the class namewith a '^'. This is a Perl extension. For example:X<character class, negation> POSIX traditional Unicode [[:^digit:]] \D \P{IsDigit} [[:^space:]] \S \P{IsSpace} [[:^word:]] \W \P{IsWord}Perl respects the POSIX standard in that POSIX character classes areonly supported within a character class. The POSIX character classes[.cc.] and [=cc=] are recognized but B<not> supported and trying touse them will cause an error.=head3 AssertionsPerl defines the following zero-width assertions:X<zero-width assertion> X<assertion> X<regex, zero-width assertion>X<regexp, zero-width assertion>X<regular expression, zero-width assertion>X<\b> X<\B> X<\A> X<\Z> X<\z> X<\G> \b Match a word boundary \B Match except at a word boundary \A Match only at beginning of string \Z Match only at end of string, or before newline at the end \z Match only at end of string \G Match only at pos() (e.g. at the end-of-match position of prior m//g)A word boundary (C<\b>) is a spot between two charactersthat has a C<\w> on one side of it and a C<\W> on the other sideof it (in either order), counting the imaginary characters off thebeginning and end of the string as matching a C<\W>. (Withincharacter classes C<\b> represents backspace rather than a wordboundary, just as it normally does in any double-quoted string.)The C<\A> and C<\Z> are just like "^" and "$", except that theywon't match multiple times when the C</m> modifier is used, while"^" and "$" will match at every internal line boundary. To matchthe actual end of the string and not ignore an optional trailingnewline, use C<\z>.X<\b> X<\A> X<\Z> X<\z> X</m>The C<\G> assertion can be used to chain global matches (usingC<m//g>), as described in L<perlop/"Regexp Quote-Like Operators">.It is also useful when writing C<lex>-like scanners, when you haveseveral patterns that you want to match against consequent substringsof your string, see the previous reference. The actual locationwhere C<\G> will match can also be influenced by using C<pos()> asan lvalue: see L<perlfunc/pos>. Note that the rule for zero-lengthmatches is modified somewhat, in that contents to the left of C<\G> isnot counted when determining the length of the match. Thus the followingwill not match forever:X<\G> $str = 'ABC'; pos($str) = 1; while (/.\G/g) { print $&; }It will print 'A' and then terminate, as it considers the match tobe zero-width, and thus will not match at the same position twice in arow.It is worth noting that C<\G> improperly used can result in an infiniteloop. Take care when using patterns that include C<\G> in an alternation.=head3 Capture buffersThe bracketing construct C<( ... )> creates capture buffers. To referto the current contents of a buffer later on, within the same pattern,use \1 for the first, \2 for the second, and so on.Outside the match use "$" instead of "\". (The\<digit> notation works in certain circumstances outsidethe match. See the warning below about \1 vs $1 for details.)Referring back to another part of the match is called aI<backreference>.X<regex, capture buffer> X<regexp, capture buffer>X<regular expression, capture buffer> X<backreference>There is no limit to the number of captured substrings that you mayuse. However Perl also uses \10, \11, etc. as aliases for \010,\011, etc. (Recall that 0 means octal, so \011 is the character atnumber 9 in your coded character set; which would be the 10th character,a horizontal tab under ASCII.) Perl resolves thisambiguity by interpreting \10 as a backreference only if at least 10left parentheses have opened before it. Likewise \11 is abackreference only if at least 11 left parentheses have openedbefore it. And so on. \1 through \9 are always interpreted asbackreferences.X<\g{1}> X<\g{-1}> X<\g{name}> X<relative backreference> X<named backreference>In order to provide a safer and easier way to construct patterns usingbackreferences, Perl provides the C<\g{N}> notation (starting with perl5.10.0). The curly brackets are optional, however omitting them is lesssafe as the meaning of the pattern can be changed by text (such as digits)following it. When N is a positive integer the C<\g{N}> notation isexactly equivalent to using normal backreferences. When N is a negativeinteger then it is a relative backreference referring to the previous N'thcapturing group. When the bracket form is used and N is not an integer, itis treated as a reference to a named buffer.Thus C<\g{-1}> refers to the last buffer, C<\g{-2}> refers to thebuffer before that. For example: / (Y) # buffer 1 ( # buffer 2 (X) # buffer 3 \g{-1} # backref to buffer 3 \g{-3} # backref to buffer 1 ) /xand would match the same as C</(Y) ( (X) \3 \1 )/x>.Additionally, as of Perl 5.10.0 you may use named capture buffers and namedbackreferences. The notation is C<< (?<name>...) >> to declare and C<< \k<name> >>to reference. You may also use apostrophes instead of angle brackets to delimit thename; and you may use the bracketed C<< \g{name} >> backreference syntax.It's possible to refer to a named capture buffer by absolute and relative number as well.Outside the pattern, a named capture buffer is available via the C<%+> hash.When different buffers within the same pattern have the same name, C<$+{name}>and C<< \k<name> >> refer to the leftmost defined group. (Thus it's possibleto do things with named capture buffers that would otherwise require C<(??{})>code to accomplish.)X<named capture buffer> X<regular expression, named capture buffer>X<%+> X<$+{name}> X<< \k<name> >>Examples: s/^([^ ]*) *([^ ]*)/$2 $1/; # swap first two words /(.)\1/ # find first doubled char and print "'$1' is the first doubled character\n"; /(?<char>.)\k<char>/ # ... a different way and print "'$+{char}' is the first doubled character\n"; /(?'char'.)\1/ # ... mix and match and print "'$1' is the first doubled character\n"; if (/Time: (..):(..):(..)/) { # parse out values $hours = $1; $minutes = $2; $seconds = $3; }Several special variables also refer back to portions of the previousmatch. C<$+> returns whatever the last bracket match matched.C<$&> returns the entire matched string. (At one point C<$0> didalso, but now it returns the name of the program.) C<$`> returnseverything before the matched string. C<$'> returns everythingafter the matched string. And C<$^N> contains whatever was matched bythe most-recently closed group (submatch). C<$^N> can be used inextended patterns (see below), for example to assign a submatch to avariable.X<$+> X<$^N> X<$&> X<$`> X<$'>The numbered match variables ($1, $2, $3, etc.) and the related punctuationset (C<$+>, C<$&>, C<$`>, C<$'>, and C<$^N>) are all dynamically scopeduntil the end of the enclosing block or until the next successfulmatch, whichever comes first. (See L<perlsyn/"Compound Statements">.)X<$+> X<$^N> X<$&> X<$`> X<$'>X<$1> X<$2> X<$3> X<$4> X<$5> X<$6> X<$7> X<$8> X<$9>B<NOTE>: Failed matches in Perl do not reset the match variables,which makes it easier to write code that tests for a series of morespecific cases and remembers the best match.B<WARNING>: Once Perl sees that you need one of C<$&>, C<$`>, orC<$'> anywhere in the program, it has to provide them for everypattern match. This may substantially slow your program. Perluses the same mechanism to produce $1, $2, etc, so you also pay aprice for each pattern that contains capturing parentheses. (Toavoid this cost while retaining the grouping behaviour, use theextended regular expression C<(?: ... )> instead.) But if you neveruse C<$&>, C<$`> or C<$'>, then patterns I<without> capturingparentheses will not be penalized. So avoid C<$&>, C<$'>, and C<$`>if you can, but if you can't (and some algorithms really appreciatethem), once you've used them once, use them at will, because you'vealready paid the price. As of 5.005, C<$&> is not so costly as theother two.X<$&> X<$`> X<$'>As a workaround for this problem, Perl 5.10.0 introduces C<${^PREMATCH}>,C<${^MATCH}> and C<${^POSTMATCH}>, which are equivalent to C<$`>, C<$&>and C<$'>, B<except> that they are only guaranteed to be defined after asuccessful match that was executed with the C</p> (preserve) modifier.The use of these variables incurs no global performance penalty, unliketheir punctuation char equivalents, however at the trade-off that youhave to tell perl when you want to use them.X</p> X<p modifier>Backslashed metacharacters in Perl are alphanumeric, such as C<\b>,C<\w>, C<\n>. Unlike some other regular expression languages, thereare no backslashed symbols that aren't alphanumeric. So anythingthat looks like \\, \(, \), \<, \>, \{, or \} is alwaysinterpreted as a literal character, not a metacharacter. This wasonce used in a common idiom to disable or quote the special meaningsof regular expression metacharacters in a string that you want touse for a pattern. Simply quote all non-"word" characters: $pattern =~ s/(\W)/\\$1/g;(If C<use locale> is set, then this depends on the current locale.)Today it is more common to use the quotemeta() function or the C<\Q>metaquoting escape sequence to disable all metacharacters' specialmeanings like this: /$unquoted\Q$quoted\E$unquoted/Beware that if you put literal backslashes (those not insideinterpolated variables) between C<\Q> and C<\E>, double-quotishbackslash interpolation may lead to confusing results. If youI<need> to use literal backslashes within C<\Q...\E>,consult L<perlop/"Gory details of parsing quoted constructs">.=head2 Extended PatternsPerl also defines a consistent extension syntax for features notfound in standard tools like B<awk> and B<lex>. The syntax is apair of parentheses with a question mark as the first thing withinthe parentheses. The character after the question mark indicatesthe extension.The stability of these extensions varies widely. Some have beenpart of the core language for many years. Others are experimentaland may change without warning or be completely removed. Checkthe documentation on an individual feature to verify its currentstatus.A question mark was chosen for this and for the minimal-matchingconstruct because 1) question marks are rare in older regularexpressions, and 2) whenever you see one, you should stop and"question" exactly what is going on. That's psychology...=over 10=item C<(?#text)>X<(?#)>A comment. The text is ignored. If the C</x> modifier enableswhitespace formatting, a simple C<#> will suffice. Note that Perl closesthe comment as soon as it sees a C<)>, so there is no way to put a literalC<)> in the comment.=item C<(?pimsx-imsx)>X<(?)>One or more embedded pattern-match modifiers, to be turned on (orturned off, if preceded by C<->) for the remainder of the pattern orthe remainder of the enclosing pattern group (if any). This isparticularly useful for dynamic patterns, such as those read in from aconfiguration file, taken from an argument, or specified in a tablesomewhere. Consider the case where some patterns want to be casesensitive and some do not: The case insensitive ones merely need toinclude C<(?i)> at the front of the pattern. For example: $pattern = "foobar"; if ( /$pattern/i ) { } # more flexible: $pattern = "(?i)foobar";
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?