perlre.pod

来自「MSYS在windows下模拟了一个类unix的终端」· POD 代码 · 共 1,286 行 · 第 1/4 页
POD
1,286 行
=head1 NAMEperlre - Perl regular expressions=head1 DESCRIPTIONThis page describes the syntax of regular expressions in Perl.  For adescription of how to I<use> regular expressions in matchingoperations, plus various examples of the same, see discussionsof C<m//>, C<s///>, C<qr//> and C<??> in L<perlop/"Regexp Quote-Like Operators">.Matching operations can have various modifiers.  Modifiersthat relate to the interpretation of the regular expression insideare listed below.  Modifiers that alter the way a regular expressionis used by Perl are detailed in L<perlop/"Regexp Quote-Like Operators"> and L<perlop/"Gory details of parsing quoted constructs">.=over 4=item iDo case-insensitive pattern matching.If C<use locale> is in effect, the case map is taken from the currentlocale.  See L<perllocale>.=item mTreat string as multiple lines.  That is, change "^" and "$" from matchingthe start or end of the string to matching the start or end of anyline anywhere within the string.=item sTreat string as single line.  That is, change "." to match any characterwhatsoever, even a newline, which normally it would not match.The C</s> and C</m> modifiers both override the C<$*> setting.  Thatis, no matter what C<$*> contains, C</s> without C</m> will force"^" to match only at the beginning of the string and "$" to matchonly at the end (or just before a newline at the end) of the string.Together, as /ms, they let the "." match any character whatsoever,while still allowing "^" and "$" to match, respectively, just afterand just before newlines within the string.=item xExtend your pattern's legibility by permitting whitespace and comments.=backThese are usually written as "the C</x> modifier", even though the delimiterin question might not really be a slash.  Any of thesemodifiers may also be embedded within the regular expression itself usingthe C<(?...)> construct.  See below.The C</x> modifier itself needs a little more explanation.  It tellsthe regular expression parser to ignore whitespace that is neitherbackslashed nor within a character class.  You can use this to break upyour regular expression into (slightly) more readable parts.  The C<#>character is also treated as a metacharacter introducing a comment,just as in ordinary Perl code.  This also means that if you want realwhitespace or C<#> characters in the pattern (outside a characterclass, where they are unaffected by C</x>), that you'll either have to escape them or encode them using octal or hex escapes.  Taken together,these features go a long way towards making Perl's regular expressionsmore readable.  Note that you have to be careful not to include thepattern delimiter in the comment--perl has no way of knowing you didnot intend to close the pattern early.  See the C-comment deletion codein L<perlop>.=head2 Regular ExpressionsThe patterns used in Perl pattern matching derive from supplied inthe Version 8 regex routines.  (The routines are derived(distantly) from Henry Spencer's freely redistributable reimplementationof the V8 routines.)  See L<Version 8 Regular Expressions> fordetails.In particular the following metacharacters have their standard I<egrep>-ishmeanings:    \	Quote the next metacharacter    ^	Match the beginning of the line    .	Match any character (except newline)    $	Match the end of the line (or before newline at the end)    |	Alternation    ()	Grouping    []	Character classBy default, the "^" character is guaranteed to match only thebeginning of the string, the "$" character only the end (or before thenewline at the end), and Perl does certain optimizations with theassumption that the string contains only one line.  Embedded newlineswill not be matched by "^" or "$".  You may, however, wish to treat astring as a multi-line buffer, such that the "^" will match after anynewline within the string, and "$" will match before any newline.  At thecost of a little more overhead, you can do this by using the /m modifieron the pattern match operator.  (Older programs did this by setting C<$*>,but this practice is now deprecated.)To simplify multi-line substitutions, the "." character never matches anewline unless you use the C</s> modifier, which in effect tells Perl to pretendthe string is a single line--even if it isn't.  The C</s> modifier alsooverrides the setting of C<$*>, in case you have some (badly behaved) oldercode that sets it in another module.The following standard quantifiers are recognized:    *	   Match 0 or more times    +	   Match 1 or more times    ?	   Match 1 or 0 times    {n}    Match exactly n times    {n,}   Match at least n times    {n,m}  Match at least n but not more than m times(If a curly bracket occurs in any other context, it is treatedas a regular character.)  The "*" modifier is equivalent to C<{0,}>, the "+"modifier to C<{1,}>, and the "?" modifier to C<{0,1}>.  n and m are limitedto integral values less than a preset limit defined when perl is built.This is usually 32766 on the most common platforms.  The actual limit canbe seen in the error message generated by code such as this:    $_ **= $_ , / {$_} / for 2 .. 42;By default, a quantified subpattern is "greedy", that is, it will match asmany times as possible (given a particular starting location) while stillallowing the rest of the pattern to match.  If you want it to match theminimum number of times possible, follow the quantifier with a "?".  Notethat the meanings don't change, just the "greediness":    *?	   Match 0 or more times    +?	   Match 1 or more times    ??	   Match 0 or 1 time    {n}?   Match exactly n times    {n,}?  Match at least n times    {n,m}? Match at least n but not more than m timesBecause patterns are processed as double quoted strings, the followingalso work:    \t		tab                   (HT, TAB)    \n		newline               (LF, NL)    \r		return                (CR)    \f		form feed             (FF)    \a		alarm (bell)          (BEL)    \e		escape (think troff)  (ESC)    \033	octal char (think of a PDP-11)    \x1B	hex char    \x{263a}	wide hex char         (Unicode SMILEY)    \c[		control char    \N{name}	named char    \l		lowercase next char (think vi)    \u		uppercase next char (think vi)    \L		lowercase till \E (think vi)    \U		uppercase till \E (think vi)    \E		end case modification (think vi)    \Q		quote (disable) pattern metacharacters till \EIf C<use locale> is in effect, the case map used by C<\l>, C<\L>, C<\u>and C<\U> is taken from the current locale.  See L<perllocale>.  Fordocumentation of C<\N{name}>, see L<charnames>.You cannot include a literal C<$> or C<@> within a C<\Q> sequence.An unescaped C<$> or C<@> interpolates the corresponding variable,while escaping will cause the literal string C<\$> to be matched.You'll need to write something like C<m/\Quser\E\@\Qhost/>.In addition, Perl defines the following:    \w	Match a "word" character (alphanumeric plus "_")    \W	Match a non-"word" character    \s	Match a whitespace character    \S	Match a non-whitespace character    \d	Match a digit character    \D	Match a non-digit character    \pP	Match P, named property.  Use \p{Prop} for longer names.    \PP	Match non-P    \X	Match eXtended Unicode "combining character sequence",        equivalent to C<(?:\PM\pM*)>    \C	Match a single C char (octet) even under utf8.A C<\w> matches a single alphanumeric character or C<_>, not a whole word.Use C<\w+> to match a string of Perl-identifier characters (which isn't the same as matching an English word).  If C<use locale> is in effect, thelist of alphabetic characters generated by C<\w> is taken from thecurrent locale.  See L<perllocale>.  You may use C<\w>, C<\W>, C<\s>, C<\S>,C<\d>, and C<\D> within character classes, but if you try to use themas endpoints of a range, that's not a range, the "-" is understood literally.See L<utf8> for details about C<\pP>, C<\PP>, and C<\X>.The POSIX character class syntax    [:class:]is also available.  The available classes and their backslashequivalents (if available) are as follows:    alpha    alnum    ascii    blank		[1]    cntrl    digit       \d    graph    lower    print    punct    space       \s	[2]    upper    word        \w	[3]    xdigit  [1] A GNU extension equivalent to C<[ \t]>, `all horizontal whitespace'.  [2] Not I<exactly equivalent> to C<\s> since the C<[[:space:]]> includes      also the (very rare) `vertical tabulator', "\ck", chr(11).  [3] A Perl extension. For example use C<[:upper:]> to match all the uppercase characters.Note that the C<[]> are part of the C<[::]> construct, not part of thewhole character class.  For example:    [01[:alpha:]%]matches zero, one, any alphabetic character, and the percentage sign.If the C<utf8> pragma is used, the following equivalences to Unicode\p{} constructs and equivalent backslash character classes (if available),will hold:    alpha       IsAlpha    alnum       IsAlnum    ascii       IsASCII    blank	IsSpace    cntrl       IsCntrl    digit       IsDigit        \d    graph       IsGraph    lower       IsLower    print       IsPrint    punct       IsPunct    space       IsSpace                IsSpacePerl    \s    upper       IsUpper    word        IsWord    xdigit      IsXDigitFor example C<[:lower:]> and C<\p{IsLower}> are equivalent.If the C<utf8> pragma is not used but the C<locale> pragma is, theclasses correlate with the usual isalpha(3) interface (except for`word' and `blank').The assumedly non-obviously named classes are:=over 4=item cntrlAny control character.  Usually characters that don't produce output assuch but instead control the terminal somehow: for example newline andbackspace are control characters.  All characters with ord() less than32 are most often classified as control characters (assuming ASCII,the ISO Latin character sets, and Unicode).=item graphAny alphanumeric or punctuation (special) character.=item printAny alphanumeric or punctuation (special) character or space.=item punctAny punctuation (special) character.=item xdigitAny hexadecimal digit.  Though this may feel silly ([0-9A-Fa-f] wouldwork just fine) it is included for completeness.=backYou can negate the [::] character classes by prefixing the class namewith a '^'. This is a Perl extension.  For example:    POSIX	trad. Perl  utf8 Perl    [:^digit:]      \D      \P{IsDigit}    [:^space:]	    \S	    \P{IsSpace}    [:^word:]	    \W	    \P{IsWord}The POSIX character classes [.cc.] and [=cc=] are recognized butB<not> supported and trying to use them will cause an error.Perl defines the following zero-width assertions:    \b	Match a word boundary    \B	Match a non-(word boundary)    \A	Match only at beginning of string    \Z	Match only at end of string, or before newline at the end    \z	Match only at end of string    \G	Match only at pos() (e.g. at the end-of-match position        of prior m//g)A word boundary (C<\b>) is a spot between two charactersthat has a C<\w> on one side of it and a C<\W> on the other sideof it (in either order), counting the imaginary characters off thebeginning and end of the string as matching a C<\W>.  (Withincharacter classes C<\b> represents backspace rather than a wordboundary, just as it normally does in any double-quoted string.)The C<\A> and C<\Z> are just like "^" and "$", except that theywon't match multiple times when the C</m> modifier is used, while"^" and "$" will match at every internal line boundary.  To matchthe actual end of the string and not ignore an optional trailingnewline, use C<\z>.The C<\G> assertion can be used to chain global matches (usingC<m//g>), as described in L<perlop/"Regexp Quote-Like Operators">.It is also useful when writing C<lex>-like scanners, when you haveseveral patterns that you want to match against consequent substringsof your string, see the previous reference.  The actual location
perlre.pod - 源码说明

本页面展示了「MSYS在windows下模拟了一个类unix的终端」中的 perlre.pod 源码文件，采用 POD 编程语言编写，共 1,286 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫下载站收录了大量与windows相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?