📄 perlrecharclass.pod

📁 视频监控网络部分的协议ddns,的模块的实现代码,请大家大胆指正.
💻 POD
📖 第 1 页 / 共 2 页
字号:
上一页 12
escaped with a backslash, although this is sometimes not needed, in whichcase the backslash may be omitted.The sequence C<\b> is special inside a bracketed character class. Whileoutside the character class C<\b> is an assertion indicating a pointthat does not have either two word characters or two non-word characterson either side, inside a bracketed character class, C<\b> matches abackspace character.A C<[> is not special inside a character class, unless it's the startof a POSIX character class (see below). It normally does not need escaping.A C<]> is either the end of a POSIX character class (see below), or itsignals the end of the bracketed character class. Normally it needsescaping if you want to include a C<]> in the set of characters.However, if the C<]> is the I<first> (or the second if the firstcharacter is a caret) character of a bracketed character class, itdoes not denote the end of the class (as you cannot have an empty class)and is considered part of the set of characters that can be matched withoutescaping.Examples: "+"   =~ /[+?*]/     #  Match, "+" in a character class is not special. "\cH" =~ /[\b]/      #  Match, \b inside in a character class                      #  is equivalent with a backspace. "]"   =~ /[][]/      #  Match, as the character class contains.                      #  both [ and ]. "[]"  =~ /[[]]/      #  Match, the pattern contains a character class                      #  containing just ], and the character class is                      #  followed by a ].=head3 Character RangesIt is not uncommon to want to match a range of characters. Luckily, insteadof listing all the characters in the range, one may use the hyphen (C<->).If inside a bracketed character class you have two characters separatedby a hyphen, it's treated as if all the characters between the two are inthe class. For instance, C<[0-9]> matches any ASCII digit, and C<[a-m]>matches any lowercase letter from the first half of the ASCII alphabet.Note that the two characters on either side of the hyphen are notnecessary both letters or both digits. Any character is possible,although not advisable.  C<['-?]> contains a range of characters, butmost people will not know which characters that will be. Furthermore,such ranges may lead to portability problems if the code has to run ona platform that uses a different character set, such as EBCDIC.If a hyphen in a character class cannot be part of a range, for instancebecause it is the first or the last character of the character class,or if it immediately follows a range, the hyphen isn't special, and will beconsidered a character that may be matched. You have to escape the hyphenwith a backslash if you want to have a hyphen in your set of characters tobe matched, and its position in the class is such that it can be consideredpart of a range.Examples: [a-z]       #  Matches a character that is a lower case ASCII letter. [a-fz]      #  Matches any letter between 'a' and 'f' (inclusive) or the             #  letter 'z'. [-z]        #  Matches either a hyphen ('-') or the letter 'z'. [a-f-m]     #  Matches any letter between 'a' and 'f' (inclusive), the             #  hyphen ('-'), or the letter 'm'. ['-?]       #  Matches any of the characters  '()*+,-./0123456789:;<=>?             #  (But not on an EBCDIC platform).=head3 NegationIt is also possible to instead list the characters you do not want tomatch. You can do so by using a caret (C<^>) as the first character in thecharacter class. For instance, C<[^a-z]> matches a character that is not alowercase ASCII letter.This syntax make the caret a special character inside a bracketed characterclass, but only if it is the first character of the class. So if you wantto have the caret as one of the characters you want to match, you eitherhave to escape the caret, or not list it first.Examples: "e"  =~  /[^aeiou]/   #  No match, the 'e' is listed. "x"  =~  /[^aeiou]/   #  Match, as 'x' isn't a lowercase vowel. "^"  =~  /[^^]/       #  No match, matches anything that isn't a caret. "^"  =~  /[x^]/       #  Match, caret is not special here.=head3 Backslash SequencesYou can put a backslash sequence character class inside a bracketed characterclass, and it will act just as if you put all the characters matched bythe backslash sequence inside the character class. For instance,C<[a-f\d]> will match any digit, or any of the lowercase letters between'a' and 'f' inclusive.Examples: /[\p{Thai}\d]/     # Matches a character that is either a Thai                    # character, or a digit. /[^\p{Arabic}()]/  # Matches a character that is neither an Arabic                    # character, nor a parenthesis.Backslash sequence character classes cannot form one of the endpointsof a range.=head3 Posix Character ClassesPosix character classes have the form C<[:class:]>, where I<class> isname, and the C<[:> and C<:]> delimiters. Posix character classes appearI<inside> bracketed character classes, and are a convenient and descriptiveway of listing a group of characters. Be careful about the syntax, # Correct: $string =~ /[[:alpha:]]/ # Incorrect (will warn): $string =~ /[:alpha:]/The latter pattern would be a character class consisting of a colon,and the letters C<a>, C<l>, C<p> and C<h>.Perl recognizes the following POSIX character classes: alpha  Any alphabetical character. alnum  Any alphanumerical character. ascii  Any ASCII character. blank  A GNU extension, equal to a space or a horizontal tab (C<\t>). cntrl  Any control character. digit  Any digit, equivalent to C<\d>. graph  Any printable character, excluding a space. lower  Any lowercase character. print  Any printable character, including a space. punct  Any punctuation character. space  Any white space character. C<\s> plus the vertical tab (C<\cK>). upper  Any uppercase character. word   Any "word" character, equivalent to C<\w>. xdigit Any hexadecimal digit, '0' - '9', 'a' - 'f', 'A' - 'F'.The exact set of characters matched depends on whether the source stringis internally in UTF-8 format or not. See L</Locale, Unicode and UTF-8>.Most POSIX character classes have C<\p> counterparts. The differenceis that the C<\p> classes will always match according to the Unicodeproperties, regardless whether the string is in UTF-8 format or not.The following table shows the relation between POSIX character classesand the Unicode properties: [[:...:]]   \p{...}      backslash alpha       IsAlpha alnum       IsAlnum ascii       IsASCII blank cntrl       IsCntrl digit       IsDigit      \d graph       IsGraph lower       IsLower print       IsPrint punct       IsPunct space       IsSpace             IsSpacePerl  \s upper       IsUpper word        IsWord xdigit      IsXDigitSome character classes may have a non-obvious name:=over 4=item cntrlAny control character. Usually, control characters don't produce outputas such, but instead control the terminal somehow: for example newlineand backspace are control characters. All characters with C<ord()> lessthan 32 are usually classified as control characters (in ASCII, the ISOLatin character sets, and Unicode), as is the character C<ord()> valueof 127 (C<DEL>).=item graphAny character that is I<graphical>, that is, visible. This class consistsof all the alphanumerical characters and all punctuation characters.=item printAll printable characters, which is the set of all the graphical charactersplus the space.=item punctAny punctuation (special) character.=back=head4 NegationA Perl extension to the POSIX character class is the ability tonegate it. This is done by prefixing the class name with a caret (C<^>).Some examples: POSIX         Unicode       Backslash [[:^digit:]]  \P{IsDigit}   \D [[:^space:]]  \P{IsSpace}   \S [[:^word:]]   \P{IsWord}    \W=head4 [= =] and [. .]Perl will recognize the POSIX character classes C<[=class=]>, andC<[.class.]>, but does not (yet?) support this construct. Use ofsuch a constructs will lead to an error.=head4 Examples /[[:digit:]]/            # Matches a character that is a digit. /[01[:lower:]]/          # Matches a character that is either a                          # lowercase letter, or '0' or '1'. /[[:digit:][:^xdigit:]]/ # Matches a character that can be anything,                          # but the letters 'a' to 'f' in either case.                          # This is because the character class contains                          # all digits, and anything that isn't a                          # hex digit, resulting in a class containing                          # all characters, but the letters 'a' to 'f'                          # and 'A' to 'F'.=head2 Locale, Unicode and UTF-8Some of the character classes have a somewhat different behaviour dependingon the internal encoding of the source string, and the locale that isin effect.C<\w>, C<\d>, C<\s> and the POSIX character classes (and their negations,including C<\W>, C<\D>, C<\S>) suffer from this behaviour.The rule is that if the source string is in UTF-8 format, the characterclasses match according to the Unicode properties. If the source stringisn't, then the character classes match according to whatever locale isin effect. If there is no locale, they match the ASCII defaults(52 letters, 10 digits and underscore for C<\w>, 0 to 9 for C<\d>, etc).This usually means that if you are matching against characters whose C<ord()>values are between 128 and 255 inclusive, your character class may matchor not depending on the current locale, and whether the source string isin UTF-8 format. The string will be in UTF-8 format if it containscharacters whose C<ord()> value exceeds 255. But a string may be in UTF-8format without it having such characters.For portability reasons, it may be better to not use C<\w>, C<\d>, C<\s>or the POSIX character classes, and use the Unicode properties instead.=head4 Examples $str =  "\xDF";      # $str is not in UTF-8 format. $str =~ /^\w/;       # No match, as $str isn't in UTF-8 format. $str .= "\x{0e0b}";  # Now $str is in UTF-8 format. $str =~ /^\w/;       # Match! $str is now in UTF-8 format. chop $str; $str =~ /^\w/;       # Still a match! $str remains in UTF-8 format.=cut
上一页 12
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -