📄 ch10.htm
字号:
</PRE>
</BLOCKQUOTE>
<P>
This program displays the following:
<BLOCKQUOTE>
<PRE>
String has root.
$scalar = The tree has many leaves
</PRE>
</BLOCKQUOTE>
<P>
The left operand of the binding operator is the string to be searched,
modified, or transformed; the right operand is the regular expression
operator to be evaluated. The complementary binding operator is
valid only when used with the matching regular expression operator.
If you use it with the substitution or translation operator, you
get the following message if you're using the <TT>-w</TT>
command-line option to run Perl:
<BLOCKQUOTE>
<PRE>
Useless use of not in void context at test.pl line 4.
</PRE>
</BLOCKQUOTE>
<P>
You can see that the <TT>!~</TT> is
the opposite of <TT>=~</TT> by replacing
the <TT>=~</TT> in the previous example:
<BLOCKQUOTE>
<PRE>
$scalar = "The root has many leaves";
print("String has root.\n") if $scalar !~ m/root/;
$scalar =~ s/root/tree/;
$scalar =~ tr/h/H/;
print("\$scalar = $scalar\n");
</PRE>
</BLOCKQUOTE>
<P>
This program displays the following:
<BLOCKQUOTE>
<PRE>
$scalar = The tree has many leaves
</PRE>
</BLOCKQUOTE>
<P>
The first print line does not get executed because the complementary
binding operator returns false.
<H2><A NAME="HowtoCreatePatterns"><FONT SIZE=5 COLOR=#FF0000>
How to Create Patterns</FONT></A></H2>
<P>
So far in this chapter, you've read about the different operators
used with regular expressions, and you've seen how to match simple
sequeNCes of characters. Now we'll look at the wide array of meta-characters
that are used to harness the full power of regular expressions.
<I>Meta-characters</I> are characters that have an additional
meaning above and beyond their literal meaning. For example, the
period character can have two meanings in a pattern. First, it
can be used to match a period character in the searched string-this
is its <I>literal meaning</I>. And second, it can be used to match
<I>any</I> character in the searched string except for the newline
character-this is its <I>meta-meaning</I>.
<P>
When creating patterns, the meta-meaning always will be the default.
If you really intend to match the literal character, you need
to prefix the meta-character with a backslash. You might recall
that the backslash is used to create an escape sequeNCe.
<P>
For more information about escape sequeNCes, see <A HREF="ch2.htm" tppabs="http://cheminf.nankai.edu.cn/~eb~/Perl%205%20By%20Example/ch2.htm" >Chapter 2</A> "Example:
Double Quoted Strings."
<P>
Patterns can have many different components. These components
all combine to provide you with the power to match any type of
string. The following list of components will give you a good
idea of the variety of ways that patterns can be created. The
section "Pattern Examples" later in this chapter shows
many examples of these rules in action.
<BLOCKQUOTE>
<B>Variable Interpolation:</B> Any variable is interpolated, and
the essentially new pattern then is evaluated as a regular expression.
Remember that only one level of interpolation is done. This means
that if the value of the variable iNCludes, for example, <TT>$scalar</TT>
as a string value, then <TT>$scalar</TT>
will not be interpolated. In addition, back-quotes do not interpolate
within double-quotes, and single-quotes do not stop interpolation
of variables when used within double-quotes.
</BLOCKQUOTE>
<BLOCKQUOTE>
<B>Self-Matching Characters:</B> Any character will match itself
unless it is a meta-character or one of <TT>$</TT>,
<TT>@</TT>, and <TT>&</TT>.
The meta-characters are listed in Table 10.5, and the other characters
are used to begin variable names and fuNCtion calls. You can use
the backslash character to force Perl to match the literal meaning
of any character. For example, <TT>m/a/</TT>
will return true if the letter <TT>a</TT>
is in the <TT>$_</TT> variable. And
<TT>m/\$/</TT> will return true if
the character <TT>$</TT> is in the
<TT>$_</TT> variable.<BR>
</BLOCKQUOTE>
<P>
<CENTER><B>Table 10.5 Regular Expression Meta-Characters,
Meta-Brackets, and Meta-SequeNCes</B></CENTER>
<p>
<CENTER>
<TABLE BORDERCOLOR=#000000 BORDER=1 WIDTH=80%>
<TR><TD WIDTH=133><CENTER><I>Meta-Character</I></CENTER></TD>
<TD WIDTH=457><I>Description</I></TD></TR>
<TR><TD WIDTH=133><CENTER>^</CENTER></TD><TD WIDTH=457>This meta-character-the caret-will match the beginning of a string or if the <TT>/m</TT> option is used, matches the beginning of a line. It is one of two pattern aNChors-the other aNChor is the
<TT>$</TT>.
</TD></TR>
<TR><TD WIDTH=133><CENTER>.</CENTER></TD><TD WIDTH=457>This meta-character will match any character except for the new line unless the <TT>/s</TT> option is specified. If the <TT>/s</TT> option is specified, then the newline also will be matched.
</TD></TR>
<TR><TD WIDTH=133><CENTER>$</CENTER></TD><TD WIDTH=457>This meta-character will match the end of a string or if the <TT>/m</TT> option is used, matches the end of a line. It is one of two pattern aNChors-the other aNChor is the <TT>^</TT>.
</TD></TR>
<TR><TD WIDTH=133><CENTER>|</CENTER></TD><TD WIDTH=457>This meta-character-called <I>alternation</I>-lets you specify two values that can cause the match to succeed. For instaNCe, <TT>m/a|b/</TT> means that the <TT>$_</TT> variable must contain the
<TT>"a"</TT> or <TT>"b"</TT> character for the match to succeed.
</TD></TR>
<TR><TD WIDTH=133><CENTER>*</CENTER></TD><TD WIDTH=457>This meta-character indicates that the "thing" immediately to the left should be matched 1 or more times in order to be evaluated as true.
</TD></TR>
<TR><TD WIDTH=133><CENTER>?</CENTER></TD><TD WIDTH=457>This meta-character indicates that the "thing" immediately to the left should be matched 0 or 1 times in order to be evaluated as true. When used in conjuNCtion with the <TT>+</TT>,
<TT>_</TT>, <TT>?</TT>, or {<TT>n</TT>, <TT>m</TT>} meta-characters and brackets, it means that the regular expression should be non-greedy and match the smallest possible string.
</TD></TR>
<TR><TD WIDTH=133><CENTER><I>Meta-Brackets</I></CENTER></TD><TD WIDTH=457><I>Description</I>
</TD></TR>
<TR><TD WIDTH=133><CENTER>()</CENTER></TD><TD WIDTH=457>The parentheses let you affect the order of pattern evaluation and act as a form of pattern memory. See the section "Pattern Memory" later in this chapter for more information.
</TD></TR>
<TR><TD WIDTH=133><CENTER>(?...)</CENTER></TD><TD WIDTH=457>If a question mark immediately follows the left parentheses, it indicates that an extended mode component is being specified. See the section, "Example: Extension Syntax," later in this
chapter for more information.
</TD></TR>
<TR><TD WIDTH=133><CENTER>{n, m}</CENTER></TD><TD WIDTH=457>The curly braces specify how many times the "thing" immediately to the left should be matched. <TT>{n}</TT> means that it should be matched exactly n times. <TT>{n,}</TT> means it must
be matched at least n times. <TT>{n, m}</TT> means that it must be matched at least n times and not more than m times.
</TD></TR>
<TR><TD WIDTH=133><CENTER>[]</CENTER></TD><TD WIDTH=457>The square brackets let you create a character class. For instaNCe, <TT>m/[abc]/</TT> will evaluate to true if any of <TT>"a"</TT>, <TT>"b"</TT>, or <TT>"c"</TT> is
contained in <TT>$_</TT>. The square brackets are a more readable alternative to the alternation meta-character.
</TD></TR>
<TR><TD WIDTH=133><CENTER><I>Meta-SequeNCes</I></CENTER></TD>
<TD WIDTH=457><I>Description</I></TD></TR>
<TR><TD WIDTH=133><CENTER>\</CENTER></TD><TD WIDTH=457>This meta-character "escapes" the following character. This means that any special meaning normally attached to that character is ignored. For instaNCe, if you need to iNClude a dollar sign
in a pattern, you must use <TT>\$</TT> to avoid Perl's variable interpolation. Use <TT>\\</TT> to specify the backslash character in your pattern.
</TD></TR>
<TR><TD WIDTH=133><CENTER>\0nnn</CENTER></TD><TD WIDTH=457>Any Octal byte.
</TD></TR>
<TR><TD WIDTH=133><CENTER>\a</CENTER></TD><TD WIDTH=457>Alarm.
</TD></TR>
<TR><TD WIDTH=133><CENTER>\A</CENTER></TD><TD WIDTH=457>This meta-sequeNCe represents the beginning of the string. Its meaning is not affected by the <TT>/m</TT> option.
</TD></TR>
<TR><TD WIDTH=133><CENTER>\b</CENTER></TD><TD WIDTH=457>This meta-sequeNCe represents the backspace character inside a character class; otherwise, it represents a <I>word boundary</I>. A word boundary is the spot between word (<TT>\w</TT>) and
non-word(<TT>\W</TT>) characters. Perl thinks that the <TT>\W</TT> meta-sequeNCe matches the imaginary characters off the ends of the string.
</TD></TR>
<TR><TD WIDTH=133><CENTER>\B</CENTER></TD><TD WIDTH=457>Match a non-word boundary.
</TD></TR>
<TR><TD WIDTH=133><CENTER>\cn</CENTER></TD><TD WIDTH=457>Any control character.
</TD></TR>
<TR><TD WIDTH=133><CENTER>\d</CENTER></TD><TD WIDTH=457>Match a single digit character.
</TD></TR>
<TR><TD WIDTH=133><CENTER>\D</CENTER></TD><TD WIDTH=457>Match a single non-digit character.
</TD></TR>
<TR><TD WIDTH=133><CENTER>\e</CENTER></TD><TD WIDTH=457>Escape.
</TD></TR>
<TR><TD WIDTH=133><CENTER>\E</CENTER></TD><TD WIDTH=457>Terminate the <TT>\L</TT> or <TT>\U</TT> sequeNCe.
</TD></TR>
<TR><TD WIDTH=133><CENTER>\f</CENTER></TD><TD WIDTH=457>Form Feed.
</TD></TR>
<TR><TD WIDTH=133><CENTER>\G</CENTER></TD><TD WIDTH=457>Match only where the previous <TT>m//g</TT> left off.
</TD></TR>
<TR><TD WIDTH=133><CENTER>\l</CENTER></TD><TD WIDTH=457>Change the next character to lowercase.
</TD></TR>
<TR><TD WIDTH=133><CENTER>\L</CENTER></TD><TD WIDTH=457>Change the following characters to lowercase until a <TT>\E</TT> sequeNCe is eNCountered.
</TD></TR>
<TR><TD WIDTH=133><CENTER>\n</CENTER></TD><TD WIDTH=457>Newline.
</TD></TR>
<TR><TD WIDTH=133><CENTER>\Q</CENTER></TD><TD WIDTH=457>Quote Regular Expression meta-characters literally until the <TT>\E</TT> sequeNCe is eNCountered.
</TD></TR>
<TR><TD WIDTH=133><CENTER>\r</CENTER></TD><TD WIDTH=457>Carriage Return.
</TD></TR>
<TR><TD WIDTH=133><CENTER>\s</CENTER></TD><TD WIDTH=457>Match a single whitespace character.
</TD></TR>
<TR><TD WIDTH=133><CENTER>\S</CENTER></TD><TD WIDTH=457>Match a single non-whitespace character.
</TD></TR>
<TR><TD WIDTH=133><CENTER>\t</CENTER></TD><TD WIDTH=457>Tab.</TD>
</TR>
<TR><TD WIDTH=133><CENTER>\u</CENTER></TD><TD WIDTH=457>Change the next character to uppercase.
</TD></TR>
<TR><TD WIDTH=133><CENTER>\U</CENTER></TD><TD WIDTH=457>Change the following characters to uppercase until a <TT>\E</TT> sequeNCe is eNCountered.
</TD></TR>
<TR><TD WIDTH=133><CENTER>\v</CENTER></TD><TD WIDTH=457>Vertical Tab.
</TD></TR>
<TR><TD WIDTH=133><CENTER>\w</CENTER></TD><TD WIDTH=457>Match a single word character. Word characters are the alphanumeric and underscore characters.
</TD></TR>
<TR><TD WIDTH=133><CENTER>\W</CENTER></TD><TD WIDTH=457>Match a single non-word character.
</TD></TR>
<TR><TD WIDTH=133><CENTER>\xnn</CENTER></TD><TD WIDTH=457>Any Hexadecimal byte.
</TD></TR>
<TR><TD WIDTH=133><CENTER>\Z</CENTER></TD><TD WIDTH=457>This meta-sequeNCe represents the end of the string. Its meaning is not affected by the <TT>/m</TT> option.
</TD></TR>
<TR><TD WIDTH=133><CENTER>\$</CENTER></TD><TD WIDTH=457>Dollar Sign.
</TD></TR>
<TR><TD WIDTH=133><CENTER>\@</CENTER></TD><TD WIDTH=457>Ampersand.
</TD></TR>
</TABLE>
</CENTER>
<P>
<BLOCKQUOTE>
<B>Character SequeNCes:</B> A sequeNCe of characters will match
the identical sequeNCe in the searched string. The characters
need to be in the same order in both the pattern and the searched
string for the match to be true. For example, <TT>m/abc/;</TT>
will match <TT>"abc"</TT>
but not <TT>"cab"</TT> or
<TT>"bca"</TT>. If any character
in the sequeNCe is a meta-character, you need to use the backslash
to match its literal value.
</BLOCKQUOTE>
<BLOCKQUOTE>
<B>Alternation:</B> The <I>alternation </I>meta-character (<TT>|</TT>)
will let you match more than one possible string. For example,
<TT>m/a|b/;</TT> will match if either
the <TT>"a"</TT> character
or the <TT>"b"</TT> character
is in the searched string. You can usesequeNCes of more than one
character with alternation. For example, <TT>m/dog|cat/;</TT>
will match if either of the strings <TT>"dog"</TT>
or <TT>"cat"</TT> is in
the searched string.<BR>
</BLOCKQUOTE>
<p>
<CENTER>
<TABLE BORDERCOLOR=#000000 BORDER=1 WIDTH=80%>
<TR><TD><B>Tip</B> </TD></TR>
<TR><TD>
<BLOCKQUOTE>
Some programmers like to eNClose the alternation sequeNCe inside parentheses to help indicate where the sequeNCe begins and ends.</BLOCKQUOTE>
<BLOCKQUOTE>
<TT>m/(dog|cat)/;</TT>
</BLOCKQUOTE>
<BLOCKQUOTE>
However, this will affect something called <I>pattern memory</I>, which you'll be learning about in the section, "Example: Pattern Memory," later in the chapter.
</BLOCKQUOTE>
</TD></TR>
</TABLE>
</CENTER>
<P>
<BLOCKQUOTE>
<B>Character Classes:</B> The square brackets are used to create
character classes. A <I>character class</I> is used to match a
specific type of character. For example, you can match any decimal
digit using <TT>m/[0123456789]/;</TT>.
This will match a single character in the range of zero to nine.
You can find more information about character classes in the section,
"Example: Character Classes," later in this chapter.
</BLOCKQUOTE>
<BLOCKQUOTE>
<B>Symbolic Character Classes:</B> There are several character
classes that are used so frequently that they have a symbolic
representation. The period meta-character stands for a special
character class that matches all characters except for the newline.
The rest are <TT>\d</TT>, <TT>\D</TT>,
<TT>\s</TT>, <TT>\S</TT>,
<TT>\w</TT>, and <TT>\W</TT>.
These are mentioned in Table 10.5 earlier and are discussed in
the section, "Example: Character Classes," later in
this chapter.
</BLOCKQUOTE>
<BLOCKQUOTE>
<B>ANChors:</B> The caret (<TT>^</TT>)
and the dollar sign meta-characters are used to aNChor a pattern
to the beginning and the end of the searched string. The caret
is always the first character in the pattern when used as an aNChor.
For example, <TT>m/^one/;</TT> will
only match if the searched string starts with a sequeNCe of characters,
<TT>one</TT>. The dollar sign is always
the last character in the pattern when used as an aNChor. For
example, <TT>m/(last|end)$/;</TT>
will match only if the searched string ends with either the character
sequeNCe <TT>last</TT> or the character
sequeNCe <TT>end</TT>. The <TT>\A</TT>
and <TT>\Z</TT> meta-sequeNCes also
are used as pattern aNChors for the beginning and end of strings.
</BLOCKQUOTE>
<BLOCKQUOTE>
<B>Quantifiers:</B> There are several meta-characters that are
devoted to controlling how many characters are matched. For example,
<TT>m/a{5}/;</TT> means that five
<TT>a</TT> characters must be found
before a true result can be returned. The <TT>*</TT>,
<TT>+</TT>, and <TT>?</TT>
meta-characters and the curly braces are all used as quantifiers.
See the section, "Example: Quantifiers," later in this
chapter for more information.
</BLOCKQUOTE>
<BLOCKQUOTE>
<B>Pattern Memory:</B> Parentheses are used to store matched values
into buffers for later recall. I like to think of this as a form
of pattern memory. Some programmers call them back-refereNCes.
After you use <TT>m/(fish|fowl)/;</TT>
to match a string and a match is found, the variable <TT>$1</TT>
will hold either <TT>fish</TT> or
<TT>fowl</TT> depending on which sequeNCe
was matched. See the section, "Example: Pattern Memory,"
later in this chapter for more information.
</BLOCKQUOTE>
<BLOCKQUOTE>
<B>Word Boundaries:</B> The <TT>\b</TT>
meta-sequeNCe will match the spot between a space and the first
character of a word or between the last character of a word and
the space. The <TT>\b</TT> will match
at the beginning or end of a string if there are no leading or
trailing spaces. For example, <TT>m/\bfoo/;</TT>
will match <TT>foo</TT> even without
spaces surrounding the word. It also will match $<TT>foo</TT>
because the dollar sign is not considered a word character. The
statement <TT>m/foo\b/;</TT> will
match <TT>foo</TT> but not <TT>foobar</TT>,
and the statement <TT>m/\bwiz/;</TT>
will match <TT>wizard</TT> but not
<TT>geewiz</TT>. See the section,
"Example: Character Classes," later in this chapter
for more information about word boundaries.
</BLOCKQUOTE>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -