⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 ch10.htm

📁 prrl 5 programs codes in the book
💻 HTM
📖 第 1 页 / 共 5 页
字号:
</PRE>

</BLOCKQUOTE>

<P>

This program displays the following:

<BLOCKQUOTE>

<PRE>

String has root.

$scalar = The tree has many leaves

</PRE>

</BLOCKQUOTE>

<P>

The left operand of the binding operator is the string to be searched,

modified, or transformed; the right operand is the regular expression

operator to be evaluated. The complementary binding operator is

valid only when used with the matching regular expression operator.

If you use it with the substitution or translation operator, you

get the following message if you're using the <TT>-w</TT>

command-line option to run Perl:

<BLOCKQUOTE>

<PRE>

Useless use of not in void context at test.pl line 4.

</PRE>

</BLOCKQUOTE>

<P>

You can see that the <TT>!~</TT> is

the opposite of <TT>=~</TT> by replacing

the <TT>=~</TT> in the previous example:

<BLOCKQUOTE>

<PRE>

$scalar = &quot;The root has many leaves&quot;;

print(&quot;String has root.\n&quot;) if $scalar !~ m/root/;

$scalar =~ s/root/tree/;

$scalar =~ tr/h/H/;

print(&quot;\$scalar = $scalar\n&quot;);

</PRE>

</BLOCKQUOTE>

<P>

This program displays the following:

<BLOCKQUOTE>

<PRE>

$scalar = The tree has many leaves

</PRE>

</BLOCKQUOTE>

<P>

The first print line does not get executed because the complementary

binding operator returns false.

<H2><A NAME="HowtoCreatePatterns"><FONT SIZE=5 COLOR=#FF0000>

How to Create Patterns</FONT></A></H2>

<P>

So far in this chapter, you've read about the different operators

used with regular expressions, and you've seen how to match simple

sequeNCes of characters. Now we'll look at the wide array of meta-characters

that are used to harness the full power of regular expressions.

<I>Meta-characters</I> are characters that have an additional

meaning above and beyond their literal meaning. For example, the

period character can have two meanings in a pattern. First, it

can be used to match a period character in the searched string-this

is its <I>literal meaning</I>. And second, it can be used to match

<I>any</I> character in the searched string except for the newline

character-this is its <I>meta-meaning</I>.

<P>

When creating patterns, the meta-meaning always will be the default.

If you really intend to match the literal character, you need

to prefix the meta-character with a backslash. You might recall

that the backslash is used to create an escape sequeNCe.

<P>

For more information about escape sequeNCes, see <A HREF="ch2.htm" tppabs="http://cheminf.nankai.edu.cn/~eb~/Perl%205%20By%20Example/ch2.htm" >Chapter 2</A> &quot;Example:

Double Quoted Strings.&quot;

<P>

Patterns can have many different components. These components

all combine to provide you with the power to match any type of

string. The following list of components will give you a good

idea of the variety of ways that patterns can be created. The

section &quot;Pattern Examples&quot; later in this chapter shows

many examples of these rules in action.

<BLOCKQUOTE>

<B>Variable Interpolation:</B> Any variable is interpolated, and

the essentially new pattern then is evaluated as a regular expression.

Remember that only one level of interpolation is done. This means

that if the value of the variable iNCludes, for example, <TT>$scalar</TT>

as a string value, then <TT>$scalar</TT>

will not be interpolated. In addition, back-quotes do not interpolate

within double-quotes, and single-quotes do not stop interpolation

of variables when used within double-quotes.

</BLOCKQUOTE>

<BLOCKQUOTE>

<B>Self-Matching Characters:</B> Any character will match itself

unless it is a meta-character or one of <TT>$</TT>,

<TT>@</TT>, and <TT>&amp;</TT>.

The meta-characters are listed in Table 10.5, and the other characters

are used to begin variable names and fuNCtion calls. You can use

the backslash character to force Perl to match the literal meaning

of any character. For example, <TT>m/a/</TT>

will return true if the letter <TT>a</TT>

is in the <TT>$_</TT> variable. And

<TT>m/\$/</TT> will return true if

the character <TT>$</TT> is in the

<TT>$_</TT> variable.<BR>

</BLOCKQUOTE>

<P>

<CENTER><B>Table 10.5&nbsp;&nbsp;Regular Expression Meta-Characters,

Meta-Brackets, and Meta-SequeNCes</B></CENTER>

<p>

<CENTER>

<TABLE BORDERCOLOR=#000000 BORDER=1 WIDTH=80%>

<TR><TD WIDTH=133><CENTER><I>Meta-Character</I></CENTER></TD>

<TD WIDTH=457><I>Description</I></TD></TR>

<TR><TD WIDTH=133><CENTER>^</CENTER></TD><TD WIDTH=457>This meta-character-the caret-will match the beginning of a string or if the <TT>/m</TT> option is used, matches the beginning of a line. It is one of two pattern aNChors-the other aNChor is the 
<TT>$</TT>.

</TD></TR>

<TR><TD WIDTH=133><CENTER>.</CENTER></TD><TD WIDTH=457>This meta-character will match any character except for the new line unless the <TT>/s</TT> option is specified. If the <TT>/s</TT> option is specified, then the newline also will be matched.

</TD></TR>

<TR><TD WIDTH=133><CENTER>$</CENTER></TD><TD WIDTH=457>This meta-character will match the end of a string or if the <TT>/m</TT> option is used, matches the end of a line. It is one of two pattern aNChors-the other aNChor is the <TT>^</TT>.

</TD></TR>

<TR><TD WIDTH=133><CENTER>|</CENTER></TD><TD WIDTH=457>This meta-character-called <I>alternation</I>-lets you specify two values that can cause the match to succeed. For instaNCe, <TT>m/a|b/</TT> means that the <TT>$_</TT> variable must contain the 
<TT>&quot;a&quot;</TT> or <TT>&quot;b&quot;</TT> character for the match to succeed.

</TD></TR>

<TR><TD WIDTH=133><CENTER>*</CENTER></TD><TD WIDTH=457>This meta-character indicates that the &quot;thing&quot; immediately to the left should be matched 1 or more times in order to be evaluated as true.

</TD></TR>

<TR><TD WIDTH=133><CENTER>?</CENTER></TD><TD WIDTH=457>This meta-character indicates that the &quot;thing&quot; immediately to the left should be matched 0 or 1 times in order to be evaluated as true. When used in conjuNCtion with the <TT>+</TT>, 
<TT>_</TT>, <TT>?</TT>, or {<TT>n</TT>, <TT>m</TT>} meta-characters and brackets, it means that the regular expression should be non-greedy and match the smallest possible string.

</TD></TR>

<TR><TD WIDTH=133><CENTER><I>Meta-Brackets</I></CENTER></TD><TD WIDTH=457><I>Description</I>

</TD></TR>

<TR><TD WIDTH=133><CENTER>()</CENTER></TD><TD WIDTH=457>The parentheses let you affect the order of pattern evaluation and act as a form of pattern memory. See the section &quot;Pattern Memory&quot; later in this chapter for more information.

</TD></TR>

<TR><TD WIDTH=133><CENTER>(?...)</CENTER></TD><TD WIDTH=457>If a question mark immediately follows the left parentheses, it indicates that an extended mode component is being specified. See the section, &quot;Example: Extension Syntax,&quot; later in this 
chapter for more information.

</TD></TR>

<TR><TD WIDTH=133><CENTER>{n, m}</CENTER></TD><TD WIDTH=457>The curly braces specify how many times the &quot;thing&quot; immediately to the left should be matched. <TT>{n}</TT> means that it should be matched exactly n times. <TT>{n,}</TT> means it must 
be matched at least n times. <TT>{n, m}</TT> means that it must be matched at least n times and not more than m times.

</TD></TR>

<TR><TD WIDTH=133><CENTER>[]</CENTER></TD><TD WIDTH=457>The square brackets let you create a character class. For instaNCe, <TT>m/[abc]/</TT> will evaluate to true if any of <TT>&quot;a&quot;</TT>, <TT>&quot;b&quot;</TT>, or <TT>&quot;c&quot;</TT> is 
contained in <TT>$_</TT>. The square brackets are a more readable alternative to the alternation meta-character.

</TD></TR>

<TR><TD WIDTH=133><CENTER><I>Meta-SequeNCes</I></CENTER></TD>

<TD WIDTH=457><I>Description</I></TD></TR>

<TR><TD WIDTH=133><CENTER>\</CENTER></TD><TD WIDTH=457>This meta-character &quot;escapes&quot; the following character. This means that any special meaning normally attached to that character is ignored. For instaNCe, if you need to iNClude a dollar sign 
in a pattern, you must use <TT>\$</TT> to avoid Perl's variable interpolation. Use <TT>\\</TT> to specify the backslash character in your pattern.

</TD></TR>

<TR><TD WIDTH=133><CENTER>\0nnn</CENTER></TD><TD WIDTH=457>Any Octal byte.

</TD></TR>

<TR><TD WIDTH=133><CENTER>\a</CENTER></TD><TD WIDTH=457>Alarm.

</TD></TR>

<TR><TD WIDTH=133><CENTER>\A</CENTER></TD><TD WIDTH=457>This meta-sequeNCe represents the beginning of the string. Its meaning is not affected by the <TT>/m</TT> option.

</TD></TR>

<TR><TD WIDTH=133><CENTER>\b</CENTER></TD><TD WIDTH=457>This meta-sequeNCe represents the backspace character inside a character class; otherwise, it represents a <I>word boundary</I>. A word boundary is the spot between word (<TT>\w</TT>) and 
non-word(<TT>\W</TT>) characters. Perl thinks that the <TT>\W</TT> meta-sequeNCe matches the imaginary characters off the ends of the string.

</TD></TR>

<TR><TD WIDTH=133><CENTER>\B</CENTER></TD><TD WIDTH=457>Match a non-word boundary.

</TD></TR>

<TR><TD WIDTH=133><CENTER>\cn</CENTER></TD><TD WIDTH=457>Any control character.

</TD></TR>

<TR><TD WIDTH=133><CENTER>\d</CENTER></TD><TD WIDTH=457>Match a single digit character.

</TD></TR>

<TR><TD WIDTH=133><CENTER>\D</CENTER></TD><TD WIDTH=457>Match a single non-digit character.

</TD></TR>

<TR><TD WIDTH=133><CENTER>\e</CENTER></TD><TD WIDTH=457>Escape.

</TD></TR>

<TR><TD WIDTH=133><CENTER>\E</CENTER></TD><TD WIDTH=457>Terminate the <TT>\L</TT> or <TT>\U</TT> sequeNCe.

</TD></TR>

<TR><TD WIDTH=133><CENTER>\f</CENTER></TD><TD WIDTH=457>Form Feed.

</TD></TR>

<TR><TD WIDTH=133><CENTER>\G</CENTER></TD><TD WIDTH=457>Match only where the previous <TT>m//g</TT> left off.

</TD></TR>

<TR><TD WIDTH=133><CENTER>\l</CENTER></TD><TD WIDTH=457>Change the next character to lowercase.

</TD></TR>

<TR><TD WIDTH=133><CENTER>\L</CENTER></TD><TD WIDTH=457>Change the following characters to lowercase until a <TT>\E</TT> sequeNCe is eNCountered.

</TD></TR>

<TR><TD WIDTH=133><CENTER>\n</CENTER></TD><TD WIDTH=457>Newline.

</TD></TR>

<TR><TD WIDTH=133><CENTER>\Q</CENTER></TD><TD WIDTH=457>Quote Regular Expression meta-characters literally until the <TT>\E</TT> sequeNCe is eNCountered.

</TD></TR>

<TR><TD WIDTH=133><CENTER>\r</CENTER></TD><TD WIDTH=457>Carriage Return.

</TD></TR>

<TR><TD WIDTH=133><CENTER>\s</CENTER></TD><TD WIDTH=457>Match a single whitespace character.

</TD></TR>

<TR><TD WIDTH=133><CENTER>\S</CENTER></TD><TD WIDTH=457>Match a single non-whitespace character.

</TD></TR>

<TR><TD WIDTH=133><CENTER>\t</CENTER></TD><TD WIDTH=457>Tab.</TD>

</TR>

<TR><TD WIDTH=133><CENTER>\u</CENTER></TD><TD WIDTH=457>Change the next character to uppercase.

</TD></TR>

<TR><TD WIDTH=133><CENTER>\U</CENTER></TD><TD WIDTH=457>Change the following characters to uppercase until a <TT>\E</TT> sequeNCe is eNCountered.

</TD></TR>

<TR><TD WIDTH=133><CENTER>\v</CENTER></TD><TD WIDTH=457>Vertical Tab.

</TD></TR>

<TR><TD WIDTH=133><CENTER>\w</CENTER></TD><TD WIDTH=457>Match a single word character. Word characters are the alphanumeric and underscore characters.

</TD></TR>

<TR><TD WIDTH=133><CENTER>\W</CENTER></TD><TD WIDTH=457>Match a single non-word character.

</TD></TR>

<TR><TD WIDTH=133><CENTER>\xnn</CENTER></TD><TD WIDTH=457>Any Hexadecimal byte.

</TD></TR>

<TR><TD WIDTH=133><CENTER>\Z</CENTER></TD><TD WIDTH=457>This meta-sequeNCe represents the end of the string. Its meaning is not affected by the <TT>/m</TT> option.

</TD></TR>

<TR><TD WIDTH=133><CENTER>\$</CENTER></TD><TD WIDTH=457>Dollar Sign.

</TD></TR>

<TR><TD WIDTH=133><CENTER>\@</CENTER></TD><TD WIDTH=457>Ampersand.

</TD></TR>

</TABLE>

</CENTER>

<P>

<BLOCKQUOTE>

<B>Character SequeNCes:</B> A sequeNCe of characters will match

the identical sequeNCe in the searched string. The characters

need to be in the same order in both the pattern and the searched

string for the match to be true. For example, <TT>m/abc/;</TT>

will match <TT>&quot;abc&quot;</TT>

but not <TT>&quot;cab&quot;</TT> or

<TT>&quot;bca&quot;</TT>. If any character

in the sequeNCe is a meta-character, you need to use the backslash

to match its literal value.

</BLOCKQUOTE>

<BLOCKQUOTE>

<B>Alternation:</B> The <I>alternation </I>meta-character (<TT>|</TT>)

will let you match more than one possible string. For example,

<TT>m/a|b/;</TT> will match if either

the <TT>&quot;a&quot;</TT> character

or the <TT>&quot;b&quot;</TT> character

is in the searched string. You can usesequeNCes of more than one

character with alternation. For example, <TT>m/dog|cat/;</TT>

will match if either of the strings <TT>&quot;dog&quot;</TT>

or <TT>&quot;cat&quot;</TT> is in

the searched string.<BR>

</BLOCKQUOTE>

<p>

<CENTER>

<TABLE BORDERCOLOR=#000000 BORDER=1 WIDTH=80%>

<TR><TD><B>Tip</B> </TD></TR>

<TR><TD>

<BLOCKQUOTE>

Some programmers like to eNClose the alternation sequeNCe inside parentheses to help indicate where the sequeNCe begins and ends.</BLOCKQUOTE>

<BLOCKQUOTE>

<TT>m/(dog|cat)/;</TT>

</BLOCKQUOTE>

<BLOCKQUOTE>

However, this will affect something called <I>pattern memory</I>, which you'll be learning about in the section, &quot;Example: Pattern Memory,&quot; later in the chapter.

</BLOCKQUOTE>



</TD></TR>

</TABLE>

</CENTER>

<P>

<BLOCKQUOTE>

<B>Character Classes:</B> The square brackets are used to create

character classes. A <I>character class</I> is used to match a

specific type of character. For example, you can match any decimal

digit using <TT>m/[0123456789]/;</TT>.

This will match a single character in the range of zero to nine.

You can find more information about character classes in the section,

&quot;Example: Character Classes,&quot; later in this chapter.

</BLOCKQUOTE>

<BLOCKQUOTE>

<B>Symbolic Character Classes:</B> There are several character

classes that are used so frequently that they have a symbolic

representation. The period meta-character stands for a special

character class that matches all characters except for the newline.

The rest are <TT>\d</TT>, <TT>\D</TT>,

<TT>\s</TT>, <TT>\S</TT>,

<TT>\w</TT>, and <TT>\W</TT>.

These are mentioned in Table 10.5 earlier and are discussed in

the section, &quot;Example: Character Classes,&quot; later in

this chapter.

</BLOCKQUOTE>

<BLOCKQUOTE>

<B>ANChors:</B> The caret (<TT>^</TT>)

and the dollar sign meta-characters are used to aNChor a pattern

to the beginning and the end of the searched string. The caret

is always the first character in the pattern when used as an aNChor.

For example, <TT>m/^one/;</TT> will

only match if the searched string starts with a sequeNCe of characters,

<TT>one</TT>. The dollar sign is always

the last character in the pattern when used as an aNChor. For

example, <TT>m/(last|end)$/;</TT>

will match only if the searched string ends with either the character

sequeNCe <TT>last</TT> or the character

sequeNCe <TT>end</TT>. The <TT>\A</TT>

and <TT>\Z</TT> meta-sequeNCes also

are used as pattern aNChors for the beginning and end of strings.

</BLOCKQUOTE>

<BLOCKQUOTE>

<B>Quantifiers:</B> There are several meta-characters that are

devoted to controlling how many characters are matched. For example,

<TT>m/a{5}/;</TT> means that five

<TT>a</TT> characters must be found

before a true result can be returned. The <TT>*</TT>,

<TT>+</TT>, and <TT>?</TT>

meta-characters and the curly braces are all used as quantifiers.

See the section, &quot;Example: Quantifiers,&quot; later in this

chapter for more information.

</BLOCKQUOTE>

<BLOCKQUOTE>

<B>Pattern Memory:</B> Parentheses are used to store matched values

into buffers for later recall. I like to think of this as a form

of pattern memory. Some programmers call them back-refereNCes.

After you use <TT>m/(fish|fowl)/;</TT>

to match a string and a match is found, the variable <TT>$1</TT>

will hold either <TT>fish</TT> or

<TT>fowl</TT> depending on which sequeNCe

was matched. See the section, &quot;Example: Pattern Memory,&quot;

later in this chapter for more information.

</BLOCKQUOTE>

<BLOCKQUOTE>

<B>Word Boundaries:</B> The <TT>\b</TT>

meta-sequeNCe will match the spot between a space and the first

character of a word or between the last character of a word and

the space. The <TT>\b</TT> will match

at the beginning or end of a string if there are no leading or

trailing spaces. For example, <TT>m/\bfoo/;</TT>

will match <TT>foo</TT> even without

spaces surrounding the word. It also will match $<TT>foo</TT>

because the dollar sign is not considered a word character. The

statement <TT>m/foo\b/;</TT> will

match <TT>foo</TT> but not <TT>foobar</TT>,

and the statement <TT>m/\bwiz/;</TT>

will match <TT>wizard</TT> but not

<TT>geewiz</TT>. See the section,

&quot;Example: Character Classes,&quot; later in this chapter

for more information about word boundaries.

</BLOCKQUOTE>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -