📄 ch10.htm
字号:
</PRE></BLOCKQUOTE><P>will display<BLOCKQUOTE><PRE>AAA</PRE></BLOCKQUOTE><P>You can use as many buffers as you need. Each time you add a setof parentheses, another buffer is used. If you want to find allthe words in the string, you need to use the /g match option.In order to find all the words, you can use a loop statement thatloops until the match operator returns false.<BLOCKQUOTE><PRE>$_ = "AAA BBB ccC";while (m/(\w+)/g) { print("$1\n");}</PRE></BLOCKQUOTE><P>The program will display<BLOCKQUOTE><PRE>AAABBBccC</PRE></BLOCKQUOTE><P>If looping through the matches is not the right approach for yourneeds, perhaps you need to create an array consisting of the matches.<BLOCKQUOTE><PRE>$_ = "AAA BBB ccC";@matches = m/(\w+)/g;print("@matches\n");</PRE></BLOCKQUOTE><P>The program will display<BLOCKQUOTE><PRE>AAA BBB ccC</PRE></BLOCKQUOTE><P>Perl also has a few special variables to help you know what matchedand what did not. These variables occasionally will save you fromhaving to add parentheses to find information.<BR><p><CENTER><TABLE BORDERCOLOR=#000000 BORDER=1 WIDTH=80%><TR><TD WIDTH=61><BLOCKQUOTE><CENTER><TT><B><FONT FACE="Courier">$+</FONT></B></TT></CENTER></BLOCKQUOTE></TD><TD WIDTH=529><BLOCKQUOTE>This variable is assigned the value that the last bracket match matched.</BLOCKQUOTE></TD></TR><TR><TD WIDTH=61><BLOCKQUOTE><CENTER><TT><B><FONT FACE="Courier">$&</FONT></B></TT></CENTER></BLOCKQUOTE></TD><TD WIDTH=529><BLOCKQUOTE>This variable is assigned the value of the entire matched string. If the match is not successful, then <TT>$&</TT> retains its value from the last successful match.</BLOCKQUOTE></TD></TR><TR><TD WIDTH=61><BLOCKQUOTE><CENTER><TT><B><FONT FACE="Courier">$`</FONT></B></TT></CENTER></BLOCKQUOTE></TD><TD WIDTH=529><BLOCKQUOTE>This variable is assigned everything in the searched string that is before the matched string.</BLOCKQUOTE></TD></TR><TR><TD WIDTH=61><BLOCKQUOTE><CENTER><TT><B><FONT FACE="Courier">$'</FONT></B></TT></CENTER></BLOCKQUOTE></TD><TD WIDTH=529><BLOCKQUOTE>This variable is assigned everything in the search string that is after the matched string.</BLOCKQUOTE></TD></TR></TABLE></CENTER><P><p><CENTER><TABLE BORDERCOLOR=#000000 BORDER=1 WIDTH=80%><TR><TD><B>Tip</B></TD></TR><TR><TD><BLOCKQUOTE>If you need to save the value of the matched strings stored in the pattern memory, make sure to assign them to other variables. Pattern memory is local to the eNClosing block and lasts only until another match is done.</BLOCKQUOTE></TD></TR></TABLE></CENTER><P><H3><A NAME="ExamplePatternPrecedeNCe">Example: Pattern PrecedeNCe</A></H3><P>Pattern components have an order of precedeNCe just as operatorsdo. If you see the following pattern:<BLOCKQUOTE><PRE>m/a|b+/</PRE></BLOCKQUOTE><P>it's hard to tell if the pattern should be<BLOCKQUOTE><PRE>m/(a|b)+/ # match either the "a" character repeated one # or more times or the "b" character repeated one # or more times.</PRE></BLOCKQUOTE><P>or<BLOCKQUOTE><PRE>m/a|(b+)/ # match either the "a" character or the "b" character # repeated one or more times.</PRE></BLOCKQUOTE><P>The order of precedeNCe shown in Table 10.7 is designed to solveproblems like this. By looking at the table, you can see thatquantifiers have a higher precedeNCe than alternation. Therefore,the second interpretation is correct.<BR><P><CENTER><B>Table 10.7 The Pattern Component Order ofPrecedeNCe</B></CENTER><p><CENTER><TABLE BORDERCOLOR=#000000 BORDER=1 WIDTH=80%><TR><TD WIDTH=145><CENTER><I>PrecedeNCe Level</I></CENTER></TD><TD WIDTH=204><I>Component</I></TD></TR><TR><TD WIDTH=145><CENTER>1</CENTER></TD><TD WIDTH=204>Parentheses</TD></TR><TR><TD WIDTH=145><CENTER>2</CENTER></TD><TD WIDTH=204>Quantifiers</TD></TR><TR><TD WIDTH=145><CENTER>3</CENTER></TD><TD WIDTH=204>SequeNCes and ANChors</TD></TR><TR><TD WIDTH=145><CENTER>4</CENTER></TD><TD WIDTH=204>Alternation</TD></TR></TABLE></CENTER><P><p><CENTER><TABLE BORDERCOLOR=#000000 BORDER=1 WIDTH=80%><TR><TD><B>Tip</B></TD></TR><TR><TD><BLOCKQUOTE>You can use parentheses to affect the order in which components are evaluated because they have the highest precedeNCe. However, unless you use the extended syntax, you will be affecting the pattern memory.</BLOCKQUOTE></TD></TR></TABLE></CENTER><P><H3><A NAME="ExampleExtensionSyntax">Example: Extension Syntax</A></H3><P>The regular expression extensions are a way to significantly addto the power of patterns without adding a lot of meta-charactersto the proliferation that already exists. By using the basic (<TT>?...</TT>)notation, the regular expression capabilities can be greatly extended.<P>At this time, Perl recognizes five extensions. These vary widelyin fuNCtionality-from adding comments to setting options. Table10.8 lists the extensions and gives a short description of each.<BR><P><CENTER><B>Table 10.8 Five Extension Components</B></CENTER><p><CENTER><TABLE BORDERCOLOR=#000000 BORDER=1 WIDTH=80%><TR><TD WIDTH=127><CENTER><I>Extension</I></CENTER></TD><TD WIDTH=463><I>Description</I></TD></TR><TR><TD WIDTH=127><CENTER>(?# TEXT)</CENTER></TD><TD WIDTH=463>This extension lets you add comments to your regular expression. The TEXT value is ignored.</TD></TR><TR><TD WIDTH=127><CENTER>(?:...)</CENTER></TD><TD WIDTH=463>This extension lets you add parentheses to your regular expression without causing a pattern memory position to be used.</TD></TR><TR><TD WIDTH=127><CENTER>(?=...)</CENTER></TD><TD WIDTH=463>This extension lets you match values without iNCluding them in the <TT>$&</TT> variable.</TD></TR><TR><TD WIDTH=127><CENTER>(?!...)</CENTER></TD><TD WIDTH=463>This extension lets you specify what should not follow your pattern. For instaNCe, <TT>/blue(?!bird)/</TT> means that <TT>"bluebox"</TT> and <TT>"bluesy"</TT> will be matched but not <TT>"bluebird"</TT>.</TD></TR><TR><TD WIDTH=127><CENTER>(?sxi)</CENTER></TD><TD WIDTH=463>This extension lets you specify an embedded option in the pattern rather than adding it after the last delimiter. This is useful if you are storing patterns in variables and using variable interpolation to do the matching.</TD></TR></TABLE></CENTER><P><P>By far the most useful feature of extended mode, in my opinion,is the ability to add comments directly inside your patterns.For example, would you rather a see a pattern that looks likethis:<BLOCKQUOTE><PRE># Match a string with two words. $1 will be the# first word. $2 will be the second word.m/^\s+(\w+)\W+(\w+)\s+$/;</PRE></BLOCKQUOTE><P>or one that looks like this:<BLOCKQUOTE><PRE>m/ (?# This pattern will match any string with two) (?# and only two words in it. The matched words) (?# will be available in $1 and $2 if the match) (?# is successful.) ^ (?# ANChor this match to the beginning) (?# of the string) \s* (?# skip over any whitespace characters) (?# use the * because there may be none) (\w+) (?# Match the first word, we know it's) (?# the first word because of the aNChor) (?# above. Place the matched word into) (?# pattern memory.) \W+ (?# Match at least one non-word) (?# character, there may be more than one) (\w+) (?# Match another word, put into pattern) (?# memory also.) \s* (?# skip over any whitespace characters) (?# use the * because there may be none) $ (?# ANChor this match to the end of the) (?# string. Because both ^ and $ aNChors) (?# are present, the entire string will) (?# need to match the pattern. A) (?# sub-string that fits the pattern will) (?# not match.) /x;</PRE></BLOCKQUOTE><P>Of course, the commented pattern is much longer, but it takesthe same amount of time to execute. In addition, it will be mucheasier to maintain the commented pattern because each componentis explained. When you know what each component is doing in relationto the rest of the pattern, it becomes easy to modify its behaviorwhen the need arises.<P>Extensions also let you change the order of evaluation withoutaffecting pattern memory. For example,<BLOCKQUOTE><PRE>m/(?:a|b)+/;</PRE></BLOCKQUOTE><P>will match either the a character repeated one or more times orthe b character repeated one or more times. The pattern memorywill not be affected.<P>At times, you might like to iNClude a pattern component in yourpattern without iNCluding it in the <TT>$&</TT>variable that holds the matched string. The technical term forthis is a <I>zero-width positive look-ahead assertion</I>. Youcan use this to ensure that the string following the matched componentis correct without affecting the matched value. For example, ifyou have some data that looks like this:<BLOCKQUOTE><PRE>David Veterinarian 56Jackie Orthopedist 34Karen Veterinarian 28</PRE></BLOCKQUOTE><P>and you want to find all veterinarians and store the value ofthe first column, you can use a look-ahead assertion. This willdo both tasks in one step. For example:<BLOCKQUOTE><PRE>while (<>) { push(@array, $&) if m/^\w+(?=\s+Vet)/;}print("@array\n");</PRE></BLOCKQUOTE><P>This program will display:<BLOCKQUOTE><PRE>David Karen</PRE></BLOCKQUOTE><P>Let's look at the pattern with comments added using the extendedmode. In this case, it doesn't make sense to add comments directlyto the pattern because the pattern is part of the <TT>if</TT>statement modifier. Adding comments in that location would makethe comments hard to format. So let's use a different tactic.<BLOCKQUOTE><PRE>$pattern = '^\w+ (?# Match the first word in the string) (?=\s+ (?# Use a look-ahead assertion to match) (?# one or more whitespace characters) Vet) (?# In addition to the whitespace, make) (?# sure that the next column starts) (?# with the character sequeNCe "Vet") ';while (<>) { push(@array, $&) if m/$pattern/x;}print("@array\n");</PRE></BLOCKQUOTE><P>Here we used a variable to hold the pattern and then used variableinterpolation in the pattern with the match operator. You mightwant to pick a more descr
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -