⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 ch10.htm

📁 this is a book on pearl , simple example with explanation is given here. it could be beneficial for
💻 HTM
📖 第 1 页 / 共 5 页
字号:
BBB&quot;</TT>, then <TT>m/\w+/;</TT>would match the <TT>&quot;AAA&quot;</TT>in the string. If <TT>$_</TT> wasblank, full of white space, or full of other non-word characters,an undefined value would be returned.<P>The preceding pattern will let you determine if <TT>$_</TT>contains a word but does not let you know what the word is. Inorder to accomplish that, you need to eNClose the matching componentsinside parentheses. For example:<BLOCKQUOTE><PRE>m/(\w+)/;</PRE></BLOCKQUOTE><P>By doing this, you force Perl to store the matched string intothe $1 variable. The $1 variable can be considered as patternmemory.<P>This introduction to pattern components describes most of thedetails you need to know in order to create your own patternsor regular expressions. However, some of the components deservea bit more study. The next few sections look at character classes,quantifiers, pattern memory, pattern precedeNCe, and the extensionsyntax. Then the rest of the chapter is devoted to showing specificexamples of when to use the different components.<H3><A NAME="ExampleCharacterClasses">Example: Character Classes</A></H3><P>A character class defines a type of character. The character class<TT>[0123456789]</TT> defines theclass of decimal digits, and <TT>[0-9a-f]</TT>defines the class of hexadecimal digits. Notice that you can usea dash to define a range of consecutive characters. Characterclasses let you match any of a range of characters; you don'tknow in advaNCe which character will be matched. This capabilityto match non-specific characters is what meta-characters are allabout.<P>You can use variable interpolation inside the character class,but you must be careful when doing so. For example,<BLOCKQUOTE><PRE>$_ = &quot;AAABBBccC&quot;;$charList = &quot;ADE&quot;;print &quot;matched&quot; if m/[$charList]/;</PRE></BLOCKQUOTE><P>will display<BLOCKQUOTE><PRE>matched</PRE></BLOCKQUOTE><P>This is because the variable interpolation results in a characterclass of <TT>[ADE]</TT>. If you usethe variable as one-half of a character range, you need to ensurethat you don't mix numbers and digits. For example,<BLOCKQUOTE><PRE>$_ = &quot;AAABBBccC&quot;;$charList = &quot;ADE&quot;;print &quot;matched&quot; if m/[$charList-9]/;</PRE></BLOCKQUOTE><P>will result in the following error message when executed:<BLOCKQUOTE><PRE>/[ADE-9]/: invalid [] range in regexp at test.pl line 4.</PRE></BLOCKQUOTE><P>At times, it's necessary to match on any character except fora given character list. This is done by complementing the characterclass with the caret. For example,<BLOCKQUOTE><PRE>$_ = &quot;AAABBBccC&quot;;print &quot;matched&quot; if m/[^ABC]/;</PRE></BLOCKQUOTE><P>will display nothing. This match returns true only if a characterbesides <TT>A</TT>, <TT>B</TT>,or <TT>C</TT> is in the searched string.If you complement a list with just the letter <TT>A</TT>,<BLOCKQUOTE><PRE>$_ = &quot;AAABBBccC&quot;;print &quot;matched&quot; if m/[^A]/;</PRE></BLOCKQUOTE><P>then the string <TT>&quot;matched&quot;</TT>will be displayed because <TT>B</TT>and <TT>C</TT> are part of the string-inother words, a character besides the letter <TT>A</TT>.<P>Perl has shortcuts for some character classes that are frequentlyused. Here is a list of what I call symbolic character classes:<BR><p><CENTER><TABLE BORDERCOLOR=#000000 BORDER=1 WIDTH=80%><TR><TD WIDTH=67><CENTER><TT><B><FONT FACE="Courier">\w</FONT></B></TT></CENTER></TD><TD WIDTH=523>This symbol matches any alphanumeric character or the underscore character. It is equivalent to the character class <TT>[a-zA-Z0-9_]</TT>.</TD></TR><TR><TD WIDTH=67><CENTER><TT><B><FONT FACE="Courier">\W</FONT></B></TT></CENTER></TD><TD WIDTH=523>This symbol matches every character that the <TT>\w</TT> symbol does not. In other words, it is the complement of <TT>\w</TT>. It is equivalent to <TT>[^a-zA-Z0-9_]</TT>.</TD></TR><TR><TD WIDTH=67><CENTER><TT><B><FONT FACE="Courier">\s</FONT></B></TT></CENTER></TD><TD WIDTH=523>This symbol matches any space, tab, or newline character. It is equivalent to <TT>[\t \n]</TT>.</TD></TR><TR><TD WIDTH=67><CENTER><TT><B><FONT FACE="Courier">\S</FONT></B></TT></CENTER></TD><TD WIDTH=523>This symbol matches any non-whitespace character. It is equivalent to <TT>[^\t \n]</TT>.</TD></TR><TR><TD WIDTH=67><CENTER><TT><B><FONT FACE="Courier">\d</FONT></B></TT></CENTER></TD><TD WIDTH=523>This symbol matches any digit. It is equivalent to <TT>[0-9]</TT>.</TD></TR><TR><TD WIDTH=67><CENTER><TT><B><FONT FACE="Courier">\D</FONT></B></TT></CENTER></TD><TD WIDTH=523>This symbol matches any non-digit character. It is equivalent to <TT>[^0-9]</TT>.</TD></TR></TABLE></CENTER><P><P>You can use these symbols inside other character classes, butnot as endpoints of a range. For example, you can do the following:<BLOCKQUOTE><PRE>$_ = &quot;\tAAA&quot;;print &quot;matched&quot; if m/[\d\s]/;</PRE></BLOCKQUOTE><P>which will display<BLOCKQUOTE><PRE>matched</PRE></BLOCKQUOTE><P>because the value of <TT>$_</TT> iNCludesthe tab character.<BR><p><CENTER><TABLE BORDERCOLOR=#000000 BORDER=1 WIDTH=80%><TR><TD><B>Tip</B></TD></TR><TR><TD><BLOCKQUOTE>Meta-characters that appear inside the square brackets that define a character class are used in their literal sense. They lose their meta-meaning. This may be a little confusing at first. In fact, I have a tendeNCy to forget this when evaluating patterns.</BLOCKQUOTE></TD></TR></TABLE></CENTER><P><p><CENTER><TABLE BORDERCOLOR=#000000 BORDER=1 WIDTH=80%><TR><TD><B>Note</B></TD></TR><TR><TD><BLOCKQUOTE>I think that most of the confusion regarding regular expressions lies in the fact that each character of a pattern might have several possible meanings. The caret could be an aNChor, it could be a caret, or it could be used to complement a character class. Therefore, it is vital that you decide which context any given pattern character or symbol is in before assigning a meaning to it.</BLOCKQUOTE></TD></TR></TABLE></CENTER><P><H3><A NAME="ExampleQuantifiers">Example: Quantifiers</A></H3><P>Perl provides several different quantifiers that let you specifyhow many times a given component must be present before the matchis true. They are used when you don't know in advaNCe how manycharacters need to be matched. Table 10.6 lists the differentquantifiers that can be used.<BR><P><CENTER><B>Table 10.6&nbsp;&nbsp;The Six Types of Quantifiers</B></CENTER><p><CENTER><TABLE BORDERCOLOR=#000000 BORDER=1 WIDTH=80%><TR><TD WIDTH=91><CENTER><I>Quantifier</I></CENTER></TD><TD WIDTH=492><I>Description</I></TD></TR><TR><TD WIDTH=91><CENTER>*</CENTER></TD><TD WIDTH=492>The component must be present zero or more times.</TD></TR><TR><TD WIDTH=91><CENTER>+</CENTER></TD><TD WIDTH=492>The component must be present one or more times.</TD></TR><TR><TD WIDTH=91><CENTER>?</CENTER></TD><TD WIDTH=492>The component must be present zero or one times.</TD></TR><TR><TD WIDTH=91><CENTER>{n}</CENTER></TD><TD WIDTH=492>The component must be present n times.</TD></TR><TR><TD WIDTH=91><CENTER>{n,}</CENTER></TD><TD WIDTH=492>The component must be present at least n times.</TD></TR><TR><TD WIDTH=91><CENTER>{n,m}</CENTER></TD><TD WIDTH=492>The component must be present at least n times and no more than m times.</TD></TR></TABLE></CENTER><P><P>If you need to match a word whose length is unknown, you needto use the <TT>+</TT> quantifier.You can't use an <TT>*</TT> becausea zero length word makes no sense. So, the match statement mightlook like this:<BLOCKQUOTE><PRE>m/\w+/;</PRE></BLOCKQUOTE><P>This pattern will match <TT>&quot;QQQ&quot;</TT>and <TT>&quot;AAAAA&quot;</TT> butnot <TT>&quot;&quot;</TT> or <TT>&quot; BBB&quot;</TT>. In order to account for the leading whitespace, which may or may not be at the beginning of a string, youneed to use the asterisk (<TT>*</TT>)quantifier in conjuNCtion with the <TT>\s</TT>symbolic character class in the following way:<BLOCKQUOTE><PRE>m/\s*\w+/; <BR></PRE></BLOCKQUOTE><p><CENTER><TABLE BORDERCOLOR=#000000 BORDER=1 WIDTH=80%><TR><TD><B>Tip</B></TD></TR><TR><TD><BLOCKQUOTE>Be careful when using the <TT>*</TT> quantifier because it can match an empty string, which might not be your intention. The pattern <TT>/b*/</TT> will match any string-even one without any <TT>b</TT> characters.</BLOCKQUOTE></TD></TR></TABLE></CENTER><P><P>At times, you may need to match an exact number of components.The following match statement will be true only if five wordsare present in the <TT>$_</TT> variable:<BLOCKQUOTE><PRE>$_ = &quot;AA AB AC AD AE&quot;;m/(\w+\s+){5}/;</PRE></BLOCKQUOTE><P>In this example, we are matching at least one word character followedby zero or more white space characters. The <TT>{5}</TT>quantifier is used to ensure that that combination of componentsis present five times.<P>The <TT>*</TT> and <TT>+</TT>quantifiers are greedy. They match as many characters as possible.This may not always be the behavior that you need. You can createnon-greedy components by following the quantifier with a <TT>?</TT>.<P>Use the following file specification in order to look at the <TT>*</TT>and <TT>+</TT> quantifiers more closely:<BLOCKQUOTE><PRE>$_ = '/user/Jackie/temp/names.dat';</PRE></BLOCKQUOTE><P>The regular expression <TT>.*</TT>will match the entire file specification. This can be seen inthe following small program:<BLOCKQUOTE><PRE>$_ = '/user/Jackie/temp/names.dat';m/.*/;print $&amp;;</PRE></BLOCKQUOTE><P>This program displays<BLOCKQUOTE><PRE>/user/Jackie/temp/names.dat</PRE></BLOCKQUOTE><P>You can see that the <TT>*</TT> quantifieris greedy. It matched the whole string. If you add the ? modifierto make the <TT>.*</TT> componentnon-greedy, what do you think the program would display?<BLOCKQUOTE><PRE>$_ = '/user/Jackie/temp/names.dat';m/.*?/;print $&amp;;</PRE></BLOCKQUOTE><P>This program displays nothing because the least amount of charactersthat the <TT>*</TT> matches is zero.If we change the <TT>*</TT> to a <TT>+</TT>,then the program will display<BLOCKQUOTE><PRE>/</PRE></BLOCKQUOTE><P>Next, let's look at the coNCept of pattern memory, which letsyou keep bits of matched string around after the match is complete.<H3><A NAME="ExamplePatternMemory">Example: Pattern Memory</A></H3><P>Matching arbitrary numbers of characters is fine, but withoutthe capability to find out what was matched, patterns would notbe very useful. Perl lets you eNClose pattern components insideparentheses in order to store the string that matched the componentsinto pattern memory. You also might hear <I>pattern memory </I>referredto as <I>pattern buffers</I>. This memory persists after the matchstatement is finished executing so that you can assign the matchedvalues to other variables.<P>You saw a simple example of this earlier right after the componentdescriptions. That example looked for the first word in a stringand stored it into the first buffer, <TT>$1</TT>.The following small program<BLOCKQUOTE><PRE>$_ =  &quot;AAA BBB ccC&quot;;m/(\w+)/;print(&quot;$1\n&quot;);

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -