📄 usingregexp.htm
字号:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<meta http-equiv="Content-Style-Type" content="text/css">
<meta http-equiv="Content-Language" content="en-us">
<title>Using Regular Expressions</title>
</head>
<BODY BGCOLOR="#FFFFFF">
<FONT size=-1>
<p class="text">Normally, when you search for a sub-string in a string, the match should be exact. So if you search for a sub-string "<span class="mono">abc</span>" then the string being searched should contain these exact letters in the same sequence for a match to be found.</p>
<p class="text">We can extend this kind of search to a case-insensitive search, where the sub-string "<span class="mono">abc</span>" will find strings like "<span class="mono">Abc</span>", "<span class="mono">ABC</span>" and so on. That is, case is ignored but the sequence of the letters should be exactly the same. Sometimes, a case insensitive search is still not enough. For example, if we want to search for numeric digits, then we basically end up searching for each digit independently. This is where regular expressions come in to help. </p>
<p class="text"><span class="bold">Regular expressions</span> are text patterns that are used for string matching. Regular expressions are strings that contain a mix of plain text and special characters to indicate what kind of matching to do. Here is a very brief tutorial on using regular expressions before we move on to the code for handling regular expressions. </p>
<h2>Regular Expressions Syntax</h2>
<p class="text"><span class="bold">Literals</span><br>
All characters are literals except: ".", "*", "?", "+", "(", ")", "{", "}", "[", "]", "^" and "$". These characters are literals when preceded by a "\". A literal is a character that matches itself.</p>
<p class="text"><span class="bold">Wildcard</span><br>
The dot character "." matches any single character.</p>
<p class="text"><span class="bold">Repeats</span><br>
A repeat is an expression that is repeated an arbitrary number of times.<br>
An expression followed by <span class="bold">*</span> can be repeated any number of times including zero.<br>
An expression followed by <span class="bold">+</span> can be repeated any number of times, but at least once.<br>
An expression followed by <span class="bold">?</span> may be repeated zero or one times only.<br>
When it is necessary to specify the minimum and maximum number of repeats explicitly, the bounds operator <span class="bold">{}</span> may be used,
thus <span class="mono">"a{2}"</span> is the letter "a" repeated exactly twice,
<span class="mono">"a{2,4}"</span> represents the letter "a" repeated between 2 and 4 times,
and <span class="mono">"a{2,}"</span> represents the letter "a" repeated at least twice with no upper limit. Note that there must be no white-space inside the {}, and there is no upper limit on the values of the lower and upper bounds.</p>
<pre>
Examples:
"ba*" will match all of "b", "ba", "baaa" etc.
"ba+" will match "ba" or "baaaa" for example but not "b".
"ba?" will match "b" or "ba".
"ba{2,4}" will match "baa", "baaa" and "baaaa".
</pre>
<p class="text"><span class="bold">Parenthesis</span><br>
Parentheses <span class="bold">()</span> are used to group items together into a sub-expression. For example, the expression <span class="mono">"(ab)*"</span> would match all of the string "ababab".</p>
<p class="text"><span class="bold">Alternatives</span><br>
Alternatives occur when the expression can match either one sub-expression or another, each alternative is separated by a <span class="bold">"|"</span>. Each alternative is the largest possible previous sub-expression; this is the opposite behaviour from repetition operators.</p>
<pre>
Examples:
"a(b|c)" could match "ab" or "ac".
"abc|def" could match "abc" or "def".
</pre>
<p class="text"><span class="bold">Sets</span><br>
A set is a set of characters that can match any single character that is a member of the set. Sets are delimited by <span class="bold">"["</span> and <span class="bold">"]"</span> and can contain literals, character ranges, and character classes. Set declarations that start with <span class="bold">"^"</span> contain the complement of the elements that follow.</p>
<pre>
Examples:
Character literals:
"[abc]" will match either of "a", "b", or "c".
"[^abc]" will match any character other than "a", "b", or "c".
Character ranges:
"[a-z]" will match any character in the range "a" to "z".
"[^A-Z]" will match any character other than those in the range "A" to "Z".
</pre>
<p class="text"><span class="bold">Character classes</span><br>
A character class is a special sequence to simplify common-used character types. Available classes are:</p>
<table border="1" cellpadding="3">
<tr valign="top">
<th>Class</th>
<th>Description</th>
<th>Equivalent</th>
</tr>
<tr valign="top">
<td align="center"><span class="bold">\w</span></td>
<td>Any word character - all alphanumeric characters plus the underscore.</td>
<td>[a-zA-Z_]</td>
</tr>
<tr valign="top">
<td align="center"><span class="bold">\s</span></td>
<td>Any whitespace character (spaces and tabs).</td>
<td> </td>
</tr>
<tr valign="top">
<td align="center"><span class="bold">\d</span></td>
<td>Any digit.</td>
<td>[0-9]</td>
</tr>
<tr valign="top">
<td align="center"><span class="bold">\l</span></td>
<td>Any lower case character.</td>
<td>[a-z]</td>
</tr>
<tr valign="top">
<td align="center"><span class="bold">\u</span></td>
<td>Any upper case character.</td>
<td>[A-Z]</td>
</tr>
</table>
<p class="text"><br>The uppercase version of these classes means <span class="bold">NOT</span>, for example, <span class="bold">\S</span> is non-spacing character.</p>
<h2>Summary of Regular Expressions Syntax Elements</h2>
<p class="text">The following table summarizes the syntax elements used in regular expressions.</p>
<table border="1" cellpadding="3">
<tr valign="top">
<th>Character</th>
<th>Description</th>
</tr>
<tr valign="top">
<td align="center">^</td>
<td>Beginning of the string. The expression "<span class="mono">^A</span>" will match an "<span class="mono">A</span>" only at the beginning of the string.</td>
</tr>
<tr valign="top">
<td align="center">^</td>
<td>The caret (^) immediately following the left bracket ([) has a different meaning. It is used to exclude the remaining characters within brackets from matching the target string. The expression "<span class="mono">[^0-9]</span>" indicates that the target character should not be a digit.</td>
</tr>
<tr valign="top">
<td align="center">$</td>
<td>The dollar sign ($) will match the end of the string. The expression "<span class="mono">abc$</span>" will match the sub-string "<span class="mono">abc</span>" only if it is at the end of the string.</td>
</tr>
<tr valign="top">
<td align="center">|</td>
<td>The alternation character (|) allows either expression on its side to match the target string. The expression "<span class="mono">a|b</span>" will match "<span class="mono">a</span>" as well as "<span class="mono">b</span>".</td>
</tr>
<tr valign="top">
<td align="center">.</td>
<td>The dot (.) will match any character.</td>
</tr>
<tr valign="top">
<td align="center">*</td>
<td>The asterisk (*) indicates that the character to the left of the asterisk in the expression should match 0 or more times.</td>
</tr>
<tr valign="top">
<td align="center">+</td>
<td>The plus (+) is similar to asterisk but there should be at least one match of the character to the left of the + sign in the expression.</td>
</tr>
<tr valign="top">
<td align="center">?</td>
<td>The question mark (?) matches the character to its left 0 or 1 times.</td>
</tr>
<tr valign="top">
<td align="center">()</td>
<td>The parenthesis affects the order of pattern evaluation and also serves as a tagged expression that can be used when replacing the matched sub-string with another expression.</td>
</tr>
<tr valign="top">
<td align="center">[]</td>
<td>Brackets ([ and ]) enclosing a set of characters indicates that any of the enclosed characters may match the target character.</td>
</tr>
<tr valign="top">
<td align="center">{N}</td>
<td>Repeats expression exactly N times.</td>
</tr>
<tr valign="top">
<td align="center">{N, M}</td>
<td>Repeats expression between N and M times.</td>
</tr>
<tr valign="top">
<td align="center">{N, }</td>
<td>Repeats expression N or more times.</td>
</tr>
</table>
</font></body>
</html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -