📄 java.regex.tutorial.html

📁 Regular Expressions of Java Tutorial
💻 HTML
📖 第 1 页 / 共 5 页
字号:
Enter input string to search: 5
No match found.

Enter your regex: [0-4[6-8]]
Enter input string to search: 6
I found the text "6" starting at index 0 and ending at index 1.

Enter your regex: [0-4[6-8]]
Enter input string to search: 8
I found the text "8" starting at index 0 and ending at index 1.

Enter your regex: [0-4[6-8]]
Enter input string to search: 9
No match found.</pre>

<div id="h4"><a name="reg3_1_4"></a>3.1.4　交集<span class="returnContents"><a href="#contents">返回目录</a></span></div>
　　建一个仅仅匹配自身嵌套类中公共部分字符的字符类时，可以像<code>[0-9&&[345]]</code>中那样使用<code>&&</code>。这种方式构建出来的<em>交集</em>（intersection）简单字符类，仅仅以匹配两个字符类中的 3，4，5 共有部分。

<pre id="console">Enter your regex: [0-9&&[345]]
Enter input string to search: 3
I found the text "3" starting at index 0 and ending at index 1.

Enter your regex: [0-9&&[345]]
Enter input string to search: 4
I found the text "4" starting at index 0 and ending at index 1.

Enter your regex: [0-9&&[345]]
Enter input string to search: 5
I found the text "5" starting at index 0 and ending at index 1.

Enter your regex: [0-9&&[345]]
Enter input string to search: 2
No match found.

Enter your regex: [0-9&&[345]]
Enter input string to search: 6
No match found.</pre>

　　下面演示两个范围交集的例子：

<pre id="console">Enter your regex: [2-8&&[4-6]]
Enter input string to search: 3
No match found.

Enter your regex: [2-8&&[4-6]]
Enter input string to search: 4
I found the text "4" starting at index 0 and ending at index 1.

Enter your regex: [2-8&&[4-6]]
Enter input string to search: 5
I found the text "5" starting at index 0 and ending at index 1.

Enter your regex: [2-8&&[4-6]]
Enter input string to search: 6
I found the text "6" starting at index 0 and ending at index 1.

Enter your regex: [2-8&&[4-6]]
Enter input string to search: 7
No match found.</pre>

<div id="h4"><a name="reg3_1_5"></a>3.1.5　差集<span class="returnContents"><a href="#contents">返回目录</a></span></div>
　　最后，可以使用<em>差集</em>（subtraction）来否定一个或多个嵌套的字符类，比如：<code>[0-9&&[^345]]</code>，这个是构建一个匹配除 3，4，5 之外所有 0 到 9 间数字的简单字符类。

<pre id="console">Enter your regex: [0-9&&[^345]]
Enter input string to search: 2
I found the text "2" starting at index 0 and ending at index 1.

Enter your regex: [0-9&&[^345]]
Enter input string to search: 3
No match found.

Enter your regex: [0-9&&[^345]]
Enter input string to search: 4
No match found.

Enter your regex: [0-9&&[^345]]
Enter input string to search: 5
No match found.

Enter your regex: [0-9&&[^345]]
Enter input string to search: 6
I found the text "6" starting at index 0 and ending at index 1.

Enter your regex: [0-9&&[^345]]
Enter input string to search: 9
I found the text "9" starting at index 0 and ending at index 1.</pre>

　　到此为止，已经涵盖了如何建立字符类的部分。在继续下一节之前，可以试着回想一下那张<a href="#fig1">字符类表</a>。

<div id="h2"><a name="reg4"></a>4　预定义字符类<span class="returnContents"><a href="#contents">返回目录</a></span></div>
　　Pattern 的 API 包有许多有用的<em>预定义字符类</em>（predefined character classes），提供了常用正则表达式的简写形式。<br/>
<table border="0" cellpadding="0" cellspacing="0" class="regTab" align="center">
  <caption>预定义字符类</caption>
  <tr>
    <td class="regCenter"><code>.</code></td>
    <td>任何字符（匹配或者不匹配行结束符）</td>
  </tr>
  <tr>
    <td class="regCenter"><code>\d</code></td>
    <td>数字字符：<code>[0-9]</code></td>
  </tr>
  <tr>
    <td class="regCenter"><code>\D</code></td>
    <td>非数字字符：<code>[^0-9]</code></td>
  </tr>
  <tr>
    <td class="regCenter"><code>\s</code></td>
    <td>空白字符：<code>[\t\n\x0B\f\r]</code></td>
  </tr>
  <tr>
    <td class="regCenter"><code>\S</code></td>
    <td>非空白字符：<code>[^\s]</code></td>
  </tr>
  <tr>
    <td class="regCenter"><code>\w</code></td>
    <td>单词字符：<code>[a-zA-Z_0-9]</code></td>
  </tr>
  <tr>
    <td class="regCenter"><code>\W</code></td>
    <td>非单词字符：<code>[^\w]</code></td>
  </tr>
</table>
　　上表中，左列是构造右列字符类的简写形式。例如：<code>\d</code>指的是数字范围（0～9），<code>\w</code>指的是单词字符（任何大小写字母、下划线或者是数字）。无论何时都有可能使用预定义字符类，它可以使代码更易阅读，更易从难看的字符类中排除错误。<br/>
　　以反斜线（<code>\</code>）开始的构造称为<em>转义构造</em>（escaped constructs）。回顾一下在 <a href="#reg2">字符串</a> 一节中的转义构造，在那里我们提及了使用反斜线，以及用于引用的<code>\Q</code>和<code>\E</code>。在字符串中使用转义构造，必须在一个反斜线前再增加一个反斜用于字符串的编译，例如：<br/>
<pre name="java" id="java">private final String REGEX = "\\d";        // 单个数字</pre>

　　这个例子中<code>\d</code>是正则表达式，另外的那个反斜线是用于代码编译所必需的。但是测试用具读取的表达式，是直接从控制台中输入的，因此不需要那个多出来的反斜线。<br/>

　　下面的例子说明了预字义字符类的用法：<br/>

<pre id="console">Enter your regex: .
Enter input string to search: @
I found the text "@" starting at index 0 and ending at index 1.

Enter your regex: .
Enter input string to search: 1
I found the text "1" starting at index 0 and ending at index 1.

Enter your regex: .
Enter input string to search: a
I found the text "a" starting at index 0 and ending at index 1.

Enter your regex: \d
Enter input string to search: 1
I found the text "1" starting at index 0 and ending at index 1.

Enter your regex: \d
Enter input string to search: a
No match found.

Enter your regex: \D
Enter input string to search: 1
No match found.

Enter your regex: \D
Enter input string to search: a
I found the text "a" starting at index 0 and ending at index 1.

Enter your regex: \s
Enter input string to search:  
I found the text " " starting at index 0 and ending at index 1.

Enter your regex: \s
Enter input string to search: a
No match found.

Enter your regex: \S
Enter input string to search:  
No match found.

Enter your regex: \S
Enter input string to search: a
I found the text "a" starting at index 0 and ending at index 1.

Enter your regex: \w
Enter input string to search: a
I found the text "a" starting at index 0 and ending at index 1.

Enter your regex: \w
Enter input string to search: !
No match found.

Enter your regex: \W
Enter input string to search: a
No match found.

Enter your regex: \W
Enter input string to search: !
I found the text "!" starting at index 0 and ending at index 1.</pre>

　　在开始的三个例子中，正则表达式是简单的，<code>.</code>（“点”元字符）表示“任意字符”，因此，在所有的三个例子（随意地选取了“@”字符，数字和字母）中都是匹配成功的。在接下来的例子中，都使用了预定义字符类表格中的单个正则表达式构造。你应该可以根据这张表指出前面每个匹配的逻辑：<br/>
　　<code>\d</code> 匹配数字字符<br/>
　　<code>\s</code> 匹配空白字符<br/>
　　<code>\w</code> 匹配单词字符<br/>
　　也可以使用意思正好相反的大写字母：<br/>
　　<code>\D</code> 匹配非数字字符<br/>
　　<code>\S</code> 匹配非空白字符<br/>
　　<code>\W</code> 匹配非单词字符<br/>

<div id="h2"><a name="reg5"></a>5　量词<span class="returnContents"><a href="#contents">返回目录</a></span></div>
　　这一节我们来看一下贪婪（greedy）、勉强（reluctant）和侵占（possessive）量词，来匹配指定表达式<code>X</code>的次数。<br/>
　　<em>量词</em>（quantifiers）允许指定匹配出现的次数，方便起见，当前 Pattern API 规范下，描述了贪婪、勉强和侵占三种量词。首先粗略地看一下，量词<code>X?</code>、<code>X??</code>和<code>X?+</code>都允许匹配 X 零次或一次，精确地做同样的事情，但它们之间有着细微的不同之处，在这节结束前会进行说明。<br/>

<table border="0" cellpadding="0" cellspacing="0" class="regTab" align="center">
  <thead>
    <tr>
      <td colspan="3">量　词　种　类</td>
      <td rowspan="2">意　　义</td>
    </tr>
    <tr>
      <td>贪婪</td>
      <td>勉强</td>
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -