📄 pattern.h
字号:
#ifndef __PATTERN_H__
#define __PATTERN_H__
#ifdef _WIN32
#pragma warning(disable:4786)
#endif
#include <vector>
#include <string>
#include <map>
class Matcher;
class NFANode;
class NFAQuantifierNode;
/**
This pattern class is very similar in functionality to Java's
java.util.regex.Pattern class. The pattern class represents an immutable
regular expression object. Instead of having a single object contain both the
regular expression object and the matching object, instead the two objects are
split apart. The {@link Matcher Matcher} class represents the maching
object.
The Pattern class works primarily off of "compiled" patterns. A typical
instantiation of a regular expression looks like:
<pre>
Pattern * p = Pattern::compile("a*b");
Matcher * m = p->createMatcher("aaaaaab");
if (m->matches()) ...
</pre>
However, if you do not need to use a pattern more than once, it is often times
okay to use the Pattern's static methods insteads. An example looks like this:
<pre>
if (Pattern::matches("a*b", "aaaab")) { ... }
</pre>
This class does not currently support unicode. The unicode update for this
class is coming soon.
This class is partially immutable. It is completely safe to call createMatcher
concurrently in different threads, but the other functions (e.g. split) should
not be called concurrently on the same <code>Pattern</code>.
<table border="0" cellpadding="1" cellspacing="0">
<tr align="left" bgcolor="#CCCCFF">
<td>
<b>Construct</b>
</td>
<td>
<b>Matches</b>
</th>
</tr>
<tr>
<td colspan="2">
</td>
</tr>
<tr>
<td colspan="2">
<b>Characters</b>
</td>
</tr>
<tr>
<td>
<code><i>x</i></code>
</td>
<td>
The character <code><i>x</i></code>
</td>
</tr>
<tr>
<td>
<code>\\</code>
</td>
<td>
The character <code>\</code>
</td>
</tr>
<tr>
<td>
<code>\0<i>nn</i></code>
</td>
<td>
The character with octal ASCII value <code><i>nn</i></code>
</td>
</tr>
<tr>
<td>
<code>\0<i>nnn</i></code>
</td>
<td>
The character with octal ASCII value <code><i>nnn</i></code>
</td>
</tr>
<tr>
<td>
<code>\x<i>hh</i></code>
</td>
<td>
The character with hexadecimal ASCII value <code><i>hh</i></code>
</td>
</tr>
<tr>
<td>
<code>\t</code>
</td>
<td>
A tab character
</td>
</tr>
<tr>
<td>
<code>\r</code>
</td>
<td>
A carriage return character
</td>
</tr>
<tr>
<td>
<code>\n</code>
</td>
<td>
A new-line character
</td>
</tr>
<tr>
<td colspan="2">
</td>
</tr>
<tr>
<td>
<b>Character Classes</b>
</td>
</tr>
<tr>
<td>
<code>[abc]</code>
</td>
<td>
Either <code>a</code>, <code>b</code>, or <code>c</code>
</td>
</tr>
<tr>
<td>
<code>[^abc]</code>
</td>
<td>
Any character but <code>a</code>, <code>b</code>, or <code>c</code>
</td>
</tr>
<tr>
<td>
<code>[a-zA-Z]</code>
</td>
<td>
Any character ranging from <code>a</code> thru <code>z</code>, or
<code>A</code> thru <code>Z</code>
</td>
</tr>
<tr>
<td>
<code>[^a-zA-Z]</code>
</td>
<td>
Any character except those ranging from <code>a</code> thru
<code>z</code>, or <code>A</code> thru <code>Z</code>
</td>
</tr>
<tr>
<td>
<code>[a\-z]</code>
</td>
<td>
Either <code>a</code>, <code>-</code>, or <code>z</code>
</td>
</tr>
<tr>
<td>
<code>[a-z[A-Z]]</code>
</td>
<td>
Same as <code>[a-zA-Z]</code>
</td>
</tr>
<tr>
<td>
<code>[a-z&&[g-i]]</code>
</td>
<td>
Any character in the intersection of <code>a-z</code> and
<code>g-i</code>
</td>
</tr>
<tr>
<td>
<code>[a-z&&[^g-i]]</code>
</td>
<td>
Any character in <code>a-z</code> and not in <code>g-i</code>
</td>
</tr>
<tr>
<td colspan="2">
</td>
</tr>
<tr>
<td colspan="2">
<b>Prefefined character classes</b>
</td>
</tr>
<tr>
<td>
<code><b>.</b></code>
</td>
<td>
Any character. Multiline matching must be compiled into the pattern for
<code><b>.</b></code> to match a <code>\r</code> or a <code>\n</code>.
Even if multiline matching is enabled, <code><b>.</b></code> will not
match a <code>\r\n</code>, only a <code>\r</code> or a <code>\n</code>.
</td>
</tr>
<tr>
<td>
<code>\d</code>
</td>
<td>
<code>[0-9]</code>
</td>
</tr>
<tr>
<td>
<code>\D</code>
</td>
<td>
<code>[^\d]</code>
</td>
</tr>
<tr>
<td>
<code>\s</code>
</td>
<td>
<code>[ \t\r\n\x0B]</code>
</td>
</tr>
<tr>
<td>
<code>\S</code>
</td>
<td>
<code>[^\s]</code>
</td>
</tr>
<tr>
<td>
<code>\w</code>
</td>
<td>
<code>[a-zA-Z0-9_]</code>
</td>
</tr>
<tr>
<td>
<code>\W</code>
</td>
<td>
<code>[^\w]</code>
</td>
</tr>
<tr>
<td colspan="2">
</td>
</tr>
<tr>
<td colspan="2">
<b>POSIX character classes
</td>
</tr>
<tr>
<td>
<code>\p{Lower}</code>
</td>
<td>
<code>[a-z]</code>
</td>
</tr>
<tr>
<td>
<code>\p{Upper}</code>
</td>
<td>
<code>[A-Z]</code>
</td>
</tr>
<tr>
<td>
<code>\p{ASCII}</code>
</td>
<td>
<code>[\x00-\x7F]</code>
</td>
</tr>
<tr>
<td>
<code>\p{Alpha}</code>
</td>
<td>
<code>[a-zA-Z]</code>
</td>
</tr>
<tr>
<td>
<code>\p{Digit}</code>
</td>
<td>
<code>[0-9]</code>
</td>
</tr>
<tr>
<td>
<code>\p{Alnum}</code>
</td>
<td>
<code>[\w&&[^_]]</code>
</td>
</tr>
<tr>
<td>
<code>\p{Punct}</code>
</td>
<td>
<code>[!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~]</code>
</td>
</tr>
<tr>
<td>
<code>\p{XDigit}</code>
</td>
<td>
<code>[a-fA-F0-9]</code>
</td>
</tr>
<tr>
<td colspan="2">
</td>
</tr>
<tr>
<td colspan="2">
<b>Boundary Matches</b>
</td>
</tr>
<tr>
<td>
<code>^</code>
</td>
<td>
The beginning of a line. Also matches the beginning of input.
</td>
</tr>
<tr>
<td>
<code>$</code>
</td>
<td>
The end of a line. Also matches the end of input.
</td>
</tr>
<tr>
<td>
<code>\b</code>
</td>
<td>
A word boundary
</td>
</tr>
<tr>
<td>
<code>\B</code>
</td>
<td>
A non word boundary
</td>
</tr>
<tr>
<td>
<code>\A</code>
</td>
<td>
The beginning of input
</td>
</tr>
<tr>
<td>
<code>\G</code>
</td>
<td>
The end of the previous match. Ensures that a "next" match will only
happen if it begins with the character immediately following the end of
the "current" match.
</td>
</tr>
<tr>
<td>
<code>\Z</code>
</td>
<td>
The end of input. Will also match if there is a single trailing
<code>\r\n</code>, a single trailing <code>\r</code>, or a single
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -