📄 wcpattern.h
字号:
#ifndef __WCPATTERN_H__
#define __WCPATTERN_H__
#ifdef _WIN32
#pragma warning(disable:4786)
#endif
#include <vector>
#include <string>
#include <map>
class WCMatcher;
class NFAUNode;
class NFAQuantifierUNode;
namespace std
{
typedef std::basic_string<wchar_t> wstring;
}
/**
This pattern class is very similar in functionality to Java's
java.util.regex.WCPattern class. The pattern class represents an immutable
regular expression object. Instead of having a single object contain both the
regular expression object and the matching object, instead the two objects are
split apart. The {@link WCMatcher WCMatcher} class represents the maching
object.
The WCPattern class works primarily off of "compiled" patterns. A typical
instantiation of a regular expression looks like:
<pre>
WCPattern * p = WCPattern::compile(L"a*b");
WCMatcher * m = p->createWCMatcher(L"aaaaaab");
if (m->matches()) ...
</pre>
However, if you do not need to use a pattern more than once, it is often times
okay to use the WCPattern's static methods insteads. An example looks like this:
<pre>
if (WCPattern::matches(L"a*b", L"aaaab")) { ... }
</pre>
This class does not currently support unicode. The unicode update for this
class is coming soon.
This class is partially immutable. It is completely safe to call createWCMatcher
concurrently in different threads, but the other functions (e.g. split) should
not be called concurrently on the same <code>WCPattern</code>.
<table border="0" cellpadding="1" cellspacing="0">
<tr align="left" bgcolor="#CCCCFF">
<td>
<b>Construct</b>
</td>
<td>
<b>Matches</b>
</th>
</tr>
<tr>
<td colspan="2">
</td>
</tr>
<tr>
<td colspan="2">
<b>Characters</b>
</td>
</tr>
<tr>
<td>
<code><i>x</i></code>
</td>
<td>
The character <code><i>x</i></code>
</td>
</tr>
<tr>
<td>
<code>\\</code>
</td>
<td>
The character <code>\</code>
</td>
</tr>
<tr>
<td>
<code>\0<i>nn</i></code>
</td>
<td>
The character with octal ASCII value <code><i>nn</i></code>
</td>
</tr>
<tr>
<td>
<code>\0<i>nnn</i></code>
</td>
<td>
The character with octal ASCII value <code><i>nnn</i></code>
</td>
</tr>
<tr>
<td>
<code>\x<i>hh</i></code>
</td>
<td>
The character with hexadecimal ASCII value <code><i>hh</i></code>
</td>
</tr>
<tr>
<td>
<code>\t</code>
</td>
<td>
A tab character
</td>
</tr>
<tr>
<td>
<code>\r</code>
</td>
<td>
A carriage return character
</td>
</tr>
<tr>
<td>
<code>\n</code>
</td>
<td>
A new-line character
</td>
</tr>
<tr>
<td colspan="2">
</td>
</tr>
<tr>
<td>
<b>Character Classes</b>
</td>
</tr>
<tr>
<td>
<code>[abc]</code>
</td>
<td>
Either <code>a</code>, <code>b</code>, or <code>c</code>
</td>
</tr>
<tr>
<td>
<code>[^abc]</code>
</td>
<td>
Any character but <code>a</code>, <code>b</code>, or <code>c</code>
</td>
</tr>
<tr>
<td>
<code>[a-zA-Z]</code>
</td>
<td>
Any character ranging from <code>a</code> thru <code>z</code>, or
<code>A</code> thru <code>Z</code>
</td>
</tr>
<tr>
<td>
<code>[^a-zA-Z]</code>
</td>
<td>
Any character except those ranging from <code>a</code> thru
<code>z</code>, or <code>A</code> thru <code>Z</code>
</td>
</tr>
<tr>
<td>
<code>[a\-z]</code>
</td>
<td>
Either <code>a</code>, <code>-</code>, or <code>z</code>
</td>
</tr>
<tr>
<td>
<code>[a-z[A-Z]]</code>
</td>
<td>
Same as <code>[a-zA-Z]</code>
</td>
</tr>
<tr>
<td>
<code>[a-z&&[g-i]]</code>
</td>
<td>
Any character in the intersection of <code>a-z</code> and
<code>g-i</code>
</td>
</tr>
<tr>
<td>
<code>[a-z&&[^g-i]]</code>
</td>
<td>
Any character in <code>a-z</code> and not in <code>g-i</code>
</td>
</tr>
<tr>
<td colspan="2">
</td>
</tr>
<tr>
<td colspan="2">
<b>Prefefined character classes</b>
</td>
</tr>
<tr>
<td>
<code><b>.</b></code>
</td>
<td>
Any character. Multiline matching must be compiled into the pattern for
<code><b>.</b></code> to match a <code>\r</code> or a <code>\n</code>.
Even if multiline matching is enabled, <code><b>.</b></code> will not
match a <code>\r\n</code>, only a <code>\r</code> or a <code>\n</code>.
</td>
</tr>
<tr>
<td>
<code>\d</code>
</td>
<td>
<code>[0-9]</code>
</td>
</tr>
<tr>
<td>
<code>\D</code>
</td>
<td>
<code>[^\d]</code>
</td>
</tr>
<tr>
<td>
<code>\s</code>
</td>
<td>
<code>[ \t\r\n\x0B]</code>
</td>
</tr>
<tr>
<td>
<code>\S</code>
</td>
<td>
<code>[^\s]</code>
</td>
</tr>
<tr>
<td>
<code>\w</code>
</td>
<td>
<code>[a-zA-Z0-9_]</code>
</td>
</tr>
<tr>
<td>
<code>\W</code>
</td>
<td>
<code>[^\w]</code>
</td>
</tr>
<tr>
<td colspan="2">
</td>
</tr>
<tr>
<td colspan="2">
<b>POSIX character classes
</td>
</tr>
<tr>
<td>
<code>\p{Lower}</code>
</td>
<td>
<code>[a-z]</code>
</td>
</tr>
<tr>
<td>
<code>\p{Upper}</code>
</td>
<td>
<code>[A-Z]</code>
</td>
</tr>
<tr>
<td>
<code>\p{ASCII}</code>
</td>
<td>
<code>[\x00-\x7F]</code>
</td>
</tr>
<tr>
<td>
<code>\p{Alpha}</code>
</td>
<td>
<code>[a-zA-Z]</code>
</td>
</tr>
<tr>
<td>
<code>\p{Digit}</code>
</td>
<td>
<code>[0-9]</code>
</td>
</tr>
<tr>
<td>
<code>\p{Alnum}</code>
</td>
<td>
<code>[\w&&[^_]]</code>
</td>
</tr>
<tr>
<td>
<code>\p{Punct}</code>
</td>
<td>
<code>[!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~]</code>
</td>
</tr>
<tr>
<td>
<code>\p{XDigit}</code>
</td>
<td>
<code>[a-fA-F0-9]</code>
</td>
</tr>
<tr>
<td colspan="2">
</td>
</tr>
<tr>
<td colspan="2">
<b>Boundary Matches</b>
</td>
</tr>
<tr>
<td>
<code>^</code>
</td>
<td>
The beginning of a line. Also matches the beginning of input.
</td>
</tr>
<tr>
<td>
<code>$</code>
</td>
<td>
The end of a line. Also matches the end of input.
</td>
</tr>
<tr>
<td>
<code>\b</code>
</td>
<td>
A word boundary
</td>
</tr>
<tr>
<td>
<code>\B</code>
</td>
<td>
A non word boundary
</td>
</tr>
<tr>
<td>
<code>\A</code>
</td>
<td>
The beginning of input
</td>
</tr>
<tr>
<td>
<code>\G</code>
</td>
<td>
The end of the previous match. Ensures that a "next" match will only
happen if it begins with the character immediately following the end of
the "current" match.
</td>
</tr>
<tr>
<td>
<code>\Z</code>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -