📄 pattern.h

📁 C++正则表达式解析
💻 H
📖 第 1 页 / 共 4 页
字号:
12 3 4 下一页
#ifndef __PATTERN_H__
#define __PATTERN_H__

#ifdef _WIN32
  #pragma warning(disable:4786)
#endif

#include <vector>
#include <string>
#include <map>

class Matcher;
class NFANode;
class NFAQuantifierNode;

/**
  This pattern class is very similar in functionality to Java's
  java.util.regex.Pattern class. The pattern class represents an immutable
  regular expression object. Instead of having a single object contain both the
  regular expression object and the matching object, instead the two objects are
  split apart. The {@link Matcher Matcher} class represents the maching
  object.

  The Pattern class works primarily off of "compiled" patterns. A typical
  instantiation of a regular expression looks like:

  <pre>
  Pattern * p = Pattern::compile("a*b");
  Matcher * m = p->createMatcher("aaaaaab");
  if (m->matches()) ...
  </pre>

  However, if you do not need to use a pattern more than once, it is often times
  okay to use the Pattern's static methods insteads. An example looks like this:

  <pre>
  if (Pattern::matches("a*b", "aaaab")) { ... }
  </pre>

  This class does not currently support unicode. The unicode update for this
  class is coming soon.

  This class is partially immutable. It is completely safe to call createMatcher
  concurrently in different threads, but the other functions (e.g. split) should
  not be called concurrently on the same <code>Pattern</code>.

  <table border="0" cellpadding="1" cellspacing="0">
    <tr align="left" bgcolor="#CCCCFF">
      <td>
        <b>Construct</b>
      </td>
      <td>
        <b>Matches</b>
      </th>
    </tr>
    <tr>
      <td colspan="2">
      &nbsp;
      </td>
    </tr>
    <tr>
      <td colspan="2">
        <b>Characters</b>
      </td>
    </tr>
    <tr>
      <td>
        <code><i>x</i></code>
      </td>
      <td>
        The character <code><i>x</i></code>
      </td>
    </tr>
    <tr>
      <td>
        <code>\\</code>
      </td>
      <td>
        The character <code>\</code>
      </td>
    </tr>
    <tr>
      <td>
        <code>\0<i>nn</i></code>
      </td>
      <td>
        The character with octal ASCII value <code><i>nn</i></code>
      </td>
    </tr>
    <tr>
      <td>
        <code>\0<i>nnn</i></code>
      </td>
      <td>
        The character with octal ASCII value <code><i>nnn</i></code>
      </td>
    </tr>
    <tr>
      <td>
        <code>\x<i>hh</i></code>
      </td>
      <td>
        The character with hexadecimal ASCII value <code><i>hh</i></code>
      </td>
    </tr>
    <tr>
      <td>
        <code>\t</code>
      </td>
      <td>
        A tab character
      </td>
    </tr>
    <tr>
      <td>
        <code>\r</code>
      </td>
      <td>
        A carriage return character
      </td>
    </tr>
    <tr>
      <td>
        <code>\n</code>
      </td>
      <td>
        A new-line character
      </td>
    </tr>
    <tr>
      <td colspan="2">
        &nbsp;
      </td>
    </tr>
    <tr>
      <td>
        <b>Character Classes</b>
      </td>
    </tr>
    <tr>
      <td>
        <code>[abc]</code>
      </td>
      <td>
        Either <code>a</code>, <code>b</code>, or <code>c</code>
      </td>
    </tr>
    <tr>
      <td>
        <code>[^abc]</code>
      </td>
      <td>
        Any character but <code>a</code>, <code>b</code>, or <code>c</code>
      </td>
    </tr>
    <tr>
      <td>
        <code>[a-zA-Z]</code>
      </td>
      <td>
        Any character ranging from <code>a</code> thru <code>z</code>, or
        <code>A</code> thru <code>Z</code>
      </td>
    </tr>
    <tr>
      <td>
        <code>[^a-zA-Z]</code>
      </td>
      <td>
        Any character except those ranging from <code>a</code> thru
        <code>z</code>, or <code>A</code> thru <code>Z</code>
      </td>
    </tr>
    <tr>
      <td>
        <code>[a\-z]</code>
      </td>
      <td>
        Either <code>a</code>, <code>-</code>, or <code>z</code>
      </td>
    </tr>
    <tr>
      <td>
        <code>[a-z[A-Z]]</code>
      </td>
      <td>
        Same as <code>[a-zA-Z]</code>
      </td>
    </tr>
    <tr>
      <td>
        <code>[a-z&&[g-i]]</code>
      </td>
      <td>
        Any character in the intersection of <code>a-z</code> and
        <code>g-i</code>
      </td>
    </tr>
    <tr>
      <td>
        <code>[a-z&&[^g-i]]</code>
      </td>
      <td>
        Any character in <code>a-z</code> and not in <code>g-i</code>
      </td>
    </tr>
    <tr>
      <td colspan="2">
        &nbsp;
      </td>
    </tr>
    <tr>
      <td colspan="2">
        <b>Prefefined character classes</b>
      </td>
    </tr>
    <tr>
      <td>
        <code><b>.</b></code>
      </td>
      <td>
        Any character. Multiline matching must be compiled into the pattern for
        <code><b>.</b></code> to match a <code>\r</code> or a <code>\n</code>.
        Even if multiline matching is enabled, <code><b>.</b></code> will not
        match a <code>\r\n</code>, only a <code>\r</code> or a <code>\n</code>.
      </td>
    </tr>
    <tr>
      <td>
        <code>\d</code>
      </td>
      <td>
        <code>[0-9]</code>
      </td>
    </tr>
    <tr>
      <td>
        <code>\D</code>
      </td>
      <td>
        <code>[^\d]</code>
      </td>
    </tr>
    <tr>
      <td>
        <code>\s</code>
      </td>
      <td>
        <code>[&nbsp;\t\r\n\x0B]</code>
      </td>
    </tr>
    <tr>
      <td>
        <code>\S</code>
      </td>
      <td>
        <code>[^\s]</code>
      </td>
    </tr>
    <tr>
      <td>
        <code>\w</code>
      </td>
      <td>
        <code>[a-zA-Z0-9_]</code>
      </td>
    </tr>
    <tr>
      <td>
        <code>\W</code>
      </td>
      <td>
        <code>[^\w]</code>
      </td>
    </tr>
    <tr>
      <td colspan="2">
        &nbsp;
      </td>
    </tr>
    <tr>
      <td colspan="2">
        <b>POSIX character classes
      </td>
    </tr>
    <tr>
      <td>
        <code>\p{Lower}</code>
      </td>
      <td>
        <code>[a-z]</code>
      </td>
    </tr>
    <tr>
      <td>
        <code>\p{Upper}</code>
      </td>
      <td>
        <code>[A-Z]</code>
      </td>
    </tr>
    <tr>
      <td>
        <code>\p{ASCII}</code>
      </td>
      <td>
        <code>[\x00-\x7F]</code>
      </td>
    </tr>
    <tr>
      <td>
        <code>\p{Alpha}</code>
      </td>
      <td>
        <code>[a-zA-Z]</code>
      </td>
    </tr>
    <tr>
      <td>
        <code>\p{Digit}</code>
      </td>
      <td>
        <code>[0-9]</code>
      </td>
    </tr>
    <tr>
      <td>
        <code>\p{Alnum}</code>
      </td>
      <td>
        <code>[\w&&[^_]]</code>
      </td>
    </tr>
    <tr>
      <td>
        <code>\p{Punct}</code>
      </td>
      <td>
        <code>[!"#$%&'()*+,-./:;&lt;=&gt;?@[\]^_`{|}~]</code>
      </td>
    </tr>
    <tr>
      <td>
        <code>\p{XDigit}</code>
      </td>
      <td>
        <code>[a-fA-F0-9]</code>
      </td>
    </tr>
    <tr>
      <td colspan="2">
        &nbsp;
      </td>
    </tr>
    <tr>
      <td colspan="2">
        <b>Boundary Matches</b>
      </td>
    </tr>
    <tr>
      <td>
        <code>^</code>
      </td>
      <td>
        The beginning of a line. Also matches the beginning of input.
      </td>
    </tr>
    <tr>
      <td>
        <code>$</code>
      </td>
      <td>
        The end of a line. Also matches the end of input.
      </td>
    </tr>
    <tr>
      <td>
        <code>\b</code>
      </td>
      <td>
        A word boundary
      </td>
    </tr>
    <tr>
      <td>
        <code>\B</code>
      </td>
      <td>
        A non word boundary
      </td>
    </tr>
    <tr>
      <td>
        <code>\A</code>
      </td>
      <td>
        The beginning of input
      </td>
    </tr>
    <tr>
      <td>
        <code>\G</code>
      </td>
      <td>
        The end of the previous match. Ensures that a "next" match will only
        happen if it begins with the character immediately following the end of
        the "current" match.
      </td>
    </tr>
    <tr>
      <td>
        <code>\Z</code>
      </td>
      <td>
        The end of input. Will also match if there is a single trailing
        <code>\r\n</code>, a single trailing <code>\r</code>, or a single
12 3 4 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -