📄 regularexpression.java

📁 java1.6众多例子参考
💻 JAVA
📖 第 1 页 / 共 5 页
字号:
 *       <dt class="REGEX"><KBD>(?:</kbd><VAR>X</VAR><kbd>)</KBD> *       <dd>Grouping. "<KBD>foo+</KBD>" matches "<KBD>foo</KBD>" or "<KBD>foooo</KBD>". *       If you want it matches "<KBD>foofoo</KBD>" or "<KBD>foofoofoo</KBD>", *       you have to write "<KBD>(?:foo)+</KBD>". * *       <dt class="REGEX"><KBD>(</kbd><VAR>X</VAR><kbd>)</KBD> *       <dd>Grouping with capturing. * It make a group and applications can know * where in target text a group matched with methods of a <code>Match</code> instance * after <code><a href="#matches(java.lang.String, com.sun.org.apache.xerces.internal.utils.regex.Match)">matches(String,Match)</a></code>. * The 0th group means whole of this regular expression. * The <VAR>N</VAR>th gorup is the inside of the <VAR>N</VAR>th left parenthesis. *  *   <p>For instance, a regular expression is *   "<FONT color=blue><KBD> *([^&lt;:]*) +&lt;([^&gt;]*)&gt; *</KBD></FONT>" *   and target text is *   "<FONT color=red><KBD>From: TAMURA Kent &lt;kent@trl.ibm.co.jp&gt;</KBD></FONT>": *   <ul> *     <li><code>Match.getCapturedText(0)</code>: *     "<FONT color=red><KBD> TAMURA Kent &lt;kent@trl.ibm.co.jp&gt;</KBD></FONT>" *     <li><code>Match.getCapturedText(1)</code>: "<FONT color=red><KBD>TAMURA Kent</KBD></FONT>" *     <li><code>Match.getCapturedText(2)</code>: "<FONT color=red><KBD>kent@trl.ibm.co.jp</KBD></FONT>" *   </ul> * *       <dt class="REGEX"><kbd>\1 \2 \3 \4 \5 \6 \7 \8 \9</kbd> *       <dd> * *       <dt class="REGEX"><kbd>(?></kbd><var>X</var><kbd>)</kbd> *       <dd>Independent expression group. ................ * *       <dt class="REGEX"><kbd>(?</kbd><var>options</var><kbd>:</kbd><var>X</var><kbd>)</kbd> *       <dt class="REGEX"><kbd>(?</kbd><var>options</var><kbd>-</kbd><var>options2</var><kbd>:</kbd><var>X</var><kbd>)</kbd> *       <dd>............................ *       <dd>The <var>options</var> or the <var>options2</var> consists of 'i' 'm' 's' 'w'. *           Note that it can not contain 'u'. * *       <dt class="REGEX"><kbd>(?</kbd><var>options</var><kbd>)</kbd> *       <dt class="REGEX"><kbd>(?</kbd><var>options</var><kbd>-</kbd><var>options2</var><kbd>)</kbd> *       <dd>...... *       <dd>These expressions must be at the beginning of a group. *     </dl> *   </li> * *   <li>Anchor *     <dl> *       <dt class="REGEX"><kbd>\A</kbd> *       <dd>Matches the beginnig of the text. * *       <dt class="REGEX"><kbd>\Z</kbd> *       <dd>Matches the end of the text, or before an EOL character at the end of the text, *           or CARRIAGE RETURN + LINE FEED at the end of the text. * *       <dt class="REGEX"><kbd>\z</kbd> *       <dd>Matches the end of the text. * *       <dt class="REGEX"><kbd>^</kbd> *       <dd>Matches the beginning of the text.  It is equivalent to <span class="REGEX"><Kbd>\A</kbd></span>. *       <dd>When <a href="#M_OPTION">a "m" option</a> is set, *           it matches the beginning of the text, or after one of EOL characters ( *           LINE FEED (U+000A), CARRIAGE RETURN (U+000D), LINE SEPARATOR (U+2028), *           PARAGRAPH SEPARATOR (U+2029).) * *       <dt class="REGEX"><kbd>$</kbd> *       <dd>Matches the end of the text, or before an EOL character at the end of the text, *           or CARRIAGE RETURN + LINE FEED at the end of the text. *       <dd>When <a href="#M_OPTION">a "m" option</a> is set, *           it matches the end of the text, or before an EOL character. * *       <dt class="REGEX"><kbd>\b</kbd> *       <dd>Matches word boundary. *           (See <a href="#W_OPTION">a "w" option</a>) * *       <dt class="REGEX"><kbd>\B</kbd> *       <dd>Matches non word boundary. *           (See <a href="#W_OPTION">a "w" option</a>) * *       <dt class="REGEX"><kbd>\&lt;</kbd> *       <dd>Matches the beginning of a word. *           (See <a href="#W_OPTION">a "w" option</a>) * *       <dt class="REGEX"><kbd>\&gt;</kbd> *       <dd>Matches the end of a word. *           (See <a href="#W_OPTION">a "w" option</a>) *     </dl> *   </li> *   <li>Lookahead and lookbehind *     <dl> *       <dt class="REGEX"><kbd>(?=</kbd><var>X</var><kbd>)</kbd> *       <dd>Lookahead. * *       <dt class="REGEX"><kbd>(?!</kbd><var>X</var><kbd>)</kbd> *       <dd>Negative lookahead. * *       <dt class="REGEX"><kbd>(?&lt;=</kbd><var>X</var><kbd>)</kbd> *       <dd>Lookbehind. *       <dd>(Note for text capturing......) * *       <dt class="REGEX"><kbd>(?&lt;!</kbd><var>X</var><kbd>)</kbd> *       <dd>Negative lookbehind. *     </dl> *   </li> * *   <li>Misc. *     <dl> *       <dt class="REGEX"><kbd>(?(</Kbd><var>condition</var><Kbd>)</kbd><var>yes-pattern</var><kbd>|</kbd><var>no-pattern</var><kbd>)</kbd>, *       <dt class="REGEX"><kbd>(?(</kbd><var>condition</var><kbd>)</kbd><var>yes-pattern</var><kbd>)</kbd> *       <dd>...... *       <dt class="REGEX"><kbd>(?#</kbd><var>comment</var><kbd>)</kbd> *       <dd>Comment.  A comment string consists of characters except '<kbd>)</kbd>'. *           You can not write comments in character classes and before quantifiers. *     </dl> *   </li> * </ul> * * * <hr width="50%"> * <h3>BNF for the regular expression</h3> * <pre> * regex ::= ('(?' options ')')? term ('|' term)* * term ::= factor+ * factor ::= anchors | atom (('*' | '+' | '?' | minmax ) '?'? )? *            | '(?#' [^)]* ')' * minmax ::= '{' ([0-9]+ | [0-9]+ ',' | ',' [0-9]+ | [0-9]+ ',' [0-9]+) '}' * atom ::= char | '.' | char-class | '(' regex ')' | '(?:' regex ')' | '\' [0-9] *          | '\w' | '\W' | '\d' | '\D' | '\s' | '\S' | category-block | '\X' *          | '(?>' regex ')' | '(?' options ':' regex ')' *          | '(?' ('(' [0-9] ')' | '(' anchors ')' | looks) term ('|' term)? ')' * options ::= [imsw]* ('-' [imsw]+)? * anchors ::= '^' | '$' | '\A' | '\Z' | '\z' | '\b' | '\B' | '\&lt;' | '\>' * looks ::= '(?=' regex ')'  | '(?!' regex ')' *           | '(?&lt;=' regex ')' | '(?&lt;!' regex ')' * char ::= '\\' | '\' [efnrtv] | '\c' [@-_] | code-point | character-1 * category-block ::= '\' [pP] category-symbol-1 *                    | ('\p{' | '\P{') (category-symbol | block-name *                                       | other-properties) '}' * category-symbol-1 ::= 'L' | 'M' | 'N' | 'Z' | 'C' | 'P' | 'S' * category-symbol ::= category-symbol-1 | 'Lu' | 'Ll' | 'Lt' | 'Lm' | Lo' *                     | 'Mn' | 'Me' | 'Mc' | 'Nd' | 'Nl' | 'No' *                     | 'Zs' | 'Zl' | 'Zp' | 'Cc' | 'Cf' | 'Cn' | 'Co' | 'Cs' *                     | 'Pd' | 'Ps' | 'Pe' | 'Pc' | 'Po' *                     | 'Sm' | 'Sc' | 'Sk' | 'So' * block-name ::= (See above) * other-properties ::= 'ALL' | 'ASSIGNED' | 'UNASSIGNED' * character-1 ::= (any character except meta-characters) * * char-class ::= '[' ranges ']' *                | '(?[' ranges ']' ([-+&] '[' ranges ']')? ')' * ranges ::= '^'? (range <a href="#COMMA_OPTION">','?</a>)+ * range ::= '\d' | '\w' | '\s' | '\D' | '\W' | '\S' | category-block *           | range-char | range-char '-' range-char * range-char ::= '\[' | '\]' | '\\' | '\' [,-efnrtv] | code-point | character-2 * code-point ::= '\x' hex-char hex-char *                | '\x{' hex-char+ '}' * <!--               | '\u005c u' hex-char hex-char hex-char hex-char * -->               | '\v' hex-char hex-char hex-char hex-char hex-char hex-char * hex-char ::= [0-9a-fA-F] * character-2 ::= (any character except \[]-,) * </pre> * * <hr width="50%"> * <h3>TODO</h3> * <ul> *   <li><a href="http://www.unicode.org/unicode/reports/tr18/">Unicode Regular Expression Guidelines</a> *     <ul> *       <li>2.4 Canonical Equivalents *       <li>Level 3 *     </ul> *   <li>Parsing performance * </ul> * * <hr width="50%"> *  * @xerces.internal * * @author TAMURA Kent &lt;kent@trl.ibm.co.jp&gt; * @version $Id: RegularExpression.java,v 1.2.6.1 2005/09/06 11:46:34 neerajbj Exp $ */public class RegularExpression implements java.io.Serializable {        private static final long serialVersionUID = 3905241217112815923L;    static final boolean DEBUG = false;    /**     * Compiles a token tree into an operation flow.     */    private synchronized void compile(Token tok) {        if (this.operations != null)            return;        this.numberOfClosures = 0;        this.operations = this.compile(tok, null, false);    }    /**     * Converts a token to an operation.     */    private Op compile(Token tok, Op next, boolean reverse) {        Op ret;        switch (tok.type) {        case Token.DOT:            ret = Op.createDot();            ret.next = next;            break;        case Token.CHAR:            ret = Op.createChar(tok.getChar());            ret.next = next;            break;        case Token.ANCHOR:            ret = Op.createAnchor(tok.getChar());            ret.next = next;            break;        case Token.RANGE:        case Token.NRANGE:            ret = Op.createRange(tok);            ret.next = next;            break;        case Token.CONCAT:            ret = next;            if (!reverse) {                for (int i = tok.size()-1;  i >= 0;  i --) {                    ret = compile(tok.getChild(i), ret, false);                }            } else {                for (int i = 0;  i < tok.size();  i ++) {                    ret = compile(tok.getChild(i), ret, true);                }            }            break;        case Token.UNION:            Op.UnionOp uni = Op.createUnion(tok.size());            for (int i = 0;  i < tok.size();  i ++) {                uni.addElement(compile(tok.getChild(i), next, reverse));            }            ret = uni;                          // ret.next is null.            break;        case Token.CLOSURE:        case Token.NONGREEDYCLOSURE:            Token child = tok.getChild(0);            int min = tok.getMin();            int max = tok.getMax();            if (min >= 0 && min == max) { // {n}                ret = next;                for (int i = 0; i < min;  i ++) {                    ret = compile(child, ret, reverse);                }                break;            }            if (min > 0 && max > 0)                max -= min;            if (max > 0) {                // X{2,6} -> XX(X(X(XX?)?)?)?                ret = next;                for (int i = 0;  i < max;  i ++) {                    Op.ChildOp q = Op.createQuestion(tok.type == Token.NONGREEDYCLOSURE);                    q.next = next;                    q.setChild(compile(child, ret, reverse));                    ret = q;                }            } else {                Op.ChildOp op;                if (tok.type == Token.NONGREEDYCLOSURE) {                    op = Op.createNonGreedyClosure();                } else {                        // Token.CLOSURE                    if (child.getMinLength() == 0)                        op = Op.createClosure(this.numberOfClosures++);                    else                        op = Op.createClosure(-1);                }                op.next = next;                op.setChild(compile(child, op, reverse));                ret = op;            }            if (min > 0) {                for (int i = 0;  i < min;  i ++) {                    ret = compile(child, ret, reverse);                }            }            break;        case Token.EMPTY:            ret = next;            break;        case Token.STRING:            ret = Op.createString(tok.getString());            ret.next = next;            break;        case Token.BACKREFERENCE:            ret = Op.createBackReference(tok.getReferenceNumber());            ret.next = next;            break;        case Token.PAREN:            if (tok.getParenNumber() == 0) {                ret = compile(tok.getChild(0), next, reverse);            } else if (reverse) {                next = Op.createCapture(tok.getParenNumber(), next);                next = compile(tok.getChild(0), next, reverse);
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -