📄 pattern.java

📁 一个简单好用的java语言实现的个人日志管理系统
💻 JAVA
📖 第 1 页 / 共 5 页
字号:
 * literals that represent regular expressions to protect them from * interpretation by the Java bytecode compiler.  The string literal * <tt>"&#92;b"</tt>, for example, matches a single backspace character when * interpreted as a regular expression, while <tt>"&#92;&#92;b"</tt> matches a * word boundary.  The string literal <tt>"&#92;(hello&#92;)"</tt> is illegal * and leads to a compile-time error; in order to match the string * <tt>(hello)</tt> the string literal <tt>"&#92;&#92;(hello&#92;&#92;)"</tt> * must be used. * * <a name="cc"> * <h4> Character Classes </h4> * *    <p> Character classes may appear within other character classes, and *    may be composed by the union operator (implicit) and the intersection *    operator (<tt>&amp;&amp;</tt>). *    The union operator denotes a class that contains every character that is *    in at least one of its operand classes.  The intersection operator *    denotes a class that contains every character that is in both of its *    operand classes. * *    <p> The precedence of character-class operators is as follows, from *    highest to lowest: * *    <blockquote><table border="0" cellpadding="1" cellspacing="0"  *                 summary="Precedence of character class operators."> *      <tr><th>1&nbsp;&nbsp;&nbsp;&nbsp;</th> *	  <td>Literal escape&nbsp;&nbsp;&nbsp;&nbsp;</td> *	  <td><tt>\x</tt></td></tr> *     <tr><th>2&nbsp;&nbsp;&nbsp;&nbsp;</th> *	  <td>Grouping</td> *	  <td><tt>[...]</tt></td></tr> *     <tr><th>3&nbsp;&nbsp;&nbsp;&nbsp;</th> *	  <td>Range</td> *	  <td><tt>a-z</tt></td></tr> *      <tr><th>4&nbsp;&nbsp;&nbsp;&nbsp;</th> *	  <td>Union</td> *	  <td><tt>[a-e][i-u]<tt></td></tr> *      <tr><th>5&nbsp;&nbsp;&nbsp;&nbsp;</th> *	  <td>Intersection</td> *	  <td><tt>[a-z&&[aeiou]]</tt></td></tr> *    </table></blockquote> * *    <p> Note that a different set of metacharacters are in effect inside *    a character class than outside a character class. For instance, the *    regular expression <tt>.</tt> loses its special meaning inside a *    character class, while the expression <tt>-</tt> becomes a range *    forming metacharacter. * * <a name="lt"> * <h4> Line terminators </h4> * * <p> A <i>line terminator</i> is a one- or two-character sequence that marks * the end of a line of the input character sequence.  The following are * recognized as line terminators: * * <ul> * *   <li> A newline (line feed) character&nbsp;(<tt>'\n'</tt>), * *   <li> A carriage-return character followed immediately by a newline *   character&nbsp;(<tt>"\r\n"</tt>), * *   <li> A standalone carriage-return character&nbsp;(<tt>'\r'</tt>), * *   <li> A next-line character&nbsp;(<tt>'&#92;u0085'</tt>), * *   <li> A line-separator character&nbsp;(<tt>'&#92;u2028'</tt>), or * *   <li> A paragraph-separator character&nbsp;(<tt>'&#92;u2029</tt>). * * </ul> * <p>If {@link #UNIX_LINES} mode is activated, then the only line terminators * recognized are newline characters. * * <p> The regular expression <tt>.</tt> matches any character except a line * terminator unless the {@link #DOTALL} flag is specified. * * <p> By default, the regular expressions <tt>^</tt> and <tt>$</tt> ignore * line terminators and only match at the beginning and the end, respectively, * of the entire input sequence. If {@link #MULTILINE} mode is activated then * <tt>^</tt> matches at the beginning of input and after any line terminator * except at the end of input. When in {@link #MULTILINE} mode <tt>$</tt> * matches just before a line terminator or the end of the input sequence. * * <a name="cg"> * <h4> Groups and capturing </h4> * * <p> Capturing groups are numbered by counting their opening parentheses from * left to right.  In the expression <tt>((A)(B(C)))</tt>, for example, there * are four such groups: </p> * * <blockquote><table cellpadding=1 cellspacing=0 summary="Capturing group numberings"> * <tr><th>1&nbsp;&nbsp;&nbsp;&nbsp;</th> *     <td><tt>((A)(B(C)))</tt></td></tr> * <tr><th>2&nbsp;&nbsp;&nbsp;&nbsp;</th> *     <td><tt>(A)</tt></td></tr> * <tr><th>3&nbsp;&nbsp;&nbsp;&nbsp;</th> *     <td><tt>(B(C))</tt></td></tr> * <tr><th>4&nbsp;&nbsp;&nbsp;&nbsp;</th> *     <td><tt>(C)</tt></td></tr> * </table></blockquote> * * <p> Group zero always stands for the entire expression. * * <p> Capturing groups are so named because, during a match, each subsequence * of the input sequence that matches such a group is saved.  The captured * subsequence may be used later in the expression, via a back reference, and * may also be retrieved from the matcher once the match operation is complete. * * <p> The captured input associated with a group is always the subsequence * that the group most recently matched.  If a group is evaluated a second time * because of quantification then its previously-captured value, if any, will * be retained if the second evaluation fails.  Matching the string * <tt>"aba"</tt> against the expression <tt>(a(b)?)+</tt>, for example, leaves * group two set to <tt>"b"</tt>.  All captured input is discarded at the * beginning of each match. * * <p> Groups beginning with <tt>(?</tt> are pure, <i>non-capturing</i> groups * that do not capture text and do not count towards the group total. * * * <h4> Unicode support </h4> * * <p> This class follows <a * href="http://www.unicode.org/unicode/reports/tr18/"><i>Unicode Technical * Report #18: Unicode Regular Expression Guidelines</i></a>, implementing its * second level of support though with a slightly different concrete syntax. * * <p> Unicode escape sequences such as <tt>&#92;u2014</tt> in Java source code * are processed as described in <a * href="http://java.sun.com/docs/books/jls/second_edition/html/lexical.doc.html#100850">\u00A73.3</a> * of the Java Language Specification.  Such escape sequences are also * implemented directly by the regular-expression parser so that Unicode * escapes can be used in expressions that are read from files or from the * keyboard.  Thus the strings <tt>"&#92;u2014"</tt> and <tt>"\\u2014"</tt>, * while not equal, compile into the same pattern, which matches the character * with hexadecimal value <tt>0x2014</tt>. * * <a name="ubc"> <p>Unicode blocks and categories are written with the * <tt>\p</tt> and <tt>\P</tt> constructs as in * Perl. <tt>\p{</tt><i>prop</i><tt>}</tt> matches if the input has the * property <i>prop</i>, while \P{</tt><i>prop</i><tt>}</tt> does not match if * the input has that property.  Blocks are specified with the prefix * <tt>In</tt>, as in <tt>InMongolian</tt>.  Categories may be specified with * the optional prefix <tt>Is</tt>: Both <tt>\p{L}</tt> and <tt>\p{IsL}</tt> * denote the category of Unicode letters.  Blocks and categories can be used * both inside and outside of a character class. * * <p> The supported blocks and categories are those of <a * href="http://www.unicode.org/unicode/standard/standard.html"><i>The Unicode * Standard, Version&nbsp;3.0</i></a>.  The block names are those defined in * Chapter&nbsp;14 and in the file <a * href="http://www.unicode.org/Public/3.0-Update/Blocks-3.txt">Blocks-3.txt * </a> of the <a * href="http://www.unicode.org/Public/3.0-Update/UnicodeCharacterDatabase-3.0.0.html">Unicode * Character Database</a> except that the spaces are removed; <tt>"Basic * Latin"</tt>, for example, becomes <tt>"BasicLatin"</tt>.  The category names * are those defined in table 4-5 of the Standard (p.&nbsp;88), both normative * and informative. * * * <h4> Comparison to Perl 5 </h4> * * <p> Perl constructs not supported by this class: </p> * * <ul> * *    <li><p> The conditional constructs <tt>(?{</tt><i>X</i><tt>})</tt> and *    <tt>(?(</tt><i>condition</i><tt>)</tt><i>X</i><tt>|</tt><i>Y</i><tt>)</tt>, *    </p></li> * *    <li><p> The embedded code constructs <tt>(?{</tt><i>code</i><tt>})</tt> *    and <tt>(??{</tt><i>code</i><tt>})</tt>,</p></li> * *    <li><p> The embedded comment syntax <tt>(?#comment)</tt>, and </p></li> * *    <li><p> The preprocessing operations <tt>\l</tt> <tt>&#92;u</tt>, *    <tt>\L</tt>, and <tt>\U</tt>.  </p></li> * * </ul> * * <p> Constructs supported by this class but not by Perl: </p> * * <ul> * *    <li><p> Possessive quantifiers, which greedily match as much as they can *    and do not back off, even when doing so would allow the overall match to *    succeed.  </p></li> * *    <li><p> Character-class union and intersection as described *    <a href="#cc">above</a>.</p></li> * * </ul> * * <p> Notable differences from Perl: </p> * * <ul> * *    <li><p> In Perl, <tt>\1</tt> through <tt>\9</tt> are always interpreted *    as back references; a backslash-escaped number greater than <tt>9</tt> is *    treated as a back reference if at least that many subexpressions exist, *    otherwise it is interpreted, if possible, as an octal escape.  In this *    class octal escapes must always begin with a zero. In this class, *    <tt>\1</tt> through <tt>\9</tt> are always interpreted as back *    references, and a larger number is accepted as a back reference if at *    least that many subexpressions exist at that point in the regular *    expression, otherwise the parser will drop digits until the number is *    smaller or equal to the existing number of groups or it is one digit. *    </p></li> * *    <li><p> Perl uses the <tt>g</tt> flag to request a match that resumes *    where the last match left off.  This functionality is provided implicitly *    by the {@link Matcher} class: Repeated invocations of the {@link *    Matcher#find find} method will resume where the last match left off, *    unless the matcher is reset.  </p></li> * *    <li><p> In Perl, embedded flags at the top level of an expression affect *    the whole expression.  In this class, embedded flags always take effect *    at the point at which they appear, whether they are at the top level or *    within a group; in the latter case, flags are restored at the end of the *    group just as in Perl.  </p></li> * *    <li><p> Perl is forgiving about malformed matching constructs, as in the *    expression <tt>*a</tt>, as well as dangling brackets, as in the *    expression <tt>abc]</tt>, and treats them as literals.  This *    class also accepts dangling brackets but is strict about dangling *    metacharacters like +, ? and *, and will throw a *    {@link PatternSyntaxException} if it encounters them. </p></li> * * </ul> * * * <p> For a more precise description of the behavior of regular expression * constructs, please see <a href="http://www.oreilly.com/catalog/regex2/"> * <i>Mastering Regular Expressions, 2nd Edition</i>, Jeffrey E. F. Friedl, * O'Reilly and Associates, 2002.</a> * </p> * * @see java.lang.String#split(String, int) * @see java.lang.String#split(String) * * @author      Mike McCloskey * @author      Mark Reinhold * @author	JSR-51 Expert Group * @version 	1.97, 04/01/13 * @since       1.4 * @spec	JSR-51 */public final class Pattern    implements java.io.Serializable{    /**     * Regular expression modifier values.  Instead of being passed as     * arguments, they can also be passed as inline modifiers.     * For example, the following statements have the same effect.     * <pre>     * RegExp r1 = RegExp.compile("abc", Pattern.I|Pattern.M);     * RegExp r2 = RegExp.compile("(?im)abc", 0);     * </pre>     *     * The flags are duplicated so that the familiar Perl match flag     * names are available.     */    /**     * Enables Unix lines mode.     *     * <p> In this mode, only the <tt>'\n'</tt> line terminator is recognized     * in the behavior of <tt>.</tt>, <tt>^</tt>, and <tt>$</tt>.     *     * <p> Unix lines mode can also be enabled via the embedded flag     * expression&nbsp;<tt>(?d)</tt>.     */    public static final int UNIX_LINES = 0x01;    /**     * Enables case-insensitive matching.     *     * <p> By default, case-insensitive matching assumes that only characters     * in the US-ASCII charset are being matched.  Unicode-aware     * case-insensitive matching can be enabled by specifying the {@link     * #UNICODE_CASE} flag in conjunction with this flag.     *     * <p> Case-insensitive matching can also be enabled via the embedded flag     * expression&nbsp;<tt>(?i)</tt>.     *     * <p> Specifying this flag may impose a slight performance penalty.  </p>     */    public static final int CASE_INSENSITIVE = 0x02;    /**     * Permits whitespace and comments in pattern.     *     * <p> In this mode, whitespace is ignored, and embedded comments starting     * with <tt>#</tt> are ignored until the end of a line.     *     * <p> Comments mode can also be enabled via the embedded flag     * expression&nbsp;<tt>(?x)</tt>.     */    public static final int COMMENTS = 0x04;    /**     * Enables multiline mode.     *     * <p> In multiline mode the expressions <tt>^</tt> and <tt>$</tt> match     * just after or just before, respectively, a line terminator or the end of     * the input sequence.  By default these expressions only match at the     * beginning and the end of the entire input sequence.     *     * <p> Multiline mode can also be enabled via the embedded flag     * expression&nbsp;<tt>(?m)</tt>.  </p>     */    public static final int MULTILINE = 0x08;    /**     * Enables dotall mode.     *     * <p> In dotall mode, the expression <tt>.</tt> matches any character,     * including a line terminator.  By default this expression does not match     * line terminators.     *     * <p> Dotall mode can also be enabled via the embedded flag     * expression&nbsp;<tt>(?s)</tt>.  (The <tt>s</tt> is a mnemonic for     * "single-line" mode, which is what this is called in Perl.)  </p>     */    public static final int DOTALL = 0x20;
💿 文件大小 121 K
👤 上传用户 gankai1983
📂 所属分类 Java编程
🏷️ 相关标签

#java #语言 #日志 #管理系统
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -