glib-regex-syntax.html

来自「最新gtk中文资料集」· HTML 代码 · 共 1,890 行 · 第 1/5 页
HTML
1,890 行
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"><html><head><meta http-equiv="Content-Type" content="text/html; charset=US-ASCII"><title>Regular expression syntax</title><meta name="generator" content="DocBook XSL Stylesheets V1.73.2"><link rel="start" href="index.html" title="GLib Reference Manual"><link rel="up" href="glib.html" title="GLib Overview"><link rel="prev" href="glib-changes.html" title="Changes to GLib"><link rel="next" href="glib-resources.html" title="Mailing lists and bug reports"><meta name="generator" content="GTK-Doc V1.9 (XML mode)"><link rel="stylesheet" href="style.css" type="text/css"><link rel="chapter" href="glib.html" title="GLib Overview"><link rel="chapter" href="glib-fundamentals.html" title="GLib Fundamentals"><link rel="chapter" href="glib-core.html" title="GLib Core Application Support"><link rel="chapter" href="glib-utilities.html" title="GLib Utilities"><link rel="chapter" href="glib-data-types.html" title="GLib Data Types"><link rel="chapter" href="tools.html" title="GLib Tools"><link rel="index" href="ix01.html" title="Index"><link rel="index" href="ix02.html" title="Index of deprecated symbols"><link rel="index" href="ix03.html" title="Index of new symbols in 2.2"><link rel="index" href="ix04.html" title="Index of new symbols in 2.4"><link rel="index" href="ix05.html" title="Index of new symbols in 2.6"><link rel="index" href="ix06.html" title="Index of new symbols in 2.8"><link rel="index" href="ix07.html" title="Index of new symbols in 2.10"><link rel="index" href="ix08.html" title="Index of new symbols in 2.12"><link rel="index" href="ix09.html" title="Index of new symbols in 2.14"><link rel="index" href="ix10.html" title="Index of new symbols in 2.16"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><table class="navigation" id="top" width="100%" summary="Navigation header" cellpadding="2" cellspacing="2"><tr valign="middle"><td><a accesskey="p" href="glib-changes.html"><img src="left.png" width="24" height="24" border="0" alt="Prev"></a></td><td><a accesskey="u" href="glib.html"><img src="up.png" width="24" height="24" border="0" alt="Up"></a></td><td><a accesskey="h" href="index.html"><img src="home.png" width="24" height="24" border="0" alt="Home"></a></td><th width="100%" align="center">GLib Reference Manual</th><td><a accesskey="n" href="glib-resources.html"><img src="right.png" width="24" height="24" border="0" alt="Next"></a></td></tr></table><div class="refentry" lang="en"><a name="glib-regex-syntax"></a><div class="titlepage"></div><div class="refnamediv"><table width="100%"><tr><td valign="top"><h2><span class="refentrytitle">Regular expression syntax</span></h2><p>Regular expression syntax &#8212; Syntax and semantics of the regular expressions supported by GRegex</p></td><td valign="top" align="right"></td></tr></table></div><div class="refsect1" lang="en"><a name="id2811996"></a><h2>GRegex regular expression details</h2><p>A regular expression is a pattern that is matched against astring from left to right. Most characters stand for themselves in apattern, and match the corresponding characters in the string. As atrivial example, the pattern</p><pre class="programlisting">The quick brown fox</pre><p>matches a portion of a string that is identical to itself. Whencaseless matching is specified (the <code class="varname">G_REGEX_CASELESS</code> flag), letters arematched independently of case.</p><p>The power of regular expressions comes from the ability to includealternatives and repetitions in the pattern. These are encoded in thepattern by the use of metacharacters, which do not stand for themselvesbut instead are interpreted in some special way.</p><p>There are two different sets of metacharacters: those that are recognizedanywhere in the pattern except within square brackets, and thosethat are recognized in square brackets. Outside square brackets, themetacharacters are as follows:</p><div class="table"><a name="id2813080"></a><p class="title"><b>Table&#160;1.&#160;Metacharacters outside square brackets</b></p><div class="table-contents"><table summary="Metacharacters outside square brackets" border="1"><colgroup><col align="center"><col></colgroup><thead><tr><th align="center">Character</th><th>Meaning</th></tr></thead><tbody><tr><td align="center">\</td><td>general escape character with several uses</td></tr><tr><td align="center">^</td><td>assert start of string (or line, in multiline mode)</td></tr><tr><td align="center">$</td><td>assert end of string (or line, in multiline mode)</td></tr><tr><td align="center">.</td><td>match any character except newline (by default)</td></tr><tr><td align="center">[</td><td>start character class definition</td></tr><tr><td align="center">|</td><td>start of alternative branch</td></tr><tr><td align="center">(</td><td>start subpattern</td></tr><tr><td align="center">)</td><td>end subpattern</td></tr><tr><td align="center">?</td><td>extends the meaning of (, or 0/1 quantifier, or quantifier minimizer</td></tr><tr><td align="center">*</td><td>0 or more quantifier</td></tr><tr><td align="center">+</td><td>1 or more quantifier, also "possessive quantifier"</td></tr><tr><td align="center">{</td><td>start min/max quantifier</td></tr></tbody></table></div></div><br class="table-break"><p>Part of a pattern that is in square brackets is called a "characterclass". In a character class the only metacharacters are:</p><div class="table"><a name="id2813248"></a><p class="title"><b>Table&#160;2.&#160;Metacharacters inside square brackets</b></p><div class="table-contents"><table summary="Metacharacters inside square brackets" border="1"><colgroup><col align="center"><col></colgroup><thead><tr><th align="center">Character</th><th>Meaning</th></tr></thead><tbody><tr><td align="center">\</td><td>general escape character</td></tr><tr><td align="center">^</td><td>negate the class, but only if the first character</td></tr><tr><td align="center">-</td><td>indicates character range</td></tr><tr><td align="center">[</td><td>POSIX character class (only if followed by POSIX syntax)</td></tr><tr><td align="center">]</td><td>terminates the character class</td></tr></tbody></table></div></div><br class="table-break"></div><div class="refsect1" lang="en"><a name="id2813340"></a><h2>Backslash</h2><p>The backslash character has several uses. Firstly, if it is followed bya non-alphanumeric character, it takes away any special meaning thatcharacter may have. This use of backslash as an escape characterapplies both inside and outside character classes.</p><p>For example, if you want to match a * character, you write \* in thepattern. This escaping action applies whether or not the followingcharacter would otherwise be interpreted as a metacharacter, so it isalways safe to precede a non-alphanumeric with backslash to specifythat it stands for itself. In particular, if you want to match abackslash, you write \\.</p><p>If a pattern is compiled with the <code class="varname">G_REGEX_EXTENDED</code>option, whitespace in the pattern (other than in a character class) andcharacters between a # outside a character class and the next newlineare ignored.An escaping backslash can be used to include a whitespace or # characteras part of the pattern.</p><p>If you want to remove the special meaning from a sequence of characters,you can do so by putting them between \Q and \E.The \Q...\E sequence is recognized both inside and outside characterclasses.</p><div class="refsect2" lang="en"><a name="id2813381"></a><h3>Non-printing characters</h3><p>A second use of backslash provides a way of encoding non-printingcharacters in patterns in a visible manner. There is no restriction on theappearance of non-printing characters, apart from the binary zero thatterminates a pattern, but when a pattern is being prepared by textediting, it is usually easier to use one of the following escapesequences than the binary character it represents:</p><div class="table"><a name="id2813396"></a><p class="title"><b>Table&#160;3.&#160;Non-printing characters</b></p><div class="table-contents"><table summary="Non-printing characters" border="1"><colgroup><col align="center"><col></colgroup><thead><tr><th align="center">Escape</th><th>Meaning</th></tr></thead><tbody><tr><td align="center">\a</td><td>alarm, that is, the BEL character (hex 07)</td></tr><tr><td align="center">\cx</td><td>"control-x", where x is any character</td></tr><tr><td align="center">\e</td><td>escape (hex 1B)</td></tr><tr><td align="center">\f</td><td>formfeed (hex 0C)</td></tr><tr><td align="center">\n</td><td>newline (hex 0A)</td></tr><tr><td align="center">\r</td><td>carriage return (hex 0D)</td></tr><tr><td align="center">\t</td><td>tab (hex 09)</td></tr><tr><td align="center">\ddd</td><td>character with octal code ddd, or backreference</td></tr><tr><td align="center">\xhh</td><td>character with hex code hh</td></tr><tr><td align="center">\x{hhh..}</td><td>character with hex code hhh..</td></tr></tbody></table></div></div><br class="table-break"><p>The precise effect of \cx is as follows: if x is a lower case letter,it is converted to upper case. Then bit 6 of the character (hex 40) isinverted. Thus \cz becomes hex 1A, but \c{ becomes hex 3B, while \c;becomes hex 7B.</p><p>After \x, from zero to two hexadecimal digits are read (letters can bein upper or lower case). Any number of hexadecimal digits may appearbetween \x{ and }, but the value of the character codemust be less than 2**31 (that is, the maximum hexadecimal value is7FFFFFFF). If characters other than hexadecimal digits appear between\x{ and }, or if there is no terminating }, this form of escape is notrecognized. Instead, the initial \x will be interpreted as a basic hexadecimalescape, with no following digits, giving a character whosevalue is zero.</p><p>Characters whose value is less than 256 can be defined by either of thetwo syntaxes for \x. There is no differencein the way they are handled. For example, \xdc is exactly the same as\x{dc}.</p><p>After \0 up to two further octal digits are read. If there are fewerthan two digits, just those that are present are used.Thus the sequence \0\x\07 specifies two binary zeros followed by a BELcharacter (code value 7). Make sure you supply two digits after theinitial zero if the pattern character that follows is itself an octaldigit.</p><p>The handling of a backslash followed by a digit other than 0 is complicated.Outside a character class, GRegex reads it and any following digits as adecimal number. If the number is less than 10, or if therehave been at least that many previous capturing left parentheses in theexpression, the entire sequence is taken as a back reference. Adescription of how this works is given later, following the discussionof parenthesized subpatterns.</p><p>Inside a character class, or if the decimal number is greater than 9and there have not been that many capturing subpatterns, GRegex re-readsup to three octal digits following the backslash, and uses them to generatea data character. Any subsequent digits stand for themselves. For example:</p><div class="table"><a name="id2813590"></a><p class="title"><b>Table&#160;4.&#160;Non-printing characters</b></p><div class="table-contents"><table summary="Non-printing characters" border="1"><colgroup><col align="center"><col></colgroup><thead><tr><th align="center">Escape</th><th>Meaning</th></tr></thead><tbody><tr><td align="center">\040</td><td>is another way of writing a space</td></tr><tr><td align="center">\40</td><td>is the same, provided there are fewer than 40 previous capturing subpatterns</td></tr><tr><td align="center">\7</td><td>is always a back reference</td></tr><tr><td align="center">\11</td><td>might be a back reference, or another way of writing a tab</td></tr><tr><td align="center">\011</td><td>is always a tab</td></tr><tr><td align="center">\0113</td><td>is a tab followed by the character "3"</td></tr><tr><td align="center">\113</td><td>might be a back reference, otherwise the character with octal code 113</td></tr><tr><td align="center">\377</td><td>might be a back reference, otherwise the byte consisting entirely of 1 bits</td></tr><tr><td align="center">\81</td><td>is either a back reference, or a binary zero followed by the two characters "8" and "1"</td></tr></tbody></table></div></div><br class="table-break"><p>Note that octal values of 100 or greater must not be introduced by aleading zero, because no more than three octal digits are ever read.</p><p>All the sequences that define a single character can be used both inside
glib-regex-syntax.html - 源码说明

本页面展示了「最新gtk中文资料集」中的 glib-regex-syntax.html 源码文件，采用 HTML 编程语言编写，共 1,890 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫下载站收录了大量与gtk相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?