📄 glib-regex-syntax.html
字号:
and outside character classes. In addition, inside a character class, thesequence \b is interpreted as the backspace character (hex 08), and thesequences \R and \X are interpreted as the characters "R" and "X", respectively.Outside a character class, these sequences have different meanings (see below).</p></div><hr><div class="refsect2" lang="en"><a name="id2813740"></a><h3>Absolute and relative back references</h3><p>The sequence \g followed by a positive or negative number, optionally enclosedin braces, is an absolute or relative back reference. Back references arediscussed later, following the discussion of parenthesized subpatterns.</p></div><hr><div class="refsect2" lang="en"><a name="id2813755"></a><h3>Generic character types</h3><p>Another use of backslash is for specifying generic character types.The following are always recognized:</p><div class="table"><a name="id2813765"></a><p class="title"><b>Table 5. Generic characters</b></p><div class="table-contents"><table summary="Generic characters" border="1"><colgroup><col align="center"><col></colgroup><thead><tr><th align="center">Escape</th><th>Meaning</th></tr></thead><tbody><tr><td align="center">\d</td><td>any decimal digit</td></tr><tr><td align="center">\D</td><td>any character that is not a decimal digit</td></tr><tr><td align="center">\s</td><td>any whitespace character</td></tr><tr><td align="center">\S</td><td>any character that is not a whitespace character</td></tr><tr><td align="center">\w</td><td>any "word" character</td></tr><tr><td align="center">\W</td><td>any "non-word" character</td></tr></tbody></table></div></div><br class="table-break"><p>Each pair of escape sequences partitions the complete set of charactersinto two disjoint sets. Any given character matches one, and only one,of each pair.</p><p>These character type sequences can appear both inside and outside characterclasses. They each match one character of the appropriate type.If the current matching point is at the end of the passed string, allof them fail, since there is no character to match.</p><p>For compatibility with Perl, \s does not match the VT character (code11). This makes it different from the the POSIX "space" class. The \scharacters are HT (9), LF (10), FF (12), CR (13), and space (32).</p><p>A "word" character is an underscore or any character less than 256 thatis a letter or digit.</p><p>Characters with values greater than 128 never match \d,\s, or \w, and always match \D, \S, and \W.</p></div><hr><div class="refsect2" lang="en"><a name="id2813896"></a><h3>Newline sequences</h3><p>Outside a character class, the escape sequence \R matches any Unicodenewline sequence.This particular group matches either the two-character sequence CR followed byLF, or one of the single characters LF (linefeed, U+000A), VT (vertical tab,U+000B), FF (formfeed, U+000C), CR (carriage return, U+000D), NEL (nextline, U+0085), LS (line separator, U+2028), or PS (paragraph separator, U+2029).The two-character sequence is treated as a single unit thatcannot be split. Inside a character class, \R matches the letter "R".</p></div><hr><div class="refsect2" lang="en"><a name="id2813914"></a><h3>Unicode character properties</h3><p>To support generic character types there are three additional escapesequences, they are:</p><div class="table"><a name="id2813925"></a><p class="title"><b>Table 6. Generic character types</b></p><div class="table-contents"><table summary="Generic character types" border="1"><colgroup><col align="center"><col></colgroup><thead><tr><th align="center">Escape</th><th>Meaning</th></tr></thead><tbody><tr><td align="center">\p{xx}</td><td>a character with the xx property</td></tr><tr><td align="center">\P{xx}</td><td>a character without the xx property</td></tr><tr><td align="center">\X</td><td>an extended Unicode sequence</td></tr></tbody></table></div></div><br class="table-break"><p>The property names represented by xx above are limited to the Unicodescript names, the general category properties, and "Any", which matchesany character (including newline). Other properties such as "InMusicalSymbols"are not currently supported. Note that \P{Any} does not match any characters,so always causes a match failure.</p><p>Sets of Unicode characters are defined as belonging to certain scripts. Acharacter from one of these sets can be matched using a script name. Forexample, \p{Greek} or \P{Han}.</p><p>Those that are not part of an identified script are lumped together as"Common". The current list of scripts is:</p><div class="itemizedlist"><ul type="disc"><li><p>Arabic</p></li><li><p>Armenian</p></li><li><p>Balinese</p></li><li><p>Bengali</p></li><li><p>Bopomofo</p></li><li><p>Braille</p></li><li><p>Buginese</p></li><li><p>Buhid</p></li><li><p>Canadian_Aboriginal</p></li><li><p>Cherokee</p></li><li><p>Common</p></li><li><p>Coptic</p></li><li><p>Cuneiform</p></li><li><p>Cypriot</p></li><li><p>Cyrillic</p></li><li><p>Deseret</p></li><li><p>Devanagari</p></li><li><p>Ethiopic</p></li><li><p>Georgian</p></li><li><p>Glagolitic</p></li><li><p>Gothic</p></li><li><p>Greek</p></li><li><p>Gujarati</p></li><li><p>Gurmukhi</p></li><li><p>Han</p></li><li><p>Hangul</p></li><li><p>Hanunoo</p></li><li><p>Hebrew</p></li><li><p>Hiragana</p></li><li><p>Inherited</p></li><li><p>Kannada</p></li><li><p>Katakana</p></li><li><p>Kharoshthi</p></li><li><p>Khmer</p></li><li><p>Lao</p></li><li><p>Latin</p></li><li><p>Limbu</p></li><li><p>Linear_B</p></li><li><p>Malayalam</p></li><li><p>Mongolian</p></li><li><p>Myanmar</p></li><li><p>New_Tai_Lue</p></li><li><p>Nko</p></li><li><p>Ogham</p></li><li><p>Old_Italic</p></li><li><p>Old_Persian</p></li><li><p>Oriya</p></li><li><p>Osmanya</p></li><li><p>Phags_Pa</p></li><li><p>Phoenician</p></li><li><p>Runic</p></li><li><p>Shavian</p></li><li><p>Sinhala</p></li><li><p>Syloti_Nagri</p></li><li><p>Syriac</p></li><li><p>Tagalog</p></li><li><p>Tagbanwa</p></li><li><p>Tai_Le</p></li><li><p>Tamil</p></li><li><p>Telugu</p></li><li><p>Thaana</p></li><li><p>Thai</p></li><li><p>Tibetan</p></li><li><p>Tifinagh</p></li><li><p>Ugaritic</p></li><li><p>Yi</p></li></ul></div><p>Each character has exactly one general category property, specified by atwo-letter abbreviation. For compatibility with Perl, negation can be specifiedby including a circumflex between the opening brace and the property name. Forexample, \p{^Lu} is the same as \P{Lu}.</p><p>If only one letter is specified with \p or \P, it includes all the generalcategory properties that start with that letter. In this case, in the absenceof negation, the curly brackets in the escape sequence are optional; these twoexamples have the same effect:</p><pre class="programlisting">\p{L}\pL</pre><p>The following general category property codes are supported:</p><div class="table"><a name="id2814326"></a><p class="title"><b>Table 7. Property codes</b></p><div class="table-contents"><table summary="Property codes" border="1"><colgroup><col align="center"><col></colgroup><thead><tr><th align="center">Code</th><th>Meaning</th></tr></thead><tbody><tr><td align="center">C</td><td>Other</td></tr><tr><td align="center">Cc</td><td>Control</td></tr><tr><td align="center">Cf</td><td>Format</td></tr><tr><td align="center">Cn</td><td>Unassigned</td></tr><tr><td align="center">Co</td><td>Private use</td></tr><tr><td align="center">Cs</td><td>Surrogate</td></tr><tr><td align="center">L</td><td>Letter</td></tr><tr><td align="center">Ll</td><td>Lower case letter</td></tr><tr><td align="center">Lm</td><td>Modifier letter</td></tr><tr><td align="center">Lo</td><td>Other letter</td></tr><tr><td align="center">Lt</td><td>Title case letter</td></tr><tr><td align="center">Lu</td><td>Upper case letter</td></tr><tr><td align="center">M</td><td>Mark</td></tr><tr><td align="center">Mc</td><td>Spacing mark</td></tr><tr><td align="center">Me</td><td>Enclosing mark</td></tr><tr><td align="center">Mn</td><td>Non-spacing mark</td></tr><tr><td align="center">N</td><td>Number</td></tr><tr><td align="center">Nd</td><td>Decimal number</td></tr><tr><td align="center">Nl</td><td>Letter number</td></tr><tr><td align="center">No</td><td>Other number</td></tr><tr><td align="center">P</td><td>Punctuation</td></tr><tr><td align="center">Pc</td><td>Connector punctuation</td></tr><tr><td align="center">Pd</td><td>Dash punctuation</td></tr><tr><td align="center">Pe</td><td>Close punctuation</td></tr><tr><td align="center">Pf</td><td>Final punctuation</td></tr><tr><td align="center">Pi</td><td>Initial punctuation</td></tr><tr><td align="center">Po</td><td>Other punctuation</td></tr><tr><td align="center">Ps</td><td>Open punctuation</td></tr><tr><td align="center">S</td><td>Symbol</td></tr><tr><td align="center">Sc</td><td>Currency symbol</td></tr><tr><td align="center">Sk</td><td>Modifier symbol</td></tr><tr><td align="center">Sm</td><td>Mathematical symbol</td></tr><tr><td align="center">So</td><td>Other symbol</td></tr>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -