re.html

来自「unix 下的C开发手册,还用详细的例程。」· HTML 代码 · 共 1,457 行 · 第 1/4 页
HTML
1,457 行
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"><html><head><!-- Copyright 1997 The Open Group, All Rights Reserved --><title>Regular Expressions</title></head><body bgcolor=white><center><font size=2>The Single UNIX &reg; Specification, Version 2<br>Copyright &copy; 1997 The Open Group</font></center><hr size=2 noshade><blockquote><center><h2><a name = "tag_007">&nbsp;</a>Regular Expressions</h2></center><xref type="1" name="re"></xref><dl><dt><b>Note:</b><dd>Two versions of regular expressions are supported in this specification set:<ul><p><li>the historical<b>Simple Regular Expressions</b>,which provide backward compatibility, but which may be withdrawnfrom a future issue of this specification set<p><li>the improved internationalised version that complies with the ISO/IEC 9945-2:1993 standard.<p></ul><p>The first (historical) version is describedas part of the<i><a href="../xsh/regexp.html">regexp()</a></i>function in the <b>XSH</b> specification.The second (improved) version is described in this chapter.</dl><i>Regular Expressions</i>(REs)provide a mechanism to select specific stringsfrom a set of character strings.<p>Regular expressions are a context-independent syntaxthat can represent a wide variety of character sets andcharacter set orderings, where these character sets areinterpreted according to the current locale.While many regular expressions can be interpreteddifferently depending on the current locale, manyfeatures, such as character class expressions, providefor contextual invariance across locales.<p>The Basic Regular Expression (BRE) notation and construction rules in<xref href=bre><a href="#tag_007_003">Basic Regular Expressions</a></xref>apply to most utilities supporting regular expressions.Some utilities, instead, support theExtended Regular Expressions (ERE) described in<xref href=ere><a href="#tag_007_004">Extended Regular Expressions</a></xref>;any exceptions for both cases are noted inthe descriptions of the specific utilities using regular expressions.Both BREs and EREs are supported by theRegular Expression Matching interface in the <b>XSH</b> specification under<i><a href="../xsh/regcomp.html">regcomp()</a></i>,<i><a href="../xsh/regexec.html">regexec()</a></i>and related functions.<h3><a name = "tag_007_001">&nbsp;</a>Regular Expression Definitions</h3><xref type="2" name="redefs"></xref>For the purposes of this section, the following definitions apply:<p><h4><a name = "tag_007_001_001">&nbsp;</a>entire regular expression</h4>The concatenated set of one or more BREs or EREsthat make up the pattern specified for string selection.<p><h4><a name = "tag_007_001_002">&nbsp;</a>matched</h4>A sequence of zero or more characters is said to be matched by aBRE or ERE when the characters in the sequence correspond to a sequenceof characters defined by the pattern.<p>Matching is based on the bit pattern used for encodingthe character, not on the graphic representation of the character.This means that if a character set contains two or more encodings fora graphic symbol, or if the strings searched contain text encoded inmore than one codeset, no attempt is made to search for any otherrepresentation of the encoded symbol.If that is required, the user can specify equivalence classes containingall variations of the desired graphic symbol.<p>The search for a matching sequence starts at the beginningof a string and stops when the first sequence matching theexpression is found, where<i>first</i>is defined to mean &quot;begins earliest in the string&quot;.If the pattern permits a variablenumber of matching characters and thus there is more than onesuch sequence starting at that point,the longest such sequence will be matched.For example:the BREbb*matches the second to fourth characters ofabbbc,and the ERE(wee|week)(knights|night)matches all ten characters ofweeknights.<p>Consistent with the whole match being the longest of the leftmostmatches, each subpattern, from left to right, matches thelongest possible string.For this purpose, a null string is considered to belonger than no match at all.For example, matching the BRE\(.*\).*againstabcdef,the subexpression(\1)isabcdef,and matching the BRE\(a*\)*againstbc,the subexpression(\1)is the null string.<p>It is possible to determine what strings correspond to subexpressionsby recursively applying the leftmost longest rule to each subexpression,but only with the proviso that the overallmatch is leftmost longest.For example, matching\(ac*\)c*d[ac]*\1againstacdacaaamatchesacdacaaa(with\1=a);simply matching the longest match for\(ac*\)would yield\1=ac,but the overall match would be smaller(acdac).Conceptually, the implementation mustexamine every possible match and among those that yield the leftmost longesttotal matches, pick the one that does the longest match for theleftmost subexpression and so on.Note that this means that matching by subexpressions is context-dependent:a subexpression within a larger RE may match adifferent string from the one it would match as an independentRE, and two instances of the same subexpression within the samelarger RE may match different lengths even in similar sequences ofcharacters.For example, in the ERE(a.*b)(a.*b),the two identical subexpressions would match four and six characters,respectively, ofaccbaccccb.<p>When a multi-character collating element in a bracket expression (see<xref href=rebrack><a href="#tag_007_003_005">RE Bracket Expression</a></xref>)is involved, the longest sequence will be measured in charactersconsumed from the string to be matched; that is, the collating elementcounts not as one element, but as the number of characters it matches.<p><h4><a name = "tag_007_001_003">&nbsp;</a>BRE (ERE) matching a single character</h4>A BRE or EREthat matches either a single character or a single collating element.<p>Only a BRE or ERE of this type that includes a bracket expression (see<xref href=rebrack><a href="#tag_007_003_005">RE Bracket Expression</a></xref>)can match a collating element.<p>The definition of<i>single character</i>has been expanded to include alsocollating elements consisting of two or more characters;this expansion is applicable only when a bracket expressionis included in the BRE or ERE.An example of such a collating element may be the Dutch ij,which collates as a y.In some encodings, a ligature &quot;i with j&quot; existsas a characterand would represent a single-character collating element.In another encoding, no such ligature exists, and the two-charactersequence ijis defined as a multi-character collating element.Outside brackets, the ij is treated as a two-character REand matches the same characters in a string.Historically, a bracket expression only matched a single character.If, however, the bracket expression defines,for example, a range that includes ij,then this particular bracket expression will also match asequence of the two characters i and j in the string.<p><h4><a name = "tag_007_001_004">&nbsp;</a>BRE (ERE) matching multiple characters</h4>A BRE or EREthat matches a concatenation of single characters or collating elements.<p>Such a BRE or EREis made up from a BRE (ERE) matching a single characterand BRE (ERE) special characters.<p><h4><a name = "tag_007_001_005">&nbsp;</a>invalid</h4>This section uses the term<i>invalid</i>for certain constructsor conditions.Invalid REs will cause the utility or functionusing the RE to generate an error condition.When<i>invalid</i>is not used, violations of thespecified syntax or semantics for REs produce undefined results:this may entail an error, enabling an extended syntax for that RE,or using the construct in error as literal characters to be matched.For example, the BRE construct\{1,2,3\}does not comply with the grammar.A portable application cannot rely on it producingan error nor matching the literal characters\{1,2,3\}.<h3><a name = "tag_007_002">&nbsp;</a>Regular Expression General Requirements</h3><xref type="2" name="regen"></xref>The requirements in this section apply toboth basic and extended regular expressions.<p>The use of regular expressions is generally associatedwith text processing.REs (BREs and EREs) operate on text strings;that is, zero or more characters followed by an end-of-stringdelimiter (typically NUL).Some utilities employingregular expressions limit the processing to lines; that is,zero or more characters followed by a newline character.In the regular expression processing described in this specification, thenewline character is regarded as an ordinary characterand both a period and a non-matching list can match one.The <b>XCU</b> specification specifies withinthe individual descriptions ofthose standard utilities employing regular expressionswhether they permit matching of newline characters;if not stated otherwise, the use of literalnewline charactersor any escape sequence equivalent produces undefined results.Those utilities (like<i><a href="../xcu/grep.html">grep</a></i>)that do not allow newline charactersto match are responsible for eliminating anynewline character from strings before matching against the RE.The<i><a href="../xsh/regcomp.html">regcomp()</a></i>function in the <b>XSH</b> specification, however, can provide support forsuch processing without violating the rules of this section.<p>The interfaces specified in this specification set do not permit theinclusion of a NUL character in an RE or in the string to be matched.If during the operation of a standard utilitya NUL is included in the text designated to be matched,that NUL may designate the end of the text stringfor the purposes of matching.<p>When a standard utility or function that uses regularexpressions specifies that pattern matching will be performedwithout regard to the case (upper- or lower-) of either dataor patterns, then when each character in the string ismatched against the pattern, not only the character, butalso its case counterpart (if any), will be matched.This definition of case-insensitive processing is intendedto allow matching of multi-character collating elementsas well as characters.For instance, as each characterin the string is matched using both its cases,the RE [[.Ch.]]when matched against the string char, is in reality matched againstch, Ch, cH and CH.<p>The implementation will support any regular expressionthat does not exceed 256 bytes in length.<h3><a name = "tag_007_003">&nbsp;</a>Basic Regular Expressions</h3><xref type="2" name="bre"></xref><h4><a name = "tag_007_003_001">&nbsp;</a>BREs Matching a Single Character or Collating Element</h4>A BRE ordinary character, a special character preceded by a backslash or aperiod matches a single character.A bracket expression matches a single character or a singlecollating element.<h4><a name = "tag_007_003_002">&nbsp;</a>BRE Ordinary Characters</h4>An ordinary character is a BRE that matches itself:any character in the supported character set,except for the BRE special characters listed in<xref href=brespec><a href="#tag_007_003_003">BRE Special Characters</a></xref>.<p>The interpretation of an ordinary character preceded by a backslash (\)is undefined, except for:<ol><p><li>the characters ), (, { and }<p><li>the digits 1 to 9 inclusive (see<xref href=bremult><a href="#tag_007_003_006">BREs Matching Multiple Characters</a></xref>)<p><li>a character inside a bracket expression.<p></ol><h4><a name = "tag_007_003_003">&nbsp;</a>BRE Special Characters</h4><xref type="3" name="brespec"></xref>A<i>BRE special character</i>has special properties in certain contexts.Outside those contexts, or when preceded by a backslash,such a character will be a BRE thatmatches the special character itself.The BRE special charactersand the contexts in which they have their special meaning are:<dl compact><dt>.&nbsp;[&nbsp;\<dd>The period, left-bracket and backslash isspecial except when used in a bracket expression (see<xref href=rebrack><a href="#tag_007_003_005">RE Bracket Expression</a></xref>).An expression containing a [that is not preceded by a backslashand is not part of a bracket expressionproduces undefined results.<dt>*<dd>The asterisk is special except when used:<ul><li>in a bracket expression<li>as the first character of an entire BRE(after an initial ^, if any)<li>as the first character of a subexpression(after an initial ^, if any); see<xref href=bremult><a href="#tag_007_003_006">BREs Matching Multiple Characters</a></xref>.</ul><dt>^<dd>The circumflex is special when used:<ul><li>as an anchor(see<xref href=breanc><a href="#tag_007_003_008">BRE Expression Anchoring</a></xref>)
re.html - 源码说明

本页面展示了「unix 下的C开发手册,还用详细的例程。」中的 re.html 源码文件，采用 HTML 编程语言编写，共 1,457 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫下载站收录了大量与unix相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?