📄 regexp.3
字号:
'\"'\" Copyright (c) 1994 The Regents of the University of California.'\" Copyright (c) 1994-1996 Sun Microsystems, Inc.'\" Copyright (c) 1998-1999 Scriptics Corporation'\"'\" See the file "license.terms" for information on usage and redistribution'\" of this file, and for a DISCLAIMER OF ALL WARRANTIES.'\" '\" RCS: @(#) $Id: RegExp.3,v 1.13 2002/11/13 22:11:40 vincentdarley Exp $'\" .so man.macros.TH Tcl_RegExpMatch 3 8.1 Tcl "Tcl Library Procedures".BS.SH NAMETcl_RegExpMatch, Tcl_RegExpCompile, Tcl_RegExpExec, Tcl_RegExpRange, Tcl_GetRegExpFromObj, Tcl_RegExpMatchObj, Tcl_RegExpExecObj, Tcl_RegExpGetInfo \- Pattern matching with regular expressions.SH SYNOPSIS.nf\fB#include <tcl.h>\fR.spint\fBTcl_RegExpMatchObj\fR(\fIinterp\fR, \fIstrObj\fR, \fIpatObj\fR).spint\fBTcl_RegExpMatch\fR(\fIinterp\fR, \fIstring\fR, \fIpattern\fR).spTcl_RegExp\fBTcl_RegExpCompile\fR(\fIinterp\fR, \fIpattern\fR).spint\fBTcl_RegExpExec\fR(\fIinterp\fR, \fIregexp\fR, \fIstring\fR, \fIstart\fR).sp\fBTcl_RegExpRange\fR(\fIregexp\fR, \fIindex\fR, \fIstartPtr\fR, \fIendPtr\fR).VS 8.1.spTcl_RegExp\fBTcl_GetRegExpFromObj\fR(\fIinterp\fR, \fIpatObj\fR, \fIcflags\fR).spint\fBTcl_RegExpExecObj\fR(\fIinterp\fR, \fIregexp\fR, \fIobjPtr\fR, \fIoffset\fR, \fInmatches\fR, \fIeflags\fR).sp\fBTcl_RegExpGetInfo\fR(\fIregexp\fR, \fIinfoPtr\fR).VE 8.1.SH ARGUMENTS.AS Tcl_Interp *interp.AP Tcl_Interp *interp inTcl interpreter to use for error reporting. The interpreter may beNULL if no error reporting is desired..VS 8.1.AP Tcl_Obj *strObj in/outRefers to the object from which to get the string to search. Theinternal representation of the object may be converted to a form thatcan be efficiently searched..AP Tcl_Obj *patObj in/outRefers to the object from which to get a regular expression. Thecompiled regular expression is cached in the object..VE 8.1.AP char *string inString to check for a match with a regular expression..AP "CONST char" *pattern inString in the form of a regular expression pattern..AP Tcl_RegExp regexp inCompiled regular expression. Must have been returned previouslyby \fBTcl_GetRegExpFromObj\fR or \fBTcl_RegExpCompile\fR..AP char *start inIf \fIstring\fR is just a portion of some other string, this argumentidentifies the beginning of the larger string.If it isn't the same as \fIstring\fR, then no \fB^\fR matcheswill be allowed..AP int index inSpecifies which range is desired: 0 means the range of the entirematch, 1 or greater means the range that matched a parenthesizedsub-expression..VS 8.4.AP "CONST char" **startPtr outThe address of the first character in the range is stored here, orNULL if there is no such range..AP "CONST char" **endPtr outThe address of the character just after the last one in the rangeis stored here, or NULL if there is no such range..VE 8.4.VS 8.1.AP int cflags inOR-ed combination of compilation flags. See below for more information..AP Tcl_Obj *objPtr in/outAn object which contains the string to check for a match with aregular expression..AP int offset inThe character offset into the string where matching should begin.The value of the offset has no impact on \fB^\fR matches. Thisbehavior is controlled by \fIeflags\fR..AP int nmatches inThe number of matching subexpressions that should be remembered forlater use. If this value is 0, then no subexpression matchinformation will be computed. If the value is -1, thenall of the matching subexpressions will be remembered. Any othervalue will be taken as the maximum number of subexpressions toremember..AP int eflags inOR-ed combination of the values TCL_REG_NOTBOL and TCL_REG_NOTEOL.See below for more information..AP Tcl_RegExpInfo *infoPtr outThe address of the location where information about a previous matchshould be stored by \fBTcl_RegExpGetInfo\fR..VE 8.1.BE.SH DESCRIPTION.PP\fBTcl_RegExpMatch\fR determines whether its \fIpattern\fR argumentmatches \fIregexp\fR, where \fIregexp\fR is interpretedas a regular expression using the rules in the \fBre_syntax\fRreference page. If there is a match then \fBTcl_RegExpMatch\fR returns 1.If there is no match then \fBTcl_RegExpMatch\fR returns 0.If an error occurs in the matching process (e.g. \fIpattern\fRis not a valid regular expression) then \fBTcl_RegExpMatch\fRreturns \-1 and leaves an error message in the interpreter result..VS 8.1.2\fBTcl_RegExpMatchObj\fR is similar to \fBTcl_RegExpMatch\fR except itoperates on the Tcl objects \fIstrObj\fR and \fIpatObj\fR instead ofUTF strings. \fBTcl_RegExpMatchObj\fR is generally more efficient than\fBTcl_RegExpMatch\fR, so it is the preferred interface..VE 8.1.2.PP\fBTcl_RegExpCompile\fR, \fBTcl_RegExpExec\fR, and \fBTcl_RegExpRange\fRprovide lower-level access to the regular expression pattern matcher.\fBTcl_RegExpCompile\fR compiles a regular expression string intothe internal form used for efficient pattern matching.The return value is a token for this compiled form, which can beused in subsequent calls to \fBTcl_RegExpExec\fR or \fBTcl_RegExpRange\fR.If an error occurs while compiling the regular expression then\fBTcl_RegExpCompile\fR returns NULL and leaves an error messagein the interpreter result.Note: the return value from \fBTcl_RegExpCompile\fR is only validup to the next call to \fBTcl_RegExpCompile\fR; it is not safe toretain these values for long periods of time..PP\fBTcl_RegExpExec\fR executes the regular expression pattern matcher.It returns 1 if \fIstring\fR contains a range of characters thatmatch \fIregexp\fR, 0 if no match is found, and\-1 if an error occurs.In the case of an error, \fBTcl_RegExpExec\fR leaves an errormessage in the interpreter result.When searching a string for multiple matches of a pattern,it is important to distinguish between the start of the originalstring and the start of the current search.For example, when searching for the second occurrence of amatch, the \fIstring\fR argument might point to the characterjust after the first match; however, it is important for thepattern matcher to know that this is not the start of the entire string,so that it doesn't allow \fB^\fR atoms in the pattern to match.The \fIstart\fR argument provides this information by pointingto the start of the overall string containing \fIstring\fR.\fIStart\fR will be less than or equal to \fIstring\fR; if itis less than \fIstring\fR then no \fB^\fR matches will be allowed..PP\fBTcl_RegExpRange\fR may be invoked after \fBTcl_RegExpExec\fRreturns; it provides detailed information about what ranges ofthe string matched what parts of the pattern.\fBTcl_RegExpRange\fR returns a pair of pointers in \fI*startPtr\fRand \fI*endPtr\fR that identify a range of characters inthe source string for the most recent call to \fBTcl_RegExpExec\fR.\fIIndex\fR indicates which of several ranges is desired:if \fIindex\fR is 0, information is returned about the overall rangeof characters that matched the entire pattern; otherwise,information is returned about the range of characters that matched the\fIindex\fR'th parenthesized subexpression within the pattern.If there is no range corresponding to \fIindex\fR then NULLis stored in \fI*startPtr\fR and \fI*endPtr\fR..PP.VS 8.1\fBTcl_GetRegExpFromObj\fR, \fBTcl_RegExpExecObj\fR, and\fBTcl_RegExpGetInfo\fR are object interfaces that provide the mostdirect control of Henry Spencer's regular expression library. Forusers that need to modify compilation and execution options directly,it is recommended that you use these interfaces instead of calling theinternal regexp functions. These interfaces handle the details of UTFto Unicode translations as well as providing improved performancethrough caching in the pattern and string objects..PP\fBTcl_GetRegExpFromObj\fR attempts to return a compiled regularexpression from the \fIpatObj\fR. If the object does not alreadycontain a compiled regular expression it will attempt to create onefrom the string in the object and assign it to the internalrepresentation of the \fIpatObj\fR. The return value of this functionis of type \fBTcl_RegExp\fR. The return value is a token for thiscompiled form, which can be used in subsequent calls to\fBTcl_RegExpExecObj\fR or \fBTcl_RegExpGetInfo\fR. If an erroroccurs while compiling the regular expression then\fBTcl_GetRegExpFromObj\fR returns NULL and leaves an error message inthe interpreter result. The regular expression token can be used aslong as the internal representation of \fIpatObj\fR refers to thecompiled form. The \fIeflags\fR argument is a bitwise OR ofzero or more of the following flags that control the compilation of\fIpatObj\fR:.RS 2.TP\fBTCL_REG_ADVANCED\fRCompile advanced regular expressions (`AREs'). This mode corresponds tothe normal regular expression syntax accepted by the Tcl regexp andregsub commands..TP\fBTCL_REG_EXTENDED\fRCompile extended regular expressions (`EREs'). This mode correspondsto the regular expression syntax recognized by Tcl 8.0 and earlierversions. .TP\fBTCL_REG_BASIC\fRCompile basic regular expressions (`BREs'). This mode correspondsto the regular expression syntax recognized by common Unix utilitieslike \fBsed\fR and \fBgrep\fR. This is the default if no flags arespecified..TP\fBTCL_REG_EXPANDED\fRCompile the regular expression (basic, extended, or advanced) using anexpanded syntax that allows comments and whitespace. This mode causesnon-backslashed non-bracket-expression whitespace and #-to-end-of-line comments to be ignored..TP\fBTCL_REG_QUOTE\fRCompile a literal string, with all characters treated as ordinary characters..TP\fBTCL_REG_NOCASE\fRCompile for matching that ignores upper/lower case distinctions..TP\fBTCL_REG_NEWLINE\fRCompile for newline-sensitive matching. By default, newline is acompletely ordinary character with no special meaning in eitherregular expressions or strings. With this flag, `[^' bracketexpressions and `.' never match newline, `^' matches an empty stringafter any newline in addition to its normal function, and `$' matchesan empty string before any newline in addition to its normal function.\fBREG_NEWLINE\fR is the bitwise OR of \fBREG_NLSTOP\fR and\fBREG_NLANCH\fR..TP\fBTCL_REG_NLSTOP\fRCompile for partial newline-sensitive matching,with the behavior of`[^' bracket expressions and `.' affected,but not the behavior of `^' and `$'. In this mode, `[^' bracketexpressions and `.' never match newline..TP\fBTCL_REG_NLANCH\fRCompile for inverse partial newline-sensitive matching,with the behavior ofof `^' and `$' (the ``anchors'') affected, but not the behavior of`[^' bracket expressions and `.'. In this mode `^' matches an empty stringafter any newline in addition to its normal function, and `$' matchesan empty string before any newline in addition to its normal function..TP\fBTCL_REG_NOSUB\fRCompile for matching that reports only success or failure,not what was matched. This reduces compile overhead and may improveperformance. Subsequent calls to \fBTcl_RegExpGetInfo\fR or\fBTcl_RegExpRange\fR will not report any match information..TP\fBTCL_REG_CANMATCH\fRCompile for matching that reports the potential to complete a partialmatch given more text (see below)..RE.PPOnly one of\fBTCL_REG_EXTENDED\fR,\fBTCL_REG_ADVANCED\fR,\fBTCL_REG_BASIC\fR, and\fBTCL_REG_QUOTE\fR may be specified..PP\fBTcl_RegExpExecObj\fR executes the regular expression patternmatcher. It returns 1 if \fIobjPtr\fR contains a range of charactersthat match \fIregexp\fR, 0 if no match is found, and \-1 if an erroroccurs. In the case of an error, \fBTcl_RegExpExecObj\fR leaves anerror message in the interpreter result. The \fInmatches\fR valueindicates to the matcher how many subexpressions are of interest. If\fInmatches\fR is 0, then no subexpression match information isrecorded, which may allow the matcher to make various optimizations.If the value is -1, then all of the subexpressions in the pattern areremembered. If the value is a positive integer, then only that numberof subexpressions will be remembered. Matching begins at thespecified Unicode character index given by \fIoffset\fR. Unlike\fBTcl_RegExpExec\fR, the behavior of anchors is not affected by theoffset value. Instead the behavior of the anchors is explicitlycontrolled by the \fIeflags\fR argument, which is a bitwise OR ofzero or more of the following flags:.RS 2.TP\fBTCL_REG_NOTBOL\fRThe starting character will not be treated as the beginning of aline or the beginning of the string, so `^' will not match there.Note that this flag has no effect on how `\fB\eA\fR' matches..TP\fBTCL_REG_NOTEOL\fRThe last character in the string will not be treated as the end of aline or the end of the string, so '$' will not match there.Note that this flag has no effect on how `\fB\eZ\fR' matches..RE.PP\fBTcl_RegExpGetInfo\fR retrieves information about the last matchperformed with a given regular expression \fIregexp\fR. The\fIinfoPtr\fR argument contains a pointer to a structure that isdefined as follows:.PP.CStypedef struct Tcl_RegExpInfo { int \fInsubs\fR; Tcl_RegExpIndices *\fImatches\fR; long \fIextendStart\fR;} Tcl_RegExpInfo;.CE.PPThe \fInsubs\fR field contains a count of the number of parenthesizedsubexpressions within the regular expression. If the \fBTCL_REG_NOSUB\fRwas used, then this value will be zero. The \fImatches\fR fieldpoints to an array of \fInsubs\fR values that indicate the bounds of eachsubexpression matched. The first element in the array refers to therange matched by the entire regular expression, and subsequent elementsrefer to the parenthesized subexpressions in the order that theyappear in the pattern. Each element is a structure that is defined asfollows:.PP.CStypedef struct Tcl_RegExpIndices { long \fIstart\fR; long \fIend\fR;} Tcl_RegExpIndices;.CE.PPThe \fIstart\fR and \fIend\fR values are Unicode character indicesrelative to the offset location within \fIobjPtr\fR where matching began.The \fIstart\fR index identifies the first character of the matchedsubexpression. The \fIend\fR index identifies the first characterafter the matched subexpression. If the subexpression matched theempty string, then \fIstart\fR and \fIend\fR will be equal. If thesubexpression did not participate in the match, then \fIstart\fR and\fIend\fR will be set to -1..PPThe \fIextendStart\fR field in \fBTcl_RegExpInfo\fR is only set if the\fBTCL_REG_CANMATCH\fR flag was used. It indicates the firstcharacter in the string where a match could occur. If a match wasfound, this will be the same as the beginning of the current match.If no match was found, then it indicates the earliest point at which amatch might occur if additional text is appended to the string. If itis no match is possible even with further text, this field will be set to -1..VE 8.1.SH "SEE ALSO"re_syntax(n).SH KEYWORDSmatch, pattern, regular expression, string, subexpression, Tcl_RegExpIndices, Tcl_RegExpInfo
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -