📄 perlreapi.1
字号:
\& {\& PERL_UNUSED_ARG(rx);\& return newSVpvs("re::engine::Example");\& }.Ve.PPAny method calls on an object created with \f(CW\*(C`qr//\*(C'\fR will be dispatched to thepackage as a normal object..PP.Vb 3\& use re::engine::Example;\& my $re = qr//;\& $re\->meth; # dispatched to re::engine::Example::meth().Ve.PPTo retrieve the \f(CW\*(C`REGEXP\*(C'\fR object from the scalar in an \s-1XS\s0 function usethe \f(CW\*(C`SvRX\*(C'\fR macro, see \*(L"\s-1REGEXP\s0 Functions\*(R" in perlapi..PP.Vb 3\& void meth(SV * rv)\& PPCODE:\& REGEXP * re = SvRX(sv);.Ve.Sh "dupe".IX Subsection "dupe".Vb 1\& void* dupe(pTHX_ REGEXP * const rx, CLONE_PARAMS *param);.Ve.PPOn threaded builds a regexp may need to be duplicated so that the patterncan be used by multiple threads. This routine is expected to handle theduplication of any private data pointed to by the \f(CW\*(C`pprivate\*(C'\fR member ofthe regexp structure. It will be called with the preconstructed newregexp structure as an argument, the \f(CW\*(C`pprivate\*(C'\fR member will point atthe \fBold\fR private structure, and it is this routine's responsibility toconstruct a copy and return a pointer to it (which perl will then use tooverwrite the field as passed to this routine.).PPThis allows the engine to dupe its private data but also if necessarymodify the final structure if it really must..PPOn unthreaded builds this field doesn't exist..SH "The REGEXP structure".IX Header "The REGEXP structure"The \s-1REGEXP\s0 struct is defined in \fIregexp.h\fR. All regex engines must be able tocorrectly build such a structure in their \*(L"comp\*(R" routine..PPThe \s-1REGEXP\s0 structure contains all the data that perl needs to be aware ofto properly work with the regular expression. It includes data aboutoptimisations that perl can use to determine if the regex engine shouldreally be used, and various other control info that is needed to properlyexecute patterns in various contexts such as is the pattern anchored insome way, or what flags were used during the compile, or whether theprogram contains special constructs that perl needs to be aware of..PPIn addition it contains two fields that are intended for the privateuse of the regex engine that compiled the pattern. These are the\&\f(CW\*(C`intflags\*(C'\fR and \f(CW\*(C`pprivate\*(C'\fR members. \f(CW\*(C`pprivate\*(C'\fR is a void pointer toan arbitrary structure whose use and management is the responsibilityof the compiling engine. perl will never modify either of thesevalues..PP.Vb 3\& typedef struct regexp {\& /* what engine created this regexp? */\& const struct regexp_engine* engine;\&\& /* what re is this a lightweight copy of? */\& struct regexp* mother_re;\&\& /* Information about the match that the perl core uses to manage things */\& U32 extflags; /* Flags used both externally and internally */\& I32 minlen; /* mininum possible length of string to match */\& I32 minlenret; /* mininum possible length of $& */\& U32 gofs; /* chars left of pos that we search from */\&\& /* substring data about strings that must appear\& in the final match, used for optimisations */\& struct reg_substr_data *substrs;\&\& U32 nparens; /* number of capture buffers */\&\& /* private engine specific data */\& U32 intflags; /* Engine Specific Internal flags */\& void *pprivate; /* Data private to the regex engine which \& created this object. */\&\& /* Data about the last/current match. These are modified during matching*/\& U32 lastparen; /* last open paren matched */\& U32 lastcloseparen; /* last close paren matched */\& regexp_paren_pair *swap; /* Swap copy of *offs */\& regexp_paren_pair *offs; /* Array of offsets for (@\-) and (@+) */\&\& char *subbeg; /* saved or original string so \edigit works forever. */\& SV_SAVED_COPY /* If non\-NULL, SV which is COW from original */\& I32 sublen; /* Length of string pointed by subbeg */\&\& /* Information about the match that isn\*(Aqt often used */\& I32 prelen; /* length of precomp */\& const char *precomp; /* pre\-compilation regular expression */\&\& char *wrapped; /* wrapped version of the pattern */\& I32 wraplen; /* length of wrapped */\&\& I32 seen_evals; /* number of eval groups in the pattern \- for security checks */\& HV *paren_names; /* Optional hash of paren names */\&\& /* Refcount of this regexp */\& I32 refcnt; /* Refcount of this regexp */\& } regexp;.Ve.PPThe fields are discussed in more detail below:.ie n .Sh """engine""".el .Sh "\f(CWengine\fP".IX Subsection "engine"This field points at a regexp_engine structure which contains pointersto the subroutines that are to be used for performing a match. Itis the compiling routine's responsibility to populate this field beforereturning the regexp object..PPInternally this is set to \f(CW\*(C`NULL\*(C'\fR unless a custom engine is specified in\&\f(CW$^H{regcomp}\fR, perl's own set of callbacks can be accessed in the structpointed to by \f(CW\*(C`RE_ENGINE_PTR\*(C'\fR..ie n .Sh """mother_re""".el .Sh "\f(CWmother_re\fP".IX Subsection "mother_re"\&\s-1TODO\s0, see <http://www.mail\-archive.com/perl5\-changes@perl.org/msg17328.html>.ie n .Sh """extflags""".el .Sh "\f(CWextflags\fP".IX Subsection "extflags"This will be used by perl to see what flags the regexp was compiledwith, this will normally be set to the value of the flags parameter bythe comp callback. See the comp documentation forvalid flags..ie n .Sh """minlen""\fP \f(CW""minlenret""".el .Sh "\f(CWminlen\fP \f(CWminlenret\fP".IX Subsection "minlen minlenret"The minimum string length required for the pattern to match. This is used toprune the search space by not bothering to match any closer to the end of astring than would allow a match. For instance there is no point in evenstarting the regex engine if the minlen is 10 but the string is only 5characters long. There is no way that the pattern can match..PP\&\f(CW\*(C`minlenret\*(C'\fR is the minimum length of the string that would be foundin $& after a match..PPThe difference between \f(CW\*(C`minlen\*(C'\fR and \f(CW\*(C`minlenret\*(C'\fR can be seen in thefollowing pattern:.PP.Vb 1\& /ns(?=\ed)/.Ve.PPwhere the \f(CW\*(C`minlen\*(C'\fR would be 3 but \f(CW\*(C`minlenret\*(C'\fR would only be 2 as the \ed isrequired to match but is not actually included in the matched content. Thisdistinction is particularly important as the substitution logic uses the\&\f(CW\*(C`minlenret\*(C'\fR to tell whether it can do in-place substitution which can result inconsiderable speedup..ie n .Sh """gofs""".el .Sh "\f(CWgofs\fP".IX Subsection "gofs"Left offset from \fIpos()\fR to start match at..ie n .Sh """substrs""".el .Sh "\f(CWsubstrs\fP".IX Subsection "substrs"Substring data about strings that must appear in the final match. Thisis currently only used internally by perl's engine for but might beused in the future for all engines for optimisations..ie n .Sh """nparens""\fP, \f(CW""lasparen""\fP, and \f(CW""lastcloseparen""".el .Sh "\f(CWnparens\fP, \f(CWlasparen\fP, and \f(CWlastcloseparen\fP".IX Subsection "nparens, lasparen, and lastcloseparen"These fields are used to keep track of how many paren groups could be matchedin the pattern, which was the last open paren to be entered, and which wasthe last close paren to be entered..ie n .Sh """intflags""".el .Sh "\f(CWintflags\fP".IX Subsection "intflags"The engine's private copy of the flags the pattern was compiled with. Usuallythis is the same as \f(CW\*(C`extflags\*(C'\fR unless the engine chose to modify one of them..ie n .Sh """pprivate""".el .Sh "\f(CWpprivate\fP".IX Subsection "pprivate"A void* pointing to an engine-defined data structure. The perl engine uses the\&\f(CW\*(C`regexp_internal\*(C'\fR structure (see \*(L"Base Structures\*(R" in perlreguts) but a customengine should use something else..ie n .Sh """swap""".el .Sh "\f(CWswap\fP".IX Subsection "swap"\&\s-1TODO:\s0 document.ie n .Sh """offs""".el .Sh "\f(CWoffs\fP".IX Subsection "offs"A \f(CW\*(C`regexp_paren_pair\*(C'\fR structure which defines offsets into the string beingmatched which correspond to the \f(CW$&\fR and \f(CW$1\fR, \f(CW$2\fR etc. captures, the\&\f(CW\*(C`regexp_paren_pair\*(C'\fR struct is defined as follows:.PP.Vb 4\& typedef struct regexp_paren_pair {\& I32 start;\& I32 end;\& } regexp_paren_pair;.Ve.PPIf \f(CW\*(C`\->offs[num].start\*(C'\fR or \f(CW\*(C`\->offs[num].end\*(C'\fR is \f(CW\*(C`\-1\*(C'\fR then thatcapture buffer did not match. \f(CW\*(C`\->offs[0].start/end\*(C'\fR represents \f(CW$&\fR (or\&\f(CW\*(C`${^MATCH\*(C'\fR under \f(CW\*(C`//p\*(C'\fR) and \f(CW\*(C`\->offs[paren].end\*(C'\fR matches \f(CW$$paren\fR where\&\f(CW$paren \fR= 1>..ie n .Sh """precomp""\fP \f(CW""prelen""".el .Sh "\f(CWprecomp\fP \f(CWprelen\fP".IX Subsection "precomp prelen"Used for optimisations. \f(CW\*(C`precomp\*(C'\fR holds a copy of the pattern thatwas compiled and \f(CW\*(C`prelen\*(C'\fR its length. When a new pattern is to becompiled (such as inside a loop) the internal \f(CW\*(C`regcomp\*(C'\fR operatorchecks whether the last compiled \f(CW\*(C`REGEXP\*(C'\fR's \f(CW\*(C`precomp\*(C'\fR and \f(CW\*(C`prelen\*(C'\fRare equivalent to the new one, and if so uses the old pattern insteadof compiling a new one..PPThe relevant snippet from \f(CW\*(C`Perl_pp_regcomp\*(C'\fR:.PP.Vb 3\& if (!re || !re\->precomp || re\->prelen != (I32)len ||\& memNE(re\->precomp, t, len))\& /* Compile a new pattern */.Ve.ie n .Sh """paren_names""".el .Sh "\f(CWparen_names\fP".IX Subsection "paren_names"This is a hash used internally to track named capture buffers and theiroffsets. The keys are the names of the buffers the values are dualvars,with the \s-1IV\s0 slot holding the number of buffers with the given name and thepv being an embedded array of I32. The values may also be containedindependently in the data array in cases where named backreferences areused..ie n .Sh """substrs""".el .Sh "\f(CWsubstrs\fP".IX Subsection "substrs"Holds information on the longest string that must occur at a fixedoffset from the start of the pattern, and the longest string that mustoccur at a floating offset from the start of the pattern. Used to doFast-Boyer-Moore searches on the string to find out if its worth usingthe regex engine at all, and if so where in the string to search..ie n .Sh """subbeg""\fP \f(CW""sublen""\fP \f(CW""saved_copy""".el .Sh "\f(CWsubbeg\fP \f(CWsublen\fP \f(CWsaved_copy\fP".IX Subsection "subbeg sublen saved_copy"Used during execution phase for managing search and replace patterns..ie n .Sh """wrapped""\fP \f(CW""wraplen""".el .Sh "\f(CWwrapped\fP \f(CWwraplen\fP".IX Subsection "wrapped wraplen"Stores the string \f(CW\*(C`qr//\*(C'\fR stringifies to. The perl engine for examplestores \f(CW\*(C`(?\-xism:eek)\*(C'\fR in the case of \f(CW\*(C`qr/eek/\*(C'\fR..PPWhen using a custom engine that doesn't support the \f(CW\*(C`(?:)\*(C'\fR constructfor inline modifiers, it's probably best to have \f(CW\*(C`qr//\*(C'\fR stringify tothe supplied pattern, note that this will create undesired patterns incases such as:.PP.Vb 3\& my $x = qr/a|b/; # "a|b"\& my $y = qr/c/i; # "c"\& my $z = qr/$x$y/; # "a|bc".Ve.PPThere's no solution for this problem other than making the customengine understand a construct like \f(CW\*(C`(?:)\*(C'\fR..ie n .Sh """seen_evals""".el .Sh "\f(CWseen_evals\fP".IX Subsection "seen_evals"This stores the number of eval groups in the pattern. This is used for securitypurposes when embedding compiled regexes into larger patterns with \f(CW\*(C`qr//\*(C'\fR..ie n .Sh """refcnt""".el .Sh "\f(CWrefcnt\fP".IX Subsection "refcnt"The number of times the structure is referenced. When this falls to 0 theregexp is automatically freed by a call to pregfree. This should be set to 1 ineach engine's \*(L"comp\*(R" routine..SH "HISTORY".IX Header "HISTORY"Originally part of perlreguts..SH "AUTHORS".IX Header "AUTHORS"Originally written by Yves Orton, expanded by \*(AEvar Arnfjo\*:r\*(d\-Bjarmason..SH "LICENSE".IX Header "LICENSE"Copyright 2006 Yves Orton and 2007 \*(AEvar Arnfjo\*:r\*(d\- Bjarmason..PPThis program is free software; you can redistribute it and/or modify it underthe same terms as Perl itself.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -