perlreapi.pod
来自「视频监控网络部分的协议ddns,的模块的实现代码,请大家大胆指正.」· POD 代码 · 共 700 行 · 第 1/2 页
POD
700 行
=head1 NAMEperlreapi - perl regular expression plugin interface=head1 DESCRIPTIONAs of Perl 5.9.5 there is a new interface for plugging and using otherregular expression engines than the default one.Each engine is supposed to provide access to a constant structure of thefollowing format: typedef struct regexp_engine { REGEXP* (*comp) (pTHX_ const SV * const pattern, const U32 flags); I32 (*exec) (pTHX_ REGEXP * const rx, char* stringarg, char* strend, char* strbeg, I32 minend, SV* screamer, void* data, U32 flags); char* (*intuit) (pTHX_ REGEXP * const rx, SV *sv, char *strpos, char *strend, U32 flags, struct re_scream_pos_data_s *data); SV* (*checkstr) (pTHX_ REGEXP * const rx); void (*free) (pTHX_ REGEXP * const rx); void (*numbered_buff_FETCH) (pTHX_ REGEXP * const rx, const I32 paren, SV * const sv); void (*numbered_buff_STORE) (pTHX_ REGEXP * const rx, const I32 paren, SV const * const value); I32 (*numbered_buff_LENGTH) (pTHX_ REGEXP * const rx, const SV * const sv, const I32 paren); SV* (*named_buff) (pTHX_ REGEXP * const rx, SV * const key, SV * const value, U32 flags); SV* (*named_buff_iter) (pTHX_ REGEXP * const rx, const SV * const lastkey, const U32 flags); SV* (*qr_package)(pTHX_ REGEXP * const rx); #ifdef USE_ITHREADS void* (*dupe) (pTHX_ REGEXP * const rx, CLONE_PARAMS *param); #endifWhen a regexp is compiled, its C<engine> field is then set to point atthe appropriate structure, so that when it needs to be used Perl can findthe right routines to do so.In order to install a new regexp handler, C<$^H{regcomp}> is setto an integer which (when casted appropriately) resolves to one of thesestructures. When compiling, the C<comp> method is executed, and theresulting regexp structure's engine field is expected to point back atthe same structure.The pTHX_ symbol in the definition is a macro used by perl under threadingto provide an extra argument to the routine holding a pointer back tothe interpreter that is executing the regexp. So under threading allroutines get an extra argument.=head1 Callbacks=head2 comp REGEXP* comp(pTHX_ const SV * const pattern, const U32 flags);Compile the pattern stored in C<pattern> using the given C<flags> andreturn a pointer to a prepared C<REGEXP> structure that can performthe match. See L</The REGEXP structure> below for an explanation ofthe individual fields in the REGEXP struct.The C<pattern> parameter is the scalar that was used as thepattern. previous versions of perl would pass two C<char*> indicatingthe start and end of the stringified pattern, the following snippet canbe used to get the old parameters: STRLEN plen; char* exp = SvPV(pattern, plen); char* xend = exp + plen;Since any scalar can be passed as a pattern it's possible to implementan engine that does something with an array (C<< "ook" =~ [ qw/ eekhlagh / ] >>) or with the non-stringified form of a compiled regularexpression (C<< "ook" =~ qr/eek/ >>). perl's own engine will alwaysstringify everything using the snippet above but that doesn't meanother engines have to.The C<flags> parameter is a bitfield which indicates which of theC<msixp> flags the regex was compiled with. It also containsadditional info such as whether C<use locale> is in effect.The C<eogc> flags are stripped out before being passed to the comproutine. The regex engine does not need to know whether any of theseare set as those flags should only affect what perl does with thepattern and its match variables, not how it gets compiled andexecuted.By the time the comp callback is called, some of these flags havealready had effect (noted below where applicable). However most oftheir effect occurs after the comp callback has run in routines thatread the C<< rx->extflags >> field which it populates.In general the flags should be preserved in C<< rx->extflags >> aftercompilation, although the regex engine might want to add or deletesome of them to invoke or disable some special behavior in perl. Theflags along with any special behavior they cause are documented below:The pattern modifiers:=over 4=item C</m> - RXf_PMf_MULTILINEIf this is in C<< rx->extflags >> it will be passed toC<Perl_fbm_instr> by C<pp_split> which will treat the subject stringas a multi-line string.=item C</s> - RXf_PMf_SINGLELINE=item C</i> - RXf_PMf_FOLD=item C</x> - RXf_PMf_EXTENDEDIf present on a regex C<#> comments will be handled differently by thetokenizer in some cases.TODO: Document those cases.=item C</p> - RXf_PMf_KEEPCOPY=backAdditional flags:=over 4=item RXf_PMf_LOCALESet if C<use locale> is in effect. If present in C<< rx->extflags >>C<split> will use the locale dependent definition of whitespace underwhen RXf_SKIPWHITE or RXf_WHITE are in effect. Under ASCII whitespaceis defined as per L<isSPACE|perlapi/ISSPACE>, and by the internalmacros C<is_utf8_space> under UTF-8 and C<isSPACE_LC> under C<uselocale>.=item RXf_UTF8Set if the pattern is L<SvUTF8()|perlapi/SvUTF8>, set by Perl_pmruntime.A regex engine may want to set or disable this flag duringcompilation. The perl engine for instance may upgrade non-UTF-8strings to UTF-8 if the pattern includes constructs such as C<\x{...}>that can only match Unicode values.=item RXf_SPLITIf C<split> is invoked as C<split ' '> or with no arguments (whichreally means C<split(' ', $_)>, see L<split|perlfunc/split>), perl willset this flag. The regex engine can then check for it and set theSKIPWHITE and WHITE extflags. To do this the perl engine does: if (flags & RXf_SPLIT && r->prelen == 1 && r->precomp[0] == ' ') r->extflags |= (RXf_SKIPWHITE|RXf_WHITE);=backThese flags can be set during compilation to enable optimizations inthe C<split> operator.=over 4=item RXf_SKIPWHITEIf the flag is present in C<< rx->extflags >> C<split> will deletewhitespace from the start of the subject string before it's operatedon. What is considered whitespace depends on whether the subject is aUTF-8 string and whether the C<RXf_PMf_LOCALE> flag is set.If RXf_WHITE is set in addition to this flag C<split> will behave likeC<split " "> under the perl engine.=item RXf_START_ONLYTells the split operator to split the target string on newlines(C<\n>) without invoking the regex engine.Perl's engine sets this if the pattern is C</^/> (C<plen == 1 && *exp== '^'>), even under C</^/s>, see L<split|perlfunc>. Of course adifferent regex engine might want to use the same optimizationswith a different syntax.=item RXf_WHITETells the split operator to split the target string on whitespacewithout invoking the regex engine. The definition of whitespace variesdepending on whether the target string is a UTF-8 string and onwhether RXf_PMf_LOCALE is set.Perl's engine sets this flag if the pattern is C<\s+>.=item RXf_NULLTells the split operator to split the target string oncharacters. The definition of character varies depending on whetherthe target string is a UTF-8 string.Perl's engine sets this flag on empty patterns, this optimizationmakes C<split //> much faster than it would otherwise be. It's evenfaster than C<unpack>.=back=head2 exec I32 exec(pTHX_ REGEXP * const rx, char *stringarg, char* strend, char* strbeg, I32 minend, SV* screamer, void* data, U32 flags);Execute a regexp.=head2 intuit char* intuit(pTHX_ REGEXP * const rx, SV *sv, char *strpos, char *strend, const U32 flags, struct re_scream_pos_data_s *data);Find the start position where a regex match should be attempted,or possibly whether the regex engine should not be run because thepattern can't match. This is called as appropriate by the coredepending on the values of the extflags member of the regexpstructure.=head2 checkstr SV* checkstr(pTHX_ REGEXP * const rx);Return a SV containing a string that must appear in the pattern. Usedby C<split> for optimising matches.=head2 free void free(pTHX_ REGEXP * const rx);Called by perl when it is freeing a regexp pattern so that the enginecan release any resources pointed to by the C<pprivate> member of theregexp structure. This is only responsible for freeing private data;perl will handle releasing anything else contained in the regexp structure.=head2 Numbered capture callbacksCalled to get/set the value of C<$`>, C<$'>, C<$&> and their namedequivalents, ${^PREMATCH}, ${^POSTMATCH} and $^{MATCH}, as well as thenumbered capture buffers (C<$1>, C<$2>, ...).The C<paren> parameter will be C<-2> for C<$`>, C<-1> for C<$'>, C<0>for C<$&>, C<1> for C<$1> and so forth.The names have been chosen by analogy with L<Tie::Scalar> methodsnames with an additional B<LENGTH> callback for efficiency. Howevernamed capture variables are currently not tied internally butimplemented via magic.=head3 numbered_buff_FETCH void numbered_buff_FETCH(pTHX_ REGEXP * const rx, const I32 paren, SV * const sv);Fetch a specified numbered capture. C<sv> should be set to the scalarto return, the scalar is passed as an argument rather than beingreturned from the function because when it's called perl already has ascalar to store the value, creating another one would beredundant. The scalar can be set with C<sv_setsv>, C<sv_setpvn> andfriends, see L<perlapi>.This callback is where perl untaints its own capture variables undertaint mode (see L<perlsec>). See the C<Perl_reg_numbered_buff_fetch>function in F<regcomp.c> for how to untaint capture variables ifthat's something you'd like your engine to do as well.=head3 numbered_buff_STORE void (*numbered_buff_STORE) (pTHX_ REGEXP * const rx, const I32 paren, SV const * const value);Set the value of a numbered capture variable. C<value> is the scalarthat is to be used as the new value. It's up to the engine to makesure this is used as the new value (or reject it).Example: if ("ook" =~ /(o*)/) { # `paren' will be `1' and `value' will be `ee' $1 =~ tr/o/e/; }Perl's own engine will croak on any attempt to modify the capturevariables, to do this in another engine use the following callback(copied from C<Perl_reg_numbered_buff_store>): void Example_reg_numbered_buff_store(pTHX_ REGEXP * const rx, const I32 paren, SV const * const value) { PERL_UNUSED_ARG(rx); PERL_UNUSED_ARG(paren); PERL_UNUSED_ARG(value); if (!PL_localizing) Perl_croak(aTHX_ PL_no_modify); }Actually perl will not I<always> croak in a statement that lookslike it would modify a numbered capture variable. This is because theSTORE callback will not be called if perl can determine that itdoesn't have to modify the value. This is exactly how tied variablesbehave in the same situation: package CaptureVar; use base 'Tie::Scalar'; sub TIESCALAR { bless [] } sub FETCH { undef } sub STORE { die "This doesn't get called" } package main; tie my $sv => "CatptureVar"; $sv =~ y/a/b/;Because C<$sv> is C<undef> when the C<y///> operator is applied to itthe transliteration won't actually execute and the program won'tC<die>. This is different to how 5.8 and earlier versions behavedsince the capture variables were READONLY variables then, now they'lljust die when assigned to in the default engine.=head3 numbered_buff_LENGTH I32 numbered_buff_LENGTH (pTHX_ REGEXP * const rx, const SV * const sv, const I32 paren);Get the C<length> of a capture variable. There's a special callbackfor this so that perl doesn't have to do a FETCH and run C<length> onthe result, since the length is (in perl's case) known from an offsetstored in C<<rx->offs> this is much more efficient: I32 s1 = rx->offs[paren].start; I32 s2 = rx->offs[paren].end; I32 len = t1 - s1;This is a little bit more complex in the case of UTF-8, see whatC<Perl_reg_numbered_buff_length> does withL<is_utf8_string_loclen|perlapi/is_utf8_string_loclen>.=head2 Named capture callbacksCalled to get/set the value of C<%+> and C<%-> as well as by someutility functions in L<re>.
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?