⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 perlreguts.1

📁 视频监控网络部分的协议ddns,的模块的实现代码,请大家大胆指正.
💻 1
📖 第 1 页 / 共 4 页
字号:
\&\fB\s-1NOTE\s0\fR: As the railroad metaphor suggests, this is \fBnot\fR a treestructure:  the tail of the branch connects to the thing following theset of \f(CW\*(C`BRANCH\*(C'\fRes.  It is a like a single line of railway track thatsplits as it goes into a station or railway yard and rejoins as it comesout the other side..PP\fIRegops\fR.IX Subsection "Regops".PPThe base structure of a regop is defined in \fIregexp.h\fR as follows:.PP.Vb 5\&    struct regnode {\&        U8  flags;    /* Various purposes, sometimes overridden */\&        U8  type;     /* Opcode value as specified by regnodes.h */\&        U16 next_off; /* Offset in size regnode */\&    };.Ve.PPOther larger \f(CW\*(C`regnode\*(C'\fR\-like structures are defined in \fIregcomp.h\fR. Theyare almost like subclasses in that they have the same fields as\&\f(CW\*(C`regnode\*(C'\fR, with possibly additional fields following inthe structure, and in some cases the specific meaning (and name)of some of base fields are overridden. The following is a morecomplete description..ie n .IP """regnode_1""" 4.el .IP "\f(CWregnode_1\fR" 4.IX Item "regnode_1".PD 0.ie n .IP """regnode_2""" 4.el .IP "\f(CWregnode_2\fR" 4.IX Item "regnode_2".PD\&\f(CW\*(C`regnode_1\*(C'\fR structures have the same header, followed by a singlefour-byte argument; \f(CW\*(C`regnode_2\*(C'\fR structures contain two two-bytearguments instead:.Sp.Vb 2\&    regnode_1                U32 arg1;\&    regnode_2                U16 arg1;  U16 arg2;.Ve.ie n .IP """regnode_string""" 4.el .IP "\f(CWregnode_string\fR" 4.IX Item "regnode_string"\&\f(CW\*(C`regnode_string\*(C'\fR structures, used for literal strings, follow the headerwith a one-byte length and then the string data. Strings are padded onthe end with zero bytes so that the total length of the node is amultiple of four bytes:.Sp.Vb 2\&    regnode_string           char string[1];\&                             U8 str_len; /* overrides flags */.Ve.ie n .IP """regnode_charclass""" 4.el .IP "\f(CWregnode_charclass\fR" 4.IX Item "regnode_charclass"Character classes are represented by \f(CW\*(C`regnode_charclass\*(C'\fR structures,which have a four-byte argument and then a 32\-byte (256\-bit) bitmapindicating which characters are included in the class..Sp.Vb 2\&    regnode_charclass        U32 arg1;\&                             char bitmap[ANYOF_BITMAP_SIZE];.Ve.ie n .IP """regnode_charclass_class""" 4.el .IP "\f(CWregnode_charclass_class\fR" 4.IX Item "regnode_charclass_class"There is also a larger form of a char class structure used to represent\&\s-1POSIX\s0 char classes called \f(CW\*(C`regnode_charclass_class\*(C'\fR which has anadditional 4\-byte (32\-bit) bitmap indicating which \s-1POSIX\s0 char classeshave been included..Sp.Vb 3\&    regnode_charclass_class  U32 arg1;\&                             char bitmap[ANYOF_BITMAP_SIZE];\&                             char classflags[ANYOF_CLASSBITMAP_SIZE];.Ve.PP\&\fIregnodes.h\fR defines an array called \f(CW\*(C`regarglen[]\*(C'\fR which gives the sizeof each opcode in units of \f(CW\*(C`size regnode\*(C'\fR (4\-byte). A macro is usedto calculate the size of an \f(CW\*(C`EXACT\*(C'\fR node based on its \f(CW\*(C`str_len\*(C'\fR field..PPThe regops are defined in \fIregnodes.h\fR which is generated from\&\fIregcomp.sym\fR by \fIregcomp.pl\fR. Currently the maximum possible numberof distinct regops is restricted to 256, with about a quarter alreadyused..PPA set of macros makes accessing the fieldseasier and more consistent. These include \f(CW\*(C`OP()\*(C'\fR, which is used to determinethe type of a \f(CW\*(C`regnode\*(C'\fR\-like structure; \f(CW\*(C`NEXT_OFF()\*(C'\fR, which is the offset tothe next node (more on this later); \f(CW\*(C`ARG()\*(C'\fR, \f(CW\*(C`ARG1()\*(C'\fR, \f(CW\*(C`ARG2()\*(C'\fR, \f(CW\*(C`ARG_SET()\*(C'\fR,and equivalents for reading and setting the arguments; and \f(CW\*(C`STR_LEN()\*(C'\fR,\&\f(CW\*(C`STRING()\*(C'\fR and \f(CW\*(C`OPERAND()\*(C'\fR for manipulating strings and regop bearingtypes..PP\fIWhat regop is next?\fR.IX Subsection "What regop is next?".PPThere are three distinct concepts of \*(L"next\*(R" in the regex engine, andit is important to keep them clear..IP "\(bu" 4There is the \*(L"next regnode\*(R" from a given regnode, a value which israrely useful except that sometimes it matches up in terms of valuewith one of the others, and that sometimes the code assumes this toalways be so..IP "\(bu" 4There is the \*(L"next regop\*(R" from a given regop/regnode. This is theregop physically located after the the current one, as determined bythe size of the current regop. This is often useful, such as whendumping the structure we use this order to traverse. Sometimes the codeassumes that the \*(L"next regnode\*(R" is the same as the \*(L"next regop\*(R", or inother words assumes that the sizeof a given regop type is always goingto be one regnode large..IP "\(bu" 4There is the \*(L"regnext\*(R" from a given regop. This is the regop whichis reached by jumping forward by the value of \f(CW\*(C`NEXT_OFF()\*(C'\fR,or in a few cases for longer jumps by the \f(CW\*(C`arg1\*(C'\fR field of the \f(CW\*(C`regnode_1\*(C'\fRstructure. The subroutine \f(CW\*(C`regnext()\*(C'\fR handles this transparently.This is the logical successor of the node, which in some cases, likethat of the \f(CW\*(C`BRANCH\*(C'\fR regop, has special meaning..SH "Process Overview".IX Header "Process Overview"Broadly speaking, performing a match of a string against a patterninvolves the following steps:.IP "A. Compilation" 5.IX Item "A. Compilation".RS 5.PD 0.IP "1. Parsing for size" 5.IX Item "1. Parsing for size".IP "2. Parsing for construction" 5.IX Item "2. Parsing for construction".IP "3. Peep-hole optimisation and analysis" 5.IX Item "3. Peep-hole optimisation and analysis".RE.RS 5.RE.IP "B. Execution" 5.IX Item "B. Execution".RS 5.IP "4. Start position and no-match optimisations" 5.IX Item "4. Start position and no-match optimisations".IP "5. Program execution" 5.IX Item "5. Program execution".RE.RS 5.RE.PD.PPWhere these steps occur in the actual execution of a perl program isdetermined by whether the pattern involves interpolating any stringvariables. If interpolation occurs, then compilation happens at run time. If itdoes not, then compilation is performed at compile time. (The \f(CW\*(C`/o\*(C'\fR modifier changes this,as does \f(CW\*(C`qr//\*(C'\fR to a certain extent.) The engine doesn't really care thatmuch..Sh "Compilation".IX Subsection "Compilation"This code resides primarily in \fIregcomp.c\fR, along with the header files\&\fIregcomp.h\fR, \fIregexp.h\fR and \fIregnodes.h\fR..PPCompilation starts with \f(CW\*(C`pregcomp()\*(C'\fR, which is mostly an initialisationwrapper which farms work out to two other routines for the heavy lifting: thefirst is \f(CW\*(C`reg()\*(C'\fR, which is the start point for parsing; the second,\&\f(CW\*(C`study_chunk()\*(C'\fR, is responsible for optimisation..PPInitialisation in \f(CW\*(C`pregcomp()\*(C'\fR mostly involves the creation and data-fillingof a special structure, \f(CW\*(C`RExC_state_t\*(C'\fR (defined in \fIregcomp.c\fR).Almost all internally-used routines in \fIregcomp.h\fR take a pointer to oneof these structures as their first argument, with the name \f(CW\*(C`pRExC_state\*(C'\fR.This structure is used to store the compilation state and contains manyfields. Likewise there are many macros which operate on thisvariable: anything that looks like \f(CW\*(C`RExC_xxxx\*(C'\fR is a macro that operates onthis pointer/structure..PP\fIParsing for size\fR.IX Subsection "Parsing for size".PPIn this pass the input pattern is parsed in order to calculate how muchspace is needed for each regop we would need to emit. The size is alsoused to determine whether long jumps will be required in the program..PPThis stage is controlled by the macro \f(CW\*(C`SIZE_ONLY\*(C'\fR being set..PPThe parse proceeds pretty much exactly as it does during theconstruction phase, except that most routines are short-circuited tochange the size field \f(CW\*(C`RExC_size\*(C'\fR and not do anything else..PP\fIParsing for construction\fR.IX Subsection "Parsing for construction".PPOnce the size of the program has been determined, the pattern is parsedagain, but this time for real. Now \f(CW\*(C`SIZE_ONLY\*(C'\fR will be false, and theactual construction can occur..PP\&\f(CW\*(C`reg()\*(C'\fR is the start of the parse process. It is responsible forparsing an arbitrary chunk of pattern up to either the end of thestring, or the first closing parenthesis it encounters in the pattern.This means it can be used to parse the top-level regex, or any sectioninside of a grouping parenthesis. It also handles the \*(L"special parens\*(R"that perl's regexes have. For instance when parsing \f(CW\*(C`/x(?:foo)y/\*(C'\fR \f(CW\*(C`reg()\*(C'\fRwill at one point be called to parse from the \*(L"?\*(R" symbol up to andincluding the \*(L")\*(R"..PPAdditionally, \f(CW\*(C`reg()\*(C'\fR is responsible for parsing the one or morebranches from the pattern, and for \*(L"finishing them off\*(R" by correctlysetting their next pointers. In order to do the parsing, it repeatedlycalls out to \f(CW\*(C`regbranch()\*(C'\fR, which is responsible for handling up to thefirst \f(CW\*(C`|\*(C'\fR symbol it sees..PP\&\f(CW\*(C`regbranch()\*(C'\fR in turn calls \f(CW\*(C`regpiece()\*(C'\fR whichhandles \*(L"things\*(R" followed by a quantifier. In order to parse the\&\*(L"things\*(R", \f(CW\*(C`regatom()\*(C'\fR is called. This is the lowest level routine, whichparses out constant strings, character classes, and thevarious special symbols like \f(CW\*(C`$\*(C'\fR. If \f(CW\*(C`regatom()\*(C'\fR encounters a \*(L"(\*(R"character it in turn calls \f(CW\*(C`reg()\*(C'\fR..PPThe routine \f(CW\*(C`regtail()\*(C'\fR is called by both \f(CW\*(C`reg()\*(C'\fR and \f(CW\*(C`regbranch()\*(C'\fRin order to \*(L"set the tail pointer\*(R" correctly. When executing andwe get to the end of a branch, we need to go to the node following thegrouping parens. When parsing, however, we don't know where the end willbe until we get there, so when we do we must go back and update theoffsets as appropriate. \f(CW\*(C`regtail\*(C'\fR is used to make this easier..PPA subtlety of the parsing process means that a regex like \f(CW\*(C`/foo/\*(C'\fR isoriginally parsed into an alternation with a single branch. It is onlyafterwards that the optimiser converts single branch alternations into thesimpler form..PP\fIParse Call Graph and a Grammar\fR.IX Subsection "Parse Call Graph and a Grammar".PPThe call graph looks like this:.PP.Vb 10\&    reg()                        # parse a top level regex, or inside of parens\&        regbranch()              # parse a single branch of an alternation\&            regpiece()           # parse a pattern followed by a quantifier\&                regatom()        # parse a simple pattern\&                    regclass()   #   used to handle a class\&                    reg()        #   used to handle a parenthesised subpattern\&                    ....\&            ...\&            regtail()            # finish off the branch\&        ...\&        regtail()                # finish off the branch sequence. Tie each\&                                 # branch\*(Aqs tail to the tail of the sequence\&                                 # (NEW) In Debug mode this is\&                                 # regtail_study()..Ve.PPA grammar form might be something like this:.PP.Vb 11\&    atom  : constant | class\&    quant : \*(Aq*\*(Aq | \*(Aq+\*(Aq | \*(Aq?\*(Aq | \*(Aq{min,max}\*(Aq\&    _branch: piece\&           | piece _branch\&           | nothing\&    branch: _branch

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -