⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 perlreguts.1

📁 视频监控网络部分的协议ddns,的模块的实现代码,请大家大胆指正.
💻 1
📖 第 1 页 / 共 4 页
字号:
\&          | _branch \*(Aq|\*(Aq branch\&    group : \*(Aq(\*(Aq branch \*(Aq)\*(Aq\&    _piece: atom | group\&    piece : _piece\&          | _piece quant.Ve.PP\fIDebug Output\fR.IX Subsection "Debug Output".PPIn the 5.9.x development version of perl you can \f(CW\*(C`<use re Debug =\*(C'\fR '\s-1PARSE\s0'>>to see some trace information about the parse process. We will start with somesimple patterns and build up to more complex patterns..PPSo when we parse \f(CW\*(C`/foo/\*(C'\fR we see something like the following table. Theleft shows what is being parsed, and the number indicates where the next regopwould go. The stuff on the right is the trace output of the graph. Thenames are chosen to be short to make it less dense on the screen. 'tsdy'is a special form of \f(CW\*(C`regtail()\*(C'\fR which does some extra analysis..PP.Vb 6\& >foo<             1    reg\&                          brnc\&                            piec\&                              atom\& ><                4      tsdy~ EXACT <foo> (EXACT) (1)\&                              ~ attach to END (3) offset to 2.Ve.PPThe resulting program then looks like:.PP.Vb 2\&   1: EXACT <foo>(3)\&   3: END(0).Ve.PPAs you can see, even though we parsed out a branch and a piece, it was ultimatelyonly an atom. The final program shows us how things work. We have an \f(CW\*(C`EXACT\*(C'\fR regop,followed by an \f(CW\*(C`END\*(C'\fR regop. The number in parens indicates where the \f(CW\*(C`regnext\*(C'\fR ofthe node goes. The \f(CW\*(C`regnext\*(C'\fR of an \f(CW\*(C`END\*(C'\fR regop is unused, as \f(CW\*(C`END\*(C'\fR regops meanwe have successfully matched. The number on the left indicates the position ofthe regop in the regnode array..PPNow let's try a harder pattern. We will add a quantifier, so now we have the pattern\&\f(CW\*(C`/foo+/\*(C'\fR. We will see that \f(CW\*(C`regbranch()\*(C'\fR calls \f(CW\*(C`regpiece()\*(C'\fR twice..PP.Vb 10\& >foo+<            1    reg\&                          brnc\&                            piec\&                              atom\& >o+<              3        piec\&                              atom\& ><                6        tail~ EXACT <fo> (1)\&                   7      tsdy~ EXACT <fo> (EXACT) (1)\&                              ~ PLUS (END) (3)\&                              ~ attach to END (6) offset to 3.Ve.PPAnd we end up with the program:.PP.Vb 4\&   1: EXACT <fo>(3)\&   3: PLUS(6)\&   4:   EXACT <o>(0)\&   6: END(0).Ve.PPNow we have a special case. The \f(CW\*(C`EXACT\*(C'\fR regop has a \f(CW\*(C`regnext\*(C'\fR of 0. This isbecause if it matches it should try to match itself again. The \f(CW\*(C`PLUS\*(C'\fR regophandles the actual failure of the \f(CW\*(C`EXACT\*(C'\fR regop and acts appropriately (goingto regnode 6 if the \f(CW\*(C`EXACT\*(C'\fR matched at least once, or failing if it didn't)..PPNow for something much more complex: \f(CW\*(C`/x(?:foo*|b[a][rR])(foo|bar)$/\*(C'\fR.PP.Vb 10\& >x(?:foo*|b...    1    reg\&                          brnc\&                            piec\&                              atom\& >(?:foo*|b[...    3        piec\&                              atom\& >?:foo*|b[a...                 reg\& >foo*|b[a][...                   brnc\&                                    piec\&                                      atom\& >o*|b[a][rR...    5                piec\&                                      atom\& >|b[a][rR])...    8                tail~ EXACT <fo> (3)\& >b[a][rR])(...    9              brnc\&                  10                piec\&                                      atom\& >[a][rR])(f...   12                piec\&                                      atom\& >a][rR])(fo...                         clas\& >[rR])(foo|...   14                tail~ EXACT <b> (10)\&                                    piec\&                                      atom\& >rR])(foo|b...                         clas\& >)(foo|bar)...   25                tail~ EXACT <a> (12)\&                                  tail~ BRANCH (3)\&                  26              tsdy~ BRANCH (END) (9)\&                                      ~ attach to TAIL (25) offset to 16\&                                  tsdy~ EXACT <fo> (EXACT) (4)\&                                      ~ STAR (END) (6)\&                                      ~ attach to TAIL (25) offset to 19\&                                  tsdy~ EXACT <b> (EXACT) (10)\&                                      ~ EXACT <a> (EXACT) (12)\&                                      ~ ANYOF[Rr] (END) (14)\&                                      ~ attach to TAIL (25) offset to 11\& >(foo|bar)$<               tail~ EXACT <x> (1)\&                            piec\&                              atom\& >foo|bar)$<                    reg\&                  28              brnc\&                                    piec\&                                      atom\& >|bar)$<         31              tail~ OPEN1 (26)\& >bar)$<                          brnc\&                  32                piec\&                                      atom\& >)$<             34              tail~ BRANCH (28)\&                  36              tsdy~ BRANCH (END) (31)\&                                      ~ attach to CLOSE1 (34) offset to 3\&                                  tsdy~ EXACT <foo> (EXACT) (29)\&                                      ~ attach to CLOSE1 (34) offset to 5\&                                  tsdy~ EXACT <bar> (EXACT) (32)\&                                      ~ attach to CLOSE1 (34) offset to 2\& >$<                        tail~ BRANCH (3)\&                                ~ BRANCH (9)\&                                ~ TAIL (25)\&                            piec\&                              atom\& ><               37        tail~ OPEN1 (26)\&                                ~ BRANCH (28)\&                                ~ BRANCH (31)\&                                ~ CLOSE1 (34)\&                  38      tsdy~ EXACT <x> (EXACT) (1)\&                              ~ BRANCH (END) (3)\&                              ~ BRANCH (END) (9)\&                              ~ TAIL (END) (25)\&                              ~ OPEN1 (END) (26)\&                              ~ BRANCH (END) (28)\&                              ~ BRANCH (END) (31)\&                              ~ CLOSE1 (END) (34)\&                              ~ EOL (END) (36)\&                              ~ attach to END (37) offset to 1.Ve.PPResulting in the program.PP.Vb 10\&   1: EXACT <x>(3)\&   3: BRANCH(9)\&   4:   EXACT <fo>(6)\&   6:   STAR(26)\&   7:     EXACT <o>(0)\&   9: BRANCH(25)\&  10:   EXACT <ba>(14)\&  12:   OPTIMIZED (2 nodes)\&  14:   ANYOF[Rr](26)\&  25: TAIL(26)\&  26: OPEN1(28)\&  28:   TRIE\-EXACT(34)\&        [StS:1 Wds:2 Cs:6 Uq:5 #Sts:7 Mn:3 Mx:3 Stcls:bf]\&          <foo>\&          <bar>\&  30:   OPTIMIZED (4 nodes)\&  34: CLOSE1(36)\&  36: EOL(37)\&  37: END(0).Ve.PPHere we can see a much more complex program, with various optimisations inplay. At regnode 10 we see an example where a character class with onlyone character in it was turned into an \f(CW\*(C`EXACT\*(C'\fR node. We can also see wherean entire alternation was turned into a \f(CW\*(C`TRIE\-EXACT\*(C'\fR node. As a consequence,some of the regnodes have been marked as optimised away. We can see thatthe \f(CW\*(C`$\*(C'\fR symbol has been converted into an \f(CW\*(C`EOL\*(C'\fR regop, a special piece ofcode that looks for \f(CW\*(C`\en\*(C'\fR or the end of the string..PPThe next pointer for \f(CW\*(C`BRANCH\*(C'\fRes is interesting in that it points at whereexecution should go if the branch fails. When executing, if the enginetries to traverse from a branch to a \f(CW\*(C`regnext\*(C'\fR that isn't a branch thenthe engine will know that the entire set of branches has failed..PP\fIPeep-hole Optimisation and Analysis\fR.IX Subsection "Peep-hole Optimisation and Analysis".PPThe regular expression engine can be a weighty tool to wield. On longstrings and complex patterns it can end up having to do a lot of workto find a match, and even more to decide that no match is possible.Consider a situation like the following pattern..PP.Vb 1\&   \*(Aqababababababababababab\*(Aq =~ /(a|b)*z/.Ve.PPThe \f(CW\*(C`(a|b)*\*(C'\fR part can match at every char in the string, and then failevery time because there is no \f(CW\*(C`z\*(C'\fR in the string. So obviously we canavoid using the regex engine unless there is a \f(CW\*(C`z\*(C'\fR in the string.Likewise in a pattern like:.PP.Vb 1\&   /foo(\ew+)bar/.Ve.PPIn this case we know that the string must contain a \f(CW\*(C`foo\*(C'\fR which must befollowed by \f(CW\*(C`bar\*(C'\fR. We can use Fast Boyer-Moore matching as implementedin \f(CW\*(C`fbm_instr()\*(C'\fR to find the location of these strings. If they don't existthen we don't need to resort to the much more expensive regex engine.Even better, if they do exist then we can use their positions toreduce the search space that the regex engine needs to cover to determineif the entire pattern matches..PPThere are various aspects of the pattern that can be used to facilitateoptimisations along these lines:.IP "\(bu" 5anchored fixed strings.IP "\(bu" 5floating fixed strings.IP "\(bu" 5minimum and maximum length requirements.IP "\(bu" 5start class.IP "\(bu" 5Beginning/End of line positions.PPAnother form of optimisation that can occur is the post-parse \*(L"peep-hole\*(R"optimisation, where inefficient constructs are replaced by more efficientconstructs. The \f(CW\*(C`TAIL\*(C'\fR regops which are used during parsing to mark the endof branches and the end of groups are examples of this. These regops are usedas place-holders during construction and \*(L"always match\*(R" so they can be\&\*(L"optimised away\*(R" by making the things that point to the \f(CW\*(C`TAIL\*(C'\fR point to thething that \f(CW\*(C`TAIL\*(C'\fR points to, thus \*(L"skipping\*(R" the node..PPAnother optimisation that can occur is that of "\f(CW\*(C`EXACT\*(C'\fR merging" which iswhere two consecutive \f(CW\*(C`EXACT\*(C'\fR nodes are merged into a singleregop. An even more aggressive form of this is that a branchsequence of the form \f(CW\*(C`EXACT BRANCH ... EXACT\*(C'\fR can be converted into a\&\f(CW\*(C`TRIE\-EXACT\*(C'\fR regop..PPAll of this occurs in the routine \f(CW\*(C`study_chunk()\*(C'\fR which uses a specialstructure \f(CW\*(C`scan_data_t\*(C'\fR to store the analysis that it has performed, anddoes the \*(L"peep-hole\*(R" optimisations as it goes..PPThe code involved in \f(CW\*(C`study_chunk()\*(C'\fR is extremely cryptic. Be careful. :\-).Sh "Execution".IX Subsection "Execution"Execution of a regex generally involves two phases, the first beingfinding the start point in the string where we should match from,and the second being running the regop interpreter..PPIf we can tell that there is no valid start point then we don't bother runninginterpreter at all. Likewise, if we know from the analysis phase that wecannot detect a short-cut to the start position, we go straight to theinterpreter..PPThe two entry points are \f(CW\*(C`re_intuit_start()\*(C'\fR and \f(CW\*(C`pregexec()\*(C'\fR. These routines

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -