📄 perlreguts.1
字号:
.\" Automatically generated by Pod::Man 2.16 (Pod::Simple 3.05).\".\" Standard preamble:.\" ========================================================================.de Sh \" Subsection heading.br.if t .Sp.ne 5.PP\fB\\$1\fR.PP...de Sp \" Vertical space (when we can't use .PP).if t .sp .5v.if n .sp...de Vb \" Begin verbatim text.ft CW.nf.ne \\$1...de Ve \" End verbatim text.ft R.fi...\" Set up some character translations and predefined strings. \*(-- will.\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left.\" double quote, and \*(R" will give a right double quote. \*(C+ will.\" give a nicer C++. Capital omega is used to do unbreakable dashes and.\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff,.\" nothing in troff, for use with C<>..tr \(*W-.ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p'.ie n \{\. ds -- \(*W-. ds PI pi. if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch. if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch. ds L" "". ds R" "". ds C` "". ds C' ""'br\}.el\{\. ds -- \|\(em\|. ds PI \(*p. ds L" ``. ds R" '''br\}.\".\" Escape single quotes in literal strings from groff's Unicode transform..ie \n(.g .ds Aq \(aq.el .ds Aq '.\".\" If the F register is turned on, we'll generate index entries on stderr for.\" titles (.TH), headers (.SH), subsections (.Sh), items (.Ip), and index.\" entries marked with X<> in POD. Of course, you'll have to process the.\" output yourself in some meaningful fashion..ie \nF \{\. de IX. tm Index:\\$1\t\\n%\t"\\$2"... nr % 0. rr F.\}.el \{\. de IX...\}.\".\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2)..\" Fear. Run. Save yourself. No user-serviceable parts.. \" fudge factors for nroff and troff.if n \{\. ds #H 0. ds #V .8m. ds #F .3m. ds #[ \f1. ds #] \fP.\}.if t \{\. ds #H ((1u-(\\\\n(.fu%2u))*.13m). ds #V .6m. ds #F 0. ds #[ \&. ds #] \&.\}. \" simple accents for nroff and troff.if n \{\. ds ' \&. ds ` \&. ds ^ \&. ds , \&. ds ~ ~. ds /.\}.if t \{\. ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u". ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u'. ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u'. ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u'. ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u'. ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u'.\}. \" troff and (daisy-wheel) nroff accents.ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V'.ds 8 \h'\*(#H'\(*b\h'-\*(#H'.ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#].ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H'.ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u'.ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#].ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#].ds ae a\h'-(\w'a'u*4/10)'e.ds Ae A\h'-(\w'A'u*4/10)'E. \" corrections for vroff.if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u'.if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u'. \" for low resolution devices (crt and lpr).if \n(.H>23 .if \n(.V>19 \\{\. ds : e. ds 8 ss. ds o a. ds d- d\h'-1'\(ga. ds D- D\h'-1'\(hy. ds th \o'bp'. ds Th \o'LP'. ds ae ae. ds Ae AE.\}.rm #[ #] #H #V #F C.\" ========================================================================.\".IX Title "PERLREGUTS 1".TH PERLREGUTS 1 "2007-12-18" "perl v5.10.0" "Perl Programmers Reference Guide".\" For nroff, turn off justification. Always turn off hyphenation; it makes.\" way too many mistakes in technical documents..if n .ad l.nh.SH "NAME"perlreguts \- Description of the Perl regular expression engine..SH "DESCRIPTION".IX Header "DESCRIPTION"This document is an attempt to shine some light on the guts of the regexengine and how it works. The regex engine represents a significant chunkof the perl codebase, but is relatively poorly understood. This documentis a meagre attempt at addressing this situation. It is derived from theauthor's experience, comments in the source code, other papers on theregex engine, feedback on the perl5\-porters mail list, and no doubt otherplaces as well..PP\&\fB\s-1NOTICE\s0!\fR It should be clearly understood that the behavior andstructures discussed in this represents the state of the engine as theauthor understood it at the time of writing. It is \fB\s-1NOT\s0\fR an \s-1API\s0definition, it is purely an internals guide for those who want to hackthe regex engine, or understand how the regex engine works. Readers ofthis document are expected to understand perl's regex syntax and itsusage in detail. If you want to learn about the basics of Perl'sregular expressions, see perlre. And if you want to replace theregex engine with your own see see perlreapi..SH "OVERVIEW".IX Header "OVERVIEW".Sh "A quick note on terms".IX Subsection "A quick note on terms"There is some debate as to whether to say \*(L"regexp\*(R" or \*(L"regex\*(R". In thisdocument we will use the term \*(L"regex\*(R" unless there is a special reasonnot to, in which case we will explain why..PPWhen speaking about regexes we need to distinguish between their sourcecode form and their internal form. In this document we will use the term\&\*(L"pattern\*(R" when we speak of their textual, source code form, and the term\&\*(L"program\*(R" when we speak of their internal representation. Thesecorrespond to the terms \fIS\-regex\fR and \fIB\-regex\fR that Mark JasonDominus employs in his paper on \*(L"Rx\*(R" ([1] in \*(L"\s-1REFERENCES\s0\*(R")..Sh "What is a regular expression engine?".IX Subsection "What is a regular expression engine?"A regular expression engine is a program that takes a set of constraintsspecified in a mini-language, and then applies those constraints to atarget string, and determines whether or not the string satisfies theconstraints. See perlre for a full definition of the language..PPIn less grandiose terms, the first part of the job is to turn a pattern intosomething the computer can efficiently use to find the matching point inthe string, and the second part is performing the search itself..PPTo do this we need to produce a program by parsing the text. We thenneed to execute the program to find the point in the string thatmatches. And we need to do the whole thing efficiently..Sh "Structure of a Regexp Program".IX Subsection "Structure of a Regexp Program"\fIHigh Level\fR.IX Subsection "High Level".PPAlthough it is a bit confusing and some people object to the terminology, itis worth taking a look at a comment that hasbeen in \fIregexp.h\fR for years:.PP\&\fIThis is essentially a linear encoding of a nondeterministicfinite-state machine (aka syntax charts or \*(L"railroad normal form\*(R" inparsing technology).\fR.PPThe term \*(L"railroad normal form\*(R" is a bit esoteric, with \*(L"syntaxdiagram/charts\*(R", or \*(L"railroad diagram/charts\*(R" being more common terms.Nevertheless it provides a useful mental image of a regex program: eachnode can be thought of as a unit of track, with a single entry and inmost cases a single exit point (there are pieces of track that fork, butstatistically not many), and the whole forms a layout with asingle entry and single exit point. The matching process can be thoughtof as a car that moves along the track, with the particular route throughthe system being determined by the character read at each possibleconnector point. A car can fall off the track at any point but it mayonly proceed as long as it matches the track..PPThus the pattern \f(CW\*(C`/foo(?:\ew+|\ed+|\es+)bar/\*(C'\fR can be thought of as thefollowing chart:.PP.Vb 10\& [start]\& |\& <foo>\& |\& +\-\-\-\-\-+\-\-\-\-\-+\& | | |\& <\ew+> <\ed+> <\es+>\& | | |\& +\-\-\-\-\-+\-\-\-\-\-+\& |\& <bar>\& |\& [end].Ve.PPThe truth of the matter is that perl's regular expressions these days aremuch more complex than this kind of structure, but visualising it this waycan help when trying to get your bearings, and it matches thecurrent implementation pretty closely..PPTo be more precise, we will say that a regex program is an encodingof a graph. Each node in the graph corresponds to part ofthe original regex pattern, such as a literal string or a branch,and has a pointer to the nodes representing the next componentto be matched. Since \*(L"node\*(R" and \*(L"opcode\*(R" already have other meanings in theperl source, we will call the nodes in a regex program \*(L"regops\*(R"..PPThe program is represented by an array of \f(CW\*(C`regnode\*(C'\fR structures, one ormore of which represent a single regop of the program. Struct\&\f(CW\*(C`regnode\*(C'\fR is the smallest struct needed, and has a field structure which isshared with all the other larger structures..PPThe \*(L"next\*(R" pointers of all regops except \f(CW\*(C`BRANCH\*(C'\fR implement concatenation;a \*(L"next\*(R" pointer with a \f(CW\*(C`BRANCH\*(C'\fR on both ends of it is connecting twoalternatives. [Here we have one of the subtle syntax dependencies: anindividual \f(CW\*(C`BRANCH\*(C'\fR (as opposed to a collection of them) is neverconcatenated with anything because of operator precedence.].PPThe operand of some types of regop is a literal string; for others,it is a regop leading into a sub-program. In particular, the operandof a \f(CW\*(C`BRANCH\*(C'\fR node is the first regop of the branch..PP
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -