📄 perlhack.pod
字号:
source, and we'll do that later on.You might also want to look at Gisle Aas's illustrated perlguts -there's no guarantee that this will be absolutely up-to-date with thelatest documentation in the Perl core, but the fundamentals will beright. ( http://gisle.aas.no/perl/illguts/ )=item L<perlxstut> and L<perlxs>A working knowledge of XSUB programming is incredibly useful for corehacking; XSUBs use techniques drawn from the PP code, the portion of theguts that actually executes a Perl program. It's a lot gentler to learnthose techniques from simple examples and explanation than from the coreitself.=item L<perlapi>The documentation for the Perl API explains what some of the internalfunctions do, as well as the many macros used in the source.=item F<Porting/pumpkin.pod>This is a collection of words of wisdom for a Perl porter; some of it isonly useful to the pumpkin holder, but most of it applies to anyonewanting to go about Perl development.=item The perl5-porters FAQThis should be available from http://dev.perl.org/perl5/docs/p5p-faq.html .It contains hints on reading perl5-porters, information on howperl5-porters works and how Perl development in general works.=back=head2 Finding Your Way AroundPerl maintenance can be split into a number of areas, and certain people(pumpkins) will have responsibility for each area. These areas sometimescorrespond to files or directories in the source kit. Among the areas are:=over 3=item Core modulesModules shipped as part of the Perl core live in the F<lib/> and F<ext/>subdirectories: F<lib/> is for the pure-Perl modules, and F<ext/>contains the core XS modules.=item TestsThere are tests for nearly all the modules, built-ins and major bitsof functionality. Test files all have a .t suffix. Module tests livein the F<lib/> and F<ext/> directories next to the module beingtested. Others live in F<t/>. See L<Writing a test>=item DocumentationDocumentation maintenance includes looking after everything in theF<pod/> directory, (as well as contributing new documentation) andthe documentation to the modules in core.=item ConfigureThe configure process is the way we make Perl portable across themyriad of operating systems it supports. Responsibility for theconfigure, build and installation process, as well as the overallportability of the core code rests with the configure pumpkin - othershelp out with individual operating systems.The files involved are the operating system directories, (F<win32/>,F<os2/>, F<vms/> and so on) the shell scripts which generate F<config.h>and F<Makefile>, as well as the metaconfig files which generateF<Configure>. (metaconfig isn't included in the core distribution.)=item InterpreterAnd of course, there's the core of the Perl interpreter itself. Let'shave a look at that in a little more detail.=backBefore we leave looking at the layout, though, don't forget thatF<MANIFEST> contains not only the file names in the Perl distribution,but short descriptions of what's in them, too. For an overview of theimportant files, try this: perl -lne 'print if /^[^\/]+\.[ch]\s+/' MANIFEST=head2 Elements of the interpreterThe work of the interpreter has two main stages: compiling the codeinto the internal representation, or bytecode, and then executing it.L<perlguts/Compiled code> explains exactly how the compilation stagehappens.Here is a short breakdown of perl's operation:=over 3=item StartupThe action begins in F<perlmain.c>. (or F<miniperlmain.c> for miniperl)This is very high-level code, enough to fit on a single screen, and itresembles the code found in L<perlembed>; most of the real action takesplace in F<perl.c>First, F<perlmain.c> allocates some memory and constructs a Perlinterpreter: 1 PERL_SYS_INIT3(&argc,&argv,&env); 2 3 if (!PL_do_undump) { 4 my_perl = perl_alloc(); 5 if (!my_perl) 6 exit(1); 7 perl_construct(my_perl); 8 PL_perl_destruct_level = 0; 9 }Line 1 is a macro, and its definition is dependent on your operatingsystem. Line 3 references C<PL_do_undump>, a global variable - allglobal variables in Perl start with C<PL_>. This tells you whether thecurrent running program was created with the C<-u> flag to perl and thenF<undump>, which means it's going to be false in any sane context.Line 4 calls a function in F<perl.c> to allocate memory for a Perlinterpreter. It's quite a simple function, and the guts of it looks likethis: my_perl = (PerlInterpreter*)PerlMem_malloc(sizeof(PerlInterpreter));Here you see an example of Perl's system abstraction, which we'll seelater: C<PerlMem_malloc> is either your system's C<malloc>, or Perl'sown C<malloc> as defined in F<malloc.c> if you selected that option atconfigure time.Next, in line 7, we construct the interpreter; this sets up all thespecial variables that Perl needs, the stacks, and so on.Now we pass Perl the command line options, and tell it to go: exitstatus = perl_parse(my_perl, xs_init, argc, argv, (char **)NULL); if (!exitstatus) { exitstatus = perl_run(my_perl); }C<perl_parse> is actually a wrapper around C<S_parse_body>, as definedin F<perl.c>, which processes the command line options, sets up anystatically linked XS modules, opens the program and calls C<yyparse> toparse it.=item ParsingThe aim of this stage is to take the Perl source, and turn it into an optree. We'll see what one of those looks like later. Strictly speaking,there's three things going on here.C<yyparse>, the parser, lives in F<perly.c>, although you're better offreading the original YACC input in F<perly.y>. (Yes, Virginia, thereB<is> a YACC grammar for Perl!) The job of the parser is to take yourcode and "understand" it, splitting it into sentences, deciding whichoperands go with which operators and so on.The parser is nobly assisted by the lexer, which chunks up your inputinto tokens, and decides what type of thing each token is: a variablename, an operator, a bareword, a subroutine, a core function, and so on.The main point of entry to the lexer is C<yylex>, and that and itsassociated routines can be found in F<toke.c>. Perl isn't much likeother computer languages; it's highly context sensitive at times, it canbe tricky to work out what sort of token something is, or where a tokenends. As such, there's a lot of interplay between the tokeniser and theparser, which can get pretty frightening if you're not used to it.As the parser understands a Perl program, it builds up a tree ofoperations for the interpreter to perform during execution. The routineswhich construct and link together the various operations are to be foundin F<op.c>, and will be examined later.=item OptimizationNow the parsing stage is complete, and the finished tree representsthe operations that the Perl interpreter needs to perform to execute ourprogram. Next, Perl does a dry run over the tree looking foroptimisations: constant expressions such as C<3 + 4> will be computednow, and the optimizer will also see if any multiple operations can bereplaced with a single one. For instance, to fetch the variable C<$foo>,instead of grabbing the glob C<*foo> and looking at the scalarcomponent, the optimizer fiddles the op tree to use a function whichdirectly looks up the scalar in question. The main optimizer is C<peep>in F<op.c>, and many ops have their own optimizing functions.=item RunningNow we're finally ready to go: we have compiled Perl byte code, and allthat's left to do is run it. The actual execution is done by theC<runops_standard> function in F<run.c>; more specifically, it's done bythese three innocent looking lines: while ((PL_op = CALL_FPTR(PL_op->op_ppaddr)(aTHX))) { PERL_ASYNC_CHECK(); }You may be more comfortable with the Perl version of that: PERL_ASYNC_CHECK() while $Perl::op = &{$Perl::op->{function}};Well, maybe not. Anyway, each op contains a function pointer, whichstipulates the function which will actually carry out the operation.This function will return the next op in the sequence - this allows forthings like C<if> which choose the next op dynamically at run time.The C<PERL_ASYNC_CHECK> makes sure that things like signals interruptexecution if required.The actual functions called are known as PP code, and they're spreadbetween four files: F<pp_hot.c> contains the "hot" code, which is mostoften used and highly optimized, F<pp_sys.c> contains all thesystem-specific functions, F<pp_ctl.c> contains the functions whichimplement control structures (C<if>, C<while> and the like) and F<pp.c>contains everything else. These are, if you like, the C code for Perl'sbuilt-in functions and operators.Note that each C<pp_> function is expected to return a pointer to the nextop. Calls to perl subs (and eval blocks) are handled within the samerunops loop, and do not consume extra space on the C stack. For example,C<pp_entersub> and C<pp_entertry> just push a C<CxSUB> or C<CxEVAL> blockstruct onto the context stack which contain the address of the opfollowing the sub call or eval. They then return the first op of that subor eval block, and so execution continues of that sub or block. Later, aC<pp_leavesub> or C<pp_leavetry> op pops the C<CxSUB> or C<CxEVAL>,retrieves the return op from it, and returns it.=item Exception handingPerl's exception handing (i.e. C<die> etc.) is built on top of the low-levelC<setjmp()>/C<longjmp()> C-library functions. These basically provide away to capture the current PC and SP registers and later restore them; i.e.a C<longjmp()> continues at the point in code where a previous C<setjmp()>was done, with anything further up on the C stack being lost. This is whycode should always save values using C<SAVE_FOO> rather than in autovariables.The perl core wraps C<setjmp()> etc in the macros C<JMPENV_PUSH> andC<JMPENV_JUMP>. The basic rule of perl exceptions is that C<exit>, andC<die> (in the absence of C<eval>) perform a C<JMPENV_JUMP(2)>, whileC<die> within C<eval> does a C<JMPENV_JUMP(3)>.At entry points to perl, such as C<perl_parse()>, C<perl_run()> andC<call_sv(cv, G_EVAL)> each does a C<JMPENV_PUSH>, then enter a runopsloop or whatever, and handle possible exception returns. For a 2 return,final cleanup is performed, such as popping stacks and calling C<CHECK> orC<END> blocks. Amongst other things, this is how scope cleanup stilloccurs during an C<exit>.If a C<die> can find a C<CxEVAL> block on the context stack, then thestack is popped to that level and the return op in that block is assignedto C<PL_restartop>; then a C<JMPENV_JUMP(3)> is performed. This normallypasses control back to the guard. In the case of C<perl_run> andC<call_sv>, a non-null C<PL_restartop> triggers re-entry to the runopsloop. The is the normal way that C<die> or C<croak> is handled within anC<eval>.Sometimes ops are executed within an inner runops loop, such as tie, sortor overload code. In this case, something like sub FETCH { eval { die } }would cause a longjmp right back to the guard in C<perl_run>, popping bothrunops loops, which is clearly incorrect. One way to avoid this is for thetie code to do a C<JMPENV_PUSH> before executing C<FETCH> in the innerrunops loop, but for efficiency reasons, perl in fact just sets a flag,using C<CATCH_SET(TRUE)>. The C<pp_require>, C<pp_entereval> andC<pp_entertry> ops check this flag, and if true, they call C<docatch>,which does a C<JMPENV_PUSH> and starts a new runops level to execute thecode, rather than doing it on the current loop.As a further optimisation, on exit from the eval block in the C<FETCH>,execution of the code following the block is still carried on in the innerloop. When an exception is raised, C<docatch> compares the C<JMPENV>level of the C<CxEVAL> with C<PL_top_env> and if they differ, justre-throws the exception. In this way any inner loops get popped.Here's an example. 1: eval { tie @a, 'A' }; 2: sub A::TIEARRAY { 3: eval { die }; 4: die; 5: }To run this code, C<perl_run> is called, which does a C<JMPENV_PUSH> thenenters a runops loop. This loop executes the eval and tie ops on line 1,with the eval pushing a C<CxEVAL> onto the context stack.The C<pp_tie> does a C<CATCH_SET(TRUE)>, then starts a second runops loopto execute the body of C<TIEARRAY>. When it executes the entertry op online 3, C<CATCH_GET> is true, so C<pp_entertry> calls C<docatch> whichdoes a C<JMPENV_PUSH> and starts a third runops loop, which then executesthe die op. At this point the C call stack looks like this: Perl_pp_die Perl_runops # third loop S_docatch_body S_docatch Perl_pp_entertry Perl_runops # second loop S_call_body Perl_call_sv Perl_pp_tie Perl_runops # first loop S_run_body perl_run mainand the context and data stacks, as shown by C<-Dstv>, look like: STACK 0: MAIN CX 0: BLOCK => CX 1: EVAL => AV() PV("A"\0) retop=leave STACK 1: MAGIC CX 0: SUB => retop=(null) CX 1: EVAL => * retop=nextstateThe die pops the first C<CxEVAL> off the context stack, setsC<PL_restartop> from it, does a C<JMPENV_JUMP(3)>, and control returns to
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -