📄 perldebug.pod
字号:
sub getcwd ;
B<Note:> I<the discussion below supposes 32-bit architecture. In the
newer versions of perl the memory usage of the constructs discussed
here is much improved, but the story discussed below is a real-life
story. This story is very terse, and assumes more than cursory
knowledge of Perl internals.>
Here is the itemized list of Perl allocations performed during parsing
of this file:
!!! "after" at test.pl line 3.
Id subtot 4 8 12 16 20 24 28 32 36 40 48 56 64 72 80 80+
0 02 13752 . . . . 294 . . . . . . . . . . 4
0 54 5545 . . 8 124 16 . . . 1 1 . . . . . 3
5 05 32 . . . . . . . 1 . . . . . . . .
6 02 7152 . . . . . . . . . . 149 . . . . .
7 02 3600 . . . . . 150 . . . . . . . . . .
7 03 64 . -1 . 1 . . 2 . . . . . . . . .
7 04 7056 . . . . . . . . . . . . . . . 7
7 17 38404 . . . . . . . 1 . . 442 149 . . 147 .
9 03 2078 17 249 32 . . . . 2 . . . . . . . .
To see this list insert two C<warn('!...')> statements around the call:
warn('!');
do 'lib/auto/POSIX/autosplit.ix';
warn('!!! "after"');
and run it with B<-DL> option. The first warn() will print memory
allocation info before the parsing of the file, and will memorize the
statistics at this point (we ignore what it prints). The second warn()
will print increments w.r.t. this memorized statistics. This is the
above printout.
Different I<Id>s on the left correspond to different subsystems of
perl interpreter, they are just first argument given to perl memory
allocation API New(). To find what C<9 03> means C<grep> the perl
source for C<903>. You will see that it is F<util.c>, function
savepvn(). This function is used to store a copy of existing chunk of
memory. Using C debugger, one can see that it is called either
directly from gv_init(), or via sv_magic(), and gv_init() is called
from gv_fetchpv() - which is called from newSUB().
B<Note:> to reach this place in debugger and skip all the calls to
savepvn during the compilation of the main script, set a C breakpoint
in Perl_warn(), C<continue> this point is reached, I<then> set
breakpoint in Perl_savepvn(). Note that you may need to skip a
handful of Perl_savepvn() which do not correspond to mass production
of CVs (there are more C<903> allocations than 146 similar lines of
F<lib/auto/POSIX/autosplit.ix>). Note also that C<Perl_> prefixes are
added by macroization code in perl header files to avoid conflicts
with external libraries.
Anyway, we see that C<903> ids correspond to creation of globs, twice
per glob - for glob name, and glob stringification magic.
Here are explanations for other I<Id>s above:
=over
=item C<717>
is for creation of bigger C<XPV*> structures. In the above case it
creates 3 C<AV> per subroutine, one for a list of lexical variable
names, one for a scratchpad (which contains lexical variables and
C<targets>), and one for the array of scratchpads needed for
recursion.
It also creates a C<GV> and a C<CV> per subroutine (all called from
start_subparse()).
=item C<002>
Creates C array corresponding to the C<AV> of scratchpads, and the
scratchpad itself (the first fake entry of this scratchpad is created
though the subroutine itself is not defined yet).
It also creates C arrays to keep data for the stash (this is one HV,
but it grows, thus there are 4 big allocations: the big chunks are not
freeed, but are kept as additional arenas for C<SV> allocations).
=item C<054>
creates a C<HEK> for the name of the glob for the subroutine (this
name is a key in a I<stash>).
Big allocations with this I<Id> correspond to allocations of new
arenas to keep C<HE>.
=item C<602>
creates a C<GP> for the glob for the subroutine.
=item C<702>
creates the C<MAGIC> for the glob for the subroutine.
=item C<704>
creates I<arenas> which keep SVs.
=back
=head2 B<-DL> details
If Perl is run with B<-DL> option, then warn()s which start with `!'
behave specially. They print a list of I<categories> of memory
allocations, and statistics of allocations of different sizes for
these categories.
If warn() string starts with
=over
=item C<!!!>
print changed categories only, print the differences in counts of allocations;
=item C<!!>
print grown categories only; print the absolute values of counts, and totals;
=item C<!>
print nonempty categories, print the absolute values of counts and totals.
=back
=head2 Limitations of B<-DL> statistic
If an extension or an external library does not use Perl API to
allocate memory, these allocations are not counted.
=head1 Debugging regular expressions
There are two ways to enable debugging output for regular expressions.
If your perl is compiled with C<-DDEBUGGING>, you may use the
B<-Dr> flag on the command line.
Otherwise, one can C<use re 'debug'>, which has effects both at
compile time, and at run time (and is I<not> lexically scoped).
=head2 Compile-time output
The debugging output for the compile time looks like this:
compiling RE `[bc]d(ef*g)+h[ij]k$'
size 43 first at 1
1: ANYOF(11)
11: EXACT <d>(13)
13: CURLYX {1,32767}(27)
15: OPEN1(17)
17: EXACT <e>(19)
19: STAR(22)
20: EXACT <f>(0)
22: EXACT <g>(24)
24: CLOSE1(26)
26: WHILEM(0)
27: NOTHING(28)
28: EXACT <h>(30)
30: ANYOF(40)
40: EXACT <k>(42)
42: EOL(43)
43: END(0)
anchored `de' at 1 floating `gh' at 3..2147483647 (checking floating)
stclass `ANYOF' minlen 7
The first line shows the pre-compiled form of the regexp, and the
second shows the size of the compiled form (in arbitrary units,
usually 4-byte words) and the label I<id> of the first node which
does a match.
The last line (split into two lines in the above) contains the optimizer
info. In the example shown, the optimizer found that the match
should contain a substring C<de> at the offset 1, and substring C<gh>
at some offset between 3 and infinity. Moreover, when checking for
these substrings (to abandon impossible matches quickly) it will check
for the substring C<gh> before checking for the substring C<de>. The
optimizer may also use the knowledge that the match starts (at the
C<first> I<id>) with a character class, and the match cannot be
shorter than 7 chars.
The fields of interest which may appear in the last line are
=over
=item C<anchored> I<STRING> C<at> I<POS>
=item C<floating> I<STRING> C<at> I<POS1..POS2>
see above;
=item C<matching floating/anchored>
which substring to check first;
=item C<minlen>
the minimal length of the match;
=item C<stclass> I<TYPE>
The type of the first matching node.
=item C<noscan>
which advises to not scan for the found substrings;
=item C<isall>
which says that the optimizer info is in fact all that the regular
expression contains (thus one does not need to enter the RE engine at
all);
=item C<GPOS>
if the pattern contains C<\G>;
=item C<plus>
if the pattern starts with a repeated char (as in C<x+y>);
=item C<implicit>
if the pattern starts with C<.*>;
=item C<with eval>
if the pattern contain eval-groups (see L<perlre/(?{ code })>);
=item C<anchored(TYPE)>
if the pattern may
match only at a handful of places (with C<TYPE> being
C<BOL>, C<MBOL>, or C<GPOS>, see the table below).
=back
If a substring is known to match at end-of-line only, it may be
followed by C<$>, as in C<floating `k'$>.
The optimizer-specific info is used to avoid entering (a slow) RE
engine on strings which will definitely not match. If C<isall> flag
is set, a call to the RE engine may be avoided even when optimizer
found an appropriate place for the match.
The rest of the output contains the list of I<nodes> of the compiled
form of the RE. Each line has format
C< >I<id>: I<TYPE> I<OPTIONAL-INFO> (I<next-id>)
=head2 Types of nodes
Here is the list of possible types with short descriptions:
# TYPE arg-description [num-args] [longjump-len] DESCRIPTION
# Exit points
END no End of program.
SUCCEED no Return from a subroutine, basically.
# Anchors:
BOL no Match "" at beginning of line.
MBOL no Same, assuming multiline.
SBOL no Same, assuming singleline.
EOS no Match "" at end of string.
EOL no Match "" at end of line.
MEOL no Same, assuming multiline.
SEOL no Same, assuming singleline.
BOUND no Match "" at any word boundary
BOUNDL no Match "" at any word boundary
NBOUND no Match "" at any word non-boundary
NBOUNDL no Match "" at any word non-boundary
GPOS no Matches where last m//g left off.
# [Special] alternatives
ANY no Match any one character (except newline).
SANY no Match any one character.
ANYOF sv Match character in (or not in) this class.
ALNUM no Match any alphanumeric character
ALNUML no Match any alphanumeric char in locale
NALNUM no Match any non-alphanumeric character
NALNUML no Match any non-alphanumeric char in locale
SPACE no Match any whitespace character
SPACEL no Match any whitespace char in locale
NSPACE no Match any non-whitespace character
NSPACEL no Match any non-whitespace char in locale
DIGIT no Match any numeric character
NDIGIT no Match any non-numeric character
# BRANCH The set of branches constituting a single choice are hooked
# together with their "next" pointers, since precedence prevents
# anything being concatenated to any individual branch. The
# "next" pointer of the last BRANCH in a choice points to the
# thing following the whole choice. This is also where the
# final "next" pointer of each individual branch points; each
# branch starts with the operand node of a BRANCH node.
#
BRANCH node Match this alternative, or the next...
# BACK Normal "next" pointers all implicitly point forward; BACK
# exists to make loop structures possible.
# not used
BACK no Match "", "next" ptr points backward.
# Literals
EXACT sv Match this string (preceded by length).
EXACTF sv Match this string, folded (prec. by length).
EXACTFL sv Match this string, folded in locale (w/len).
# Do nothing
NOTHING no Match empty string.
# A variant of above which delimits a group, thus stops optimizations
TAIL no Match empty string. Can jump here from outside.
# STAR,PLUS '?', and complex '*' and '+', are implemented as circular
# BRANCH structures using BACK. Simple cases (one character
# per match) are implemented with STAR and PLUS for speed
# and to minimize recursive plunges.
#
STAR node Match this (simple) thing 0 or more times.
PLUS node Match this (simple) thing 1 or more times.
CURLY sv 2 Match this simple thing {n,m} times.
CURLYN no 2 Match next-after-this simple thing
# {n,m} times, set parenths.
CURLYM no 2 Match this medium-complex thing {n,m} times.
CURLYX sv 2 Match this complex thing {n,m} times.
# This terminator creates a loop structure for CURLYX
WHILEM no Do curly processing and see if rest matches.
# OPEN,CLOSE,GROUPP ...are numbered at compile time.
OPEN num 1 Mark this point in input as start of #n.
CLOSE num 1 Analogous to OPEN.
REF num 1 Match some already matched string
REFF num 1 Match already matched string, folded
REFFL num 1 Match already matched string, folded in loc.
# grouping assertions
IFMATCH off 1 2 Succeeds if the following matches.
UNLESSM off 1 2 Fails if the following matches.
SUSPEND off 1 1 "Independent" sub-RE.
IFTHEN off 1 1 Switch, should be preceeded by switcher .
GROUPP num 1 Whether the group matched.
# Support for long RE
LONGJMP off 1 1 Jump far away.
BRANCHJ off 1 1 BRANCH with long offset.
# The heavy worker
EVAL evl 1 Execute some Perl code.
# Modifiers
MINMOD no Next operator is not greedy.
LOGICAL no Next opcode should set the flag only.
# This is not used yet
RENUM off 1 1 Group with independently numbered parens.
# This is not really a node, but an optimized away piece of a "long" node.
# To simplify debugging output, we mark it as if it were a node
OPTIMIZED off Placeholder for dump.
=head2 Run-time output
First of all, when doing a match, one may get no run-time output even
if debugging is enabled. this means that the RE engine was never
entered, all of the job was done by the optimizer.
If RE engine was entered, the output may look like this:
Matching `[bc]d(ef*g)+h[ij]k$' against `abcdefg__gh__'
Setting an EVAL scope, savestack=3
2 <ab> <cdefg__gh_> | 1: ANYOF
3 <abc> <defg__gh_> | 11: EXACT <d>
4 <abcd> <efg__gh_> | 13: CURLYX {1,32767}
4 <abcd> <efg__gh_> | 26: WHILEM
0 out of 1..32767 cc=effff31c
4 <abcd> <efg__gh_> | 15: OPEN1
4 <abcd> <efg__gh_> | 17: EXACT <e>
5 <abcde> <fg__gh_> | 19: STAR
EXACT <f> can match 1 times out of 32767...
Setting an EVAL scope, savestack=3
6 <bcdef> <g__gh__> | 22: EXACT <g>
7 <bcdefg> <__gh__> | 24: CLOSE1
7 <bcdefg> <__gh__> | 26: WHILEM
1 out of 1..32767 cc=effff31c
Setting an EVAL scope, savestack=12
7 <bcdefg> <__gh__> | 15: OPEN1
7 <bcdefg> <__gh__> | 17: EXACT <e>
restoring \1 to 4(4)..7
failed, try continuation...
7 <bcdefg> <__gh__> | 27: NOTHING
7 <bcdefg> <__gh__> | 28: EXACT <h>
failed...
failed...
The most significant information in the output is about the particular I<node>
of the compiled RE which is currently being tested against the target string.
The format of these lines is
C< >I<STRING-OFFSET> <I<PRE-STRING>> <I<POST-STRING>> |I<ID>: I<TYPE>
The I<TYPE> info is indented with respect to the backtracking level.
Other incidental information appears interspersed within.
=cut
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -