📄 ch24_02.htm
字号:
<html><head><title>Efficiency (Programming Perl)</title><!-- STYLESHEET --><link rel="stylesheet" type="text/css" href="../style/style1.css"><!-- METADATA --><!--Dublin Core Metadata--><meta name="DC.Creator" content=""><meta name="DC.Date" content=""><meta name="DC.Format" content="text/xml" scheme="MIME"><meta name="DC.Generator" content="XSLT stylesheet, xt by James Clark"><meta name="DC.Identifier" content=""><meta name="DC.Language" content="en-US"><meta name="DC.Publisher" content="O'Reilly & Associates, Inc."><meta name="DC.Source" content="" scheme="ISBN"><meta name="DC.Subject.Keyword" content=""><meta name="DC.Title" content="Efficiency"><meta name="DC.Type" content="Text.Monograph"></head><body><!-- START OF BODY --><!-- TOP BANNER --><img src="gifs/smbanner.gif" usemap="#banner-map" border="0" alt="Book Home"><map name="banner-map"><AREA SHAPE="RECT" COORDS="0,0,466,71" HREF="index.htm" ALT="Programming Perl"><AREA SHAPE="RECT" COORDS="467,0,514,18" HREF="jobjects/fsearch.htm" ALT="Search this book"></map><!-- TOP NAV BAR --><div class="navbar"><table width="515" border="0"><tr><td align="left" valign="top" width="172"><a href="ch24_01.htm"><img src="../gifs/txtpreva.gif" alt="Previous" border="0"></a></td><td align="center" valign="top" width="171"><a href="ch24_01.htm">Chapter 24: Common Practices</a></td><td align="right" valign="top" width="172"><a href="ch24_03.htm"><img src="../gifs/txtnexta.gif" alt="Next" border="0"></a></td></tr></table></div><hr width="515" align="left"><!-- SECTION BODY --><h2 class="sect1">24.2. Efficiency</h2><p><a name="INDEX-4192"></a><a name="INDEX-4193"></a><a name="INDEX-4194"></a><a name="INDEX-4195"></a><a name="INDEX-4196"></a>While most of the work of programming may be simply getting your programworking properly, you may find yourself wanting more bang for the buckout of your Perl program. Perl's rich set of operators, data types, andcontrol constructs are not necessarily intuitive when it comes to speedand space optimization. Many trade-offs were made during Perl's design,and such decisions are buried in the guts of the code. In general, theshorter and simpler your code is, the faster it runs, but there areexceptions. This section attempts to help you make it work just a weebit better.</p><p>If you want it to work a lot better, you can play with the Perlcompiler backend described in <a href="ch18_01.htm">Chapter 18, "Compiling"</a>, or rewrite yourinner loop as a C extension as illustrated in <a href="ch21_01.htm">Chapter 21, "Internals and Externals"</a>.</p><p><a name="INDEX-4197"></a>Note that optimizing for time may sometimes cost you in space orprogrammer efficiency (indicated by conflicting hints below). Them'sthe breaks. If programming was easy, they wouldn't need something ascomplicated as a human being to do it, now would they?</p><h3 class="sect2">24.2.1. Time Efficiency</h3><ul><li><p><a name="INDEX-4198"></a><a name="INDEX-4199"></a>Use hashes instead of linear searches. For example, instead of searchingthrough <tt class="literal">@keywords</tt> to see if <tt class="literal">$_</tt> is a keyword, construct a hashwith:<blockquote><pre class="programlisting">my %keywords;for (@keywords) { $keywords{$_}++;}</pre></blockquote>Then you can quickly tell if <tt class="literal">$_</tt> contains a keyword by testing<tt class="literal">$keyword{$_}</tt> for a nonzero value.</p></li><li><p><a name="INDEX-4200"></a><a name="INDEX-4201"></a>Avoid subscripting when a <tt class="literal">foreach</tt> or list operatorwill do. Not only is subscripting an extra operation, but if yoursubscript variable happens to be in floating point because you didarithmetic, an extra conversion from floating point back to integer isnecessary. There's often a better way to do it. Consider using<tt class="literal">foreach</tt>, <tt class="literal">shift</tt>, and<tt class="literal">splice</tt> operations. Consider saying <tt class="literal">useinteger</tt>.</p></li><li><p><a name="INDEX-4202"></a>Avoid <tt class="literal">goto</tt>. It scans outward from your current location for theindicated label.</p></li><li><p><a name="INDEX-4203"></a><a name="INDEX-4204"></a>Avoid <tt class="literal">printf</tt> when <tt class="literal">print</tt> will do.</p></li><li><p><a name="INDEX-4205"></a><a name="INDEX-4206"></a><a name="INDEX-4207"></a>Avoid <tt class="literal">$&</tt> and its two buddies, <tt class="literal">$`</tt> and <tt class="literal">$'</tt>. Any occurrence inyour program causes all matches to save the searched string forpossible future reference. (However, once you've blown it, it doesn'thurt to have more of them.)</p></li><li><p><a name="INDEX-4208"></a><a name="INDEX-4209"></a>Avoid using <tt class="literal">eval</tt> on a string. An<tt class="literal">eval</tt> of a string (although not of a<em class="replaceable">BLOCK</em>) forces recompilation every timethrough. The Perl parser is pretty fast for a parser, but that's notsaying much. Nowadays there's almost always a better way to do whatyou want anyway. In particular, any code that uses<tt class="literal">eval</tt> merely to construct variable names is obsoletesince you can now do the same directly using symbolic references:<blockquote><pre class="programlisting">no strict 'refs';$name = "variable";$$name = 7; # Sets $variable to 7</pre></blockquote></p></li><li><p><a name="INDEX-4210"></a><a name="INDEX-4211"></a>Avoid <tt class="literal">eval</tt><em class="replaceable">STRING</em> insidea loop. Put the loop into the <tt class="literal">eval</tt> instead, toavoid redundant recompilations of the code. See the<tt class="literal">study</tt> operator in <a href="ch29_01.htm">Chapter 29, "Functions"</a>for an example of this.</p></li><li><p> Avoid run-time-compiled patterns. Use the<tt class="literal">/</tt><em class="replaceable">pattern</em><tt class="literal">/o</tt>(once only) pattern modifier to avoid pattern recompilation when thepattern doesn't change over the life of the process. For patterns thatchange occasionally, you can use the fact that a null pattern refersback to the previous pattern, like this:<blockquote><pre class="programlisting">"foundstring" =~ /$currentpattern/; # Dummy match (must succeed).while (<>) { print if //;}</pre></blockquote>Alternatively, you can precompile your regular expression using the <tt class="literal">qr</tt>quote construct. You can also use <tt class="literal">eval</tt> to recompile a subroutinethat does the match (if you only recompile occasionally). That works even better if you compile a bunch of matches into a single subroutine, thus amortizing the subroutine call overhead.</p></li><li><p>Short-circuit alternation is often faster than the corresponding regex. So:<blockquote><pre class="programlisting">print if /one-hump/ || /two/;</pre></blockquote>is likely to be faster than:<blockquote><pre class="programlisting">print if /one-hump|two/;</pre></blockquote>at least for certain values of <tt class="literal">one-hump</tt> and <tt class="literal">two</tt>. This is because theoptimizer likes to hoist certain simple matching operations up intohigher parts of the syntax tree and do very fast matching with aBoyer-Moore algorithm. A complicated pattern tends to defeat this.</p></li><li><p><a name="INDEX-4212"></a>Reject common cases early with <tt class="literal">next if</tt>. As with simple regularexpressions, the optimizer likes this. And it just makes sense to avoidunnecessary work. You can typically discard comment lines and blanklines even before you do a <tt class="literal">split</tt> or <tt class="literal">chop</tt>:<blockquote><pre class="programlisting">while (<>) { next if /^#/; next if /^$/; chop; @piggies = split(/,/); ...}</pre></blockquote></p></li><li><p><a name="INDEX-4213"></a>Avoid regular expressions with many quantifiers or with big<tt class="literal">{</tt><em class="replaceable">MIN</em><tt class="literal">,</tt><em class="replaceable">MAX</em><tt class="literal">}</tt> numbers on parenthesized expressions. Such patternscan result in exponentially slow backtracking behavior unless thequantified subpatterns match on their first "pass". You can alsouse the <tt class="literal">(?>...)</tt> construct to force a subpattern to eithermatch completely or fail without backtracking.</p></li><li><p><a name="INDEX-4214"></a>Try to maximize the length of any nonoptional literal strings inregular expressions. This is counterintuitive, but longer patternsoften match faster than shorter patterns. That's because the optimizerlooks for constant strings and hands them off to a Boyer-Moore search,which benefits from longer strings. Compile your pattern with Perl's<span class="option">-Dr</span> debugging switch to see what Dr. Perl thinks the longest literalstring is.</p></li><li><p><a name="INDEX-4215"></a>Avoid expensive subroutine calls in tight loops. There is overheadassociated with calling subroutines, especially when you pass lengthyparameter lists or return lengthy values. In order of increasingdesperation, try passing values by reference, passing values asdynamically scoped globals, inlining the subroutine, or rewriting thewhole loop in C. (Better than all of those solutions is if you can define thesubroutine out of existence by using a smarter algorithm.)</p></li><li><p><a name="INDEX-4216"></a>Avoid <tt class="literal">getc</tt> for anything but single-character terminal I/O. In fact,don't use it for that either. Use <tt class="literal">sysread</tt>.</p></li><li><p>Avoid frequent <tt class="literal">substr</tt>s on long strings, especially if the stringcontains UTF-8. It's okay to use <tt class="literal">substr</tt> at the front of a string,and for some tasks you can keep the <tt class="literal">substr</tt> at the front by "chewing up"the string as you go with a four-argument <tt class="literal">substr</tt>, replacing thepart you grabbed with <tt class="literal">""</tt>:<blockquote><pre class="programlisting">while ($buffer) { process(substr($buffer, 0, 10, ""));}</pre></blockquote></p></li><li><p><a name="INDEX-4217"></a>Use <tt class="literal">pack</tt> and <tt class="literal">unpack</tt> instead of multiple <tt class="literal">substr</tt> invocations.</p></li><li><p>Use <tt class="literal">substr</tt> as an lvalue rather than concatenating substrings. Forexample, to replace the fourth through seventh characters of <tt class="literal">$foo</tt> withthe contents of the variable <tt class="literal">$bar</tt>, don't do this:<blockquote><pre class="programlisting">$foo = substr($foo,0,3) . $bar . substr($foo,7);</pre></blockquote>Instead, simply identify the part of the string to be replaced andassign into it, as in:<blockquote><pre class="programlisting">substr($foo, 3, 4) = $bar;</pre></blockquote>But be aware that if <tt class="literal">$foo</tt> is a huge string and <tt class="literal">$bar</tt> isn'texactly the length of the "hole", this can do a lot of copying too. Perl tries to minimize that by copying from either the front or theback, but there's only so much it can do if the <tt class="literal">substr</tt> is in themiddle.</p></li><li><p>Use <tt class="literal">s///</tt> rather than concatenating substrings. This is especiallytrue if you can replace one constant with another of the same size. This results in an in-place substitution.</p></li><li><p>Use statement modifiers and equivalent <tt class="literal">and</tt> and <tt class="literal">or</tt> operatorsinstead of full-blown conditionals. Statement modifiers (like <tt class="literal">$ring= 0 unless $engaged</tt>) and logical operators avoid the overhead ofentering and leaving a block. They can often be more readable too.</p></li><li><p>Use <tt class="literal">$foo = $a || $b || $c</tt>. This is much faster (and shorter to say)than:<blockquote><pre class="programlisting">if ($a) { $foo = $a;}elsif ($b) { $foo = $b;}elsif ($c) { $foo = $c;}</pre></blockquote>Similarly, set default values with:<blockquote><pre class="programlisting">$pi ||= 3;</pre></blockquote></p></li><li><p>Group together any tests that want the same initial string. When testinga string for various prefixes in anything resembling a switch structure,put together all the <tt class="literal">/^a/</tt> patterns, all the <tt class="literal">/^b/</tt> patterns, and soon.</p></li><li><p><a name="INDEX-4218"></a><a name="INDEX-4219"></a>Don't test things you know won't match. Use <tt class="literal">last</tt> or <tt class="literal">elsif</tt> toavoid falling through to the next case in your switch statement.</p></li><li><p>Use special operators like <tt class="literal">study</tt>, logical string operations, <tt class="literal">pack'u'</tt>, and <tt class="literal">unpack '%'</tt> formats.</p></li><li><p>Beware of the tail wagging the dog. Misstatements resembling <tt class="literal">(<STDIN>)[0]</tt> can cause Perl much unnecessary work. In accordancewith Unix philosophy, Perl gives you enough rope to hang yourself.</p></li><li><p>Factor operations out of loops. The Perl optimizer does not attempt toremove invariant code from loops. It expects you to exercise some sense.</p></li><li><p>Strings can be faster than arrays.</p></li><li><p>Arrays can be faster than strings. It all depends onwhether you're going to reuse the strings or arrays and whichoperations you're going to perform. Heavy modification of each elementimplies that arrays will be better, and occasional modification of someelements implies that strings will be better. But you just have to tryit and see.</p></li><li><p><tt class="literal">my</tt> variables are faster than <tt class="literal">local</tt> variables.</p></li><li><p>Sorting on a manufactured key array may be faster than using a fancysort subroutine. A given array value will usually be compared multipletimes, so if the sort subroutine has to do much recalculation, it'sbetter to factor out that calculation to a separate pass before theactual sort.</p></li><li><p>If you're deleting characters, <tt class="literal">tr/abc//d</tt> is faster than <tt class="literal">s/[abc]//g</tt>.</p></li><li><p><a name="INDEX-4220"></a><a name="INDEX-4221"></a><a name="INDEX-4222"></a><a name="INDEX-4223"></a><tt class="literal">print</tt> with a comma separator may be faster than concatenatingstrings. For example:<blockquote><pre class="programlisting">print $fullname{$name} . " has a new home directory " . $home{$name} . "\n";</pre></blockquote>has to glue together the two hashes and the two fixed strings beforepassing them to the low-level print routines, whereas:<blockquote><pre class="programlisting">print $fullname{$name}, " has a new home directory ", $home{$name}, "\n";</pre></blockquote>doesn't. On the other hand, depending on the values and thearchitecture, the concatenation may be faster. Try it.</p></li><li><p>Prefer <tt class="literal">join("", ...)</tt> to a series of concatenated strings. Multipleconcatenations may cause strings to be copied back and forth multipletimes. The <tt class="literal">join</tt> operator avoids this.</p></li><li><p><a name="INDEX-4224"></a><tt class="literal">split</tt> on a fixed string is generally faster than <tt class="literal">split</tt> on apattern. That is, use <tt class="literal">split(/ /, ...)</tt> rather than <tt class="literal">split(/ +/, ...)</tt>if you know there will only be one space. However, the patterns<tt class="literal">/\s+/</tt>, <tt class="literal">/^/</tt>, and <tt class="literal">/ /</tt> are specially optimized, as is the special <tt class="literal">split</tt>on whitespace.</p></li><li><p><a name="INDEX-4225"></a><a name="INDEX-4226"></a>Pre-extending an array or string can save some time. As strings andarrays grow, Perl extends them by allocating a new copy with some roomfor growth and copying in the old value. Pre-extending a string withthe <tt class="literal">x</tt> operator or an array by setting<tt class="literal">$#array</tt> can prevent thisoccasional overhead and reduce memory fragmentation.</p></li><li><p><a name="INDEX-4227"></a>Don't <tt class="literal">undef</tt> long strings and arrays if they'll be reused for the samepurpose. This helps prevent reallocation when the string or array mustbe re-extended.</p></li><li><p>Prefer <tt class="literal">"\0" x 8192</tt> over <tt class="literal">unpack("x8192",())</tt>.</p></li><li><p><tt class="literal">system("mkdir ...")</tt> may be faster on multiple directories if the<em class="emphasis">mkdir</em> syscall isn't available.</p></li><li><p><a name="INDEX-4228"></a>Avoid using <tt class="literal">eof</tt> if return values will already indicate it.</p></li><li><p><a name="INDEX-4229"></a><a name="INDEX-4230"></a>Cache entries from files (like <em class="emphasis">passwd</em> and
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -