ch05_17.htm

来自「By Tom Christiansen and Nathan Torkingto」· HTM 代码 · 共 1,002 行 · 第 1/2 页
HTM
1,002 行
><I>   </I></CODE></B></CODE><CODECLASS="literal">|</CODE><CODECLASS="userinput"><B><CODECLASS="replaceable"><I>      19 fix</I></CODE></B></CODE><CODECLASS="userinput"><B><CODECLASS="replaceable"><I>   </I></CODE></B></CODE><CODECLASS="literal">|</CODE><CODECLASS="userinput"><B><CODECLASS="replaceable"><I>       6 .</I></CODE></B></CODE></PRE><PCLASS="para">The arguments you give <EMCLASS="emphasis">dutree</EM> are passed through to <EMCLASS="emphasis">du</EM>. That way you could call <EMCLASS="emphasis">dutree</EM> in any of these ways, or maybe more if your <EMCLASS="emphasis">du</EM> supports other options.</P><PRECLASS="programlisting">% dutree% dutree /usr% dutree -a % dutree -a /bin</PRE><PCLASS="para">The <CODECLASS="literal">%Dirsize</CODE> hash maintains the mapping of names to sizes. For example, <CODECLASS="literal">$Dirsize{&quot;pcb&quot;}</CODE> contains 2412 in this sample run. We'll use that hash both for output and for sorting each directory's subdirectories by size.</P><PCLASS="para"><CODECLASS="literal">%Kids</CODE> is more interesting. For any given path <CODECLASS="literal">$path</CODE>, <CODECLASS="literal">$Kids{$path}</CODE> contains a (reference to an) array of names of subdirectories of this one. The <CODECLASS="literal">&quot;pcb&quot;</CODE> entry contains a reference to an anonymous array containing <CODECLASS="literal">&quot;fix&quot;</CODE>, <CODECLASS="literal">&quot;rev&quot;</CODE>, and <CODECLASS="literal">&quot;pending&quot;</CODE>. The <CODECLASS="literal">&quot;rev&quot;</CODE> entry contains <CODECLASS="literal">&quot;maybe&quot;</CODE> and <CODECLASS="literal">&quot;web&quot;</CODE>. The <CODECLASS="literal">&quot;maybe&quot;</CODE> entry contains <CODECLASS="literal">&quot;yes&quot;</CODE> and <CODECLASS="literal">&quot;not&quot;</CODE>, which do not have their own entries because they are end nodes in the tree.</P><PCLASS="para">The <CODECLASS="literal">output</CODE><ACLASS="indexterm"NAME="ch05-idx-1000006538-0"></A> function is passed the start of the tree &nbsp;-  the last line read in from the output of <EMCLASS="emphasis">du</EM>. First it prints that directory and its size. Then the function sorts the directory's children (if any) so that those with the most disk usage float to the top. Finally, <CODECLASS="literal">output</CODE> calls itself, recursing on each child in order. The extra arguments are used in formatting.</P><PCLASS="para">This program is inherently recursive because the filesystem is recursive. However, its data structure is not recursive; at least, not the way a circular linked list is. Each value is an array of further keys to process. The recursion resides in the processing, not in the storage.</P><DIVCLASS="example"><H4CLASS="example"><ACLASS="title"NAME="ch05-24951">Example 5.3: dutree</A></H4><PRECLASS="programlisting">#!/usr/bin/perl -w# dutree - print sorted indented rendition of du outputuse strict;my %Dirsize;my %Kids;getdots(my $topdir = input());output($topdir);# run du, read in input, save sizes and kids# return last directory (file?) readsub input {     my($size, $name, $parent);    @ARGV = (&quot;du @ARGV |&quot;);         # prep the arguments    while (&lt;&gt;) {                    # magic open is our friend<ACLASS="indexterm"NAME="ch05-idx-1000006647-0"></A>        ($size, $name) = split;        $Dirsize{$name} = $size;        ($parent = $name) =~ s#/[^/]+$##;   # dirname        push @{ $Kids{$parent} }, $name unless eof;    }     return $name;}# figure out how much is taken up in each directory# that isn't stored in subdirectories.  add a new# fake kid called &quot;.&quot; containing that much.sub getdots {    my $root = $_[0];    my($size, $cursize);    $size = $cursize = $Dirsize{$root};    if ($Kids{$root}) {        for my $kid (@{ $Kids{$root} }) {             $cursize -= $Dirsize{$kid};            getdots($kid);        }    }     if ($size != $cursize) {        my $dot = &quot;$root/.&quot;;        $Dirsize{$dot} = $cursize;        push @{ $Kids{$root} }, $dot;    } } # recursively output everything,# passing padding and number width in as well# on recursive callssub output {    my($root, $prefix, $width) = (shift, shift || '', shift || 0);    my $path;    ($path = $root) =~ s#.*/##;     # basename    my $size = $Dirsize{$root};    my $line = sprintf(&quot;%${width}d %s&quot;, $size, $path);    print $prefix, $line, &quot;\n&quot;;    for ($prefix .= $line) {        # build up more output        s/\d /| /;        s/[^|]/ /g;    }    if ($Kids{$root}) {             # not a bachelor node        my @Kids = @{ $Kids{$root} };        @Kids = sort { $Dirsize{$b} &lt;=&gt; $Dirsize{$a} } @Kids;        $Dirsize{$Kids[0]} =~ /(\d+)/;        my $width = length $1;        for my $kid (@Kids) { output($kid, $prefix, $width) }    }} </PRE></DIV><PCLASS="para">Before Perl supported hashes of arrays directly, Herculean efforts were required to emulate these higher order constructs. Some folks used repeated calls to <CODECLASS="literal">split</CODE> and <CODECLASS="literal">join</CODE>, but these were exceedingly slow.</P><PCLASS="para"><ACLASS="xref"HREF="ch05_17.htm#ch05-29853"TITLE="dutree-orig">Example 5.4</A> is a version of <EMCLASS="emphasis">dutree</EM> from those days of Perl arcana. Because we didn't have proper array references, we had to usurp the Perl symbol table itself. This program created variables on the fly with bizarre names. Can you find which hash this program is using?</P><PCLASS="para">The <CODECLASS="literal">@{&quot;pcb&quot;}</CODE> array contains <CODECLASS="literal">&quot;pcb/fix&quot;</CODE>, <CODECLASS="literal">&quot;pcb/rev&quot;</CODE>, and <CODECLASS="literal">&quot;pcb/pending&quot;</CODE>. The <CODECLASS="literal">@{&quot;pcb/rev&quot;}</CODE> array contains <CODECLASS="literal">&quot;pcb/rev/maybe&quot;</CODE> and <CODECLASS="literal">&quot;pcb/rev/web&quot;</CODE>. The <CODECLASS="literal">@{&quot;pcb/rev/maybe&quot;}</CODE> array contains <CODECLASS="literal">&quot;pcb/rev/yes&quot;</CODE> and <CODECLASS="literal">&quot;pcb/rev/not&quot;</CODE>.</P><PCLASS="para">When you assign something like <CODECLASS="literal">&quot;pcb/fix&quot;</CODE> to <CODECLASS="literal">*kid</CODE>, it promotes the string on the right-hand side to a typeglob. This makes <CODECLASS="literal">@kid</CODE> an alias for <CODECLASS="literal">@{&quot;pcb/fix&quot;}</CODE>&nbsp;- among other things. It would also alias <CODECLASS="literal">&amp;kid</CODE> to <CODECLASS="literal">&amp;{&quot;pcb/fix&quot;}</CODE>, and so on.</P><PCLASS="para">If that isn't interesting enough, consider how the <CODECLASS="literal">local</CODE> is using dynamic scoping of global variables to avoid passing in extra arguments. Check out what's happening with the <CODECLASS="literal">$width</CODE> variable in the <CODECLASS="literal">output</CODE> routine.</P><DIVCLASS="example"><H4CLASS="example"><ACLASS="title"NAME="ch05-29853">Example 5.4: dutree-orig</A></H4><PRECLASS="programlisting">#!/usr/bin/perl# <ACLASS="indexterm"NAME="ch05-idx-1000006913-0"></A>dutree_orig: the old version pre-perl5 (early 90s)@lines = `du @ARGV`;chop(@lines);&amp;input($top = pop @lines);&amp;output($top);exit;sub input {    local($root, *kid, $him) = @_[0,0];    while (@lines &amp;&amp; &amp;childof($root, $lines[$#lines])) {        &amp;input($him = pop(@lines));        push(@kid, $him);    }     if (@kid) {        local($mysize) = ($root =~ /^(\d+)/);        for (@kid) { $mysize -= (/^(\d+)/)[0]; }         push(@kid, &quot;$mysize .&quot;) if $size != $mysize;    }     @kid = &amp;sizesort(*kid);} sub output {    local($root, *kid, $prefix) = @_[0,0,1];    local($size, $path) = split(' ', $root);    $path =~ s!.*/!!;    $line = sprintf(&quot;%${width}d %s&quot;, $size, $path);    print $prefix, $line, &quot;\n&quot;;    $prefix .= $line;    $prefix =~ s/\d /| /;    $prefix =~ s/[^|]/ /g;    local($width) = $kid[0] =~ /(\d+)/ &amp;&amp; length(&quot;$1&quot;);    for (@kid) { &amp;output($_, $prefix); };} sub sizesort {    local(*list, @index) = shift;    sub bynum { $index[$b] &lt;=&gt; $index[$a]; }    for (@list) { push(@index, /(\d+)/); }     @list[sort bynum 0..$#list];} sub childof {    local(@pair) = @_;    for (@pair) { s/^\d+\s+//g; s/$/\//; }              index($pair[1], $pair[0]) &gt;= 0;}</PRE></DIV><PCLASS="para">The answer to the question posed above, "Which hash is the old <EMCLASS="emphasis">dutree</EM> using?" is <CODECLASS="literal">%main::</CODE>, that is, the Perl symbol table itself. Needless to say, this program will never run under <CODECLASS="literal">use</CODE> <CODECLASS="literal">strict</CODE>. We're happy to report that the updated version runs three times as fast as the old one. That's because the old one keeps looking up variables in the symbol table, and the new one doesn't have to. It's also because we avoid all that slow splitting of the space used and the directory name. But we thought we'd show you the old version because it is instructive<EMCLASS="emphasis"></EM><ACLASS="indexterm"NAME="ch05-idx-1000007008-0"></A><ACLASS="indexterm"NAME="ch05-idx-1000007008-1"></A><ACLASS="indexterm"NAME="ch05-idx-1000007008-2"></A><ACLASS="indexterm"NAME="ch05-idx-1000007008-3"></A><ACLASS="indexterm"NAME="ch05-idx-1000007008-4"></A><ACLASS="indexterm"NAME="ch05-idx-1000007008-5"></A> too. <ACLASS="indexterm"NAME="ch05-idx-1000007009-0"></A></P></DIV><DIVCLASS="htmlnav"><P></P><HRALIGN="LEFT"WIDTH="684"TITLE="footer"><TABLEWIDTH="684"BORDER="0"CELLSPACING="0"CELLPADDING="0"><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="ch05_16.htm"TITLE="5.15. Representing Relationships Between Data"><IMGSRC="../gifs/txtpreva.gif"ALT="Previous: 5.15. Representing Relationships Between Data"BORDER="0"></A></TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="228"><ACLASS="book"HREF="index.htm"TITLE="Perl Cookbook"><IMGSRC="../gifs/txthome.gif"ALT="Perl Cookbook"BORDER="0"></A></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="228"><ACLASS="chapter"HREF="ch06_01.htm"TITLE="6. Pattern Matching"><IMGSRC="../gifs/txtnexta.gif"ALT="Next: 6. Pattern Matching"BORDER="0"></A></TD></TR><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="228">5.15. Representing Relationships Between Data</TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="228"><ACLASS="index"HREF="index/index.htm"TITLE="Book Index"><IMGSRC="../gifs/index.gif"ALT="Book Index"BORDER="0"></A></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="228">6. Pattern Matching</TD></TR></TABLE><HRALIGN="LEFT"WIDTH="684"TITLE="footer"><FONTSIZE="-1"></DIV<!-- LIBRARY NAV BAR --> <img src="../gifs/smnavbar.gif" usemap="#library-map" border="0" alt="Library Navigation Links"><p> <a href="copyrght.htm">Copyright &copy; 2002</a> O'Reilly &amp; Associates. All rights reserved.</font> </p> <map name="library-map"> <area shape="rect" coords="1,0,85,94" href="../index.htm"><area shape="rect" coords="86,1,178,103" href="../lwp/index.htm"><area shape="rect" coords="180,0,265,103" href="../lperl/index.htm"><area shape="rect" coords="267,0,353,105" href="../perlnut/index.htm"><area shape="rect" coords="354,1,446,115" href="../prog/index.htm"><area shape="rect" coords="448,0,526,132" href="../tk/index.htm"><area shape="rect" coords="528,1,615,119" href="../cookbook/index.htm"><area shape="rect" coords="617,0,690,135" href="../pxml/index.htm"></map> </BODY></HTML>
ch05_17.htm - 源码说明

本页面展示了「By Tom Christiansen and Nathan Torkington ISBN 1-56592-243-3 First Edition, published August 1998」中的 ch05_17.htm 源码文件，采用 HTM 编程语言编写，共 1,002 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫下载站收录了大量与Christiansen相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?