📄 ch08_07.htm
字号:
<HTML><HEAD><TITLE>Recipe 8.6. Picking a Random Line from a File (Perl Cookbook)</TITLE><METANAME="DC.title"CONTENT="Perl Cookbook"><METANAME="DC.creator"CONTENT="Tom Christiansen & Nathan Torkington"><METANAME="DC.publisher"CONTENT="O'Reilly & Associates, Inc."><METANAME="DC.date"CONTENT="1999-07-02T01:38:42Z"><METANAME="DC.type"CONTENT="Text.Monograph"><METANAME="DC.format"CONTENT="text/html"SCHEME="MIME"><METANAME="DC.source"CONTENT="1-56592-243-3"SCHEME="ISBN"><METANAME="DC.language"CONTENT="en-US"><METANAME="generator"CONTENT="Jade 1.1/O'Reilly DocBook 3.0 to HTML 4.0"><LINKREV="made"HREF="mailto:online-books@oreilly.com"TITLE="Online Books Comments"><LINKREL="up"HREF="ch08_01.htm"TITLE="8. File Contents"><LINKREL="prev"HREF="ch08_06.htm"TITLE="8.5. Trailing a Growing File"><LINKREL="next"HREF="ch08_08.htm"TITLE="8.7. Randomizing All Lines"></HEAD><BODYBGCOLOR="#FFFFFF"><img alt="Book Home" border="0" src="gifs/smbanner.gif" usemap="#banner-map" /><map name="banner-map"><area shape="rect" coords="1,-2,616,66" href="index.htm" alt="Perl Cookbook"><area shape="rect" coords="629,-11,726,25" href="jobjects/fsearch.htm" alt="Search this book" /></map><div class="navbar"><p><TABLEWIDTH="684"BORDER="0"CELLSPACING="0"CELLPADDING="0"><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="ch08_06.htm"TITLE="8.5. Trailing a Growing File"><IMGSRC="../gifs/txtpreva.gif"ALT="Previous: 8.5. Trailing a Growing File"BORDER="0"></A></TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="228"><B><FONTFACE="ARIEL,HELVETICA,HELV,SANSERIF"SIZE="-1"><ACLASS="chapter"REL="up"HREF="ch08_01.htm"TITLE="8. File Contents"></A></FONT></B></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="ch08_08.htm"TITLE="8.7. Randomizing All Lines"><IMGSRC="../gifs/txtnexta.gif"ALT="Next: 8.7. Randomizing All Lines"BORDER="0"></A></TD></TR></TABLE></DIV><DIVCLASS="sect1"><H2CLASS="sect1"><ACLASS="title"NAME="ch08-chap08_picking_0">8.6. Picking a Random Line from a File</A></H2><DIVCLASS="sect2"><H3CLASS="sect2"><ACLASS="title"NAME="ch08-pgfId-600">Problem<ACLASS="indexterm"NAME="ch08-idx-1000004656-0"></A><ACLASS="indexterm"NAME="ch08-idx-1000004656-1"></A><ACLASS="indexterm"NAME="ch08-idx-1000004656-2"></A><ACLASS="indexterm"NAME="ch08-idx-1000004656-3"></A></A></H3><PCLASS="para">You want to return a random line from a file.</P></DIV><DIVCLASS="sect2"><H3CLASS="sect2"><ACLASS="title"NAME="ch08-pgfId-606">Solution</A></H3><PCLASS="para">Use <CODECLASS="literal">rand</CODE> and <CODECLASS="literal">$.</CODE> (the current line number) to decide which line to print:</P><PRECLASS="programlisting">srand;rand($.) < 1 && ($line = $_) while <>;# $line is the random line</PRE></DIV><DIVCLASS="sect2"><H3CLASS="sect2"><ACLASS="title"NAME="ch08-pgfId-618">Discussion</A></H3><PCLASS="para">This is a beautiful example of a solution that may not be obvious. We read every line in the file but don't have to store them all in memory. This is great for large files. Each line has a 1 in N (where N is the number of lines read so far) chance of being selected.</P><PCLASS="para">Here's a replacement for <EMCLASS="emphasis">fortune</EM> using this algorithm:</P><PRECLASS="programlisting">$/ = "%%\n";$data = '/usr/share/games/fortunes';srand;rand($.) < 1 && ($adage = $_) while <>;print $adage;</PRE><PCLASS="para">If you know line offsets (for instance, you've created an index) and the number of lines, you can randomly select a line and jump to its offset in the file, but you usually don't have such an index.</P><PCLASS="para">Here's a more rigorous explanation of how the algorithm works. The function call <CODECLASS="literal">rand</CODE> <CODECLASS="literal">($.)</CODE> picks a random number between 0 and the current line number. Therefore, you have a one in N chance, that is, 1/N, of keeping the Nth line. Therefore you've a 100% chance of keeping the first line, a 50% chance of keeping the second, a 33% chance of keeping the third, and so on. The question is whether this is fair for all N, where N is any positive integer.</P><PCLASS="para">First, some concrete examples, then abstract ones.</P><PCLASS="para">Obviously, a file with one line (N=1) is fair: you always keep the first line because 1/1 = 100%, making it fair for files of 1 line. For a file with two lines, N=2. You always keep the first line; then when reaching the second line, you have a 50% chance of keeping it. Thus, both lines have an equal chance of being selected, which shows that N=2 is fair. For a file with three lines, N=3. You have a one-third chance, 33%, of keeping that third line. That leaves a two-thirds chance of retaining one of the first two out of the three lines. But we've already shown that for those first two lines there's a 50-50 chance of selecting either one. 50 percent of two-thirds is one-third. Thus, you have a one-third chance of selecting each of the three lines of the file.</P><PCLASS="para">In the general case, a file of N+1 lines will choose the last line 1/(N+1) times and one of the previous N lines N/(N+1) times. Dividing N/(N+1) by N leaves us with 1/(N+1) for each the first N lines in our N+1 line file, and also 1/(N+1) for line number N+1. The algorithm is therefore fair for all N, where N is a positive integer.</P><PCLASS="para">We've managed to choose fairly a random line from a file with speed directly proportional to the size of the file, but using no more memory than it takes to hold the longest line, even in the worst case.<ACLASS="indexterm"NAME="ch08-idx-1000004658-0"></A><ACLASS="indexterm"NAME="ch08-idx-1000004658-1"></A><ACLASS="indexterm"NAME="ch08-idx-1000004658-2"></A><ACLASS="indexterm"NAME="ch08-idx-1000004658-3"></A></P></DIV><DIVCLASS="sect2"><H3CLASS="sect2"><ACLASS="title"NAME="ch08-pgfId-648">See Also</A></H3><PCLASS="para">The <CODECLASS="literal">$.</CODE> entry in <ICLASS="filename">perlvar </I>(1) and in the "Special Variables" section of <ACLASS="olink"HREF="../prog/ch02_01.htm">Chapter 2</A> of <ACLASS="citetitle"HREF="../prog/index.htm"TITLE="Programming Perl"><CITECLASS="citetitle">Programming Perl</CITE></A>; <ACLASS="xref"HREF="ch02_08.htm"TITLE="Generating Random Numbers">Recipe 2.7</A>; <ACLASS="xref"HREF="ch02_09.htm"TITLE="Generating Different Random Numbers">Recipe 2.8</A></P></DIV></DIV><DIVCLASS="htmlnav"><P></P><HRALIGN="LEFT"WIDTH="684"TITLE="footer"><TABLEWIDTH="684"BORDER="0"CELLSPACING="0"CELLPADDING="0"><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="ch08_06.htm"TITLE="8.5. Trailing a Growing File"><IMGSRC="../gifs/txtpreva.gif"ALT="Previous: 8.5. Trailing a Growing File"BORDER="0"></A></TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="228"><ACLASS="book"HREF="index.htm"TITLE="Perl Cookbook"><IMGSRC="../gifs/txthome.gif"ALT="Perl Cookbook"BORDER="0"></A></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="228"><ACLASS="sect1"HREF="ch08_08.htm"TITLE="8.7. Randomizing All Lines"><IMGSRC="../gifs/txtnexta.gif"ALT="Next: 8.7. Randomizing All Lines"BORDER="0"></A></TD></TR><TR><TDALIGN="LEFT"VALIGN="TOP"WIDTH="228">8.5. Trailing a Growing File</TD><TDALIGN="CENTER"VALIGN="TOP"WIDTH="228"><ACLASS="index"HREF="index/index.htm"TITLE="Book Index"><IMGSRC="../gifs/index.gif"ALT="Book Index"BORDER="0"></A></TD><TDALIGN="RIGHT"VALIGN="TOP"WIDTH="228">8.7. Randomizing All Lines</TD></TR></TABLE><HRALIGN="LEFT"WIDTH="684"TITLE="footer"><FONTSIZE="-1"></DIV<!-- LIBRARY NAV BAR --> <img src="../gifs/smnavbar.gif" usemap="#library-map" border="0" alt="Library Navigation Links"><p> <a href="copyrght.htm">Copyright © 2002</a> O'Reilly & Associates. All rights reserved.</font> </p> <map name="library-map"> <area shape="rect" coords="1,0,85,94" href="../index.htm"><area shape="rect" coords="86,1,178,103" href="../lwp/index.htm"><area shape="rect" coords="180,0,265,103" href="../lperl/index.htm"><area shape="rect" coords="267,0,353,105" href="../perlnut/index.htm"><area shape="rect" coords="354,1,446,115" href="../prog/index.htm"><area shape="rect" coords="448,0,526,132" href="../tk/index.htm"><area shape="rect" coords="528,1,615,119" href="../cookbook/index.htm"><area shape="rect" coords="617,0,690,135" href="../pxml/index.htm"></map> </BODY></HTML>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -