ent.html

来自「这是一个经典的伪随机数生成程序,特别方便和实用」· HTML 代码 · 共 339 行
HTML
339 行
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"><html version="-//W3C//DTD HTML 3.2 Final//EN"><head><title>Pseudorandom Number Sequence Test Program</title><meta name="author" content="John Walker"><meta name="description" content="ENT: A Pseudorandom Number Sequence Test Program"><meta name="keywords" content="pseudorandom, random, test, entropy"></head><body bgcolor="#FFFFFF"><center><h1><img src="entitle.gif" width=276 height=113 alt="ENT"></h1><h2>A Pseudorandom Number Sequence Test Program</h2></center><hr><p>This page describes a program, <code>ent</code>, which applies varioustests to sequences of bytes stored in files and reports the results ofthose tests.  The program is useful for those evaluating pseudorandomnumber generators for encryption and statistical samplingapplications, compression algorithms, and other applications where theinformation density of a file is of interest.<p><h3>NAME</h3>     <b>ent</b> - pseudorandom number sequence test<h3>SYNOPSIS</h3>     <b>ent</b> [ <b>-bcftu</b> ] [ <i>infile</i> ]<h3>DESCRIPTION</h3><b>ent</b> performs a variety of tests on the stream ofbytes in <i>infile</i> (or standard input if no <i>infile</i>is specified) and produces output as follows on the standardoutput stream:<p><pre>    Entropy = 7.980627 bits per character.    Optimum compression would reduce the size    of this 51768 character file by 0 percent.     Chi square distribution for 51768 samples is 1542.26, and randomly    would exceed this value 0.01 percent of the times.      Arithmetic mean value of data bytes is 125.93 (127.5 = random).    Monte Carlo value for Pi is 3.169834647 (error 0.90 percent).    Serial correlation coefficient is 0.004249 (totally uncorrelated = 0.0).</pre><p>The values calculated are as follows:<dl><dt><b>Entropy</b><dd>The information density of the contents of the file,                expressed as a number of bits per character.  The                results above, which resulted from processing an image                file compressed with JPEG, indicate that the file is                extremely dense in information--essentially random.                Hence, compression of the file is unlikely to reduce                its size.  By contrast, the C source code of the                program has entropy of about 4.9 bits per character,                indicating that optimal compression of the file would                reduce its size by 38%.  [Hamming, pp.&nbsp;104-108]<p><dt><b>Chi-square Test</b><dd>The chi-square test is the most commonly used test for                the randomness of data, and is extremely sensitive to                errors in pseudorandom sequence generators.  The                chi-square distribution is calculated for the stream                of bytes in the file and expressed as an absolute                number and a percentage which indicates how frequently                a truly random sequence would exceed the value                calculated.  We interpret the percentage as the degree                to which the sequence tested is suspected of being                non-random.  If the percentage is greater than 99% or                less than 1%, the sequence is almost certainly not                random.  If the percentage is between 99% and 95% or                between 1% and 5%, the sequence is suspect.                Percentages between 90% and 95% and 5% and 10%                indicate the sequence is "almost suspect".  Note that                our JPEG file, while very dense in information, is far                from random as revealed by the chi-square test.<p>                Applying this test to the output of various                pseudorandom sequence generators is interesting.  The                low-order 8 bits returned by the standard Unix                <code>rand()</code> function, for example, yields:<p><blockquote>                  Chi square distribution for 500000 samples is 0.01,                  and randomly would exceed this value 99.99 percent                  of the times.</blockquote><p>                While an improved generator [Park &amp; Miller] reports:<p><blockquote>                  Chi square distribution for 500000 samples is                  212.53, and randomly would exceed this value 95.00                  percent of the times.</blockquote><p>                Thus, the standard Unix generator (or at least the                low-order bytes it returns) is unacceptably                non-random, while the improved generator is much                better but still sufficiently non-random to cause                concern for demanding applications.  Contrast both of                these software generators with the chi-square result                of a genuine random sequence created by <a href="/hotbits/">timing                radioactive decay events</a>.<p><blockquote>                  Chi square distribution for 32768 samples is 237.05,                  and randomly would exceed this value 75.00 percent                  of the times.</blockquote><p>                See [Knuth, pp.&nbsp;35-40] for more information on the                chi-square test.  An interactive                <a href="/rpkp/experiments/analysis/chiCalc.html">chi-square                calculator</a> is available at this site.<p><dt><b>Arithmetic Mean</b><dd>This is simply the result of summing the all the bytes                (bits if the <b>-b</b> option is specified) in the file                and dividing by the file length.  If the data are                close to random, this should be about 127.5 (0.5                for <b>-b</b> option output).  If the                mean departs from this value, the values are                consistently high or low.<p><dt><b>Monte Carlo Value for Pi</b><dd>Each successive sequence of six bytes is used as 24 bit X and Y                co-ordinates within a square.  If the distance of the                randomly-generated point is less than the radius of a                circle inscribed within the square, the six-byte                sequence is considered a "hit".  The percentage of                hits can be used to calculate the value of Pi.  For                very large streams (this approximation converges very                slowly), the value will approach the correct value of                Pi if the sequence is close to random.  A 32768 byte                file created by radioactive decay yielded:<p><blockquote>                  Monte Carlo value for Pi is 3.139648438 (error 0.06                  percent).</blockquote><p><dt><b>Serial Correlation Coefficient</b><dd>This quantity measures the extent to which each byte in the file                depends upon the previous byte.  For random sequences,                this value (which can be positive or negative) will,                of course, be close to zero.  A non-random byte stream                such as a C program will yield a serial correlation                coefficient on the order of 0.5.  Wildly predictable                data such as uncompressed bitmaps will exhibit serial                correlation coefficients approaching 1.  See [Knuth,                pp.&nbsp;64-65] for more details.</dl><p><h3>OPTIONS</h3><dl compact><dt><b>-b</b>  <dd>The input is treated as a stream of bits rather                than of 8-bit bytes.  Statistics reported reflect                the properties of the bitstream.<p><dt><b>-c</b>  <dd>Print a table of the number of occurrences of               each possible byte (or bit, if the <b>-b</b> option               is also specified) value, and the fraction of the               overall file made up by that value.  Printable               characters in the ISO 8859-1 Latin1 character set               are shown along with their decimal byte values.               In non-terse output mode, values with zero               occurrences are not printed.<p><dt><b>-f</b>  <dd>Fold upper case letters to lower case before               computing statistics.  Folding is done based on the               ISO 8859-1 Latin1 character set, with accented               letters correctly processed.<p><dt><b>-t</b>  <dd>Terse mode: output is written in Comma               Separated Value (CSV) format, suitable for               loading into a spreadsheet and easily read               by any programming language.  See               <a href="#Terse">Terse Mode Output Format</a>               below for additional details.<p><dt><b>-u</b>  <dd>Print how-to-call information.</dl><h3>FILES</h3>    If no <i>infile</i> is specified, <b>ent</b> obtains its input    from standard input.  Output is always written to standard    output.<h3><a name="Terse">TERSE MODE OUTPUT FORMAT</a></h3>Terse mode is selected by specifying the <b>-t</b> optionon the command line.  Terse mode output is written inComma Separated Value (CSV) format, which can be directlyloaded into most spreadsheet programs and is easily readby any programming language.  Each record in the CSVfile begins with a record type field, which identifiesthe content of the following fields.  If the <b>-c</b>option is not specified, the terse mode output willconsist of two records, as follows:<pre>0,File-bytes,Entropy,Chi-square,Mean,Monte-Carlo-Pi,Serial-Correlation1,<em>file_length</em>,<em>entropy</em>,<em>chi_square</em>,<em>mean</em>,<em>Pi_value</em>,<em>correlation</em></pre>where the italicised values in the type 1 record are thenumerical values for the quantities named in the type 0column title record.  If the <b>-b</b> option is specified, the secondfield of the type 0 record will be "<tt>File-bits</tt>", andthe <em>file_length</em> field in type 1 record will be givenin bits instead of bytes.  If the <b>-c</b> option is specified,additional records are appended to the terse mode output whichcontain the character counts:<pre>2,Value,Occurrences,Fraction3,<em>v</em>,<em>count</em>,<em>fraction</em>. . .</pre>If the <b>-b</b> option is specified, only two type 3 records willappear for the two bit values <em>v</em>=0 and <em>v</em>=1.Otherwise, 256 type 3 records are included, one for eachpossible byte value.  The second field of a type 3 recordindicates how many bytes (or bits) of value <em>v</em>appear in the input, and <em>fraction</em> gives the decimalfraction of the file which has value <em>v</em> (which isequal to the <em>count</em> value of this record divided bythe <em>file_length</em> field in the type 1 record).<h3>BUGS</h3>    Note that the "optimal compression" shown for the file is    computed from the byte- or bit-stream entropy and thus    reflects compressibility based on a reading frame of    the chosen width (8-bit bytes or individual bits if the    <b>-b</b> option is specified).  Algorithms which use    a larger reading frame, such as the Lempel-Ziv [Lempel&nbsp;&amp;&nbsp;Ziv]    algorithm, may achieve greater compression if the file    contains repeated sequences of multiple bytes.<h2><a href="random.zip"><img src="/images/icons/file.gif" alt="" align=middle width=40 height=40></a> <a href="random.zip">Download random.zip</a> (Zipped archive)</h2>    The program is provided as <a href="random.zip">random.zip</a>,    a <a href="http://www.pkware.com/">Zipped</a> archive    containing an ready-to-run MS-DOS executable    program, <code>ent.exe</code> (compiled using Microsoft Visual    C++ 1.52, creating    a 16-bit MS-DOS file which does not require Windows to    execute), and in source code form along with a    <code>Makefile</code> to build the program under Unix.<h3>SEE ALSO</h3><dl compact><dt>&nbsp;    <dd><cite><a href="/rpkp/experiments/statistics.html">Introduction              to Probability and Statistics</a></cite><p><dt>[Hamming] <dd>Hamming, Richard W.              <cite> Coding and Information Theory</cite>.              Englewood Cliffs NJ: Prentice-Hall, 1980.<p><dt>[Knuth]   <dd>Knuth, Donald E.              <cite>              <a href="http://www.amazon.com/exec/obidos/ASIN/0201896842/fourmilabwwwfour" target="Amazon_Fourmilab">              The Art of Computer Programming,              Volume 2 / Seminumerical Algorithms</a></cite>.              Reading MA: Addison-Wesley, 1969.              ISBN 0-201-89684-2.<p><dt>[Lempel&nbsp;&amp;&nbsp;Ziv]                <dd>Ziv J. and A. Lempel.                "A Universal Algorithm for Sequential Data Compression".                <cite>IEEE Transactions on Information Theory</cite>                <b>23</b>, 3, pp.&nbsp;337-343.<p><dt>[Park &amp; Miller] <dd>Park, Stephen K. and Keith W. Miller.                "Random Number Generators: Good Ones Are Hard to Find".                <cite>Communications of the ACM</cite>, October 1988, p.&nbsp;1192.</dl><h3>COPYING</h3><blockquote>     This software is in the public domain.  Permission to use, copy,     modify, and distribute this software and its documentation for     any purpose and without fee is hereby granted, without any     conditions or restrictions.  This software is provided "as is"     without express or implied warranty.</blockquote><hr><address><a href="/">by John Walker</a><br>October 20th, 1998</address></body></html>
ent.html - 源码说明

本页面展示了「这是一个经典的伪随机数生成程序,特别方便和实用」中的 ent.html 源码文件，采用 HTML 编程语言编写，共 339 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫下载站收录了大量与伪随机相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?