📄 ch05_01.htm

📁 by Randal L. Schwartz and Tom Phoenix ISBN 0-596-00132-0 Third Edition, published July 2001. (See
💻 HTM
字号:
<html><head><title>Hashes (Learning Perl, 3rd Edition)</title><link rel="stylesheet" type="text/css" href="../style/style1.css" />

<meta name="DC.Creator" content="Randal L. Schwartz and Tom Phoenix" /><meta name="DC.Format" content="text/xml" scheme="MIME" /><meta name="DC.Language" content="en-US" /><meta name="DC.Publisher" content="O'Reilly &amp; Associates, Inc." /><meta name="DC.Source" scheme="ISBN" content="0596001320L" /><meta name="DC.Subject.Keyword" content="stuff" /><meta name="DC.Title" content="Learning Perl, 3rd Edition" /><meta name="DC.Type" content="Text.Monograph" />

</head><body bgcolor="#ffffff">

<img alt="Book Home" border="0" src="gifs/smbanner.gif" usemap="#banner-map" /><map name="banner-map"><area shape="rect" coords="1,-2,616,66" href="index.htm" alt="Learning Perl, 3rd Edition" /><area shape="rect" coords="629,-11,726,25" href="jobjects/fsearch.htm" alt="Search this book" /></map>

<div class="navbar"><table width="684" border="0"><tr><td align="left" valign="top" width="228"><a href="ch04_12.htm"><img alt="Previous" border="0" src="../gifs/txtpreva.gif" /></a></td><td align="center" valign="top" width="228"><a href="index.htm"></a></td><td align="right" valign="top" width="228"><a href="ch05_02.htm"><img alt="Next" border="0" src="../gifs/txtnexta.gif" /></a></td></tr></table></div>




<h1 class="chapter">Chapter 5. Hashes</h1>
<div class="htmltoc"><h4 class="tochead">Contents:</h4>
  <p> <a href="#lperl3-CHP-5-SECT-1">What Is a Hash?</a><br />
<a href="ch05_02.htm">Hash Element Access</a><br />
<a href="ch05_03.htm">Hash Functions</a><br />
<a href="ch05_04.htm">Typical Use of a Hash</a><br />
<a href="ch05_05.htm">Exercises</a><br /></p></div>

<p><a name="INDEX-371" /></a>In this
chapter, we will see one of Perl's features that makes Perl one
of the world's truly great programming
languages -- <em class="emphasis">hashes.</em><a href="#FOOTNOTE-121">[121]</a> Although hashes
are a powerful and useful feature, you may have used other powerful
languages for years without ever hearing of hashes. But you'll
use hashes in nearly every Perl program you'll write from now
on; they're that important.
</p><blockquote class="footnote"> <a name="FOOTNOTE-121" /></a><p>[121]In the
olden days, we called these "<a name="INDEX-372" /></a>associative
arrays." But the Perl community decided in about 1995 this was
too many letters to type and too many syllables to say, so we changed
the name to "hashes." </p> </blockquote>

<div class="sect1"><a name="lperl3-CHP-5-SECT-1" /></a>
<h2 class="sect1">5.1. What Is a Hash?</h2>

<p>A hash is a data structure, not unlike an array in that it can hold
any number of values and retrieve them at will. But instead of
indexing the values by <em class="emphasis">number</em>, as we did with
arrays, we'll look up the values by <em class="emphasis">name</em>.
That is, the <em class="firstterm">indices</em> (here, we'll call
them <em class="firstterm">keys</em><a name="INDEX-373" /></a>
<a name="INDEX-374" /></a>)
aren't numbers, but instead they are arbitrary unique strings
(see <a href="ch05_01.htm#lperl3-CHP-5-FIG-1">Figure 5-1</a>).
</p>

<a name="lperl3-CHP-5-FIG-1" /></a><div class="figure"><img height="220" alt="Figure 5-1" src="figs/lrnp_0501.gif" width="238" /></div><h4 class="objtitle">Figure 5-1. Hash keys and values</h4>

<p>The keys are <em class="emphasis">strings</em>, first of all, so instead
of getting element number <tt class="literal">3</tt> from an array,
we'll be accessing the hash element named
<tt class="literal">wilma</tt>.
</p>

<p>These keys are arbitrary strings -- you can use any string
expression for a hash key. And they are unique strings -- just as
there's only one array element numbered <tt class="literal">3</tt>,
there's only one hash element named <tt class="literal">wilma</tt>.
</p>

<p>Another way to think of a hash is that it's like a barrel of
data, where each piece of data has a tag attached. You can reach into
the barrel and pull out any tag and see what piece of data is
attached. But there's no "first" item in the
barrel; it's just a jumble. In an array, we'd start with
element <tt class="literal">0</tt>, then element <tt class="literal">1</tt>, then
element <tt class="literal">2</tt>, and so on. But in a hash, there's
no fixed order, no first element. It's just a collection of
<a name="INDEX-375" /></a>key-value
pairs.
</p>

<p>The keys and values are both arbitrary scalars, but the keys are
always converted to strings. So, if you used the numeric expression
<tt class="literal">50/20</tt> as the key,<a href="#FOOTNOTE-122">[122]</a> it would be turned into the
three-character string <tt class="literal">"2.5"</tt>, which is one of the
keys shown in the diagram above.
</p><blockquote class="footnote"> <a name="FOOTNOTE-122" /></a><p>[122]That's a
numeric expression, not the five-character string
<tt class="literal">"50/20"</tt>. If we used that five-character string as
a hash key, it would stay the same five-character string, of
course.</p> </blockquote>

<p>As usual, Perl's no-unnecessary-limits philosophy applies: a
hash may be of any size, from an empty hash with zero key-value
pairs, up to whatever fills up your memory.
</p>

<p>Some implementations of hashes (such as in the original
<i class="command">awk</i> language, from where Larry borrowed the idea)
slow down as the hashes get larger and larger. This is not the case
in Perl -- it has a good, efficient, scalable algorithm.<a href="#FOOTNOTE-123">[123]</a>
So, if a hash has only three key-value pairs, it's very quick
to "reach into the barrel" and pull out any one of those.
If the hash has three <em class="emphasis">million</em> key-value pairs,
it should be just about as quick to pull out any one of those. A big
hash is nothing to fear.
</p><blockquote class="footnote">
<a name="FOOTNOTE-123" /></a><p>[123]Technically, Perl rebuilds the hash table as needed for larger
hashes. In fact, the term "hashes" comes from the fact
that a hash table is used for implementing them.</p> </blockquote>

<p>It's worth mentioning again that the keys are always unique,
although the values may be duplicated. The values of a hash may be
all numbers, all strings, <tt class="literal">undef</tt> values, or a
mixture.<a href="#FOOTNOTE-124">[124]</a> But the keys are all arbitrary, unique strings.
</p><blockquote class="footnote"> <a name="FOOTNOTE-124" /></a><p>[124]Or, in fact, any scalar values, including
other scalar types than the ones we'll see in this book.</p>
</blockquote>

<a name="lperl3-CHP-5-SECT-1.1" /></a><div class="sect2">
<h3 class="sect2">5.1.1. Why Use a Hash?</h3>

<p>When you first hear about
<a name="INDEX-376" /></a>hashes, especially if you've lived
a long and productive life as a programmer using other languages that
don't have hashes, you may wonder why anyone would want one of
these strange beasts. Well, the general idea is that you'll
have one set of data "related to" another set of data.
For example, here are some hashes you might find in typical
applications of Perl:
</p>

<dl>
<dt><i>Given name, family name</i></dt>
<dd>
<a name="INDEX-377" /></a>The given name (first name) is the key,
and the family name is the value. This requires unique given names,
of course; if there were two people named <tt class="literal">randal</tt>,
this wouldn't work. With this hash, you can look up
anyone's given name, and find the corresponding family name. If
you use the key <tt class="literal">tom</tt>, you get the value
<tt class="literal">phoenix</tt>.
<p></p>
</dd>

</dl>

<dl>
<dt><i>Host name, IP address</i></dt>
<dd>
You may know that each computer on the Internet has both a host name
(like www.stonehenge.com) and an
<a name="INDEX-378" /></a>IP
address number (like 123.45.67.89). That's because machines
like working with the numbers, but we humans have an easier time
remembering the names. The host names are unique strings, so they can
be used to make this hash. With this hash, you could look up a host
name and find the corresponding IP address number.
<p></p>
</dd>

</dl>

<dl>
<dt><i>IP address, host name</i></dt>
<dd>
Or you could go in the opposite direction. We generally think of the
IP address as a number, but it can also be a unique string, so
it's suitable for use as a hash key. In this hash, we can look
up the IP address number to determine the corresponding host name.
Note that this is <em class="emphasis">not</em> the same hash as the
previous example: hashes are a one-way street, running from key to
value; there's no way to look up a value in a hash and find the
corresponding key! So these two are a <em class="emphasis">pair</em> of
hashes, one for storing IP addresses, one for host names. It's
easy enough to create one of these given the other, though, as
we'll see below.
<p></p>
</dd>

</dl>

<dl>
<dt><i>Word, count of number of times that word appears<a href="#FOOTNOTE-125">[125]</a></i></dt>
<dd>
<a name="INDEX-379" /></a>The
idea here is that you want to know how often each word appears in a
given document. Perhaps you're building an index to a number of
documents, so that when a user searches for <tt class="literal">fred</tt>,
you'll know that a certain document mentions
<tt class="literal">fred</tt> five times, another mentions
<tt class="literal">fred</tt> seven times, and yet another doesn't
mention <tt class="literal">fred</tt> at all -- so you'll know
which documents the user is likely to want. As the index-making
program reads through a given document, each time it sees a mention
of <tt class="literal">fred</tt>, it adds one to the value filed under the
key of <tt class="literal">fred</tt>. That is, if we had seen
<tt class="literal">fred</tt> twice already in this document, the value
would be <tt class="literal">2</tt>, but now we'll increment it to
<tt class="literal">3</tt>. If we had never seen <tt class="literal">fred</tt>
before, we'd change the value from <tt class="literal">undef</tt>
(the implicit, default value) to <tt class="literal">1</tt>.
<p></p>
</dd>

</dl>

<dl>
<dt><i>Username, number of disk blocks they are using [wasting] </i></dt>
<dd>
System administrators like this one: the usernames on a given system
are all unique strings, so they can be used as keys in a hash to look
up information about that user.
<p></p>
</dd>

</dl>

<dl>
<dt><i>Driver's license number, name</i></dt>
<dd>
There may be many, many people named John Smith, but we hope that
each one has a different driver's license number. That number
makes for a unique key, and the person's name is the value.
<p></p>
</dd>

</dl>

<p>So, yet another way to think of a hash is as a
<em class="emphasis">very</em> simple
<a name="INDEX-380" /></a>database, in which just one piece of data
may be filed under each key. In fact, if your task description
includes phrases like "finding duplicates,"
"unique," "cross-reference," or "lookup
table," it's likely that a hash will be useful in the
implementation.
</p>

</div>
</div>










<hr width="684" align="left" />
<div class="navbar"><table width="684" border="0"><tr><td align="left" valign="top" width="228"><a href="ch04_12.htm"><img alt="Previous" border="0" src="../gifs/txtpreva.gif" /></a></td><td align="center" valign="top" width="228"><a href="index.htm"><img alt="Home" border="0" src="../gifs/txthome.gif" /></a></td><td align="right" valign="top" width="228"><a href="ch05_02.htm"><img alt="Next" border="0" src="../gifs/txtnexta.gif" /></a></td></tr><tr><td align="left" valign="top" width="228">4.12. Exercises</td><td align="center" valign="top" width="228"><a href="index/index.htm"><img alt="Book Index" border="0" src="../gifs/index.gif" /></a></td><td align="right" valign="top" width="228">5.2. Hash Element Access</td></tr></table></div>
<hr width="684" align="left" />

<img alt="Library Navigation Links" border="0" src="../gifs/navbar.gif" usemap="#library-map" />
<p><p><font size="-1"><a href="copyrght.htm">Copyright &copy; 2002</a> O'Reilly &amp; Associates. All rights reserved.</font></p>

<map name="library-map"><area shape="rect" coords="1,0,85,94" href="../index.htm"><area shape="rect" coords="86,1,178,103" href="../lwp/index.htm"><area shape="rect" coords="180,0,265,103" href="../lperl/index.htm"><area shape="rect" coords="267,0,353,105" href="../perlnut/index.htm"><area shape="rect" coords="354,1,446,115" href="../prog/index.htm"><area shape="rect" coords="448,0,526,132" href="../tk/index.htm"><area shape="rect" coords="528,1,615,119" href="../cookbook/index.htm"><area shape="rect" coords="617,0,690,135" href="../pxml/index.htm"></map>

</body></html>
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -