📄 chap12.htm

📁 美国麻省理工的算法导论,英文版,看后程序员宏观的能力培养有好处
💻 HTM
📖 第 1 页 / 共 5 页
字号:
12 3 4 5 下一页
<HTML><HEAD>

<TITLE>Intro to Algorithms: CHAPTER 12: HASH TABLES</TITLE></HEAD><BODY BGCOLOR="#FFFFFF">


<a href="chap13.htm"><img align=right src="../../images/next.gif" alt="Next Chapter" border=0></A>
<a href="toc.htm"><img align=right src="../../images/toc.gif" alt="Return to Table of Contents" border=0></A>
<a href="chap11.htm"><img align=right src="../../images/prev.gif" alt="Previous Chapter" border=0></A>


<h1><a name="07c0_1428">CHAPTER 12: HASH TABLES<a name="07c0_1428"></h1><P>
<a name="07c0_1426"><a name="07c0_1427">Many applications require a dynamic set that supports only the dictionary operations <FONT FACE="Courier New" SIZE=2>INSERT</FONT>, <FONT FACE="Courier New" SIZE=2>SEARCH</FONT>, and <FONT FACE="Courier New" SIZE=2>DELETE</FONT>. For example, a compiler for a computer language maintains a symbol table, in which the keys of elements are arbitrary character strings that correspond to identifiers in the language. A hash table is an effective data structure for implementing dictionaries. Although searching for an element in a hash table can take as long as searching for an element in a linked list--<img src="../images/bound.gif">(<I>n</I>) time in the worst case--in practice, hashing performs extremely well. Under reasonable assumptions, the expected time to search for an element in a hash table is <I>O</I>(1).<P>
A hash table is a generalization of the simpler notion of an ordinary array. Directly addressing into an ordinary array makes effective use of our ability to examine an arbitrary position in an array in <I>O</I> (1) time. Section 12.1 discusses direct addressing in more detail. Direct addressing is applicable when we can afford to allocate an array that has one position for every possible key.<P>
When the number of keys actually stored is small relative to the total number of possible keys, hash tables become an effective alternative to directly addressing an array, since a hash table typically uses an array of size proportional to the number of keys actually stored. Instead of using the key as an array index directly, the array index is <I>computed</I> from the key. Section 12.2 presents the main ideas, and Section 12.3 describes how array indices can be computed from keys using hash functions. Several variations on the basic theme are presented and analyzed; the "bottom line" is that hashing is an extremely effective and practical technique: the basic dictionary operations require only <I>O</I> (1) time on the average.<P>





<h1><a name="07c2_1433">12.1 Direct-address tables<a name="07c2_1433"></h1><P>
<a name="07c2_1428"><a name="07c2_1429"><a name="07c2_142a"><a name="07c2_142b">Direct addressing is a simple technique that works well when the universe <I>U</I> of keys is reasonably small. Suppose that an application needs a dynamic set in which each element has a key drawn from the universe <I>U</I> = {0,1, . . . , m - 1}, where <I>m</I> is not too large. We shall assume that no two elements have the same key.<P>
<img src="220_a.gif"><P>
<h4><a name="07c2_1434">Figure 12.1 Implementing a dynamic set by a direct-address table T. Each key in the universe U = {0,1, . . . , 9} corresponds to an index in the table. The set K = {2, 3, 5, 8} of actual keys determines the slots in the table that contain pointers to elements. The other slots, heavily shaded, contain <FONT FACE="Courier New" SIZE=2>NIL<FONT FACE="Times New Roman" SIZE=2>.<a name="07c2_1434"></FONT></FONT></sub></sup></h4><P>
<a name="07c2_142c"><a name="07c2_142d"><a name="07c2_142e"><a name="07c2_142f">To represent the dynamic set, we use an array, or <I><B>direct-address table</I></B>, <I>T</I> [0 . . <I>m</I> - 1], in which each position, or <I><B>slot</I></B>, corresponds to a key in the universe <I>U</I>. Figure 12.1 illustrates the approach; slot <I>k</I> points to an element in the set with key <I>k</I>. If the set contains no element with key <I>k</I>, then <I>T</I>[<I>k</I>] = <FONT FACE="Courier New" SIZE=2>NIL.</FONT><P>
The dictionary operations are trivial to implement.<P>
<pre><a name="07c2_1430">DIRECT-ADDRESS-SEARCH(<I>T,k</I>)</sub></sup></pre><P>
<pre><B>return</B> <I>T</I>[<I>k</I>]</sub></sup></pre><P>
<pre></sub></sup></pre><P>
<pre><a name="07c2_1431">DIRECT-ADDRESS-INSERT(<I>T,x</I>)</sub></sup></pre><P>
<pre><I>T</I>[<I>key</I>[<I>x</I>]] <img src="../images/arrlt12.gif"> <I>x</I></sub></sup></pre><P>
<pre></sub></sup></pre><P>
<pre><a name="07c2_1432">DIRECT-ADDRESS-DELETE(<I>T,x</I>)</sub></sup></pre><P>
<pre><I>T</I>[<I>key</I>[<I>x</I>]] <img src="../images/arrlt12.gif"> NIL</sub></sup></pre><P>
Each of these operations is fast: only <I>O</I>(1) time is required.<P>
For some applications, the elements in the dynamic set can be stored in the direct-address table itself. That is, rather than storing an element<FONT FACE="CG Times (W1)" SIZE=2>'</FONT>s key and satellite data in an object external to the direct-address table, with a pointer from a slot in the table to the object, we can store the object in the slot itself, thus saving space. Moreover, it is often unnecessary to store the key field of the object, since if we have the index of an object in the table, we have its key. If keys are not stored, however, we must have some way to tell if the slot is empty.<P>





<h2><a name="07c3_1435">Exercises<a name="07c3_1435"></h2><P>
<a name="07c3_1436">12.1-1<a name="07c3_1436"><P>
Consider a dynamic set <I>S</I> that is represented by a direct-address table <I>T </I>of length <I>m</I>. Describe a procedure that finds the maximum element of <I>S</I>. What is the worst-case performance of your procedure?<P>
<a name="07c3_1437">12.1-2<a name="07c3_1437"><P>
<a name="07c3_1433"><a name="07c3_1434">A <I><B>bit vector</I></B> is simply an array of bits (0<FONT FACE="CG Times (W1)" SIZE=2>'</FONT>s and 1<FONT FACE="CG Times (W1)" SIZE=2>'</FONT>s). A bit vector of length <I>m</I> takes much less space than an array of <I>m </I>pointers. Describe how to use a bit vector to represent a dynamic set of distinct elements with no satellite data. Dictionary operations should run in <I>O</I>(1) time.<P>
<a name="07c3_1438">12.1-3<a name="07c3_1438"><P>
Suggest how to implement a direct-address table in which the keys of stored elements do not need to be distinct and the elements can have satellite data. All three dictionary operations (<FONT FACE="Courier New" SIZE=2>INSERT</FONT>, <FONT FACE="Courier New" SIZE=2>DELETE</FONT>, and <FONT FACE="Courier New" SIZE=2>SEARCH</FONT>) should run in <I>O</I>(1) time. (Don't forget that <FONT FACE="Courier New" SIZE=2>DELETE</FONT> takes as an argument a pointer to an object to be deleted, not a key.)<P>
<a name="07c3_1439">12.1-4<a name="07c3_1439"><P>
We wish to implement a dictionary by using direct addressing on a <I>huge </I>array. At the start, the array entries may contain garbage, and initializing the entire array is impractical because of its size. Describe a scheme for implementing a direct-address dictionary on a huge array. Each stored object should use <I>O</I>(1) space; the operations <FONT FACE="Courier New" SIZE=2>SEARCH</FONT>, <FONT FACE="Courier New" SIZE=2>INSERT</FONT>, and <FONT FACE="Courier New" SIZE=2>DELETE</FONT> should take <I>O</I>(1) time each; and the initialization of the data structure should take <I>O</I>(1) time. (<I>Hint:</I> Use an additional stack, whose size is the number of keys actually stored in the dictionary, to help determine whether a given entry in the huge array is valid or not.)<P>
<P>


<P>







<h1><a name="07c4_143b">12.2 Hash tables<a name="07c4_143b"></h1><P>
<a name="07c4_1435"><a name="07c4_1436"><a name="07c4_1437">The difficulty with direct addressing is obvious: if the universe <I>U</I> is large, storing a table <I>T</I> of size |<I>U</I>| may be impractical, or even impossible, given the memory available on a typical computer. Furthermore, the set <I>K</I> of keys <I>actually stored</I> may be so small relative to <I>U</I> that most of the space allocated for <I>T</I> would be wasted.<P>
When the set <I>K</I> of keys stored in a dictionary is much smaller than the universe <I>U</I> of all possible keys, a hash table requires much less storage than a direct-address table. Specifically, the storage requirements can be reduced to <img src="../images/bound.gif">(|<I>K</I>|<I></I>), even though searching for an element in the hash table still requires only <I>O</I>(1) time. (The only catch is that this bound is for the <I>average time</I>, whereas for direct addressing it holds for the <I>worst-case time</I>.)<P>
<img src="222_a.gif"><P>
<h4><a name="07c4_143c">Figure 12.2 Using a hash function h to map keys to hash-table slots. Keys k<SUB>2</SUB><FONT FACE="Times New Roman" SIZE=2> and k<SUB>5</SUB><FONT FACE="Times New Roman" SIZE=2> map to the same slot, so they collide.<a name="07c4_143c"></FONT></FONT></sub></sup></h4><P>
<a name="07c4_1438">With direct addressing, an element with key <I>k</I> is stored in slot <I>k</I>. With hashing, this element is stored in slot <I>h</I>(<I>k</I>); that is, a <I><B>hash function</I></B> <I>h</I> is used to compute the slot from the key <I>k</I>. Here <I>h</I> maps the universe <I>U</I> of keys into the slots of a <I><B>hash table</I></B> <I>T</I>[0 . . <I>m</I> - 1]:<P>
<pre><I>h: U</I> <img src="../images/arrow12.gif">{0,1, . . . , <I>m </I>- 1} .</sub></sup></pre><P>
<a name="07c4_1439">We say that an element with key <I>k</I> <I><B>hashes</I></B> to slot <I>h(k)</I>; we also say that <I>h(k)</I> is the <I><B>hash value</I></B> of key <I>k</I>. Figure 12.2 illustrates the basic idea. The point of the hash function is to reduce the range of array indices that need to be handled. Instead of |<I>U</I>| values, we need to handle only <I>m</I> values. Storage requirements are correspondingly reduced.<P>
<a name="07c4_143a">The fly in the ointment of this beautiful idea is that two keys may hash to the same slot--a <I><B>collision</I>.</B> Fortunately, there are effective techniques for resolving the conflict created by collisions.<P>
Of course, the ideal solution would be to avoid collisions altogether. We might try to achieve this goal by choosing a suitable hash function <I>h</I>. One idea is to make <I>h</I> appear to be &quot;random,&quot; thus avoiding collisions or at least minimizing their number. The very term &quot;to hash,&quot; evoking images of random mixing and chopping, captures the spirit of this approach. (Of course, a hash function <I>h</I> must be deterministic in that a given input <I>k</I> should always produce the same output <I>h(k)</I>.) Since |<I>U</I>| &gt; <I>m</I>, however, there must be two keys that have the same hash value; avoiding collisions altogether is therefore impossible. Thus, while a well-designed, &quot;random&quot;- looking hash function can minimize the number of collisions, we still need a method for resolving the collisions that do occur.<P>
The remainder of this section presents the simplest collision resolution technique, called chaining. Section 12.4 introduces an alternative method for resolving collisions, called open addressing.<P>
<img src="223_a.gif"><P>
<h4><a name="07c4_143d">Figure 12.3 Collision resolution by chaining. Each hash-table slot T[j] contains a linked list of all the keys whose hash value is j. For example, h(k<SUB>1</SUB><FONT FACE="Times New Roman" SIZE=2>) = h(k<SUB>4</SUB><FONT FACE="Times New Roman" SIZE=2>) and h(k<SUB>5</SUB><FONT FACE="Times New Roman" SIZE=2>) = h(k<SUB>2</SUB><FONT FACE="Times New Roman" SIZE=2>) = h(k<SUB>7</SUB><FONT FACE="Times New Roman" SIZE=2>).<a name="07c4_143d"></FONT></FONT></FONT></FONT></FONT></sub></sup></h4><P>





<h2>Collision resolution by chaining</h2><P>
<a name="07c5_143b"><a name="07c5_143c"><a name="07c5_143d">In <I><B>chaining</I></B>, we put all the elements that hash to the same slot in a linked list, as shown in Figure 12.3. Slot <I>j</I> contains a pointer to the head of the list of all stored elements that hash to <I>j</I>; if there are no such elements, slot <I>j</I> contains <FONT FACE="Courier New" SIZE=2>NIL</FONT>.<P>
The dictionary operations on a hash table <I>T</I> are easy to implement when collisions are resolved by chaining.<P>
<pre><a name="07c5_143e"><a name="07c5_143f">CHAINED-HASH-INSERT(<I>T,x</I>)</sub></sup></pre><P>
<pre>insert <I>x</I> at the head of list <I>T</I>[<I>h</I>(<I>key</I>[<I>x</I>])]</sub></sup></pre><P>
<pre></sub></sup></pre><P>
<pre><a name="07c5_1440"><a name="07c5_1441">CHAINED-HASH-SEARCH(<I>T,k</I>)</sub></sup></pre><P>
<pre>search for an element with key <I>k</I> in list <I>T</I>[<I>h</I>(<I>k</I>)]</sub></sup></pre><P>
<pre></sub></sup></pre><P>
<pre><a name="07c5_1442"><a name="07c5_1443">CHAINED-HASH-DELETE(<I>T,x</I>)</sub></sup></pre><P>
12 3 4 5 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -