📄 chap12.htm

📁 程序设计的艺术,此书是英文版的,经久不衰,非常的经典
💻 HTM
📖 第 1 页 / 共 5 页
字号:
Show that if we restrict each component <I>a<SUB>i</I></SUB> of <I>a</I> in equation (12.3) to be nonzero, then the set <img src="232_b.gif"> as defined in equation (12.4) is not universal. (<I>Hint:</I> Consider the keys <I>x</I> = 0 and <I>y</I> = 1.)<P>
<P>


<P>







<h1><a name="07cf_1460">12.4 Open addressing<a name="07cf_1460"></h1><P>
<a name="07cf_1456"><a name="07cf_1457"><a name="07cf_1458">In <I><B>open addressing</I></B>, all elements are stored in the hash table itself. That is, each table entry contains either an element of the dynamic set or <FONT FACE="Courier New" SIZE=2>NIL</FONT>. When searching for an element, we systematically examine table slots until the desired element is found or it is clear that the element is not in the table. There are no lists and no elements stored outside the table, as there are in chaining. Thus, in open addressing, the hash table can "fill up" so that no further insertions can be made; the load factor <img src="../images/alpha12.gif"> can never exceed 1.<P>
Of course, we could store the linked lists for chaining inside the hash table, in the otherwise unused hash-table slots (see Exercise 12.2-5), but the advantage of open addressing is that it avoids pointers altogether. Instead of following pointers, we <I>compute</I> the sequence of slots to be examined. The extra memory freed by not storing pointers provides the hash table with a larger number of slots for the same amount of memory, potentially yielding fewer collisions and faster retrieval.<P>
<a name="07cf_1459"><a name="07cf_145a"><a name="07cf_145b">To perform insertion using open addressing, we successively examine, or <I><B>probe</I></B>, the hash table until we find an empty slot in which to put the key. Instead of being fixed in the order 0, 1, . . . , <I>m</I> - 1 (which requires <img src="../images/bound.gif">(<I>n</I>) search time), the sequence of positions probed <I>depends upon the key being inserted.</I> To determine which slots to probe, we extend the hash function to include the probe number (starting from 0) as a second input. Thus, the hash function becomes<P>
<pre><I>h</I>:<I>U </I>X {0, 1, . . . , <I>m</I> -1} <img src="../images/arrow12.gif"> {0, 1, . . . , <I>m</I> -1} .</sub></sup></pre><P>
With open addressing, we require that for every key <I>k</I>, the <I><B>probe sequence</I></B><P>
<pre><img src="../images/lftwdchv.gif"><I>h</I>(<I>k</I>, 0), <I>h</I>(<I>k</I>, 1), . . . , <I>h</I>(<I>k</I>, <I>m</I> - 1)<img src="../images/wdrtchv.gif"></sub></sup></pre><P>
be a permutation of <img src="../images/lftwdchv.gif">0, 1, . . . , <I>m</I> - 1<img src="../images/wdrtchv.gif">, so that every hash-table position is eventually considered as a slot for a new key as the table fills up. In the following pseudocode, we assume that the elements in the hash table <I>T</I> are keys with no satellite information; the key <I>k</I> is identical to the element containing key <I>k</I>. Each slot contains either a key or <FONT FACE="Courier New" SIZE=2>NIL</FONT> (if the slot is empty).<P>
<pre><a name="07cf_145c">HASH-INSERT(<I>T</I>,<I>k</I>)</sub></sup></pre><P>
<pre>1<I>  i</I> <img src="../images/arrlt12.gif"> 0</sub></sup></pre><P>
<pre>2<B>   repeat</B> <I>j</I> <img src="../images/arrlt12.gif"> <I>h</I>(<I>k</I>,<I>i</I>)</sub></sup></pre><P>
<pre>3<B>          if</B> <I>T</I>[<I>j</I>] = NIL</sub></sup></pre><P>
<pre>4<B>             then</B> <I>T</I>[<I>j</I>] <img src="../images/arrlt12.gif"> <I>k</I></sub></sup></pre><P>
<pre>5                  <B>return</B> <I>j</I></sub></sup></pre><P>
<pre>6             <B>else</B> <I>i</I> <img src="../images/arrlt12.gif"> <I>i</I> + 1</sub></sup></pre><P>
<pre>7<B>  until</B> <I>i</I> = <I>m</I></sub></sup></pre><P>
<pre>8<B> error</B> "hash table overflow"</sub></sup></pre><P>
The algorithm for searching for key <I>k</I> probes the same sequence of slots that the insertion algorithm examined when key <I>k</I> was inserted. Therefore, the search can terminate (unsuccessfully) when it finds an empty slot, since <I>k</I> would have been inserted there and not later in its probe sequence. (Note that this argument assumes that keys are not deleted from the hash table.) The procedure <FONT FACE="Courier New" SIZE=2>HASH</FONT>-<FONT FACE="Courier New" SIZE=2>SEARCH</FONT> takes as input a hash table <I>T</I> and a key <I>k</I>, returning <I>j</I> if slot <I>j</I> is found to contain key <I>k</I>, or <FONT FACE="Times New Roman" SIZE=1>NIL</FONT> if key <I>k</I> is not present in table <I>T</I>.<P>
<pre><a name="07cf_145d">HASH-SEARCH(<I>T</I>, <I>k</I>)</sub></sup></pre><P>
<pre>1 <I> i</I> <img src="../images/arrlt12.gif"> 0</sub></sup></pre><P>
<pre>2   <B>repeat</B> <I>j</I> <img src="../images/arrlt12.gif"> <I>h</I>(<I>k</I>, <I>i</I>)</sub></sup></pre><P>
<pre>3          <B>if </B><I>T</I>[<I>j</I>]= <I>j</I></sub></sup></pre><P>
<pre>4             <B>then</B> <B>return</B> <I>j</I></sub></sup></pre><P>
<pre>5           <I>i</I> <img src="../images/arrlt12.gif"> <I>i</I> + 1</sub></sup></pre><P>
<pre>6<B>    until</B> <I>T</I>[<I>j</I>] = NIL or <I>i</I> = <I>m</I></sub></sup></pre><P>
<pre>7<B>  return</B> NIL</sub></sup></pre><P>
<a name="07cf_145e">Deletion from an open-address hash table is difficult. When we delete a key from slot <I>i</I>, we cannot simply mark that slot as empty by storing <FONT FACE="Courier New" SIZE=2>NIL</FONT> in it. Doing so might make it impossible to retrieve any key <I>k</I> during whose insertion we had probed slot <I>i</I> and found it occupied. One solution is to mark the slot by storing in it the special value <FONT FACE="Courier New" SIZE=2>DELETED</FONT> instead of <FONT FACE="Courier New" SIZE=2>NIL</FONT>. We would then modify the procedure <FONT FACE="Courier New" SIZE=2>HASH</FONT>-<FONT FACE="Courier New" SIZE=2>SEARCH</FONT> so that it keeps on looking when it sees the value <FONT FACE="Courier New" SIZE=2>DELETED</FONT>, while <FONT FACE="Courier New" SIZE=2>HASH</FONT>-<FONT FACE="Courier New" SIZE=2>INSERT</FONT> would treat such a slot as if it were empty so that a new key can be inserted. When we do this, though, the search times are no longer dependent on the load factor <img src="../images/alpha12.gif">, and for this reason chaining is more commonly selected as a collision resolution technique when keys must be deleted.<P>
<a name="07cf_145f">In our analysis, we make the assumption of <I><B>uniform hashing:</I></B> we assume that each key considered is equally likely to have any of the <I>m</I>! permutations of {0, 1, . . . , <I>m</I> - 1} as its probe sequence. Uniform hashing generalizes the notion of simple uniform hashing defined earlier to the situation in which the hash function produces not just a single number, but a whole probe sequence. True uniform hashing is difficult to implement, however, and in practice suitable approximations (such as double hashing, defined below) are used.<P>
Three techniques are commonly used to compute the probe sequences required for open addressing: linear probing, quadratic probing, and double hashing. These techniques all guarantee that <img src="../images/lftwdchv.gif"><I>h</I>(<I>k</I>, 1), <I>h</I>(<I>k</I>, 2), . . . , <I>h</I>(<I>k</I>, <I>m</I>)<img src="../images/wdrtchv.gif"> is a permutation of <img src="../images/lftwdchv.gif">0, 1, . . . , <I>m</I> - 1<img src="../images/wdrtchv.gif"> for each key <I>k</I>. None of these techniques fulfills the assumption of uniform hashing, however, since none of them is capable of generating more than <I>m</I><SUP>2</SUP> different probe sequences (instead of the <I>m</I>! that uniform hashing requires). Double hashing has the greatest number of probe sequences and, as one might expect, seems to give the best results.<P>





<h3>Linear probing</h3><P>
<a name="07d0_1460"><a name="07d0_1461">Given an ordinary hash function <I>h</I>': <I>U</I> <img src="../images/arrow12.gif"> {0, 1, . . . , <I>m</I> - 1}, the method of <I><B>linear probing</I></B> uses the hash function<P>
<pre><I>h</I>(<I>k</I>,<I>i</I>) = (<I>h</I>'(<I>k</I>) + <I>i</I>) mod <I>m</I></sub></sup></pre><P>
for <I>i</I> = 0,1,...,<I>m</I> - 1. Given key <I>k</I>, the first slot probed is <I>T</I>[<I>h</I>'(<I>k</I>)]. We next probe slot <I>T</I>[<I>h</I>'(<I>k</I>) + 1], and so on up to slot <I>T</I>[<I>m</I> - 1]. Then we wrap around to slots <I>T</I>[0], <I>T</I>[1], . . . , until we finally probe slot <I>T</I>[<I>h</I>'(<I>k</I>) - 1]. Since the initial probe position determines the entire probe sequence, only <I>m</I> distinct probe sequences are used with linear probing.<P>
<a name="07d0_1462"><a name="07d0_1463">Linear probing is easy to implement, but it suffers from a problem known as <I><B>primary clustering</I></B>. Long runs of occupied slots build up, increasing the average search time. For example, if we have <I>n</I> = <I>m</I>/2 keys in the table, where every even-indexed slot is occupied and every odd-indexed slot is empty, then the average unsuccessful search takes 1.5 probes. If the first <I>n</I> = <I>m</I>/2 locations are the ones occupied, however, the average number of probes increases to about <I>n</I>/4 = <I>m</I>/8. Clusters are likely to arise, since if an empty slot is preceded by <I>i</I> full slots, then the probability that the empty slot is the next one filled is (<I>i</I> + 1)/<I>m</I>, compared with a probability of 1/<I>m </I>if the preceding slot was empty. Thus, runs of occupied slots tend to get longer, and linear probing is not a very good approximation to uniform hashing.<P>
<P>







<h3>Quadratic probing</h3><P>
<a name="07d1_1464"><a name="07d1_1465"><a name="07d1_1466"><I><B>Quadratic probing</I></B> uses a hash function of the form<P>
<pre><I>h</I>(<I>k</I>,<I>i</I>) = (<I>h</I>'(<I>k</I>) + <I>c</I><SUB>1</SUB><I>i</I> + <I>c</I><SUB>2</SUB><I>i</I><SUP>2</SUP>) mod <I>m</I>,</sub></sup></pre><P>
<h4><a name="07d1_1467">(12.5)<a name="07d1_1467"></sub></sup></h4><P>
where (as in linear probing) <I>h</I>' is an auxiliary hash function, <I>c</I><SUB>1</SUB> and <I>c</I><SUB>2</SUB> <img src="../images/noteq.gif"> 0 are auxiliary constants, and <I>i</I> = 0, 1, . . . , <I>m</I> - 1. The initial position probed is <I>T</I>[<I>h</I>'(<I>k</I>)]; later positions probed are offset by amounts that depend in a quadratic manner on the probe number <I>i</I>. This method works much better than linear probing, but to make full use of the hash table, the values of <I>c</I><SUB>1</SUB>, <I>c</I><SUB>2</SUB>, and <I>m</I> are constrained. Problem 12-4 shows one way to select these parameters. Also, if two keys have the same initial probe position, then their probe sequences are the same, since <I>h</I>(<I>k</I><SUB>1</SUB>, 0) = <I>h</I>(<I>k</I><SUB>2</SUB>, 0) implies <I>h</I>(<I>k</I><SUB>1</SUB>, <I>i</I>) = <I>h</I>(<I>k</I><SUB>2</SUB>, <I>i</I>). This leads to a milder form of clustering, called <I><B>secondary clustering</I></B><I>.</I> As in linear probing, the initial probe determines the entire sequence, so only <I>m</I> distinct probe sequences are used.<P>
<P>







<h3>Double hashing</h3><P>
<a name="07d2_1467"><a name="07d2_1468">Double hashing is one of the best methods available for open addressing because the permutations produced have many of the characteristics of randomly chosen permutations. <I><B>Double hashing</I></B> uses a hash function of the form<P>
<pre><I>h</I>(<I>k</I>, <I>i</I>) = (<I>h</I><SUB>1</SUB>(<I>k</I>) + <I>ih</I><SUB>2</SUB>(<I>k</I>)) mod <I>m</I>,</sub></sup></pre><P>
where <I>h</I><SUB>1</SUB> and <I>h</I><SUB>2</SUB> are auxiliary hash functions. The initial position probed is <I>T</I>[<I>h</I><SUB>1</SUB> (<I>k</I>)]; successive probe positions are offset from previous positions by the amount <I>h</I><SUB>2</SUB>(<I>k</I>), modulo <I>m</I>. Thus, unlike the case of linear or quadratic probing, the probe sequence here depends in two ways upon the key <I>k</I>, since the initial probe position, the offset, or both, may vary. Figure 12.5 gives an example of insertion by double hashing.<P>
<img src="236_a.gif"><P>
<h4><a name="07d2_1469">Figure 12.5 Insertion by double hashing. Here we have a hash table of size 13 with h<SUB>1</SUB><FONT FACE="Times New Roman" SIZE=2>(k)</FONT> = k mod 13 and h<SUB>2</SUB><FONT FACE="Times New Roman" SIZE=2>(k)</FONT> = 1 + (k mod 11). Since 14 <img src="../images/equiv10.gif"> 1 mod 13 and 14 <img src="../images/equiv10.gif"> 3 mod 11, the key 14 will be inserted into empty slot 9, after slots 1 and 5 have been examined and found to be already occupied.<a name="07d2_1469"></sub></sup></h4><P>
The value <I>h</I><SUB>2</SUB>(<I>k</I>) must be relatively prime to the hash-table size <I>m</I> for the entire hash table to be searched. Otherwise, if <I>m</I> and <I>h</I><SUB>2</SUB>(<I>k</I>) have greatest common divisor <I>d</I> &gt; 1 for some key <I>k</I>, then a search for key <I>k</I> would examine only (1/<I>d</I>)th of the hash table. (See Chapter 33.) A convenient way to ensure this condition is to let <I>m</I> be a power of 2 and to design <I>h</I><SUB>2</SUB> so that it always produces an odd number. Another way is to let <I>m</I> be prime and to design <I>h</I><SUB>2</SUB> so that it always returns a positive integer less than <I>m</I>. For example, we could choose <I>m</I> prime and let<P>
<pre><I>h</I><SUB>1</SUB>(<I>k</I>) = <I>k</I> mod <I>m</I> ,</sub></sup></pre><P>
<pre><I>h</I><SUB>2</SUB>(<I>k</I>) = 1 + (<I>k</I> mod <I>m</I>'),</sub></sup></pre><P>
where <I>m</I>' is chosen to be slightly less than <I>m</I> (say, <I>m</I> - 1 or <I>m</I> - 2). For example, if <I>k</I> = 123456 and <I>m</I> = 701, we have <I>h</I><SUB>1</SUB>(<I>k</I>) = 80 and <I>h</I><SUB>2</SUB>(<I>k</I>) = 257, so the first probe is to position 80, and then every 257th slot (modulo <I>m</I>) is examined until the key is found or every slot is examined.<P>
Double hashing represents an improvement over linear or quadratic probing in that <img src="../images/bound.gif">(<I>m</I><SUP>2</SUP>) probe sequences are used, rather than <img src="../images/bound.gif">(<I>m</I>), since each possible (<I>h</I><SUB>1</SUB> (<I>k</I>), <I>h</I><SUB>2</SUB>(<I>k</I>)) pair yields a distinct probe sequence, and as we vary the key, the initial probe position <I>h</I><SUB>1</SUB>(<I>k</I>) and the offset <I>h</I><SUB>2</SUB>(<I>k</I>) may vary independently. As a result, the performance of double hashing appears to be very close to the performance of the "ideal" scheme of uniform hashing.<P>
<P>
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -