📄 chap12.htm
字号:
<h3>Analysis of open-address hashing</h3><P>
<a name="07d3_1469">Our analysis of open addressing, like our analysis of chaining, is expressed in terms of the load factor <img src="../images/alpha12.gif"> of the hash table, as <I>n</I> and <I>m</I> go to infinity. Recall that if <I>n</I> elements are stored in a table with <I>m</I> slots, the average number of elements per slot is <img src="../images/alpha12.gif"><I> = </I>n/m<I>. Of course, with open addressing, we have at most one element per slot, and thus </I>n<I> <img src="../images/lteq12.gif"> </I>m<I>, which implies <img src="../images/alpha12.gif"></I> <img src="../images/lteq12.gif"> 1.<P>
We assume that uniform hashing is used. In this idealized scheme, the probe sequence <img src="../images/lftwdchv.gif"><I>h</I>(<I>k, </I>0), <I>h</I>(k, 1), . . . , <I>h(k, m</I> - 1)<img src="../images/wdrtchv.gif"> for each key <I>k</I> is equally likely to be any permutation on <img src="../images/lftwdchv.gif">0, 1, . . . , <I>m</I> - 1<img src="../images/wdrtchv.gif">. That is, each possible probe sequence is equally likely to be used as the probe sequence for an insertion or a search. Of course, a given key has a unique fixed probe sequence associated with it; what is meant here is that, considering the probability distribution on the space of keys and the operation of the hash function on the keys, each possible probe sequence is equally likely.<P>
We now analyze the expected number of probes for hashing with open addressing under the assumption of uniform hashing, beginning with an analysis of the number of probes made in an unsuccessful search.<P>
<a name="07d3_146a">Theorem 12.5<a name="07d3_146a"><P>
Given an open-address hash table with load factor <img src="../images/alpha12.gif"><I> = </I>n/m<I> < 1, the expected number of probes in an unsuccessful search is at most 1/(1 - <img src="../images/alpha12.gif"></I>), assuming uniform hashing.<P>
<I><B>Proof </I></B>In an unsuccessful search, every probe but the last accesses an occupied slot that does not contain the desired key, and the last slot probed is empty. Let us define<P>
<I>p<SUB>i</I></SUB> = Pr {exactly <I>i</I> probes access occupied slots}<P>
for <I>i</I> = 0, 1, 2, . . . . For <I>i</I> > <I>n</I>, we have <I>p<SUB>i</I></SUB> = 0, since we can find at most <I>n </I>slots already occupied. Thus, the expected number of probes is<P>
<img src="237_a.gif"><P>
<h4><a name="07d3_146b">(12.6)<a name="07d3_146b"></sub></sup></h4><P>
To evaluate equation (12.6), we define<P>
<I>q<SUB>i</I></SUB> = Pr {at least <I>i</I> probes access occupied slots}<P>
for <I>i</I> = 0, 1, 2, . . . . We can then use identity (6.28):<P>
<img src="237_b.gif"><P>
What is the value of <I>q<SUB>i</I></SUB> for <I>i</I> <img src="../images/gteq.gif"> 1? The probability that the first probe accesses an occupied slot is <I>n/m</I>; thus,<P>
<img src="238_a.gif"><P>
With uniform hashing, a second probe, if necessary, is to one of the remaining <I>m</I> - 1 unprobed slots, <I>n</I> - 1 of which are occupied. We make a second probe only if the first probe accesses an occupied slot; thus,<P>
<img src="238_b.gif"><P>
In general, the <I>i</I>th probe is made only if the first <I>i </I>- 1 probes access occupied slots, and the slot probed is equally likely to be any of the remaining <I>m</I> - <I>i</I> + 1 slots, <I>n</I> - <I>i</I> + 1 of which are occupied. Thus,<P>
<img src="238_c.gif"><P>
for <I>i</I> = 1, 2, . . . , <I>n</I>, since (<I>n</I> - <I>j</I>) / (<I>m</I> - <I>j</I>) <img src="../images/lteq12.gif"> <I>n </I>/ <I>m</I> if <I>n</I> <img src="../images/lteq12.gif"> <I>m</I> and <I>j</I> <img src="../images/gteq.gif"> 0. After <I>n</I> probes, all <I>n</I> occupied slots have been seen and will not be probed again, and thus <I>q<SUB>i</I></SUB> = 0 for <I>i</I> <img src="../images/gteq.gif"> <I>n</I>.<P>
We are now ready to evaluate equation (12.6). Given the assumption that <img src="../images/alpha12.gif"><I></I> < 1, the average number of probes in an unsuccessful search is<P>
<img src="238_d.gif"><P>
<h4><a name="07d3_146c">(12.7)<a name="07d3_146c"></sub></sup></h4><P>
Equation (12.7) has an intuitive interpretation: one probe is always made, with probability approximately <img src="../images/alpha12.gif"> a second probe is needed, with probability approximately <img src="../images/alpha12.gif"><I><SUP></I>2</SUP> a third probe is needed, and so on. <P>
If <img src="../images/alpha12.gif"><I></I> is a constant, Theorem 12.5 predicts that an unsuccessful search runs in <I>O</I>(1) time. For example, if the hash table is half full, the average number of probes in an unsuccessful search is 1/(1 - .5) = 2. If it is 90 percent full, the average number of probes is 1/(1 - .9) = 10.<P>
Theorem 12.5 gives us the performance of the <FONT FACE="Courier New" SIZE=2>HASH</FONT>-<FONT FACE="Courier New" SIZE=2>INSERT</FONT> procedure almost immediately.<P>
<a name="07d3_146d">Corollary 12.6<a name="07d3_146d"><P>
Inserting an element into an open-address hash table with load factor <img src="../images/alpha12.gif"><I></I> requires at most 1/(1 - <img src="../images/alpha12.gif"><I></I>) probes on average, assuming uniform hashing.<P>
<I><B>Proof</I></B> An element is inserted only if there is room in the table, and thus <img src="../images/alpha12.gif"><I></I> < 1. Inserting a key requires an unsuccessful search followed by placement of the key in the first empty slot found. Thus, the expected number of probes is 1/(1 - <img src="../images/alpha12.gif"><I></I>). <P>
Computing the expected number of probes for a successful search requires a little more work.<P>
<a name="07d3_146e">Theorem 12.7<a name="07d3_146e"><P>
Given an open-address hash table with load factor <img src="../images/alpha12.gif"><I></I> < 1, the expected number of probes in a successful search is at most<P>
<img src="239_a.gif"><P>
assuming uniform hashing and assuming that each key in the table is equally likely to be searched for.<P>
<I><B>Proof</I></B> A search for a key <I>k</I> follows the same probe sequence as was followed when the element with key <I>k</I> was inserted. By Corollary 12.6, if <I>k</I> was the (<I>i</I> + 1)st key inserted into the hash table, the expected number of probes made in a search for <I>k</I> is at most 1 / (1 - <I>i/m</I>) = <I>m/</I>(<I>m</I> - <I>i</I>). Averaging over all <I>n</I> keys in the hash table gives us the average number of probes in a successful search:<P>
<img src="239_b.gif"><P>
where <I>H<SUB>i</I></SUB> = <img src="239_c.gif"> is the <I>i</I>th harmonic number (as defined in equation (3.5)). Using the bounds ln <I>i</I> <img src="../images/lteq12.gif"> <I>H<SUB>i</I></SUB> <img src="../images/lteq12.gif"> ln <I>i</I> + 1 from equations (3.11)and (3.12), we obtain<P>
<img src="239_d.gif"><P>
for a bound on the expected number of probes in a successful search. <P>
If the hash table is half full, the expected number of probes is less than 3.387. If the hash table is 90 percent full, the expected number of probes is less than 3.670.<P>
<P>
<h2><a name="07d4_1471">Exercises<a name="07d4_1471"></h2><P>
<a name="07d4_1472">12.4-1<a name="07d4_1472"><P>
<a name="07d4_146a">Consider inserting the keys 10, 22, 31, 4, 15, 28, 17, 88, 59 into a hash table of length <I>m</I> = 11 using open addressing with the primary hash function <I>h</I>'(<I>k</I>) = <I>k</I> mod <I>m</I>. Illustrate the result of inserting these keys using linear probing, using quadratic probing with <I>c</I><SUB>1</SUB> = 1 and <I>c</I><SUB>2</SUB> = 3, and using double hashing with <I>h</I><SUB>2</SUB>(<I>k</I>) = 1 + (<I>k</I> mod (<I>m</I> - 1)).<P>
<a name="07d4_1473">12.4-2<a name="07d4_1473"><P>
<a name="07d4_146b"><a name="07d4_146c"><a name="07d4_146d">Write pseudocode for <FONT FACE="Courier New" SIZE=2>HASH</FONT>-<FONT FACE="Courier New" SIZE=2>DELETE</FONT> as outlined in the text, and modify <FONT FACE="Courier New" SIZE=2>HASH</FONT>-<FONT FACE="Courier New" SIZE=2>INSERT</FONT> and <FONT FACE="Courier New" SIZE=2>HASH</FONT>-<FONT FACE="Courier New" SIZE=2>SEARCH</FONT> to incorporate the special value <FONT FACE="Courier New" SIZE=2>DELETED</FONT>.<P>
<a name="07d4_1474">12.4-3<a name="07d4_1474"><P>
<a name="07d4_146e">Suppose that we use double hashing to resolve collisions; that is, we use the hash function <I>h</I>(<I>k</I>, <I>i</I>) = (<I>h</I><SUB>1</SUB>(<I>k</I>) + <I>ih</I><SUB>2</SUB>(<I>k</I>)) mod <I>m</I>. Show that the probe sequence <img src="../images/lftwdchv.gif"><I>h</I>(<I>k</I>, 0), <I>h</I>(<I>k</I>, 1), . . . , <I>h</I>(<I>k</I>, <I>m</I> - 1)<img src="../images/wdrtchv.gif"> is a permutation of the slot sequence <img src="../images/lftwdchv.gif">0, 1, . . . , <I>m </I>- 1<img src="../images/wdrtchv.gif"> if and only if <I>h</I><SUB>2</SUB>(<I>k</I>) is relatively prime to <I>m</I>. (<I>Hint</I>: See Chapter 33.)<P>
<a name="07d4_1475">12.4-4<a name="07d4_1475"><P>
Consider an open-address hash table with uniform hashing and a load factor <img src="../images/alpha12.gif"><I></I> = 1/2. What is the expected number of probes in an unsuccessful search? What is the expected number of probes in a successful search? Repeat these calculations for the load factors 3/4 and 7/8.<P>
<a name="07d4_1476">12.4-5<a name="07d4_1476"><P>
Suppose that we insert <I>n</I> keys into a hash table of size <I>m</I> using open addressing and uniform hashing. Let <I>p</I>(<I>n</I>, <I>m</I>) be the probability that no collisions occur. Show that <I>p</I>(<I>n</I>, <I>m</I>) <img src="../images/lteq12.gif"> <I>e</I><SUP>-<I>n</I>(<I>n </I>- 1)/2<I>m</I></SUP>. (<I>Hint</I>: See equation (2.7).) Argue that when <I>n</I> exceeds <img src="240_a.gif">, the probability of avoiding collisions goes rapidly to zero.<P>
<a name="07d4_1477">12.4-6<a name="07d4_1477"><P>
<a name="07d4_146f"><a name="07d4_1470">The bound on the harmonic series can be improved to<P>
<img src="240_b.gif"><P>
<h4><a name="07d4_1478">(12.8)<a name="07d4_1478"></sub></sup></h4><P>
where <img src="../images/gamma14.gif"><I></I> = 0.5772156649 . . . is known as <I><B>Euler's constant</I></B> and <img src="../images/memof12.gif"> satisfies 0 < <img src="../images/memof12.gif"> < 1. (See Knuth [121] for a derivation.) How does this improved approximation for the harmonic series affect the statement and proof of Theorem 12.7?<P>
<a name="07d4_1479">12.4-7<a name="07d4_1479"><P>
Consider an open-address hash table with a load factor <img src="../images/alpha12.gif"><I></I>. Find the nonzero value <img src="../images/alpha12.gif"> for which the expected number of probes in an unsuccessful search equals twice the expected number of probes in a successful search. Use the estimate (1/<img src="../images/alpha12.gif"><I></I>) ln(1/(1 - <img src="../images/alpha12.gif"><I></I>)) for the number of probes required for a successful search.<P>
<P>
<P>
<h1><a name="07d5_147e">Problems<a name="07d5_147e"></h1><P>
<a name="07d5_147f">12-1 Longest-probe bound for hashing<a name="07d5_147f"><P>
<a name="07d5_1471"><a name="07d5_1472">A hash table of size <I>m</I> is used to store <I>n</I> items, with <I>n</I> <img src="../images/lteq12.gif"> <I>m</I>/2. Open addressing is used for collision resolution.<P>
<I><B>a.</I></B> Assuming uniform hashing, show that for <I>i</I> = 1, 2, . . . , <I>n</I>, the probability that the <I>i</I>th insertion requires strictly more than <I>k</I
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -