s_has.htm
来自「Data Structure Ebook」· HTM 代码 · 共 243 行
HTM
243 行
<html>
<head>
<title>Hash Tables</title>
</head>
<body bgcolor="#ffffff">
<p align=right>
<a href="s_man.htm" tppabs="http://www.ee.uwa.edu.au/~plsd210/ds/niemann/s_man.htm" target="_top"><img src="c_man.gif" tppabs="http://www.ee.uwa.edu.au/~plsd210/ds/niemann/c_man.gif" width=74 height=19 border=0></a>
</p>
<h1>Hash Tables</h1>
Hash tables are a simple and effective method to implement dictionaries.
Average time to
search for an element is <b><i>O</i></b>(1),
while worst-case time is <b><i>O</i></b>(<i>n</i>).
<a href="javascript:if(confirm('http://www.amazon.com/exec/obidos/ISBN=0262031418/none01A/ \n\nThis file was not retrieved by Teleport Pro, because it is addressed on a domain or path outside the boundaries set for its Starting Address. \n\nDo you want to open it from the server?'))window.location='http://www.amazon.com/exec/obidos/ISBN=0262031418/none01A/'" tppabs="http://www.amazon.com/exec/obidos/ISBN=0262031418/none01A/" target="_top">Cormen [1990]</a> and
<a href="javascript:if(confirm('http://www.amazon.com/exec/obidos/ISBN=0201896850/none01A/ \n\nThis file was not retrieved by Teleport Pro, because it is addressed on a domain or path outside the boundaries set for its Starting Address. \n\nDo you want to open it from the server?'))window.location='http://www.amazon.com/exec/obidos/ISBN=0201896850/none01A/'" tppabs="http://www.amazon.com/exec/obidos/ISBN=0201896850/none01A/" target="_top">Knuth [1998]</a>
both contain excellent discussions on
hashing.
<h2>Theory</h2>
A hash table is simply an array that is addressed via a hash
function. For example, in
Figure 3-1,
<b>HashTable</b> is an array with
8 elements. Each element is a pointer to a linked list of
numeric data. The hash function for this example simply divides
the data key by 8, and uses the remainder as an index into the
table. This yields a number from 0 to 7. Since the range of
indices for <b>HashTable</b> is 0 to 7, we are guaranteed that the index
is valid.
<p>
<center>
<h3><a name="Fig3-1">
<img src="s_fig31.gif" tppabs="http://www.ee.uwa.edu.au/~plsd210/ds/niemann/s_fig31.gif" vspace=5 width=451 height=226><br>
Figure 3-1: A Hash Table</a>
</h3>
</center>
<p>
To insert a new item in the table, we hash the key to
determine which list the item goes on, and then insert the item
at the beginning of the list. For example, to insert 11, we
divide 11 by 8 giving a remainder of 3. Thus, 11 goes on the
list starting at <b>HashTable[3]</b>. To find a number, we hash the
number and chain down the correct list to see if it is in the
table. To delete a number, we find the number and remove the
node from the linked list.
<p>
Entries in the hash table are dynamically allocated and entered
on a linked list associated with each hash table entry.
This technique is known as <i>chaining</i>. An alternative method,
where all entries are stored in the hash table itself, is known as
direct or open addressing and may be found in the references.
<p>
If the hash function is uniform, or equally distributes the
data keys among the hash table indices, then hashing effectively
subdivides the list to be searched. Worst-case behavior occurs
when all keys hash to the same index. Then we simply have a
single linked list that must be sequentially searched.
Consequently, it is important to choose a good hash function.
Several methods may be used to hash key values. To illustrate
the techniques, I will assume <i>unsigned char</i> is 8-bits, <i>unsigned
short int</i> is 16-bits and <i>unsigned long int</i> is 32-bits.
<ul>
<li> <i>Division method</i> (<i>tablesize</i> = <i>prime</i>). This technique was used
in the preceeding example. A <b>HashValue</b>, from 0 to
(<b>HashTableSize - 1</b>), is computed by dividing the key value
by the size of the
hash table and taking the remainder. For example:
<blockquote><b><pre>
typedef int HashIndexType;
HashIndexType Hash(int Key) {
return Key % HashTableSize;
}
</pre></b></blockquote>
Selecting an appropriate <b>HashTableSize</b> is important to the
success of this method. For example, a <b>HashTableSize</b> of two
would yield even hash values for even <b>Keys</b>, and odd hash values
for odd <b>Keys</b>. This is an undesirable property, as all keys would
hash to the same value if they happened to be even. If
<b>HashTableSize</b> is a power of two, then the hash function simply
selects a subset of the <b>Key</b> bits as the table index. To obtain a
more random scattering, <b>HashTableSize</b> should be a prime number
not too close to a power of two.
<p>
<li> <i>Multiplication method</i> (<i>tablesize</i> = 2<sup><i>n</i></sup>). The multiplication
method may be used for a <b>HashTableSize</b> that is a power of 2. The
<b>Key</b> is multiplied by a constant, and then the necessary bits are
extracted to index into the table. Knuth
recommends using the fractional part of the product of the key and the
golden ratio, or (<i>sqrt</i>(5) - 1)/2.
For example, assuming a word size of 8 bits, the golden ratio is
multiplied by 2<sup>8</sup> to obtain 158. The product of the 8-bit
key and 158 results in a 16-bit integer. For a table size of
2<sup>5</sup> the 5 most significant bits of the least significant
word are extracted for the hash value.
The following definitions
may be used for the multiplication method:
<blockquote><b><pre>
/* 8-bit index */
typedef unsigned char HashIndexType;
static const HashIndexType K = 158;
/* 16-bit index */
typedef unsigned short int HashIndexType;
static const HashIndexType K = 40503;
/* 32-bit index */
typedef unsigned long int HashIndexType;
static const HashIndexType K = 2654435769;
/* w=bitwidth(HashIndexType), size of table=2**m */
static const int S = w - m;
HashIndexType HashValue = (HashIndexType)(K * Key) >> S;
</pre></b></blockquote>
For example, if <b>HashTableSize</b> is 1024 (2<sup>10</sup>), then a
16-bit index is sufficient and <b>S</b> would be assigned a
value of 16 - 10 = 6. Thus, we have:
<blockquote><b><pre>
typedef unsigned short int HashIndexType;
HashIndexType Hash(int Key) {
static const HashIndexType K = 40503;
static const int S = 6;
return (HashIndexType)(K * Key) >> S;
}
</pre></b></blockquote>
<li> <i>Variable string addition method</i> (<i>tablesize</i> = 256). To hash a
variable-length string, each character is added, modulo 256, to a
total. A <b>HashValue</b>, range 0-255, is computed.
<blockquote><b><pre>
typedef unsigned char HashIndexType;
HashIndexType Hash(char *str) {
HashIndexType h = 0;
while (*str) h += *str++;
return h;
}
</pre></b></blockquote>
<li> <i>Variable string exclusive-or method</i> (<i>tablesize</i> = 256). This
method is similar to the addition method, but successfully
distinguishes similar words and anagrams. To obtain a hash value
in the range 0-255, all bytes in the string are exclusive-or'd
together. However, in the process of doing each exclusive-or, a
random component is introduced.
<blockquote><b><pre>
typedef unsigned char HashIndexType;
unsigned char Rand8[256];
HashIndexType Hash(char *str) {
unsigned char h = 0;
while (*str) h = Rand8[h ^ *str++];
return h;
}
</pre></b></blockquote>
<b>Rand8</b> is a table of 256 8-bit unique random numbers.
The exact ordering is not critical. The exclusive-or
method has its basis in cryptography, and is quite effective
(<a href="javascript:if(confirm('http://www.acm.org/pubs/citations/journals/cacm/1990-33-6/p677-pearson/ \n\nThis file was not retrieved by Teleport Pro, because it is addressed on a domain or path outside the boundaries set for its Starting Address. \n\nDo you want to open it from the server?'))window.location='http://www.acm.org/pubs/citations/journals/cacm/1990-33-6/p677-pearson/'" tppabs="http://www.acm.org/pubs/citations/journals/cacm/1990-33-6/p677-pearson/" target="_top">Pearson [1990]</a>).
<p>
<li> <i>Variable string exclusive-or method</i> (<i>tablesize</i> <= 65536). If we
hash the string twice, we may derive a hash value for an
arbitrary table size up to 65536. The second time the string is
hashed, one is added to the first character. Then the two 8-bit
hash values are concatenated together to form a 16-bit hash
value.
</ul>
<blockquote><b><pre>
typedef unsigned short int HashIndexType;
unsigned char Rand8[256];
HashIndexType Hash(char *str) {
HashIndexType h;
unsigned char h1, h2;
if (*str == 0) return 0;
h1 = *str; h2 = *str + 1;
str++;
while (*str) {
h1 = Rand8[h1 ^ *str];
h2 = Rand8[h2 ^ *str];
str++;
}
/* h is in range 0..65535 */
h = ((HashIndexType)h1 << 8)|(HashIndexType)h2;
/* use division method to scale */
return h % HashTableSize
}
</pre></b></blockquote>
Assuming <i>n</i> data items, the hash table size should be large
enough to accommodate a reasonable number of entries. As seen in Table 3-1,
a small table size substantially increases the average
time to find a key. A hash table may be viewed as a collection
of linked lists. As the table becomes larger, the number of
lists increases, and the average number of nodes on each list
decreases. If the table size is 1, then the table is really a
single linked list of length <i>n</i>. Assuming a perfect hash
function, a table size of 2 has two lists of length <i>n</i>/2. If the
table size is 100, then we have 100 lists of length <i>n</i>/100. This
considerably reduces the length of the list to be searched.
There is
considerable leeway in the choice of table
size.
<center>
<p><a name="Tbl3-1"></a>
<table border cellspacing=0 cellpadding=3>
<caption align=bottom><h3>Table 3-1: HashTableSize vs. Average Search Time (µs), 4096 entries</h3></caption>
<tr><th>size</th><th>time</th><th></th><th>size</th><th>time</th></tr>
<tr align=right><td> 1</td><td> 869</td><td></td><td> 128</td><td> 9</td>
<tr align=right><td> 2</td><td> 432</td><td></td><td> 256</td><td> 6</td>
<tr align=right><td> 4</td><td> 214</td><td></td><td> 512</td><td> 4</td>
<tr align=right><td> 8</td><td> 106</td><td></td><td> 1024</td><td> 4</td>
<tr align=right><td> 16</td><td> 54</td><td></td><td> 2048</td><td> 3</td>
<tr align=right><td> 32</td><td> 28</td><td></td><td> 4096</td><td> 3</td>
<tr align=right><td> 64</td><td> 15</td><td></td><td> 8192</td><td> 3</td>
</table>
</center>
<p>
<h2>Implementation</h2>
An
<a href="javascript:if(confirm('http://www.ee.uwa.edu.au/~plsd210/ds/niemann/s_has.txt \n\nThis file was not retrieved by Teleport Pro, because it is linked too far away from its Starting Address. If you increase the in-domain depth setting for the Starting Address, this file will be queued for retrieval. \n\nDo you want to open it from the server?'))window.location='http://www.ee.uwa.edu.au/~plsd210/ds/niemann/s_has.txt'" tppabs="http://www.ee.uwa.edu.au/~plsd210/ds/niemann/s_has.txt">ANSI-C implementation</a>
of a hash table is included.
<b>Typedef T</b> and comparison operator <b>compEQ</b> should
be altered to reflect the data stored in the table.
The <b>hashTableSize</b> must be determined and the <b>hashTable</b> allocated.
The division method was used in the <b>hash</b> function.
Function <b>insertNode</b>
allocates a new node and inserts it in the table. Function
<b>deleteNode</b>
deletes and frees a node from the table.
Function <b>findNode</b> searches the
table for a particular value.
</body>
</html>
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?