首页 › 资源下载 › Linux/Unix编程 › spines-ns › 源码查看
stdhash.c

来自「spines-ns」· C语言代码 · 共 832 行 · 第 1/2 页
832 行
/* Copyright (c) 2000, The Johns Hopkins University * All rights reserved. * * The contents of this file are subject to a license (the ``License'') * that is the exact equivalent of the BSD license as of July 23, 1999.  * You may not use this file except in compliance with the License. The * specific language governing the rights and limitations of the License * can be found in the file ``STDUTIL_LICENSE'' found in this  * distribution. * * Software distributed under the License is distributed on an AS IS  * basis, WITHOUT WARRANTY OF ANY KIND, either express or implied.  * * The Original Software is: *     The Stdutil Library *  * Contributors: *     Creator - John Lane Schultz (jschultz@cnds.jhu.edu) *     The Center for Networking and Distributed Systems *         (CNDS - http://www.cnds.jhu.edu) */ #include <stdlib.h>#include <string.h>#ifdef USE_DMALLOC#include <dmalloc.h>#endif#include "stdutil/stdutil.h"#include "stdutil/stderror.h"#include "stdutil/stdhash.h"/* if all the necessary types exist then you can use stdhash */#ifdef STDRAND_EXISTS/* stdhash is a table based implementation of a dictionary data   structure. In particular, it is an open-address double hashing   hashtable (ref: Sedgewick: Algorithms in C, 3rd Ed.) that maps   non-unique keys to values. Values can be zero length which allows   this dictionary to function also as a hashset of keys.   Both keys and values are contained by value inside the    dictionary, just like all of the other data structures in the    stdutil library.   The user must specify a key equality relation and a key hashcode   relation for his keys. Several such functions are defined by   stdkvp.[ch] for commonly used key types.    _The user must be very careful when crafting such a pair of fcns_:    (1) equals(k1, k2) MUST be reflexive, symmetric, and transitive   (2) hashcode(k) MUST be invariant (always the same for a particular key)   (3) if equals(k1, k2) then hashcode(k1) == hashcode(k2) MUST hold   ** Note that condition 3 is fairly stringent!!   Also the user should consider the following:    (4) valid values for the key hashcode function are in the range:        [0, 2^(sizeof(size_t) * 8 - 1)). The most significant bit can legally       be set, but it will be ignored -- possibly causing more hashcode       collisions than expected.    (5) if !equals(k1, k2) then the probability that hashcode(k1) ==       hashcode(k2) should not be significantly more than        1 / 2^(sizeof(size_t) * 8 - 1) for best performance.   At the core of this hashtable implementation are two hash fcns:   The first hash fcn maps a key's hashcode to the range [0, n) (where   n is the capacity of the hashtable). This mapping is used to find   where in the table a search for that key should begin.   The second hash fcn returns an invariant offset (based on the   hashcode) for the key that is non-zero, less than and relatively   prime to n.  This offset is used to jump through the hashtable when   searching for a particular key.   Example: searching for a key in the table...  First, the hashcode   is computed for the key. Based on that, the first hash fcn returns   an index in the table, ind, at which to begin searching for the   key. If ind contains no key-val pair then the key does not exist in   the table and our search is done.  If ind contains a pair and it's   key matches the search key then our search is done. If ind contains   a pair but it's key doesn't match the search key (both its hashcode   and equals return a match) then I examine index (ind += offset)   (mod n).  I continue this process until I find a key match or I   find an empty index. Because offset is relatively prime to n, I can   check every index in the table in this manner in n iterations (the   sequence of indexes generated by this process for a particular key   is called a probe sequence). However, by keeping the table sparsely   populated the search will always terminate, and in practice the   search will on average terminate after a constant number of steps.   The first hash fcn (see implementation below: hash) is a kick ass   implementation of the strongly 2-universal family of hash fcns that    maps integers from the range [0, m) to the range [0, n):            h(i)[a, b] = ((a * i + b) (mod p)) (mod n)   Where n < m <= p <= 2m (usually n << m), p is prime, and a and b    are uniformly chosen random parameters of the function (0 < a < p,    0 <= b < p).   However, rather than using a large prime p, I use 2^(sizeof(size_t)   * 8). This causes problems if either a or i is a multiple of 2. To   get around these problems I simply force a and i to be odd (see   next paragraph for how). Because of this, a * i is relatively prime   to my "p" and I get all of the nice properties of doing   multiplication modulo a prime. However, because I did the math mod   2^(sizeof(size_t) * 8) the mod p operation is done implicitly by   the standard definition of multiplication and addition of size_t   types in C. Furthermore, I implement the second mod operation (mod   n) by a bitwise AND operation. I do this by _making the capacity   (n) of the table a power of 2_ and then (x % n) == (x & (n-1))! One   catch though: because a * i is always odd, b always determines what   the lowest bit of y = a * i + b is.  If b is odd then y is always   even, if b is even then y is always odd. So, the lowest bit doesn't   contribute anything and I need to remove it. That is why in the   hash() fcn below I right shift the result y. So the two expensive   modulo operations above are replaced by a single bitwise AND   operation!   I force a to be odd by uniformly picking a number in the range [0,   2^(sizeof(size_t) * 8)) and then OR'ing it with 0x1. I force i in   the range [0, 2^(sizeof(size_t) * 8 - 1)) to be odd by multiplying   it by 2 and then OR'ing it with 0x1. This operation maps i to odd   integers in the range [1, 2^(sizeof(size_t) * 8)). I often refer to   these numbers as expanded hashcodes or exphcodes. I choose b   uniformly from the range [0, 2^(sizeof(size_t) * 8)), but the   lowest bit does not contribute anything (see explanation in above   paragraph).   The second hash fcn is simply the lowest lg(n) bits of the expanded   hashcode for a key. This value is always odd making it relatively   prime to n (table capacity), _which is a power of 2_. This allows   us to iteratively search through the table. This hash function is   simply a bitwise AND operation between the expanded hashcode and   the.   Another important thing to note about this hashtable is that   because it uses an open-addressing scheme, key removals are more   complex than in other schemes. Instead of actually deleting the   node and saying it is empty, I leave the node in its space but I   set the cached, expanded hashcode to an impossible value (0) to   indicate that it was removed, but searchs should not terminate   here. This is because some insert probe sequence might have   depended on the removed node when it was inserted and so I can't   simply delete the node. But, if a user tries to insert into the   table and such an "inactive" node is found it can be used by the   new pair being inserted. Leaving these nodes in the table causes   still another problem. If the nodes still exist in the table, then   the load factor thresholds must be modified to take this into   account when deciding whether or not to grow, shrink, or rehash the   table. If they didn't, the table could be filled with "inactive"   nodes and searches might never terminate. In this implementation I   track the number of nodes in the table by the num_nodes variable. I   make decisions on whether to resize or rehash the table based on:   size, num_nodes, low_thresh and high_thresh.   Some more points: I cache expanded hashcodes for keys in   stdhash_nodes. Obviously a key should be constant, meaning that it   shouldn't change once it is inserted in to an associative data   structure. Anyway, I depend on that fact and property 3 (above)   strongly when I am searching for keys.    When rehashing occurs I don't free and reallocate active nodes. I   allocate a new table and free those inactive nodes (exphcode == 0)   that are acting only as place holders (see prior paragraph). Then,   all of the active nodes are inserted into the new table.  */# ifdef STD_CONSTRUCT_CHECKS#  define IS_HASH_INITED(hash) ((hash)->init_val == STDHASH_INITED)#  define INIT_HASH(hash)      ((hash)->init_val = STDHASH_INITED)#  define UNINIT_HASH(hash)    ((hash)->init_val = ~STDHASH_INITED)#  define IS_IT_INITED(it)     ((it)->it_init_val == STDHASH_IT_INITED)#  define INIT_IT(it)          ((it)->it_init_val = STDHASH_IT_INITED)# else#  define IS_HASH_INITED(hash)#  define INIT_HASH(hash)#  define UNINIT_HASH(hash)#  define IS_IT_INITED(it)#  define INIT_IT(it)# endif/* macros for table positions (stdhash_node**s) */# define POS_EMPTY(node_pos)    ((*(node_pos)) == 0)# define POS_INACTIVE(node_pos) ((*(node_pos))->exphcode == 0)# define IS_VALID_IT(it) \((it)->node_pos >= (it)->hash->begin && (it)->node_pos < (it)->hash->table_end && \ !POS_EMPTY((it)->node_pos) && !POS_INACTIVE((it)->node_pos))# define IS_LEGAL_IT(it) (IS_VALID_IT(it) || (it)->node_pos == (it)->hash->table_end)/* macros for table nodes (stdhash_node*s) */# define NKEY(node) ((node)->kv.key)# define NVAL(node) ((node)->kv.value)inline static int get_cap_and_threshs(size_t size_request, size_t *new_cap, 				      size_t *high_thresh, size_t *low_thresh) {  size_t tmp_new_cap = stdgood_pow2_cap(size_request <<= 1);  /* <<= !!! */  if (tmp_new_cap <= size_request && size_request != 0)    return STD_MEM_FAILURE;  *new_cap     = tmp_new_cap;  *high_thresh = tmp_new_cap >> 1; /* let table get 50% full before realloc'ing again */  *low_thresh  = tmp_new_cap >> 3; /* if table drops below 12.5% full shrink */  return STD_SUCCESS;}/* allocs and sets the key-val pair for a node */inline static stdhash_node *init_node(stdhash *h, stdhash_node *node, size_t exphcode, 				      const void *key, const void *val) {  if (!(node->kv.key = (void*) malloc(h->ksize)))    return 0;  if (h->vsize != 0) {    if (!(node->kv.value = (void*) malloc(h->vsize))) {      free(node->kv.key);      return 0;    }  } else    node->kv.value = 0;  node->exphcode = exphcode;  memcpy(node->kv.key, key, h->ksize);  memcpy(node->kv.value, val, h->vsize);    return node;}/* allocs a node and inits it to contain the passed key-val pair */inline static stdhash_node *make_node(stdhash *h, size_t exphcode, const void *key, const void *val) {  stdhash_node *node;  if (!(node = (stdhash_node*) malloc(sizeof(stdhash_node))))    return 0;  if (!init_node(h, node, exphcode, key, val)) {    free(node);    return 0;  }  return node;}/* frees a node entirely */inline static void free_node(stdhash_node *node) {  free(node->kv.key);  if (node->kv.value != 0)    free(node->kv.value);  free(node);}/* see comments at top of file: expands hcode of key into an exphcode */inline static size_t exp_hcode(size_t hcode) {  return (hcode << 1) | 0x1;}/* see comments at top of file: computes both hashfcns for a given exphcode and a stdhash */inline static void hash(size_t exphcode, size_t a, size_t b, size_t new_cap_min1,			size_t *hash_val, size_t *offset) {  *hash_val = (((a * exphcode + b) >> 1) & new_cap_min1); /* [0, n) */  *offset   = (exphcode & new_cap_min1);                  /* (0, n) and always odd */}/* pick new values for a and b -- used when I rehash the table *//* When a is extremely small our second hash fcn is more likely to   produce the same offset for two different hashcodes that have the   same hash value. In fact, if a == 1 then all keys that have the   same hash value will have the same offset. So here I discard very   small values of a. Therefore, a is actually an odd number in the   range [13, 2^(sizeof(size_t) * 8)). */inline static void update_params(stdhash *h) {  do {    h->a = stdrand(h->ai);  } while (h->a < 12);  h->a |= 0x1;            /* a needs to be odd */  h->b  = stdrand(h->bi); /* lowest bit of b contributes nothing */}/* get next with no safe checking */inline static stdhash_node **next(const stdhash *h, stdhash_node **curr) {  for (++curr; curr != h->table_end && (POS_EMPTY(curr) || POS_INACTIVE(curr)); ++curr);  return curr;}/* get prev with no safe checking */inline static stdhash_node **prev(const stdhash *h, stdhash_node **curr) {  for (--curr; POS_EMPTY(curr) || POS_INACTIVE(curr); --curr);  return curr;}/* Find the position of the first active node in the table and return it */inline static stdhash_node **find_begin(const stdhash *h) {  return next(h, h->table - 1);}/* This function is called when it has been determined that the active   pairs in h need to be put into a different table. It also    determines what the new capacity and thresholds should be.  */inline static int rehash(stdhash *h, size_t request_size) {  stdhash_node **curr = h->table, **end = h->table_end;  stdhash_node **search, **t = 0, **t_end;  size_t new_cap, new_cap_min1, new_high, new_low, hash_val, offset;  STD_SAFE_CHECK(request_size >= h->size); /* don't allow "lossy" rehashes */  /* get a good capacity for this table -- aim for capacity to be around 4 times size */  if (get_cap_and_threshs(request_size, &new_cap, &new_high, &new_low) != 0 ||      (new_cap != 0 && !(t = (stdhash_node**) calloc(new_cap, sizeof(stdhash_node*)))))    return STD_MEM_FAILURE;  t_end = t + new_cap;  new_cap_min1 = new_cap - 1;  update_params(h);                /* pick new a and b parameters */  for (; curr != end; ++curr) {    if (POS_EMPTY(curr))           /* ignore empty positions */      continue;    else if (POS_INACTIVE(curr))   /* free inactive nodes */      free_node(*curr);                else {                         /* insert active node into t */      hash((*curr)->exphcode, h->a, h->b, new_cap_min1, &hash_val, &offset);      search = t + hash_val;      while (!POS_EMPTY(search))   /* simple search: find an empty spot */	if ((search += offset) >= t_end)	  search = t + (search - t_end);      *search = *curr;             /* insert active node */    }  }  if (h->table)     free(h->table);  h->table       = t;  h->table_end   = t_end;  h->begin       = find_begin(h);  /* find first active element or end */  h->cap_min1    = new_cap_min1;  h->high_thresh = new_high;  h->low_thresh  = new_low;  h->num_nodes   = h->size;        /* no inactive nodes right now, only active */  return STD_SUCCESS;}inline static size_t sizeof_stdkvp(const stdhash_it *it) {  STD_CONSTRUCT_CHECK(IS_IT_INITED(it));  return stdhash_kvp_size(it->hash);}/******************* stdhash_it interface **************************************/inline void *stdhash_it_key(const stdhash_it *it) {  STD_CONSTRUCT_CHECK(IS_IT_INITED(it) && IS_HASH_INITED(it->hash));  STD_BOUNDS_CHECK(IS_VALID_IT(it));  return NKEY(*it->node_pos);}inline void *stdhash_it_val(const stdhash_it *it) {  STD_CONSTRUCT_CHECK(IS_IT_INITED(it) && IS_HASH_INITED(it->hash));  STD_BOUNDS_CHECK(IS_VALID_IT(it));  return NVAL(*it->node_pos);}inline stdkvp *stdhash_it_kvp(const stdhash_it *it) {  STD_CONSTRUCT_CHECK(IS_IT_INITED(it) && IS_HASH_INITED(it->hash));  STD_BOUNDS_CHECK(IS_VALID_IT(it));  return &(*it->node_pos)->kv;}inline stdbool stdhash_it_equals(const stdhash_it *it1, const stdhash_it *it2) {  STD_CONSTRUCT_CHECK(IS_IT_INITED(it1) && IS_IT_INITED(it2));  STD_CONSTRUCT_CHECK(IS_HASH_INITED(it1->hash) && IS_HASH_INITED(it2->hash));  STD_BOUNDS_CHECK(it1->hash == it2->hash && IS_LEGAL_IT(it1) && IS_LEGAL_IT(it2));  return it1->node_pos == it2->node_pos;}inline stdbool stdhash_it_is_begin(const stdhash_it *it) {  STD_CONSTRUCT_CHECK(IS_IT_INITED(it) && IS_HASH_INITED(it->hash));  return it->node_pos == it->hash->begin;}inline stdbool stdhash_it_is_end(const stdhash_it *it) {  STD_CONSTRUCT_CHECK(IS_IT_INITED(it) && IS_HASH_INITED(it->hash));  return it->node_pos == it->hash->table_end;}inline stdhash_it *stdhash_it_seek_begin(stdhash_it *it) {  STD_CONSTRUCT_CHECK(IS_IT_INITED(it) && IS_HASH_INITED(it->hash));  it->node_pos = it->hash->begin;  return it;}inline stdhash_it *stdhash_it_seek_end(stdhash_it *it) {  STD_CONSTRUCT_CHECK(IS_IT_INITED(it) && IS_HASH_INITED(it->hash));  it->node_pos = it->hash->table_end;  return it;}inline stdhash_it *stdhash_it_next(stdhash_it *it) {  STD_CONSTRUCT_CHECK(IS_IT_INITED(it) && IS_HASH_INITED(it->hash));  STD_BOUNDS_CHECK(IS_VALID_IT(it));  it->node_pos = next(it->hash, it->node_pos);  return it;}inline stdhash_it *stdhash_it_advance(stdhash_it *it, size_t num_advance) {  STD_CONSTRUCT_CHECK(IS_IT_INITED(it) && IS_HASH_INITED(it->hash));  STD_BOUNDS_CHECK(IS_LEGAL_IT(it));  while (num_advance--) {    STD_BOUNDS_CHECK(it->node_pos != it->hash->table_end);    it->node_pos = next(it->hash, it->node_pos);
stdhash.c - 源码说明

本页面展示了「spines-ns」中的 stdhash.c 源码文件，采用 C语言编程语言编写，共 832 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫下载站收录了大量与spines-ns相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?