📄 hash.c

📁 wget (command line browser) source code
💻 C
📖 第 1 页 / 共 2 页
字号:
12 下一页
/* Hash tables.   Copyright (C) 2000, 2001 Free Software Foundation, Inc.This file is part of GNU Wget.GNU Wget is free software; you can redistribute it and/or modifyit under the terms of the GNU General Public License as published bythe Free Software Foundation; either version 2 of the License, or (atyour option) any later version.GNU Wget is distributed in the hope that it will be useful,but WITHOUT ANY WARRANTY; without even the implied warranty ofMERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See theGNU General Public License for more details.You should have received a copy of the GNU General Public Licensealong with Wget; if not, write to the Free SoftwareFoundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.In addition, as a special exception, the Free Software Foundationgives permission to link the code of its release of Wget with theOpenSSL project's "OpenSSL" library (or with modified versions of itthat use the same license as the "OpenSSL" library), and distributethe linked executables.  You must obey the GNU General Public Licensein all respects for all of the code used other than "OpenSSL".  If youmodify this file, you may extend this exception to your version of thefile, but you are not obligated to do so.  If you do not wish to doso, delete this exception statement from your version.  */#ifdef HAVE_CONFIG_H# include <config.h>#endif#ifdef HAVE_STRING_H# include <string.h>#else# include <strings.h>#endif /* HAVE_STRING_H */#include <stdlib.h>#include <assert.h>#include "wget.h"#include "utils.h"#include "hash.h"#ifdef STANDALONE# undef xmalloc# undef xrealloc# undef xfree# define xmalloc malloc# define xrealloc realloc# define xfree free# undef TOLOWER# define TOLOWER(x) ('A' <= (x) && (x) <= 'Z' ? (x) - 32 : (x))#endif/* INTERFACE:   Hash tables are an implementation technique used to implement   mapping between objects.  Assuming a good hashing function is used,   they provide near-constant-time access and storing of information.   Duplicate keys are not allowed.   This file defines the following entry points: hash_table_new   creates a hash table, and hash_table_destroy deletes it.   hash_table_put establishes a mapping between a key and a value.   hash_table_get retrieves the value that corresponds to a key.   hash_table_contains queries whether a key is stored in a table at   all.  hash_table_remove removes a mapping that corresponds to a   key.  hash_table_map allows you to map through all the entries in a   hash table.  hash_table_clear clears all the entries from the hash   table.   The number of mappings in a table is not limited, except by the   amount of memory.  As you add new elements to a table, it regrows   as necessary.  If you have an idea about how many elements you will   store, you can provide a hint to hash_table_new().   The hashing and equality functions depend on the type of key and   are normally provided by the user.  For the special (and frequent)   case of using string keys, you can use the pre-canned   make_string_hash_table(), which provides an efficient string   hashing function, and a string equality wrapper around strcmp().   When specifying your hash and test functions, make sure the   following holds true:   - The test function returns non-zero for keys that are considered     "equal", zero otherwise.   - The hash function returns a number that represents the     "distinctness" of the object.  In more precise terms, it means     that for any two objects that test "equal" under the test     function, the hash function MUST produce the same result.     This does not mean that each distinct object must produce a     distinct value, only that non-distinct objects must produce the     same values!  For instance, a hash function that returns 0 for     any given object is a perfectly valid (albeit extremely bad) hash     function.  A hash function that hashes a string by adding up all     its characters is another example of a valid (but quite bad) hash     function.     The above stated rule is quite easy to enforce.  For example, if     your testing function compares strings case-insensitively, all     your function needs to do is lower-case the string characters     before calculating a hash.  That way you have easily guaranteed     that case differences will not result in a different hash.   - If you care about performance, choose a hash function with as     good "spreading" as possible.  A good hash function will react to     even a small change in its input with a completely different     resulting hash.  Finally, don't make the hash function itself     overly slow, because you'll be incurring a non-negligible     overhead to reads and writes to the hash table.   Note that neither keys nor values are copied when inserted into the   hash table, so they must exist for the lifetime of the table.  This   means that e.g. the use of static strings is OK, but objects with a   shorter life-time need to be copied (with strdup() or the like in   the case of strings) before being inserted.  *//* IMPLEMENTATION:   All the hash mappings (key-value pairs of pointers) are stored in a   contiguous array.  The position of each mapping is determined by   the hash value of its key and the size of the table: location :=   hash(key) % size.  If two different keys end up on the same   position (hash collision), the one that came second is placed at   the next empty position following the occupied place.  This   collision resolution technique is called "linear probing".   There are more advanced collision resolution mechanisms (quadratic   probing, double hashing), but we don't use them because they incur   more non-sequential access to the array, which results in worse   cache behavior.  Linear probing works well as long as the   fullness/size ratio is kept below 75%.  We make sure to regrow or   rehash the hash table whenever this threshold is exceeded.   Collisions make deletion tricky because finding collisions again   relies on new empty spots not being created.  That's why   hash_table_remove is careful to rehash the mappings that follow the   deleted one.  *//* When hash table's fullness exceeds this threshold, the hash table   is resized.  */#define HASH_FULLNESS_THRESHOLD 0.75/* The hash table size is multiplied by this factor with each resize.   This guarantees infrequent resizes.  */#define HASH_RESIZE_FACTOR 2struct mapping {  void *key;  void *value;};struct hash_table {  unsigned long (*hash_function) PARAMS ((const void *));  int (*test_function) PARAMS ((const void *, const void *));  int size;			/* size of the array */  int count;			/* number of non-empty, non-deleted                                   fields. */  int resize_threshold;		/* after size exceeds this number of				   entries, resize the table.  */  int prime_offset;		/* the offset of the current prime in				   the prime table. */  struct mapping *mappings;	/* the array of mapping pairs. */};/* We use NULL key to mark a mapping as empty.  It is consequently   illegal to store NULL keys.  */#define NON_EMPTY(mp) (mp->key != NULL)/* "Next" mapping is the mapping after MP, but wrapping back to   MAPPINGS when MP would reach MAPPINGS+SIZE.  */#define NEXT_MAPPING(mp, mappings, size) (mp != mappings + (size - 1)	\					  ? mp + 1 : mappings)/* Loop over non-empty mappings starting at MP. */#define LOOP_NON_EMPTY(mp, mappings, size)				\  for (; NON_EMPTY (mp); mp = NEXT_MAPPING (mp, mappings, size))/* #### We might want to multiply with the "golden ratio" here to get   better randomness for keys that do not result from a good hash   function.  This is currently not a problem in Wget because we only   use the string hash tables.  */#define HASH_POSITION(ht, key) (ht->hash_function (key) % ht->size)/* Find a prime near, but greather than or equal to SIZE.  Of course,   the primes are not calculated, but looked up from a table.  The   table does not contain all primes in range, just a selection useful   for this purpose.   PRIME_OFFSET is a minor optimization: if specified, it starts the   search for the prime number beginning with the specific offset in   the prime number table.  The final offset is stored in the same   variable.  */static intprime_size (int size, int *prime_offset){  static const unsigned long primes [] = {    13, 19, 29, 41, 59, 79, 107, 149, 197, 263, 347, 457, 599, 787, 1031,    1361, 1777, 2333, 3037, 3967, 5167, 6719, 8737, 11369, 14783,    19219, 24989, 32491, 42257, 54941, 71429, 92861, 120721, 156941,    204047, 265271, 344857, 448321, 582821, 757693, 985003, 1280519,    1664681, 2164111, 2813353, 3657361, 4754591, 6180989, 8035301,    10445899, 13579681, 17653589, 22949669, 29834603, 38784989,    50420551, 65546729, 85210757, 110774011, 144006217, 187208107,    243370577, 316381771, 411296309, 534685237, 695090819, 903618083,    1174703521, 1527114613, 1985248999,    (unsigned long)0x99d43ea5, (unsigned long)0xc7fa5177  };  int i = *prime_offset;  for (; i < countof (primes); i++)    if (primes[i] >= size)      {	/* Set the offset to the next prime.  That is safe because,	   next time we are called, it will be with a larger SIZE,	   which means we could never return the same prime anyway.	   (If that is not the case, the caller can simply reset	   *prime_offset.)  */	*prime_offset = i + 1;	return primes[i];      }  abort ();  return 0;}/* Create a hash table with hash function HASH_FUNCTION and test   function TEST_FUNCTION.  The table is empty (its count is 0), but   pre-allocated to store at least ITEMS items.   ITEMS is the number of items that the table can accept without   needing to resize.  It is useful when creating a table that is to   be immediately filled with a known number of items.  In that case,   the regrows are a waste of time, and specifying ITEMS correctly   will avoid them altogether.   Note that hash tables grow dynamically regardless of ITEMS.  The   only use of ITEMS is to preallocate the table and avoid unnecessary   dynamic regrows.  Don't bother making ITEMS prime because it's not   used as size unchanged.  To start with a small table that grows as   needed, simply specify zero ITEMS.   If HASH_FUNCTION is not provided, identity table is assumed,   i.e. key pointers are compared as keys.  If you want strings with   equal contents to hash the same, use make_string_hash_table.  */struct hash_table *hash_table_new (int items,		unsigned long (*hash_function) (const void *),		int (*test_function) (const void *, const void *)){  int size;  struct hash_table *ht    = (struct hash_table *)xmalloc (sizeof (struct hash_table));  ht->hash_function = hash_function ? hash_function : ptrhash;  ht->test_function = test_function ? test_function : ptrcmp;  ht->prime_offset = 0;  /* Calculate the size that ensures that the table will store at     least ITEMS keys without the need to resize.  */  size = 1 + items / HASH_FULLNESS_THRESHOLD;  size = prime_size (size, &ht->prime_offset);  ht->size = size;  ht->resize_threshold = size * HASH_FULLNESS_THRESHOLD;  /*assert (ht->resize_threshold >= items);*/  ht->mappings = xmalloc (ht->size * sizeof (struct mapping));  memset (ht->mappings, '\0', ht->size * sizeof (struct mapping));  ht->count = 0;  return ht;}/* Free the data associated with hash table HT. */voidhash_table_destroy (struct hash_table *ht){  xfree (ht->mappings);  xfree (ht);}/* The heart of most functions in this file -- find the mapping whose   KEY is equal to key, using linear probing.  Returns the mapping   that matches KEY, or the first empty mapping if none matches.  */static inline struct mapping *find_mapping (const struct hash_table *ht, const void *key){  struct mapping *mappings = ht->mappings;  int size = ht->size;  struct mapping *mp = mappings + HASH_POSITION (ht, key);  int (*equals) PARAMS ((const void *, const void *)) = ht->test_function;  LOOP_NON_EMPTY (mp, mappings, size)    if (equals (key, mp->key))      break;  return mp;}/* Get the value that corresponds to the key KEY in the hash table HT.   If no value is found, return NULL.  Note that NULL is a legal value   for value; if you are storing NULLs in your hash table, you can use   hash_table_contains to be sure that a (possibly NULL) value exists   in the table.  Or, you can use hash_table_get_pair instead of this   function.  */void *hash_table_get (const struct hash_table *ht, const void *key){  struct mapping *mp = find_mapping (ht, key);  if (NON_EMPTY (mp))    return mp->value;  else    return NULL;}/* Like hash_table_get, but writes out the pointers to both key and   value.  Returns non-zero on success.  */inthash_table_get_pair (const struct hash_table *ht, const void *lookup_key,		     void *orig_key, void *value){  struct mapping *mp = find_mapping (ht, lookup_key);  if (NON_EMPTY (mp))    {      if (orig_key)	*(void **)orig_key = mp->key;      if (value)	*(void **)value = mp->value;      return 1;    }  else    return 0;}/* Return 1 if HT contains KEY, 0 otherwise. */inthash_table_contains (const struct hash_table *ht, const void *key){  struct mapping *mp = find_mapping (ht, key);  return NON_EMPTY (mp);}
12 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -