📄 userguide.txt

📁 uthash 是一个C语言的哈希表
💻 TXT
📖 第 1 页 / 共 3 页
字号:
上一页 1 23
|===============================================================================Which hash function is best?~~~~~~~~~~~~~~~~~~~~~~~~~~~~You can easily determine the best hash function for your key domain. To do so,you'll need to run your program once in a data-collection pass, and then runthe collected data through an included analysis utility.First you must build the analysis utility. From the top-level directory,    cd tests/    makeWe'll use `test14.c` to demonstrate the data-collection and analysis steps(here using `sh` syntax to redirect file descriptor 3 to a file):.Using keystats--------------------------------------------------------------------------------% cc -DHASH_EMIT_KEYS=3 -I../src -o test14 test14.c% ./test14 3>test14.keys% ./keystats test14.keysfcn  ideal%     #items   #buckets  dup%  fl   add_usec  find_usec  del-all usec---  ------ ---------- ---------- -----  -- ---------- ----------  ------------FNV   90.3%       1219        512    0%  ok        244        136            44SAX   88.7%       1219        512    0%  ok        201        145            46OAT   87.2%       1219        256    0%  ok        166        214            40JEN   86.7%       1219        256    0%  ok        266        221            40BER   86.2%       1219        256    0%  ok        171        155            45--------------------------------------------------------------------------------[NOTE]The value 3 in `-DHASH_EMIT_KEYS=3` is a file descriptor. Any file descriptorthat your program doesn't use for its own purposes can be used instead of 3. The data-collection mode enabled by `-DHASH_EMIT_KEYS=x` should not be used inproduction code. Usually, you should just pick the first hash function that is listed. Here, thisis `FNV`.  This is the function that provides the most even distribution foryour keys. If several have the same `ideal%`, then choose the fastest oneaccording to the `find_usec` column.keystats column reference~~~~~~~~~~~~~~~~~~~~~~~~~fcn::    symbolic name of hash functionideal%::    The percentage of items in the hash table which can be looked up within an    ideal number of steps. (Further explained below).#items::    the number of keys that were read in from the emitted key file#buckets::    the number of buckets in the hash after all the keys were addeddup%::    the percent of duplicate keys encountered in the emitted key file.    Duplicates keys are filtered out to maintain key uniqueness. (Duplicates    are normal.  For example, if the application adds an item to a hash,    deletes it, then re-adds it, the key is written twice to the emitted file.)  flags::    this is either `ok`, or `nx` (noexpand) if the expansion inhibited flag is    set, described in Appendix B.  It is not recommended to use a hash function    that has the `noexpand` flag set.add_usec::    the clock time in microseconds required to add all the keys to a hashfind_usec::    the clock time in microseconds required to look up every key in the hashdel-all usec::    the clock time in microseconds required to delete every item in the hashideal%~~~~~~.What is ideal%?*****************************************************************************The 'n' items in a hash are distributed into 'k' buckets. Ideally each bucketwould contain an equal share '(n/k)' of the items. In other words, the maximumlinear position of any item in a bucket chain would be 'n/k' if every bucket isequally used. If some buckets are overused and others are underused, theoverused buckets will contain items whose linear position surpasses 'n/k'.Such items are considered non-ideal. As you might guess, `ideal%` is the percentage of ideal items in the hash. Theseitems have favorable linear positions in their bucket chains.  As `ideal%`approaches 100%, the hash table approaches constant-time lookup performance.*****************************************************************************[[Appendix_B]]Appendix B: Expansion internals-------------------------------Internally this hash manages the number of buckets, with the goal of havingenough buckets so that each one contains only a small number of items.  .Why does the number of buckets matter?********************************************************************************When looking up an item by its key, this hash scans linearly through the itemsin the appropriate bucket. In order for the linear scan to run in constanttime, the number of items in each bucket must be bounded. This is accomplishedby increasing the number of buckets as needed.********************************************************************************Normal expansion~~~~~~~~~~~~~~~~This hash attempts to keep fewer than 10 items in each bucket. When an item isadded that would cause a bucket to exceed this number, the number of buckets inthe hash is doubled and the items are redistributed into the new buckets. In anideal world, each bucket will then contain half as many items as it did before.Bucket expansion occurs automatically and invisibly as needed. There isno need for the application to know when it occurs. Per-bucket expansion threshold^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^Normally all buckets share the same threshold (10 items) at which point bucketexpansion is triggered. During the process of bucket expansion, uthash canadjust this expansion-trigger threshold on a per-bucket basis if it sees thatcertain buckets are over-utilized. When this threshold is adjusted, it goes from 10 to a multiple of 10 (for thatparticular bucket).  The multiple is based on how many times greater the actualchain length is than the ideal length. It is a practical measure to reduceexcess bucket expansion in the case where a hash function over-utilizes a fewbuckets but has good overall distribution. However, if the overall distributiongets too bad, uthash changes tactics.Inhibited expansion~~~~~~~~~~~~~~~~~~~You usually don't need to know or worry about this, particularly if you usedthe `keystats` utility during development to select a good hash for your keys.A hash function may yield an uneven distribution of items across the buckets.In moderation this is not a problem. Normal bucket expansion takes place asthe chain lengths grow. But when significant imbalance occurs (because the hashfunction is not well suited to the key domain), bucket expansion may beineffective at reducing the chain lengths. Imagine a very bad hash function which always puts every item in bucket 0. Nomatter how many times the number of buckets is doubled, the chain length ofbucket 0 stays the same. In a situation like this, the best behavior is to stop expanding, and accept O(n) lookup performance. This is what uthashdoes. It degrades gracefully if the hash function is ill-suited to the keys.If two consecutive bucket expansions yield `ideal%` values below 50%, uthashinhibits expansion for that hash table.  Once set, the 'bucket expansioninhibited' flag remains in effect as long as the hash has items in it.  Inhibited expansion may cause `HASH_FIND` to exhibit worse than constant-timeperformance. Appendix C: Hooks-----------------You don't need to use these hooks- they are only here if you want to modifythe behavior of uthash.  Hooks can be used to change how uthash allocates memory, and to run code in response to certain internal events. malloc/free~~~~~~~~~~~By default this hash implementation uses `malloc` and `free` to manage memory.If your application uses its own custom allocator, this hash can use them too..Specifying alternate memory management functions----------------------------------------------------------------------------#include "uthash.h"/* undefine the defaults */#undef uthash_bkt_malloc#undef uthash_bkt_free#undef uthash_tbl_malloc#undef uthash_tbl_free/* re-define, specifying alternate functions */#define uthash_bkt_malloc(sz) my_malloc(sz)  /* for UT_hash_bucket */#define uthash_bkt_free(ptr) my_free(ptr)    #define uthash_tbl_malloc(sz) my_malloc(sz)  /* for UT_hash_table  */#define uthash_tbl_free(ptr) my_free(ptr)    ...----------------------------------------------------------------------------Why are there two pairs of malloc/free functions?^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^One deals with `UT_hash_bucket` structures, the other with `UT_hash_table`structures.  While the two structures don't 'need' to have their own allocationand free functions (indeed, the default is just to use `malloc` and `free` forboth), they exist separately for each structure for convenient integration withpool or "slab" type allocators. This type of allocator provides a separate poolfor each structure.Out of memory~~~~~~~~~~~~~If memory allocation fails (i.e., the malloc function returned `NULL`), thedefault behavior is to terminate the process by calling `exit(-1)`. This canbe modified by re-defining the `uthash_fatal` macro.     #undef uthash_fatal    #define uthash_fatal(msg) my_fatal_function(msg);The fatal function should terminate the process; uthash does not support"returning a failure" if memory cannot be allocated.Internal events~~~~~~~~~~~~~~~There is no need for the application to set these hooks or take action inresponse to these events. They are mainly for diagnostic purposes.These two hooks are "notification" hooks which get executed if uthash isexpanding buckets, or setting the 'bucket expansion inhibited' flag. Normallyboth of these hooks are undefined and thus compile away to nothing. Expansion notification^^^^^^^^^^^^^^^^^^^^^^There is a hook for the bucket expansion event.  .Bucket expansion hook----------------------------------------------------------------------------#include "uthash.h"#undef uthash_expand_fyi #define uthash_expand_fyi(tbl) printf("expanded to %d buckets\n", tbl->num_buckets)...----------------------------------------------------------------------------Expansion-inhibited notification^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^This hook can be defined to code to execute in the event that uthash decides toset the 'bucket expansion inhibited' flag. .Bucket expansion inhibited hook----------------------------------------------------------------------------#include "uthash.h"#undef uthash_noexpand_fyi#define uthash_noexpand_fyi printf("warning: bucket expansion inhibited\n");...----------------------------------------------------------------------------Appendix D: Debug mode----------------------If a program that uses this hash is compiled with `-DHASH_DEBUG=1`, a specialinternal consistency-checking mode is activated.  In this mode, the integrityof the whole hash is checked following every add or delete operation.  This isfor debugging the uthash software only, not for use in production code. In the `tests/` directory, running `make debug` will run all the tests inthis mode.In this mode, any internal errors in the hash data structure will cause amessage to be printed to `stderr` and the program to exit.The `UT_hash_handle` data structure includes `next`, `prev`, `hh_next` and`hh_prev` fields.  The former two fields determine the "application" ordering(that is, insertion order-- the order the items were added).  The latter twofields determine the "bucket chain" order.  These link the `UT_hash_handles`together in a doubly-linked list that is a bucket chain.Checks performed in `-DHASH_DEBUG=1` mode:- the hash is walked in its entirety twice: once in 'bucket' order and a  second time in 'application' order- the total number of items encountered in both walks is checked against the  stored number- during the walk in 'bucket' order, each item's `hh_prev` pointer is compared  for equality with the last visited item- during the walk in 'application' order, each item's `prev` pointer is compared  for equality with the last visited item.Macro debugging:********************************************************************************Sometimes it's difficult to interpret a compiler warning when all it hasis a warning and a line number containing a macro call. In the case of uthash,the macro can expand to dozens of lines. In this case, it is helpful to expandthe macros to get a more precise idea of the line number where the error occurs.  gcc -E -I../src test1.c > /tmp/a.c  egrep -v '^#' /tmp/a.c > /tmp/b.c  indent /tmp/b.c  gcc -o /tmp/b /tmp/b.cThe last line compiles the original program (test1.c) with all macros expanded.So any compiler error or warning will have a line number that can be used topinpoint the offending line precisely within the expanded macro call.********************************************************************************Appendix E: Thread safety-------------------------You can use uthash in a threaded program. But you must do the locking. Use aread-write lock to protect against concurrent writes. It is ok to haveconcurrent readers (since uthash 1.5). For example using pthreads you can create an rwlock like this:  pthread_rwlock_t lock;  if (pthread_rwlock_init(&lock,NULL) != 0) fatal("can't create rwlock");Then, readers must acquire the read lock before doing any `HASH_FIND` calls orbefore iterating over the hash elements:  if (pthread_rwlock_rdlock(&lock) != 0) fatal("can't get rdlock");  HASH_FIND_INT(elts, &i, e);  pthread_rwlock_unlock(&lock);  Writers must acquire the exclusive write lock before doing any update. Add,delete, and sort are all updates that must be locked.  if (pthread_rwlock_wrlock(&lock) != 0) fatal("can't get wrlock");  HASH_DEL(elts, e);  pthread_rwlock_unlock(&lock);If you prefer, you can use a mutex instead of a read-write lock, but this willreduce reader concurrency to a single thread at a time. An example program using uthash with a read-write lock is included in`tests/threads/test1.c`.[[Appendix_F]]Appendix F: Macro reference---------------------------Convenience macros~~~~~~~~~~~~~~~~~~The convenience macros do the same thing as the generalized macros, butrequire fewer arguments.  In order to use the convenience macros, 1. the structure's `UT_hash_handle` field must be named `hh`, and2. for add or find, the key field must be of type `int` or `char[]` .Convenience macros[width="90%",cols="10m,30m",grid="none",options="header"]|===============================================================================|macro         | arguments|HASH_ADD_INT  | (head, keyfield_name, item_ptr)|HASH_FIND_INT | (head, key_ptr, item_ptr)|HASH_ADD_STR  | (head, keyfield_name, item_ptr)|HASH_FIND_STR | (head, key_ptr, item_ptr)|HASH_DEL      | (head, item_ptr)|HASH_SORT     | (head, cmp)|HASH_COUNT    | (head)|===============================================================================General macros~~~~~~~~~~~~~~These macros add, find, delete and sort the items in a hash.  You need to use the general macros if your `UT_hash_handle` is named something otherthan `hh`, or if your key's data type isn't `int` or `char[]`..General macros[width="90%",cols="10m,30m",grid="none",options="header"]|===============================================================================|macro          | arguments|HASH_ADD       | (hh_name, head, keyfield_name, key_len, item_ptr)|HASH_ADD_KEYPTR| (hh_name, head, key_ptr, key_len, item_ptr)|HASH_FIND      | (hh_name, head, key_ptr, key_len, item_ptr)|HASH_DELETE    | (hh_name, head, item_ptr)|HASH_SRT       | (hh_name, head, cmp)|HASH_CNT       | (hh_name, head)|===============================================================================[NOTE]`HASH_ADD_KEYPTR` is used when the structure contains a pointer to thekey, rather than the key itself. Argument descriptions^^^^^^^^^^^^^^^^^^^^^hh_name::    name of the `UT_hash_handle` field in the structure. Conventionally called    `hh`.head::    the structure pointer variable which acts as the "head" of the hash. So    named because it initially points to the first item that is added to the hash.keyfield_name::    the name of the key field in the structure. (In the case of a multi-field    key, this is the first field of the key). If you're new to macros, it    might seem strange to pass the name of a field as a parameter. See    <<validc,note>>.key_len::    the length of the key field in bytes. E.g. for an integer key, this is    `sizeof(int)`, while for a string key it's `strlen(key)`. (For a    multi-field key, see the notes in this guide on calculating key length).key_ptr::    for `HASH_FIND`, this is a pointer to the key to look up in the hash    (since it's a pointer, you can't directly pass a literal value here). For    `HASH_ADD_KEYPTR`, this is the address of the key of the item being added.item_ptr::    pointer to the structure being added, deleted or looked up. This is an    input parameter for HASH_ADD and HASH_DELETE macros, and an output    parameter for HASH_FIND.cmp::    pointer to comparison function which accepts two arguments (pointers to    items to compare) and returns an int specifying whether the first item    should sort before, equal to, or after the second item (like `strcmp`).// vim: set tw=80 wm=2 syntax=asciidoc:
上一页 1 23
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -