📄 changelog
字号:
compilers, even though a large, sparse array must be created. Here's the general idea: a. Generate the wordlist as a contiguous block of keywords, just as before when using a switch statement. This wordlist *must* be sorted by hash value. b. Generate the lookup array, which is an array of signed {chars,shorts,ints}, (which ever allows full coverage of the wordlist dimensions). If the value v, where v = lookup[hash(str,len)], is >= 0 and < TOTAL_KEYWORDS, then we simply use this result as a direct access into the wordlist array to snag the keyword for comparison. c. Otherwise, if v is < -TOTAL_KEYWORDS or > TOTAL_KEYWORDS this is an indication that we'll need to search through some number of duplicates hash values. Using a hash linking scheme we'd then index into a different part of the hash table that provides the starting index and total length of the duplicate entries to find via linear search!Sun Nov 12 13:48:10 1989 Doug Schmidt (schmidt at zola.ics.uci.edu) * Simplified Key_List::output_min_max considerably by recognizing that since the keyword list was already sorted by hash value finding the min and max values is trivial! * Improved the debugging diagnostics considerably in classes Key_List, Hash_Table, and Gen_Perf. * Modified the `-s' option so that a negative argument is now interpreted to mean `allow the maximum associated value to be about x times *smaller* than the number of input keys.' This should help prevent massive explosion of generated hash table size for large keysets.Sat Nov 11 11:31:13 1989 Doug Schmidt (schmidt at glacier.ics.uci.edu) * Added a field in class Key_List that counts the total number of duplicate keywords, both static and dynamic. * Added a new member function Bool_Array that deletes the dynamic memory allocated to Bool_Array::storage_array. This space may be needed for subsequent options, so it made sense to free it as soon as possible... * Renamed file/class Alpha_Vectors to Vectors, to avoid problems with 14 character length filenames on SYSV. Also changed file adapredefined.gperf to adadefs.gperf in the ./tests directory. * Modified class Options by changing the member function Options::total_positions to Options::get_charset_size and Options::set_charset_size. These two routines now either return the total charset size *or* the length of the largest keyword if the user specifies the -k'*' (ALLCHARS) option. This change cleans up client code. * Merged all the cperf changes into gperf. * Made sure to explicitly initialize perfect.fewest_collisions to 0. * Cleaned up some loose ends noticed by Nels Olson. 1. Removed `if (collisions <= perfect.fewest_collisions)' from Gen_Perf::affects_prev since it was superfluous. 2. Removed the fields best_char_value and best_asso_value from Gen_Perf. There were also unnecessary. 3. Fixed a braino in the Bool_Array::bool_array_reset function. Since iteration numbers can never be zero the `if (bool_array.iteration_number++ == 0)' must be `if (++bool_array.iteration_number == 0).' 4. Modified Std_Err::report_error so that it correctly handles "%%". * It is important to note that -D no longer enables -S. There is a good reason for this change, which will become manifested in the next release... (suspense!). * Made some subtle changes to Key_List::print_switch so that if finally seems to work correctly. Needs more stress testing, however... * Made a major change to the Key_List::print_switch function. The user can now specify the number of switch statements to generate via an argument to the -S option, i.e., -S1 means `generate 1 switch statement with all keywords in it,' -S2 means generate 2 switch statements with 1/2 the elements in each one, etc. Hopefully this will fix the problem with C compilers not being able to generate code for giant switch statements (but don't hold your breath!) * Changed Key_List::length function to Key_List::keyword_list_length. * Added a feature to main.c that prints out the starting wall-clock time before the program begins and prints out the ending wall-clock time when the program is finished. * Added the GATHER_STATISTICS code in hash-table.c so we can keep track of how well double hashing is doing. Eventually, GATHER_STATISTICS will be added so that all instrumentation code can be conditionally compiled in. * Fixed a stupid bug in Key_List::print_switch routine. This was necessary to make sure the generated switch statement worked correctly when *both* `natural,' i.e., static links and dynamic links, i.e., unresolved duplicates, hash to the same value. * Modified Bool_Array::~Bool_Array destructor so that it now frees the bool_array.storage_array when it is no longer needed. Since this array is generally very large it makes sense to return the memory to the freelist when it is no longer in use. * Changed the interface to constructor Hash_Table::Hash_Table. This constructor now passed a pointer to a power-of-two sized buffer that serve as storage for the hash table. Although this weakens information hiding a little bit it greatly reduces dynamic memory fragmentation, since we can now obtain the memory via a call to alloca, rather than malloc. This change modified Key_List::read_keys calling interface. * Since alloca is now being used more aggressively a conditional compilation section was added in main.c. Taken from GNU GCC, this code gets rid of any avoidable limit on stack size so that alloca does not fail. It is only used if the -DRLIMIT_STACK symbol is defined when gperf is compiled. * Added warnings in option.c so that user's would be informed that -r superceeds -i on the command-line. * Rewrote Gen_Perf::affects_prev. First, the code structure was cleaned up considerably (removing the need for a dreaded goto!). Secondly, a major change occurred so that Gen_Perf::affects_prev returns FALSE (success) when fewest_hits gets down to whatever it was after inserting the previous key (instead of waiting for it to reach 0). In other words, it stops trying if it can resolve the new collisions added by a key, even if there are still other old, unresolved collisions. This modification was suggested by Nels Olson and seems to *greatly* increase the speed of gperf for large keyfiles. Thanks Nels! * In a similar vein, inside the Gen_Perf::change routine the variable `perfect.fewest_collisions is no longer initialized with the length of the keyword list. Instead it starts out at 0 and is incremented by 1 every time change () is called. The rationale for this behavior is that there are times when a collision causes the number of duplicates (collisions) to increase by a large amount when it would presumably just have gone up by 1 if none of the asso_values were changed. That is, at the beginning of change(), you could initialize fewest_hits to 1+(previous value of fewest_hits) instead of to the number of keys. Thanks again, Nels. * Replaced alloca with new in the Gen_Perf::change function. This should eliminate some overhead at the expense of a little extra memory that is never reclaimed. * Renamed Gen_Perf::merge_sets to Gen_Perf::compute_disjoint_union to reflect the change in behavior. * Added the -e option so users can supply a string containing the characters used to separate keywords from their attributes. The default behavior is ",\n". * Removed the char *uniq_set field from LIST_NODE and modified uses of uniq_set in perfect.c and keylist.c. Due to changes to Gen_Perf::compute_disjoint_sets this field was no longer necessary, and its removal makes the program smaller and potentially faster. * Added lots of changes/fixes suggested by Nels Olson (umls.UUCP!olson@mis.ucsf.edu). In particular: 1. Changed Bool_Array so that it would dynamically create an array of unsigned shorts rather than ints if the LO_CAL symbol was defined during program compilation. This cuts the amount of dynamic memory usage in half, which is important for large keyfile input. 2. Added some additional debugging statements that print extra info to stderr when the -d option is enabled. 3. Fixed a really stupid bug in Key_List::print_switch A right paren was placed at the wrong location, which broke strlen (). 4. Fixed a subtle problem with printing case values when keylinks appear. The logic failed to account for the fact that there can be keylinks *and* regular node info also! 5. Changed the behavior of Key_List::read_keys so that it would honor -D unequivocally, i.e., it doesn't try to turn off dup handling if the user requests it, even if there are no immediate links in the keyfile input. 6. Modified the -j option so that -j 0 means `try random values when searching for a way to resolve collisions.' 7. Added a field `num_done' to the Gen_Perf struct. This is used to report information collected when trying to resolve hash collisions. 8. Modified the merge_sets algorithm to perform a disjoint union of two multisets. This ensures that subsequent processing in Gen_Perf::affect_prev doesn't waste time trying to change an associated value that is shared between two conflicting keywords. 9. Modified Gen_Perf::affects_prev so that it doesn't try random jump values unless the -j 0 option is enabled. 10. Fixed a silly bug in Gen_Perf::change. This problem caused gperf to seg fault when the -k* option was given and the keyfile file had long keywords. Sun Oct 29 00:18:55 1989 Doug Schmidt (schmidt at siam.ics.uci.edu) * Modified class-specific new operations for Read_Line and List_Node so they don't fail if SIZE is larger than twice the previous buffer size. Note we double buffer size everytime the previous buffer runs out, as a heuristic to reduce future calls to malloc.Sun Oct 22 13:49:43 1989 Doug Schmidt (schmidt at glacier.ics.uci.edu) * Updated gperf version number to 2.0. Send to Doug Lea for incorporation into the long-awaited `official' libg++ 1.36 release! * Thanks to Nels Olson a silly bug in Gen_Perf::change () was fixed. This problem caused gperf to seg fault when the -k* option was given and the keyfile file had long keywords. * Modified Key_List::print_hash_function so that it output max_hash_value + 1 (rather than just max_hash_value) for any associated value entries that don't correspond to keyword charset characters. This should speed up rejection of non-keyword strings a little in some cases.Sat Oct 21 19:28:36 1989 Doug Schmidt (schmidt at crimee.ics.uci.edu) * Fixed Key_List::print_hash_function so that it no longer output things like `return 0 + ...' Although this probably gets optimized away by even the worst C compilers there isn't any point tempting fate... ;-) * Fixed class List_Node's constructor so that it wouldn't a priori refuse to consider trying to hash keys whose length is less than the smallest user-specified key position. It turns out this is not a problem unless the user also specifies the -n (NOLENGTH) option, in which case such keys most likely don't have a prayer of being hashed correctly! * Changed the name of the generated lookup table from `Hash_Table' to `asso_value' to be consistent with the gperf paper.Tue Oct 17 14:19:48 1989 Doug Schmidt (schmidt at glacier.ics.uci.edu) * Added a flag GATHER_STATISTICS in the Makefile. If defined during compilation this turns on certain collection facilities that track the performance of gperf during its execution. In particular, I want to see how many collisions occur for the double hashing Hash_Table. * Added a safety check so that we don't screw up if the total number of `resets' of the Bool_Array exceeds MAX_INT. Since this number is around 2^31 it is unlikely that this would ever occur for most input, but why take the risk? * Changed the behavior for the -a (ANSI) option so that the generated prototypes use int rather than size_t for the LEN parameter. It was too ugly having to #include <stddef.h> all over the place...Mon Oct 16 11:00:35 1989 Doug Schmidt (schmidt at crimee.ics.uci.edu) * Continued to work on the gperf paper for the USENIX C++ conference. At some point this will be merged back into the gperf documentation...Sat Oct 14 20:29:43 1989 Doug Schmidt (schmidt at siam.ics.uci.edu) * Added a majorly neat hack to Bool_Array, suggested by rfg. The basic idea was to throw away the Ullman array technique.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -