📄 gperf.texinfo
字号:
The positions are separated by commas, @emph{e.g.}, @samp{-k 9,4,13,14};ranges may be used, @emph{e.g.}, @samp{-k 2-7}; and positions may occurin any order. Furthermore, the meta-character '*' causes the generatedhash function to consider @strong{all} character positions in each key,whereas '$' instructs the hash function to use the ``final character''of a key (this is the only way to use a character position greater than126, incidentally).For instance, the option @samp{-k 1,2,4,6-10,'$'} generates a hashfunction that considers positions 1,2,4,6,7,8,9,10, plus the lastcharacter in each key (which may differ for each key, obviously). Keyswith length less than the indicated key positions work properly, sinceselected key positions exceeding the key length are simply notreferenced in the hash function.@item -K @var{key name}By default, the program assumes the structure component identifier forthe keyword is ``name.'' This option allows an arbitrary choice ofidentifier for this component, although it still must occur as the firstfield in your supplied @code{struct}.@item -lCompare key lengths before trying a string comparison. This might cutdown on the number of string comparisons made during the lookup, sincekeys with different lengths are never compared via @code{strcmp}.However, using @samp{-l} might greatly increase the size of thegenerated C code if the lookup table range is large (which implies thatthe switch option @samp{-S} is not enabled), since the length tablecontains as many elements as there are entries in the lookup table.@item -L @var{generated language name}Instructs @code{gperf} to generate code in the language specified by theoption's argument. Languages handled are currently C++ and C. Thedefault is C.@item -nInstructs the generator not to include the length of a keyword whencomputing its hash value. This may save a few assembly instructions inthe generated lookup table.@item -N @var{lookup function name}Allows you to specify the name for the generated lookup function.Default name is `in_word_set.' This option permits completely automaticgeneration of perfect hash functions, especially when multiple generatedhash functions are used in the same application.Note that if -LC++ is enabled this option provides the name of thegenerated C++ class (since the name of the generated lookup memberfunction is then @code{operator ()}).@item -oReorders the keywords by sorting the keywords so that frequentlyoccuring key position set components appear first. A second reorderingpass follows so that keys with ``already determined values'' are placedtowards the front of the keylist. This may decrease the time requiredto generate a perfect hash function for many keyword sets, and alsoproduce more minimal perfect hash functions. The reason for this isthat the reordering helps prune the search time by handling inevitablecollisions early in the search process. On the other hand, if thenumber of keywords is @emph{very} large using @samp{-o} may@emph{increase} @code{gperf}'s execution time, since collisions will beginearlier and continue throughout the remainder of keyword processing.See Cichelli's paper from the January 1980 Communications of the ACM fordetails.@item -pChanges the return value of the generated function @code{in_word_set}from boolean (@emph{i.e.}, 0 or 1), to either type ``pointer touser-defined struct,'' (if the @samp{-t} option is enabled), or simplyto @code{char *}, if @samp{-t} is not enabled. This option is mostuseful when the @samp{-t} option (allowing user-defined structs) isused. For example, it is possible to automatically generate the GNU Creserved word lookup routine with the options @samp{-p} and @samp{-t}.@item -rUtilizes randomness to initialize the associated values table. Thisfrequently generates solutions faster than using deterministicinitialization (which starts all associated values at 0). Furthermore,using the randomization option generally increases the size of thetable. If @code{gperf} has difficultly with a certain keyword set try using@samp{-r} or @samp{-D}.@item -s @var{size-multiple}Affects the size of the generated hash table. The numeric argument forthis option indicates ``how many times larger or smaller'' the maximumassociated value range should be, in relationship to the number of keys.If the @var{size-multiple} is negative the maximum associated value iscalculated by @emph{dividing} it into the total number of keys. Forexample, a value of 3 means ``allow the maximum associated value to beabout 3 times larger than the number of input keys.'' Conversely, a value of -3 means ``allow the maximum associated value tobe about 3 times smaller than the number of input keys.'' Negativevalues are useful for limiting the overall size of the generated hashtable, though this usually increases the number of duplicate hashvalues.If `generate switch' option @samp{-S} is @emph{not} enabled, the maximumassociated value influences the static array table size, and a largertable should decrease the time required for an unsuccessful search, atthe expense of extra table space.The default value is 1, thus the default maximum associated value aboutthe same size as the number of keys (for efficiency, the maximumassociated value is always rounded up to a power of 2). The actualtable size may vary somewhat, since this technique is essentially aheuristic. In particular, setting this value too high slows down@code{gperf}'s runtime, since it must search through a much larger rangeof values. Judicious use of the @samp{-f} option helps alleviate thisoverhead, however.@item -S @var{total switch statements}Causes the generated C code to use a @code{switch} statement scheme,rather than an array lookup table. This can lead to a reduction in bothtime and space requirements for some keyfiles. The argument to thisoption determines how many @code{switch} statements are generated. Avalue of 1 generates 1 @code{switch} containing all the elements, avalue of 2 generates 2 tables with 1/2 the elements in each@code{switch}, etc. This is useful since many C compilers cannotcorrectly generate code for large @code{switch} statements. This optionwas inspired in part by Keith Bostic's original C program.@item -tAllows you to include a @code{struct} type declaration for generatedcode. Any text before a pair of consecutive %% is consider part of thetype declaration. Key words and additional fields may follow this, onegroup of fields per line. A set of examples for generating perfect hashtables and functions for Ada, C, and G++, Pascal, and Modula 2 and 3reserved words are distributed with this release.@item -TPrevents the transfer of the type declaration to the output file. Usethis option if the type is already defined elsewhere.@item -vPrints out the current version number.@item -Z @var{class name}Allow user to specify name of generated C++ class. Default name is@code{Perfect_Hash}.@end table@end itemize@node Bugs, Projects, Options, Top@chapter Known Bugs and Limitations with @code{gperf}The following are some limitations with the current release of@code{gperf}:@itemize @bullet@itemThe @code{gperf} utility is tuned to execute quickly, and works quicklyfor small to medium size data sets (around 1000 keywords). It isextremely useful for maintaining perfect hash functions for compilerkeyword sets. Several recent enhancements now enable @code{gperf} towork efficiently on much larger keyword sets (over 15,000 keywords).When processing large keyword sets it helps greatly to have over 8 megsof RAM.However, since @code{gperf} does not backtrack no guaranteed solutionoccurs on every run. On the other hand, it is usually easy to obtain asolution by varying the option parameters. In particular, try the@samp{-r} option, and also try changing the default arguments to the@samp{-s} and @samp{-j} options. To @emph{guarantee} a solution, usethe @samp{-D} and @samp{-S} options, although the final results are notlikely to be a @emph{perfect} hash function anymore! Finally, use the@samp{-f} option if you want @code{gperf} to generate the perfect hashfunction @emph{fast}, with less emphasis on making it minimal.@item The size of the generate static keyword array can get @emph{extremely}large if the input keyword file is large or if the keywords are quitesimilar. This tends to slow down the compilation of the generated Ccode, and @emph{greatly} inflates the object code size. If thissituation occurs, consider using the @samp{-S} option to reduce datasize, potentially increasing keyword recognition time a negligibleamount. Since many C compilers cannot correctly generated code forlarge switch statements it is important to qualify the @var{-S} optionwith an appropriate numerical argument that controls the number ofswitch statements generated.@item The maximum number of key positions selected for a given key has anarbitrary limit of 126. This restriction should be removed, and ifanyone considers this a problem write me and let me know so I can removethe constraint.@itemThe C++ source code only compiles correctly with GNU G++, version 1.36(and hopefully later versions). Porting to AT&T cfront would betedious, but possible (and desirable). There is also a K&R C versionavailable now. This should compile without change on most BSD systems,but may require a bit of work to run on SYSV, since @code{gperf} uses@var{alloca} in several places. Send mail to schmidt at ics.uci.edu forinformation.@end itemize@node Projects, Implementation, Bugs, Top@chapter Things Still Left to DoIt should be ``relatively'' easy to replace the current perfect hashfunction algorithm with a more exhaustive approach; the perfect hashmodule is essential independent from other program modules. Additionalworthwhile improvements include:@itemize @bullet@item Make the algorithm more robust. At present, the program halts with anerror diagnostic if it can't find a direct solution and the @samp{-D}option is not enabled. A more comprehensive, albeit computationallyexpensive, approach would employ backtracking or enable alternativeoptions and retry. It's not clear how helpful this would be, ingeneral, since most search sets are rather small in practice.@item Another useful extension involves modifying the program to generate``minimal'' perfect hash functions (under certain circumstances, thecurrent version can be rather extravagant in the generated table size).Again, this is mostly of theoretical interest, since a sparse tableoften produces faster lookups, and use of the @samp{-S} @code{switch}option can minimize the data size, at the expense of slightly longerlookups (note that the gcc compiler generally produces good code for@code{switch} statements, reducing the need for more complex schemes).@itemIn addition to improving the algorithm, it would also be useful togenerate a C++ class or Ada package as the code output, in addition tothe current C routines.@end itemize@node Implementation, Bibliography, Projects, Top@chapter Implementation Details of GNU @code{gperf}A paper describing the high-level description of the data structures andalgorithms used to implement @code{gperf} will soon be available. Thispaper is useful not only from a maintenance and enhancement perspective,but also because they demonstrate several clever and useful programmingtechniques, @emph{e.g.}, `Iteration Number' boolean arrays, doublehashing, a ``safe'' and efficient method for reading arbitrarily longinput from a file, and a provably optimal algorithm for simultaneouslydetermining both the minimum and maximum elements in a list.@page@node Bibliography, , Implementation, Top@chapter Bibliography[1] Chang, C.C.: @i{A Scheme for Constructing Ordered Minimal PerfectHashing Functions} Information Sciences 39(1986), 187-195. [2] Cichelli, Richard J. @i{Author's Response to ``On Cichelli's Minimal Perfect HashFunctions Method''} Communications of the ACM, 23, 12(December 1980), 729. [3] Cichelli, Richard J. @i{Minimal Perfect Hash Functions Made Simple}Communications of the ACM, 23, 1(January 1980), 17-19. [4] Cook, C. R. and Oldehoeft, R.R. @i{A Letter Oriented MinimalPerfect Hashing Function} SIGPLAN Notices, 17, 9(September 1982), 18-27.[5] Cormack, G. V. and Horspool, R. N. S. and Kaiserwerth, M.@i{Practical Perfect Hashing} Computer Journal, 28, 1(January 1985), 54-58. [6] Jaeschke, G. @i{Reciprocal Hashing: A Method for Generating MinimalPerfect Hashing Functions} Communications of the ACM, 24, 12(December1981), 829-833.[7] Jaeschke, G. and Osterburg, G. @i{On Cichelli's Minimal PerfectHash Functions Method} Communications of the ACM, 23, 12(December 1980),728-729.[8] Sager, Thomas J. @i{A Polynomial Time Generator for Minimal PerfectHash Functions} Communications of the ACM, 28, 5(December 1985), 523-532[9] Schmidt, Douglas C. @i{GPERF: A Perfect Hash Function Generator}Second USENIX C++ Conference Proceedings, April 1990.[10] Sebesta, R.W. and Taylor, M.A. @i{Minimal Perfect Hash Functionsfor Reserved Word Lists} SIGPLAN Notices, 20, 12(September 1985), 47-53.[11] Sprugnoli, R. @i{Perfect Hashing Functions: A Single ProbeRetrieving Method for Static Sets} Communications of the ACM, 2011(November 1977), 841-850.[12] Stallman, Richard M. @i{Using and Porting GNU CC} Free Software Foundation,1988.[13] Stroustrup, Bjarne @i{The C++ Programming Language.} Addison-Wesley, 1986.[14] Tiemann, Michael D. @i{User's Guide to GNU C++} Free SoftwareFoundation, 1989.@contents@bye
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -