📄 readme
字号:
sdbm - Substitute DBM or Berkeley ndbm for Every UN*X[1] Made Simple Ozan (oz) Yigit The Guild of PD Software Toolmakers Toronto - Canada oz@nexus.yorku.caImplementation is the sincerest form of flattery. - L. PeterDeutschA The Clone of the ndbm library The sources accompanying this notice - sdbm - consti-tute the first public release (Dec. 1990) of a completeclone of the Berkeley UN*X ndbm library. The sdbm library ismeant to clone the proven functionality of ndbm as closelyas possible, including a few improvements. It is practical,easy to understand, and compatible. The sdbm library is notderived from any licensed, proprietary or copyrightedsoftware. The sdbm implementation is based on a 1978 algorithm[Lar78] by P.-A. (Paul) Larson known as ``Dynamic Hashing''.In the course of searching for a substitute for ndbm, I pro-totyped three different external-hashing algorithms [Lar78,Fag79, Lit80] and ultimately chose Larson's algorithm as abasis of the sdbm implementation. The Bell Labs dbm (andtherefore ndbm) is based on an algorithm invented by KenThompson, [Tho90, Tor87] and predates Larson's work. The sdbm programming interface is totally compatiblewith ndbm and includes a slight improvement in database ini-tialization. It is also expected to be binary-compatibleunder most UN*X versions that support the ndbm library. The sdbm implementation shares the shortcomings of thendbm library, as a side effect of various simplifications tothe original Larson algorithm. It does produce holes in thepage file as it writes pages past the end of file. (Larson'spaper include a clever solution to this problem that is aresult of using the hash value directly as a block address.)On the other hand, extensive tests seem to indicate thatsdbm creates fewer holes in general, and the resulting page-files are smaller. The sdbm implementation is also fasterthan ndbm in database creation. Unlike the ndbm, the sdbm_________________________ [1] UN*X is not a trademark of any (dis)organization. - 2 -store operation will not ``wander away'' trying to split itsdata pages to insert a datum that cannot (due to elaborateworst-case situations) be inserted. (It will fail after apre-defined number of attempts.)Important Compatibility Warning The sdbm and ndbm libraries cannot share databases: onecannot read the (dir/pag) database created by the other.This is due to the differences between the ndbm and sdbmalgorithms[2], and the hash functions used. It is easy toconvert between the dbm/ndbm databases and sdbm by ignoringthe index completely: see dbd, dbu etc.Notice of Intellectual PropertyThe entire sdbm library package, as authored by me, Ozan S.Yigit, is hereby placed in the public domain. As such, theauthor is not responsible for the consequences of use ofthis software, no matter how awful, even if they arise fromdefects in it. There is no expressed or implied warranty forthe sdbm library. Since the sdbm library package is in the public domain,this original release or any additional public-domainreleases of the modified original cannot possibly (by defin-ition) be withheld from you. Also by definition, You (singu-lar) have all the rights to this code (including the rightto sell without permission, the right to hoard[3] and theright to do other icky things as you see fit) but thoserights are also granted to everyone else. Please note that all previous distributions of thissoftware contained a copyright (which is now dropped) toprotect its origins and its current public domain statusagainst any possible claims and/or challenges.Acknowledgments Many people have been very helpful and supportive. Apartial list would necessarily include Rayan Zacherissen(who contributed the man page, and also hacked a MMAP_________________________ [2] Torek's discussion [Tor87] indicates thatdbm/ndbm implementations use the hash value to traversethe radix trie differently than sdbm and as a result,the page indexes are generated in different order. Formore information, send e-mail to the author. [3] You cannot really hoard something that is avail-able to the public at large, but try if it makes youfeel any better. - 3 -version of sdbm), Arnold Robbins, Chris Lewis, Bill David-sen, Henry Spencer, Geoff Collyer, Rich Salz (who got mestarted in the first place), Johannes Ruschein (who did theminix port) and David Tilbrook. I thank you all.Distribution Manifest and NotesThis distribution of sdbm includes (at least) the following: CHANGES change log README this file. biblio a small bibliography on external hashing dba.c a crude (n/s)dbm page file analyzer dbd.c a crude (n/s)dbm page file dumper (for conversion) dbe.1 man page for dbe.c dbe.c Janick's database editor dbm.c a dbm library emulation wrapper for ndbm/sdbm dbm.h header file for the above dbu.c a crude db management utility hash.c hashing function makefile guess. pair.c page-level routines (posted earlier) pair.h header file for the above readme.ms troff source for the README file sdbm.3 man page sdbm.c the real thing sdbm.h header file for the above tune.h place for tuning & portability thingies util.c miscellaneous dbu is a simple database manipulation program[4] thattries to look like Bell Labs' cbt utility. It is currentlyincomplete in functionality. I use dbu to test out the rou-tines: it takes (from stdin) tab separated key/value pairsfor commands like build or insert or takes keys for commandslike delete or look. dbu <build|creat|look|insert|cat|delete> dbmfile dba is a crude analyzer of dbm/sdbm/ndbm page files. Itscans the entire page file, reporting page level statistics,and totals at the end. dbd is a crude dump program for dbm/ndbm/sdbm data-bases. It ignores the bitmap, and dumps the data pages insequence. It can be used to create input for the dbu util-ity. Note that dbd will skip any NULLs in the key and datafields, thus is unsuitable to convert some peculiar_________________________ [4] The dbd, dba, dbu utilities are quick hacks andare not fit for production use. They were developedlate one night, just to test out sdbm, and convert somedatabases. - 4 -databases that insist in including the terminating null. I have also included a copy of the dbe (ndbm DataBaseEditor) by Janick Bergeron [janick@bnr.ca] for your pleas-ure. You may find it more useful than the little dbu util-ity. dbm.[ch] is a dbm library emulation on top of ndbm (andhence suitable for sdbm). Written by Robert Elz. The sdbm library has been around in beta test for quitea long time, and from whatever little feedback I received(maybe no news is good news), I believe it has been func-tioning without any significant problems. I would, ofcourse, appreciate all fixes and/or improvements. Portabil-ity enhancements would especially be useful.Implementation Issues Hash functions: The algorithm behind sdbm implementa-tion needs a good bit-scrambling hash function to be effec-tive. I ran into a set of constants for a simple hash func-tion that seem to help sdbm perform better than ndbm forvarious inputs: /* * polynomial conversion ignoring overflows * 65599 nice. 65587 even better. */ long dbm_hash(char *str, int len) { register unsigned long n = 0; while (len--) n = n * 65599 + *str++; return n; } There may be better hash functions for the purposes ofdynamic hashing. Try your favorite, and check the pagefile.If it contains too many pages with too many holes, (in rela-tion to this one for example) or if sdbm simply stops work-ing (fails after SPLTMAX attempts to split) when you feedyour NEWS history file to it, you probably do not have agood hashing function. If you do better (for differenttypes of input), I would like to know about the function youuse. Block sizes: It seems (from various tests on a fewmachines) that a page file block size PBLKSIZ of 1024 is byfar the best for performance, but this also happens to limitthe size of a key/value pair. Depending on your needs, youmay wish to increase the page size, and also adjust PAIRMAX(the maximum size of a key/value pair allowed: should always - 5 -be at least three words smaller than PBLKSIZ.) accordingly.The system-wide version of the library should probably beconfigured with 1024 (distribution default), as this appearsto be sufficient for most common uses of sdbm.Portability This package has been tested in many different UN*Xeseven including minix, and appears to be reasonably portable.This does not mean it will port easily to non-UN*X systems.Notes and Miscellaneous The sdbm is not a very complicated package, at leastnot after you familiarize yourself with the literature onexternal hashing. There are other interesting algorithms inexistence that ensure (approximately) single-read access toa data value associated with any key. These are directory-less schemes such as linear hashing [Lit80] (+ Larson varia-tions), spiral storage [Mar79] or directory schemes such asextensible hashing [Fag79] by Fagin et al. I do hope thesesources provide a reasonable playground for experimentationwith other algorithms. See the June 1988 issue of ACM Com-puting Surveys [Enb88] for an excellent overview of thefield.References[Lar78] P.-A. Larson, ``Dynamic Hashing'', BIT, vol. 18, pp. 184-201, 1978.[Tho90] Ken Thompson, private communication, Nov. 1990[Lit80] W. Litwin, `` Linear Hashing: A new tool for file and table addressing'', Proceedings of the 6th Conference on Very Large Dabatases (Montreal), pp. 212-223, Very Large Database Foundation, Saratoga, Calif., 1980.[Fag79] R. Fagin, J. Nievergelt, N. Pippinger, and H. R. Strong, ``Extendible Hashing - A Fast Access Method for Dynamic Files'', ACM Trans. Database Syst., vol. 4, no.3, pp. 315-344, Sept. 1979.[Wal84] Rich Wales, ``Discussion of "dbm" data base system'', USENET newsgroup unix.wizards, Jan. 1984.[Tor87] Chris Torek, ``Re: dbm.a and ndbm.a archives'', - 6 - USENET newsgroup comp.unix, 1987.[Mar79] G. N. Martin, ``Spiral Storage: Incrementally Augment- able Hash Addressed Storage'', Technical Report #27, University of Varwick, Coventry, U.K., 1979.[Enb88] R. J. Enbody and H. C. Du, ``Dynamic Hashing Schemes'',ACM Computing Surveys, vol. 20, no. 2, pp. 85-113, June 1988.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -