⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 atom.html.svn-base

📁 纯C数据结构
💻 SVN-BASE
📖 第 1 页 / 共 2 页
字号:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN" "http://www.w3.org/TR/REC-html40/strict.dtd"><html><head><meta NAME="GENERATOR" CONTENT="Microsoft FrontPage 5.0"><link REV="made" HREF="http://drh.home.dyndns.org"><link REL="STYLESHEET" HREF="cii.css"><title>Chapter 3: Atoms</title></head><body><p CLASS="Colophon">This page is an HTML rendition of Chapter 3 of <a href="http://www.cs.princeton.edu/software/cii/"><cite>C Interfaces and Implementations</cite></a>and thus only approximates the appearance of the printed chapter. This page uses cascadingstyle sheets (CSS) and thus must be viewed using a CSS-aware browser, such as NetscapeNavigator 4 or Microsoft Internet Explorer 3 or better. The <a href="http://ciibook.webhop.net/pdf/atom.pdf">Acrobat PDFrendition of this chapter</a> is more faithful to the printed version.</p><table style="text-align: center;" cellpadding="0" cellspacing="0">  <tr>    <td align="center" style="font-size: 500%; font-style: bold">3</td>  </tr>  <tr>    <td align="center"><img SRC="../images/dot_black.gif" WIDTH="50" HEIGHT="4" ALT="======"><br>    <img SRC="../images/dot_clear.gif" WIDTH="1" HEIGHT="2" ALT=" "><br>    <img SRC="../images/dot_black.gif" WIDTH="50" HEIGHT="2" ALT=" "></td>  </tr>  <tr>    <td align="center"><img SRC="../images/dot_clear.gif" WIDTH="1" HEIGHT="12" ALT=" "></td>  </tr>  <tr>    <td align="center" style="font-size: 360%; font-style: bold; font-variant: small-caps">Atoms</td>  </tr></table><p CLASS="FirstBody">An <em CLASS="Emphasis">atom</em> is a pointer to a unique, immutablesequence of zero or more arbitrary bytes. Most atoms are pointers to null-terminatedstrings, but a pointer to any sequence of bytes can be an atom. There is only a singleoccurrence of any atom, which is why it's called an atom. Two atoms are identical if theypoint to the same location. Comparing two byte sequences for equality by simply comparingpointers is one of the advantages of atoms. Another advantage is that using atoms savesspace because there's only one occurrence of each sequence.</p><p>Atoms are often used as keys in data structures that are indexed by sequences ofarbitrary bytes instead of by integers. The tables and sets described in Chapters 8 and 9are examples.</p><h2 CLASS="Section">3.1 Interface</h2><p CLASS="Noindent">The <em CLASS="Code">Atom</em> interface is simple:</p><pre CLASS="Chunk"><em CLASS="Fragment">&lt;atom.h&gt;=</em>#ifndef ATOM_INCLUDED#define ATOM_INCLUDEDextern int Atom_length(const char *str);extern const char *Atom_new   (const char *str, int len);extern const char *Atom_string(const char *str);extern const char *Atom_int   (long n);#endif</pre><p CLASS="Noindent"><em CLASS="Code">Atom_new</em> accepts a pointer to a sequence ofbytes and the number of bytes in that sequence. It adds a copy of the sequence to thetable of atoms, if necessary, and returns the atom, which is a pointer to the copy of thesequence in the atom table. <em CLASS="Code">Atom_new</em> never returns the null pointer.Once an atom is created, it exists for the duration of the client's execution. An atom isalways terminated with a null character, which <em CLASS="Code">Atom_new</em> adds whennecessary.</p><p><em CLASS="Code">Atom_string</em> is similar to <em CLASS="Code">Atom_new</em> ; itcaters to the common use of character strings as atoms. It accepts a null-terminatedstring, adds a copy of that string to the atom table, if necessary, and returns the atom. <em CLASS="Code">Atom_int</em> returns the atom for the string representation of the longinteger <em CLASS="Code">n</em>&#151;another common usage. Finally, <em CLASS="Code">Atom_length</em>returns the length of its atom argument.</p><p>It is a checked runtime error to pass a null pointer to any function in this interface,to pass a negative <em CLASS="Code">len</em> to <em CLASS="Code">Atom_new</em>, or to passa pointer that is not an atom to <em CLASS="Code">Atom_length</em>. It is an <em CLASS="Emphasis">unchecked</em> runtime error to modify the bytes pointed to by an atom. <em CLASS="Code">Atom_length</em> can take time to execute proportional to the number ofatoms. <em CLASS="Code">Atom_new</em>, <em CLASS="Code">Atom_string</em>, and <em CLASS="Code">Atom_int</em> can each raise the exception <em CLASS="Code">Mem_Failed</em>.</p><h2 CLASS="Section">3.2 Implementation</h2><p CLASS="Noindent">The implementation of <em CLASS="Code">Atom</em> maintains the atomtable. <em CLASS="Code">Atom_new</em>, <em CLASS="Code">Atom_string</em>, and <em CLASS="Code">Atom_int</em> search the atom table and possibly add new elements to it, and <em CLASS="Code">Atom_length</em> just searches it.</p><pre CLASS="Chunk"><em CLASS="Fragment">&lt;atom.c&gt;=</em><em CLASS="Fragment">&lt;includes 34&gt;</em><em CLASS="Fragment">&lt;macros 37&gt;</em><em CLASS="Fragment">&lt;data 36&gt;</em><em CLASS="Fragment">&lt;functions 35&gt;</em></pre><pre CLASS="Chunk"><em CLASS="Fragment">&lt;includes 34&gt;=</em>#include &quot;atom.h&quot;</pre><p><em CLASS="Code">Atom_string</em> and <em CLASS="Code">Atom_int</em> can be implementedwithout knowing the representation details of the atom table. <em CLASS="Code">Atom_string</em>,for example, just calls <em CLASS="Code">Atom_new</em>:</p><pre CLASS="Chunk"><em CLASS="Fragment">&lt;functions 35&gt;=</em>const char *Atom_string(const char *str) {	assert(str);	return Atom_new(str, strlen(str));}</pre><pre CLASS="Chunk"><em CLASS="Fragment">&lt;includes 34&gt;+=</em>#include &lt;string.h&gt;#include &quot;assert.h&quot;</pre><p><em CLASS="Code">Atom_int</em> first converts its argument to a string, then calls <em CLASS="Code">Atom_new</em>:</p><pre CLASS="Chunk"><em CLASS="Fragment">&lt;functions 35&gt;+=</em>const char *Atom_int(long n) {	char str[43];	char *s = str + sizeof str;	unsigned long m;	if (n == LONG_MIN)		m = LONG_MAX + 1UL;	else if (n &lt; 0)		m = -n;	else		m = n;	do		*--s = m%10 + '0';	while ((m /= 10) &gt; 0);	if (n &lt; 0)		*--s = '-';	return Atom_new(s, (str + sizeof str) - s);}</pre><pre CLASS="Chunk"><em CLASS="Fragment">&lt;includes 34&gt;+=</em>#include &lt;limits.h&gt;</pre><p><em CLASS="Code">Atom_int</em> must cope with the asymmetrical range oftwo's-complement numbers and with the ambiguities of C's division and modulus operators.Unsigned division and modulus <em CLASS="Emphasis">are</em> well defined, so <em CLASS="Code">Atom_int</em> can avoid the ambiguities of the signed operators by usingunsigned arithmetic.</p><p>The absolute value of the most negative signed long integer cannot be represented,because there is one more negative number than positive number in two's-complementsystems. <em CLASS="Code">Atom_new</em> thus starts by testing for this single anomalybefore assigning the absolute value of its argument to the unsigned long integer <em CLASS="Code">m</em>. The value of <em CLASS="Code">LONG_MAX</em> resides in the standardheader <em CLASS="Code">limits.h</em>.</p><p>The loop forms the decimal string representation of <em CLASS="Code">m</em> from rightto left; it computes the rightmost digit, divides <em CLASS="Code">m</em> by 10, andcontinues until <em CLASS="Code">m</em> is zero. As each digit is computed, it's stored at<em CLASS="Code">--s</em>, which marches <em CLASS="Code">s</em> backward in <em CLASS="Code">str</em> . If <em CLASS="Code">n</em> is negative, a minus sign is stored atthe beginning of the string.</p><p>When the conversion is done, <em CLASS="Code">s</em> points to the desired string, andthis string has <em CLASS="Code">&amp;str[43]-s</em> characters. <em CLASS="Code">str</em>has 43 characters, which is enough to hold the decimal representation of any integer onany conceivable machine. Suppose, for example, that longs are 128 bits. The stringrepresentation of any 128-bit signed integer in octal&#151;base 8&#151;fits in128/3&nbsp;+&nbsp;1 = 43 characters. The decimal representation can take no more digitsthan the octal representation, so 43 characters are enough.</p><p>The 43 in the definition of <em CLASS="Code">str</em> is an example of a &quot;magicnumber,&quot; and it's usually better style to define a symbolic name for such values toensure that the same value is used everywhere. Here, however, the value appears only once,and <em CLASS="Code">sizeof</em> is used whenever the value is used. Defining a symbolicname might make the code easier to read, but it will also make the code longer and clutterthe name space. In this book, a symbolic name is defined only when the value appears morethan once, or when it is part of an interface. The length of the hash table <em CLASS="Code">buckets</em> below&#151;2,048&#151;is another example of this convention.</p><p>A hash table is the obvious data structure for the atom table. The hash table is anarray of pointers to lists of entries, each of which holds one atom:</p><pre CLASS="Chunk"><em CLASS="Fragment">&lt;data 36&gt;=</em>static struct atom {	struct atom *link;	int len;	char *str;} *buckets[2048];</pre><p CLASS="Noindent">The linked list emanating from <em CLASS="Code">buckets[i]</em> holdsthose atoms that hash to <em CLASS="Code">i</em>. An entry's <em CLASS="Code">link</em>field points to the next entry on the list, the <em CLASS="Code">len</em> field holds thelength of the sequence, and the <em CLASS="Code">str</em> fields points to the sequenceitself. For example, on a little endian computer with 32-bit words and 8-bit characters, <em CLASS="Code">Atom_string(&quot;an atom&quot;)</em> allocates the <em CLASS="Code">struct</em><em CLASS="Code">atom</em> shown in Figure 3.1, where the underscore character (<em CLASS="Code">_</em>) denotes a space. Each entry is just large enough to hold itssequence. Figure 3.2 shows the overall structure of the hash table.</p><table style="text-align: center;">  <tr>    <td align="center"><img SRC="atom-6.gif" ALT="Little endian layout" WIDTH="161" HEIGHT="108"></td>  </tr>  <tr>    <td align="center"><strong>Figure 3.1</strong> Little endian layout of a <em CLASS="Code">struct    atom</em> for <em CLASS="Code">&quot; an atom&quot;</em></td>  </tr></table><table style="text-align: center;">  <tr>    <td align="center"><img SRC="atom-7.gif" ALT="Hash table structure" WIDTH="310" HEIGHT="340"></td>  </tr>  <tr>    <td align="center"><strong>Figure 3.2</strong> Hash table structure</td>  </tr></table><p><em CLASS="Code">Atom_new</em> computes a hash number for the sequence given by <em CLASS="Code">str[0</em>..<em CLASS="Code">len-1]</em> (or the empty sequence, if <em CLASS="Code">len</em> is zero), reduces this hash number modulo the number of elements in <em CLASS="Code">buckets</em>, and searches the list pointed to by that element of <em CLASS="Code">buckets</em>. If it finds that <em CLASS="Code">str[0</em>..<em CLASS="Code">len-1]</em>is already in the table, it simply returns the atom:</p><pre CLASS="Chunk"><em CLASS="Fragment">&lt;functions 35&gt;+=</em>const char *Atom_new(const char *str, int len) {	unsigned long h;	int i;	struct atom *p;	assert(str);	assert(len &gt;= 0);	<em CLASS="Fragment">&lt;</em><em CLASS="Code">h = </em><em CLASS="Fragment">hash </em><em CLASS="Code">str[0</em>..<em CLASS="Code">len-1]</em><em CLASS="Fragment"> 39&gt;</em>	h &amp;= NELEMS(buckets)-1;	for (p = buckets[h]; p; p = p-&gt;link)		if (len == p-&gt;len) {			for (i = 0; i &lt; len &amp;&amp; p-&gt;str[i] == str[i]; )				i++;			if (i == len)				return p-&gt;str;		}

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -