📄 chapter6.html
字号:
struct key keytab[NKEYS];</pre>Since the structure <tt>keytab</tt> contains a constant set of names, it iseasiest to make it an external variable and initialize it once and for all whenit is defined. The structure initialization is analogous to earlier ones - thedefinition is followed by a list of initializers enclosed in braces:<pre> struct key { char *word; int count; } keytab[] = { "auto", 0, "break", 0, "case", 0, "char", 0, "const", 0, "continue", 0, "default", 0, /* ... */ "unsigned", 0, "void", 0, "volatile", 0, "while", 0 };</pre>The initializers are listed in pairs corresponding to the structure members.It would be more precise to enclose the initializers for each "row" orstructure in braces, as in<pre> { "auto", 0 }, { "break", 0 }, { "case", 0 }, ...</pre>but inner braces are not necessary when the initializers are simple variablesor character strings, and when all are present. As usual, the number of entriesin the array <tt>keytab</tt> will be computed if the initializers are presentand the <tt>[]</tt> is left empty.<p>The keyword counting program begins with the definition of <tt>keytab</tt>.The main routine reads the input by repeatedly calling a function<tt>getword</tt> that fetches one word at a time. Each word is looked up in<tt>keytab</tt> with a version of the binary search function that we wrote in<a href="chapter3.html">Chapter 3</a>. The list of keywords must be sorted inincreasing order in the table.<pre> #include <stdio.h> #include <ctype.h> #include <string.h> #define MAXWORD 100 int getword(char *, int); int binsearch(char *, struct key *, int); /* count C keywords */ main() { int n; char word[MAXWORD]; while (getword(word, MAXWORD) != EOF) if (isalpha(word[0])) if ((n = binsearch(word, keytab, NKEYS)) >= 0) keytab[n].count++; for (n = 0; n < NKEYS; n++) if (keytab[n].count > 0) printf("%4d %s\n", keytab[n].count, keytab[n].word); return 0; } /* binsearch: find word in tab[0]...tab[n-1] */ int binsearch(char *word, struct key tab[], int n) { int cond; int low, high, mid; low = 0; high = n - 1; while (low <= high) { mid = (low+high) / 2; if ((cond = strcmp(word, tab[mid].word)) < 0) high = mid - 1; else if (cond > 0) low = mid + 1; else return mid; } return -1; }</pre>We will show the function <tt>getword</tt> in a moment; for now it suffices tosay that each call to <tt>getword</tt> finds a word, which is copied into thearray named as its first argument.<p>The quantity <tt>NKEYS</tt> is the number of keywords in <tt>keytab</tt>. Althoughwe could count this by hand, it's a lot easier and safer to do it by machine,especially if the list is subject to change. One possibility would be toterminate the list of initializers with a null pointer, then loop along<tt>keytab</tt> until the end is found.<p>But this is more than is needed, since the size of the array is completelydetermined at compile time. The size of the array is the size of one entrytimes the number of entries, so the number of entries is just<p> <em>size of</em> <tt>keytab / </tt><em>size of</em> <tt>struct key</tt><p>C provides a compile-time unary operator called <tt>sizeof</tt> that can beused to compute the size of any object. The expressions<pre> sizeof <em>object</em></pre>and<pre> sizeof (<em>type name</em>)</pre>yield an integer equal to the size of the specified object or type in bytes.(Strictly, <tt>sizeof</tt> produces an unsigned integer value whose type,<tt>size_t</tt>, is defined in the header <tt><stddef.h></tt>.) An object can be avariable or array or structure. A type name can be the name of a basic typelike <tt>int</tt> or <tt>double</tt>, or a derived type like a structure or apointer.<p>In our case, the number of keywords is the size of the array divided by thesize of one element. This computation is used in a <tt>#define</tt> statementto set the value of <tt>NKEYS</tt>:<pre> #define NKEYS (sizeof keytab / sizeof(struct key))</pre>Another way to write this is to divide the array size by the size of a specificelement:<pre> #define NKEYS (sizeof keytab / sizeof(keytab[0]))</pre>This has the advantage that it does not need to be changed if the type changes.<p>A <tt>sizeof</tt> can not be used in a <tt>#if</tt> line, because the preprocessordoes not parse type names. But the expression in the <tt>#define</tt> is notevaluated by the preprocessor, so the code here is legal.<p>Now for the function <tt>getword</tt>. We have written a more general<tt>getword</tt> than is necessary for this program, but it is not complicated.<tt>getword</tt> fetches the next ``word'' from the input, where a word is eithera string of letters and digits beginning with a letter, or a single non-whitespace character. The function value is the first character of the word, or<tt>EOF</tt> for end of file, or the character itself if it is not alphabetic.<pre> /* getword: get next word or character from input */ int getword(char *word, int lim) { int c, getch(void); void ungetch(int); char *w = word; while (isspace(c = getch())) ; if (c != EOF) *w++ = c; if (!isalpha(c)) { *w = '\0'; return c; } for ( ; --lim > 0; w++) if (!isalnum(*w = getch())) { ungetch(*w); break; } *w = '\0'; return word[0]; }</pre><tt>getword</tt> uses the <tt>getch</tt> and <tt>ungetch</tt> that we wrotein <a href="chapter4.html">Chapter 4</a>. When the collection of analphanumeric token stops, <tt>getword</tt> has gone one character too far.The call to <tt>ungetch</tt> pushes that character back on the input for thenext call. <tt>getword</tt> also uses <tt>isspace</tt> to skip whitespace,<tt>isalpha</tt> to identify letters, and <tt>isalnum</tt> to identifyletters and digits; all are from the standard header<tt><ctype.h></tt>.<p><strong>Exercise 6-1.</strong> Our version of <tt>getword</tt> does notproperly handle underscores, string constants, comments, or preprocessorcontrol lines. Write a better version.<h2><a name="s6.4">6.4 Pointers to Structures</a></h2>To illustrate some of the considerations involved with pointers to and arraysof structures, let us write the keyword-counting program again, this timeusing pointers instead of array indices.<p>The external declaration of <tt>keytab</tt> need not change, but <tt>main</tt>and <tt>binsearch</tt> do need modification.<pre> #include <stdio.h> #include <ctype.h> #include <string.h> #define MAXWORD 100 int getword(char *, int); struct key *binsearch(char *, struct key *, int); /* count C keywords; pointer version */ main() { char word[MAXWORD]; struct key *p; while (getword(word, MAXWORD) != EOF) if (isalpha(word[0])) if ((p=binsearch(word, keytab, NKEYS)) != NULL) p->count++; for (p = keytab; p < keytab + NKEYS; p++) if (p->count > 0) printf("%4d %s\n", p->count, p->word); return 0; } /* binsearch: find word in tab[0]...tab[n-1] */ struct key *binsearch(char *word, struck key *tab, int n) { int cond; struct key *low = &tab[0]; struct key *high = &tab[n]; struct key *mid; while (low < high) { mid = low + (high-low) / 2; if ((cond = strcmp(word, mid->word)) < 0) high = mid; else if (cond > 0) low = mid + 1; else return mid; } return NULL; }</pre>There are several things worthy of note here. First, the declaration of<tt>binsearch</tt> must indicate that it returns a pointer to <tt>struct key</tt>instead of an integer; this is declared both in the function prototype andin <tt>binsearch</tt>. If <tt>binsearch</tt> finds the word, it returns a pointerto it; if it fails, it returns <tt>NULL</tt>.<p>Second, the elements of <tt>keytab</tt> are now accessed by pointers. Thisrequires significant changes in <tt>binsearch</tt>.<p>The initializers for <tt>low</tt> and <tt>high</tt> are now pointers to thebeginning and just past the end of the table.<p>The computation of the middle element can no longer be simply<pre> mid = (low+high) / 2 /* WRONG */</pre>because the addition of pointers is illegal. Subtraction is legal, however,so <tt>high-low</tt> is the number of elements, and thus<pre> mid = low + (high-low) / 2</pre>sets <tt>mid</tt> to the element halfway between <tt>low</tt> and <tt>high</tt>.<p>The most important change is to adjust the algorithm to make sure that itdoes not generate an illegal pointer or attempt to access an element outsidethe array. The problem is that <tt>&tab[-1]</tt> and <tt>&tab[n]</tt>are both outside the limits of the array <tt>tab</tt>. The former is strictlyillegal, and it is illegal to dereference the latter. The language definitiondoes guarantee, however, that pointer arithmetic that involves the firstelement beyond the end of an array (that is, <tt>&tab[n]</tt>) will workcorrectly.<p>In <tt>main</tt> we wrote<pre> for (p = keytab; p < keytab + NKEYS; p++)</pre>If <tt>p</tt> is a pointer to a structure, arithmetic on <tt>p</tt> takes intoaccount the size of the structure, so <tt>p++</tt> increments <tt>p</tt> by thecorrect amount to get the next element of the array of structures, and thetest stops the loop at the right time.<p>Don't assume, however, that the size of a structure is the sum of the sizesof its members. Because of alignment requirements for different objects,there may be unnamed ``holes'' in a structure. Thus, for instance, if a<tt>char</tt> is one byte and an <tt>int</tt> four bytes, the structure<pre> struct { char c; int i; };</pre>might well require eight bytes, not five. The <tt>sizeof</tt> operator returnsthe proper value.<p>Finally, an aside on program format: when a function returns a complicatedtype like a structure pointer, as in<pre> struct key *binsearch(char *word, struct key *tab, int n)</pre>the function name can be hard to see, and to find with a text editor.Accordingly an alternate style is sometimes used:<pre> struct key * binsearch(char *word, struct key *tab, int n)</pre>This is a matter of personal taste; pick the form you like and hold to it.<h2><a name="s6.5">6.5 Self-referential Structures</a></h2>Suppose we want to handle the more general problem of counting theoccurrences of <em>all</em> the words in some input. Since the list of wordsisn't known in advance, we can't conveniently sort it and use a binarysearch. Yet we can't do a linear search for each word as it arrives, to seeif it's already been seen; the program would take too long. (More precisely,
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -