📄 1cleanup.html
字号:
<HTML>
<HEAD>
<TITLE>The Cleanup</TITLE>
<meta name="description" content="Code cleanup">
<meta name="keywords" content="programming, design, implementation, class">
<link rel="stylesheet" href="../rs.css">
</HEAD>
<BODY background="../images/margin.gif" bgcolor="#ffffe0">
<!-- Main Table -->
<table cellpadding="6">
<tr>
<td width="78">
<td>
<h3>Code Review 1: The Cleanup</h3>
</td></tr>
<tr>
<td class=margin valign=top>
<a href="source/calc0.zip">
<img src="../images/brace.gif" width=16 height=16 border=1 alt="Download!"><br>source</a>
</td>
<td><P>As much as we would like a program to follow the design, there is always an element of ad libbing when filling out the details. We create auxiliary classes and functions, add methods and data members to existing classes. This process reminds me of the growth of a yeast culture. As each cell keeps growing, it starts to bloat and bud. Eventually new buds separate and start a life of their own.
<P>In software, the growing and (especially) the bloating happens quite naturally. It抯 the budding and the separation that requires conscious effort on the part of the programmer. Code reviews help this process greatly. We抣l see examples of overgrown files, methods and classes throughout our code reviews. Our response will be to create new files, new methods and, especially, new classes.
<P>We抣l be always on the lookout for potential code reuse.
<P>When reviewing code, we will also figure out which parts require more thorough explanation and add appropriate comments.
<h3>Improving Code Grouping</h3>
<P>The obvious place to start the code review is the file <span class="file">parse.cpp</span>. It contains the <var>main</var> procedure. And that抯 the first jarring feature. <var>Main</var> does not belong in the implementation file of the <var>Parser</var> class. I put it there because its purpose was mainly to test the <var>Parser</var>. Now it抯 time to separate it away from the <var>Parser</var> and create a new file <span class="file">main.cpp</span>.
<P>Look also at the file <span class="file">symtab.h</span>. I definitely see some potential for code reuse there. Such classes as <var>Link</var> and <var>List</var> are very general and can be used in many applications. This particular list stores direct <I>values</I> rather than pointers to objects. So let抯 create a separate header file for it and call it <span class="file">vlist.h</span>. The implementation will go to another file <span class="file">vlist.cpp</span>.
<P>Notice that we haven抰 tried to make any changes to the classes <var>Link</var> and <var>List</var> to make them more general. This is a useful rule of thumb:
<!-- Definition -->
<p>
<table border=4 cellpadding=10><tr>
<td bgcolor="#ffffff" class=defTable>
Don抰 try to generalize until there's a need for it. </td></tr>
</table>
<!-- End Definition -->
<p>Over-generalization is the bane of code reuse.
<P>Finally, I split the files <span class="file">symtab.h</span> and <span class="file">symtab.cpp</span> into three groups of files corresponding to the symbol table, the function table and the hash table.
<P class=caption>Table: The correspondence between files and classes, functions and global objects.</p>
<TABLE CELLSPACING=0 BORDER=1 CELLPADDING=7>
<TR><td>
<B><I><span class="file">main.cpp</span></i></b></TD>
<td>
<var>main ()</var></TD>
</TR>
<TR><td>
<B><I><span class="file">parser.h parser.cpp</span></i></b></TD>
<td>
<var>Parser</var></TD>
</TR>
<TR><td>
<B><I><span class="file">scan.h scan.cpp</span></i></b></TD>
<td>
<var>EToken, Scanner</var></TD>
</TR>
<TR><td>
<B><I><span class="file">funtab.h funtab.cpp</span></i></b></TD>
<td>
<var>FunctionEntry, funArr, FunctionTable</var></TD>
</TR>
<TR><td>
<B><I><span class="file">symtab.h symtab.cpp</span></i></b></TD>
<td>
<var>SymbolTable</var></TD>
</TR>
<TR><td>
<B><I><span class="file">vlist.h vlist.cpp</span></i></b></TD>
<td>
<var>Link, List</var></TD>
</TR>
<TR><td>
<B><I><span class="file">htab.h htab.cpp</span></i></b></TD>
<td>
<var>HTable</var></TD>
</TR>
<TR><td>
<B><I><span class="file">store.h store.cpp</span></i></b></TD>
<td>
<var>Store</var></TD>
</TR>
<TR><td>
<B><I><span class="file">tree.h tree.cpp</span></i></b></TD>
<td>
<var>Node</var> <i>and its children</i></TD>
</TR>
</TABLE>
<h3>Decoupling the Output</h3>
<P>Let抯 look again at the main loop of our calculator.
<!--code --><table width="100%" cellspacing=10> <tr> <td class=codetable>
<pre>
cout << "> "; // prompt
cin.getline (buf, maxBuf);
Scanner scanner (buf);
Parser parser (scanner, store, funTab, symTab);
status = parser.Eval ();
</pre></table><!--EndCode-->
<P>The first two lines take care of user input. The next three lines do the parsing and evaluation. Aren抰 we missing something? Shouldn抰 the calculator display the result? Well, it does, inside <var>Eval</var>. Sending the result of the calculation to the output is a side effect of evaluation.
<p>What it means, however, is that somebody will have to look inside the implementation of <var>Eval</var> in order to understand what the calculator is doing. Okay, that sounds bad--but isn't it true in general that you'd have to look inside the implementation of <var>Eval</var> in order to know what it's doing? Otherwise, how would you know that it first parses and then evaluates the user input, which has been passed to the parser in the form of a scanner.
<p>Well, that's a good point! Indeed, why don抰 we make it more explicit and rewrite the last line of the main loop.
<!--Code--><table width="100%" cellspacing=10> <tr> <td class=codetable>
<pre>
cout << "> "; // prompt
cin.getline (buf, maxBuf);
Scanner scanner (buf);
Parser parser (scanner, store, funTab, symTab);
status = parser.Parse ();
double result = parser.Calculate ();
cout << result << endl;
</pre>
</table><!--End Code-->
<P>This way, every statement does just one well defined thing. I can read this code without having to know any of the implementation details of any of the objects. Here抯 how I read it:
<OL>
<LI>Prompt the user,
<LI>Get a line of input,
<LI>Create a scanner to encapsulate the line of input,
<LI>Create a parser to process this input,
<LI>Let the parser parse it,
<LI>Let the parser calculate the result,
<LI>Display the result.
</OL>
<P>I find this solution esthetically more pleasing, but esthetics alone isn't usually enough to convince somebody to rewrite their code. I had to learn to translate my esthetics into practical arguments. Therefore I will argue that this version is <i>better</i> than the original one because:
<UL>
<LI>It separates three distinct and well defined actions: parsing, calculation and display. This decoupling will translate into easier maintenance.
<LI>Input and output are now performed at the same level, rather than input being done at the top level and output interspersed with the lower level code.
<LI>Input and output are part of user interface which usually evolves independently of the rest of the program. So it's always a good idea to have it concentrated in one place. I抦 already anticipating the possible switch from command line interface to a simple Windows interface.</UL>
<P>The changes to <var>Parser</var> to support this new split are pretty obvious. I got rid of the old <var>Parse</var> method and pasted its code directly in the only place from which it was called (all it did was: <var>_pTree = Expr()</var>. Then I renamed the <var>Eval</var> to <var>Parse</var> and moved the calculation to the new method <var>Calculate</var>. By the way, I also fixed a loophole in error checking: the input line should not contain anything following the expression. Before that, a line like 1 + 1 = 1 would have been accepted without complaint.
<!--Code--><table width="100%" cellspacing=10> <tr>
<td class=codetable>
<pre>
Status Parser::Parse ()
{
// Everything is an expression
_pTree = Expr ();
if (!_scanner.IsDone ())
_status = stError;
return _status;
}
double Parser::Calculate () const
{
assert (_status == stOk);
assert (_pTree != 0);
return _pTree->Calc ();
}
bool Scanner::IsDone () const
{
return _buf [_iLook] == '\0';
}</pre>
</table><!--End Code-->
<P>My students convinced me that <var>Calculate</var> should not be called, if the parsing resulted in an error. As you can see, that certainly simplifies the code of <var>Parse</var> and <var>Calculate</var>. Of course, now we have to check for errors in main, but that抯 okay; since we have the status codes available there, why not do something useful with them.
<p>By the way, as long as we抮e at it, we can also check the scanner for an empty line and not bother creating a parser in such a case. Incidentally, that will allow us to use an empty line as a sign from the user to quit the calculator. Things are just nicely falling in place.
<!--Code--><table width="100%" cellspacing=10> <tr>
<td class=codetable>
<pre>
cout << "\nEnter empty line to quit\n";
// Process a line of input at a time
do
{
cout << "> "; // prompt
cin.getline (buf, maxBuf); // read a line
Scanner scanner (buf); // create a scanner
if (!scanner.IsEmpty ())
{
// Create a parser
Parser parser (scanner, store, funTab, symTab);
status = parser.Parse ();
if (status == stOk)
{
double result = parser.Calculate ();
cout << result << endl;
}
else
{
cout << "Syntax error.\n";
}
}
else
{
break;
}
} while (status != stQuit);
</pre>
</table><!-- End Code-->
<P>Notice what we have just done: We have reorganized the top level of our program with very little effort. In traditional programming this is a big no-no. Changing the top level is synonymous with rewriting the whole program. We must have done something right, if we were able to pull such a trick. Of course, this is a small program and, besides, <i>I</i> wrote it, so it was easy for <i>me</i> to change it. I have some good news: I have tried these methods in a project a hundred times bigger than this and it worked. Better than that--I would let a newcomer to the project go ahead and make a top level change and propagate it all the way to the lowest level. And frankly, if I weren抰 able to do just that, I would have been stuck forever with some early design mistakes. Just like the one in the calculator which we have just fixed.
<P>There are several other output operations throughout the program. They are mostly dealing with error reporting. Since we don抰 have yet a coherent plan for error reporting and error propagation, we抣l just do one cosmetic change. Instead of directing error messages to the standard output <var>cout</var>, we抣l direct them to the standard error <var>cerr</var>, like this:</P>
<!-- Code --><table width="100%" cellspacing = 10><tr>
<td class =codetable>
<pre>
cerr << "Error: division by zero\n";
</pre>
</table><!--End Code -->
<P>The only practical difference is that, if you redirect standard output of your program to a file, the error messages will still go to your console. Not that we care much, but when we port this program to Windows, we would like to be able to quickly search the source for <var>stderr</var> and, for instance, convert these printouts to message boxes.
<h3>Fighting Defensive Programming</h3>
<P>"Defensive programming" sounds good, doesn抰 it? A program written by a defensive programmer should be safer and more robust, right?
<P>Wrong! Defensive programming is a dangerous form of engineering malpractice (in a show of self-abasement I抣l shortly expose examples of such practices in my own code). Defensive programming promotes writing code that is supposed to work even in the face of programmers
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -