📄 6lib.html
字号:
<td width=10> </td>
<td bgcolor="#cccccc" class=sidebar>
Bug Alert! The standard library that comes with VC++ 6.0 has a bug that makes <var>getline</var> expect two newlines rather than one, as line terminator. A fix for this and many other bugs is available on the Internet at <a href="javascript:if(confirm('http://www.dinkumware.com/ \n\nThis file was not retrieved by Teleport Pro, because it is addressed on a domain or path outside the boundaries set for its Starting Address. \n\nDo you want to open it from the server?'))window.location='http://www.dinkumware.com/'" tppabs="http://www.dinkumware.com/">www.dinkumware.com</a>.
</td></table>
<!-- End Sidebar -->
<p>Let's stop here for a moment. We have just removed the last built-in limitation from our program. There is no longer any restriction on the length of the input line, or on the length of an identifier. The file <span class="file">params.h</span> is gone!
<p>How did that happen? Well, we started using dynamic data structures. Is our program more complex because of that? No, it's not! In fact it's simpler now. Is there any reason to introduce limitations into programs? Hardly!
<p>Now let's follow the fate of the input string from main and into the scanner. It makes little sense to store a naked c-string in the scanner. So let's rewrite it to use a standard string instead.
<!-- Code -->
<table width="100%" cellspacing=10><tr>
<td class=codeTable>
<pre>class Scanner
{
public:
Scanner (std::string const & buf);
bool IsDone () const { return _iLook == std::string::<font color="red">npos</font>; }
bool IsEmpty () const { return _buf.<font color="red">length</font> () == 0; }
EToken Token () const { return _token; }
EToken Accept ();
std::string GetSymbolName ();
double Number ();
private:
void EatWhite ();
typedef std::string::<font color="red">size_type</font> size_type;
<font color="red">std::string</font> const & _buf;
size_type _iLook;
EToken _token;
double _number;
size_type _iSymbol;
size_type _lenSymbol;
static char _whiteChars [];
};
</pre>
</td></tr>
</table>
<!-- End Code -->
<p>The natural way of marking the fact that we are beyond the end of string is to use a special index value, <var>string::npos</var>. This is the value that is returned by string's various "find" methods when end of string is reached. For instance, we can use one such method, <var>find_first_not_of</var>, to skip whitespace in our buffer.
<!-- Code -->
<table width="100%" cellspacing=10><tr>
<td class=codeTable>
<pre>char Scanner::_whiteChars [] = " \t\n\r";
void Scanner::EatWhite ()
{
_iLook = _buf.find_first_not_of (_whiteChars, _iLook);
}</pre>
</td></tr>
</table>
<!-- End Code -->
<p>The method <var>find_first_not_of</var> takes a null-terminated array of characters to be skipped (in our case the array contains a space, a tab, a newline and a carriage return) and the optional starting index which defaults to zero. It returns the index of the first occurrence of acharacterr that is not in the skip list. If no such character is found, it returns <var>string::npos</var>.
<p>By the way, the value <var>string::npos</var> is guaranteed to be greater than any valid index, as long as you are comparing the same integral types. That's why we made sure we use the same type, <var>size_type</var>, for our index as the one used internally by the string itself.
<p><var>GetSymbolName</var> is returning a substring of the buffer.
<!-- Code -->
<table width="100%" cellspacing=10><tr>
<td class=codeTable>
<pre>std::string Scanner::GetSymbolName ()
{
return _buf.substr (_iSymbol, _lenSymbol);
}</pre>
</td></tr>
</table>
<!-- End Code -->
<p>An alternative would be to use the string constructor that takes the source string, the starting offset and the length.
<p>The rest of the implementation of the scanner works with almost no change. We only have to make sure that, at the end of <var>Accept</var>, we set the position to <var>string::npos</var> after the buffer has been consumed.
<!-- Code -->
<table width="100%" cellspacing=10><tr>
<td class=codeTable>
<pre>if (_iLook == _buf.length ())
_iLook = std::string::npos;</pre>
</td></tr>
</table>
<!-- End Code -->
<p>In particular, code like this, although not the most elegant, will work with strings as well as it used to work with straight character arrays:
<!-- Code -->
<table width="100%" cellspacing=10><tr>
<td class=codeTable>
<pre>char * p;
_number = strtod (&_buf [_iLook], &p);
_iLook = p - &_buf [0];
</pre>
</td></tr>
</table>
<!-- End Code -->
<p>This is not the preferred way of writing a program, if you've decided to use the standard library from the onset of the project. But I wanted to show you that it's quite possible to start the conversion in a legacy program and not have to go all the way at once.
<h3>Exploring Streams</h3>
</td></tr>
<tr>
<td class=margin valign=top>
<a href="calc7.zip" tppabs="http://www.relisoft.com/book/tech/source/calc7.zip">
<img src="brace.gif" tppabs="http://www.relisoft.com/book/images/brace.gif" width=16 height=16 border=1 alt="Download!"><br>source</a>
</td>
<td>
<p>So what is the preferred way; if we really wanted to use the standard library to its full advantage? I guess we wouldn't bother reading the line from the standard input into a string. We would just pass the stream directly to the scanner. After all, we don't need random access to the input line--we are parsing it more or less one character at a time. Except in rare cases, we don't have to go back in the string to re-parse it (there are some grammars that require it--ours doesn't). And when we do, it's only by one character.
<p>Let's start from the top. We can create the scanner, passing it the standard input stream as an argument. By the way, the type of <var>cin</var> is <var>std::istream</var>.
<!-- Code -->
<table width="100%" cellspacing=10><tr>
<td class=codeTable>
<pre>cerr << "> "; // prompt
Scanner scanner (cin);</pre>
</td></tr>
</table>
<!-- End Code -->
<p>Here's the new definition of the Scanner class.
<!-- Code -->
<table width="100%" cellspacing=10><tr>
<td class=codeTable>
<pre>class Scanner
{
public:
explicit Scanner (<font color="red">std::istream</font> & in);
bool IsDone () const { return _token == tEnd; }
bool IsEmpty () const { return _isEmpty; }
EToken Token () const { return _token; }
void Accept ();
std::string GetSymbolName ();
double Number ();
private:
void ReadChar ();
<font color="red">std::istream</font> & _in;
int _look; // lookahead character
bool _isEmpty;
EToken _token;
double _number;
std::string _symbol;
};</pre>
</td></tr>
</table>
<!-- End Code -->
<p>I did a little reorganizing here. I'm keeping a lookahead character in <var>_look</var>. I also decided to have a Boolean flag <var>_isEmpty</var>, to keep around the information that the stream was empty when the scanner was constructed (I can no longer look back at the beginning of input, once <var>Accept</var> has been called). I changed the test for <var>IsDone</var> to simply compare the current token with <var>tEnd</var>. Finally, I needed a string to keep the name of the last symbolic variable read from the input.
<p>Here's the constructor of the <var>Scanner</var>:
<!-- Code -->
<table width="100%" cellspacing=10><tr>
<td class=codeTable>
<pre>Scanner::Scanner (std::istream & in)
: _in (in)
{
Accept ();
_isEmpty = (Token () == tEnd);
}</pre>
</td></tr>
</table>
<!-- End Code -->
<p>The <var>Accept</var> method needs a little rewriting. Where we used to call <var>EatWhite ()</var>, we now call <var>ReadChar ()</var>. It skips whitespace as before, but it also initializes the lookahead character to the first non-white character. Since the lookahead has been consumed from the input stream, we don't have to do any incrementing after we've recognized it in <var>Accept</var>.
<!-- Code -->
<table width="100%" cellspacing=10><tr>
<td class=codeTable>
<pre>void Scanner::Accept ()
{
ReadChar ();
switch (_look)
{
case '+':
_token = tPlus;
// no incrementing
break;
...
}
}</pre>
</td></tr>
</table>
<!-- End Code -->
<p>This is the implementation of <var>ReadChar</var>:
<!-- Code -->
<table width="100%" cellspacing=10><tr>
<td class=codeTable>
<pre>void Scanner::ReadChar ()
{
_look = _in.<font color="red">get</font> ();
while (_look == ' ' || _look == '\t')
_look = _in.<font color="red">get</font> ();
}
</pre>
</td></tr>
</table>
<!-- End Code -->
<p>I had to rethink the handling of the end of input. Before, when we used <var>getline</var> to read input, we actually never had to deal with a newline. By definition, <var>getline</var> eats the newline and terminates the string appropriately (i.e., the c-string version appends a null, the <var>std::string</var> version updates the internal length). The <var>get</var> method, on the other hand, reads every character as is, including the newline. So I let the scanner recognize a newline as the end of input.
<!-- Code -->
<table width="100%" cellspacing=10><tr>
<td class=codeTable>
<pre> case '\n': // end of input
case '\r':
case EOF: // end of file
_token = tEnd;
break;
</pre>
</td></tr>
</table>
<!-- End Code -->
<p>Incidentally, I did some thinking ahead and decided to let the scanner recognize the end of file. The special <var>EOF</var> value is returned by the <var>get ()</var> method when it encounters the end of file. This value is not even a character (that's why <var>get</var> is defined to return an <var>int</var>, rather than <var>char</var>).
<p>How can a standard input stream encounter an end of file? There's actually more than one way it may happen. First, you may enter it from the keyboard--in DOS it's the combination Ctrl-Z. Second, the program might be called from the command line with redirected input. You may create a text file, say <span class="file">calc.txt</span>, filled with commands for the calculator and then call it like this:
<!-- Code -->
<table width="100%" cellspacing=10><tr>
<td class=codeTable>
<pre>calc < calc.txt</pre>
</td></tr>
</table>
<!-- End Code -->
<p>The operating system will plug the contents of this file into the program's standard input and execute it. You'll see the results of calculations flashing on your standard output. That is unless you redirect it too, like this:
<!-- Code -->
<table width="100%" cellspacing=10><tr>
<td class=codeTable>
<pre>calc < calc.txt > results.txt</pre>
</td></tr>
</table>
<!-- End Code -->
Then you'll only see only the standard error (including the prompts) flashing before your eyes, and the file <span class="file">results.txt</span> will be filled with results of your calculations.
<p>Let's continue with our rewrite of the scanner. Here's what we do when we recognize a number:
<!-- Code -->
<table width="100%" cellspacing=10><tr>
<td class=codeTable>
<pre> case '0': case '1': case '2': case '3': case '4':
case '5': case '6': case '7': case '8': case '9':
case '.':
{
_token = tNumber;
_in.<font color="red">putback</font> (_look);
_in >> _number; // read the whole number
break;
}</pre>
</td></tr>
</table>
<!-- End Code -->
<p>Reading a floating-point number from the standard input is easy. The only complication arises from the fact that we've already read the first character of the number--our lookahead. So before we read the whole number, we have to put our lookahead back into the stream. Don't worry, this is a simple operation. After all, the input stream is buffered. When you call <var>get</var>, the character is simply read from a buffer (unless the buffer is empty--in that case the system replenishis it by actually reading the input). Ungetting a character just means putting it back into that buffer. Input streams are implemented in such a way that it's always possible to put back one character.
<p>When reading an identifier, we do a slight variation of the same trick.
<!-- Code -->
<table width="100%" cellspacing=10><tr>
<td class=codeTable>
<pre> default:
if (isalpha (_look) || _look == '_')
{
_token = tIdent;
_symbol.erase (); // erase string contents
do {
_symbol += _look;
_look = _in.get ();
} while (isalnum (_look));
_in.<font color="red">putback</font> (_look);
}
else
_token = tError;
break;</pre>
</td></tr>
</table>
<!-- End Code -->
<p>We don't have to <var>putback</var> a lookahead at the beginning of reading an identifier. Instead, we have to <var>putback</var> the last character, the one that is not part of the identifier, so that the next call to <var>ReadChar ()</var> can see it.
<p>Haven't we lost some generality by switching from a string to a stream in our implementation of the scanner? After all, you can always convert a stream to a string (e.g., using <var>getline ()</var>). Is the opposite possible? Not to worry! Converting a string into a stream is as easy. The appropriate class is called <var>istringstream</var> and is defined in the header <span class="file"><sstream></span>. Since <var>istringstream</var> inherits from <var>istream</var>, our scanner won't notice the difference. For instance, we can do this:
<!-- Code -->
<table width="100%" cellspacing=10><tr>
<td class=codeTable>
<pre>std::istringstream in ("sin (2 * pi / 3)");
Scanner scanner (in);</pre>
</td></tr>
</table>
<!-- End Code -->
<p class=summary>We have just skimmed the surface of the standard library and we've already found a lot of useful stuff. It really pays to study it, rather than implement your own solutions from scratch.
</td>
</tr>
</table>
<!-- End Main Table -->
</body>
</html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -