📄 chapter15.html

📁 think like a computer scientist
💻 HTML
📖 第 1 页 / 共 2 页
字号:
12 下一页
<HTML><HEAD>  <TITLE>Chapter 15</TITLE>  <LINK REL="STYLESHEET" HREF="downey.css" tppabs="http://rocky.wellesley.edu/downey/ost/thinkCS/c++_html/downey.css"></HEAD><BODY><H2>Chapter 15</H2><H1>File Input/Output and <TT>apmatrix</TT>es</TT></H1><P>In this chapter we will develop a program that reads and writes files, parses input, and demonstrates the <TT>apmatrix</TT> class.  We will also implement a data structure called <TT>Set</TT> that expands automatically as you add elements.</P><P>Aside from demonstrating all these features, the real purpose of the programis to generate a two-dimensional table of the distances between cities in the United States.  The output is a table that looks like this:</P><PRE>Atlanta 0Chicago 700     0Boston  1100    1000    0Dallas  800     900     1750    0Denver  1450    1000    2000    800     0Detroit 750     300     800     1150    1300    0Orlando 400     1150    1300    1100    1900    1200    0Phoenix 1850    1750    2650    1000    800     2000    2100    0Seattle 2650    2000    3000    2150    1350    2300    3100    1450    0        Atlanta Chicago Boston  Dallas  Denver  Detroit Orlando Phoenix Seattle</PRE><P>The diagonal elements are all zero because that is the distance from a city to itself. Also, because the distance from A to B is the same as the distance from B to A, there is no need to print the top half of the matrix.</P><BR><BR><H3>15.1 Streams</H3><P>To get input from a file or send output to a file, you have to create an <TT>ifstream</TT> object (for input files) or an <TT>ofstream</TT> object (for output files). These objects are defined in the header file <TT>fstream.h</TT>,which you have to include.</P><P>A <B>stream</B> is an abstract object that represents the flow of data from a source like the keyboard or a file to a destination like the screen or a file.</P><P>We have already worked with two streams: <TT>cin</TT>, which has type <TT>istream</TT>, and <TT>cout</TT>, which has type <TT>ostream</TT>. <TT>cin</TT> represents the flow of data from the keyboard to the program. Eachtime the program uses the <TT>>></TT> operator or the <TT>getline</TT> function, it removes a piece of data from the input stream.</P><P>Similarly, when the program uses the <TT><<</TT> operator on an <TT>ostream</TT>, it adds a datum to the outgoing stream.</P><BR><BR><H3>15.2 File input</H3><P>To get data from a file, we have to create a stream that flows from the fileinto the program. We can do that using the <TT>ifstream</TT> constructor.</P><PRE>  ifstream infile ("file-name");</PRE><P>The argument for this constructor is a string that contains the name of the file you want to open. The result is an object named <TT>infile</TT> that supports all the same operations as <TT>cin</TT>, including <TT>>></TT> and <TT>getline</TT>.</P><PRE>  int x;  apstring line;      infile >> x;               // get a single integer and store in x  getline (infile, line);    // get a whole line and store in line</PRE><P>If we know ahead of time how much data is in a file, it is straightforward to write a loop that reads the entire file and then stops. More often, though, we want to read the entire file, but don't know how big it is.</P><P>There are member functions for <TT>ifstreams</TT> that check the status of the input stream; they are called <TT>good</TT>, <TT>eof</TT>, <TT>fail</TT> and <TT>bad</TT>. We will use <TT>good</TT> to make sure the file was opened successfully and <TT>eof</TT> to detect the ``end of file.''</P><P>Whenever you get data from an input stream, you don't know whether the attempt succeeded until you check. If the return value from <TT>eof</TT> is <TT>true</TT> then we have reached the end of the file and we know that the last attempt failed. Here is a program that reads lines from a file and displaysthem on the screen:</P><PRE>  apstring fileName = ...;  ifstream infile (fileName.c_str());  if (infile.good() == false) {    cout << "Unable to open the file named " << fileName;    exit (1);  }  while (true) {    getline (infile, line);    if (infile.eof()) break;    cout << line << endl;  }</PRE><P>The function <TT>c\_str</TT> converts an <TT>apstring</TT> to a native C string. Because the <TT>ifstream</TT> constructor expects a C string as an argument, we have to convert the <TT>apstring</TT>.</P><P>Immediately after opening the file, we invoke the <TT>good</TT> function. The return value is <TT>false</TT> if the system could not open the file, most likely because it does not exist, or you do not have permission to read it.</P><P>The statement <TT>while(true)</TT> is an idiom for an infinite loop. Usuallythere will be a <TT>break</TT> statement somewhere in the loop so that the program does not really run forever (although some programs do). In this case, the <TT>break</TT> statement allows us to exit the loop as soon as we detect the end of file.</P><P>It is important to exit the loop between the input statement and the output statement, so that when <TT>getline</TT> fails at the end of the file, we do not output the invalid data in <TT>line</TT>.</P><BR><BR><H3>15.3 File output</H3><P>Sending output to a file is similar.  For example, we could modify the previous program to copy lines from one file to another.</P><PRE>  ifstream infile ("input-file");  ofstream outfile ("output-file");  if (infile.good() == false || outfile.good() == false) {    cout << "Unable to open one of the files." << endl;    exit (1);  }  while (true) {    getline (infile, line);    if (infile.eof()) break;    outfile << line << endl;  }</PRE><BR><BR><H3>15.4 Parsing input</H3><P>In Section 1.4 I defined ``parsing'' as the process of analyzing the structure of a sentence in a natural language or a statement in a formal language. For example, the compiler has to parse your program before it can translate it into machine language.</P><P>In addition, when you read input from a file or from the keyboard you often have to parse it in order to extract the information you want and detect errors.</P><P>For example, I have a file called <TT>distances</TT> that contains information about the distances between major cities in the United States. I got this information from a randomly-chosen web page</P><TT>&nbsp;&nbsp;&nbsp;&nbsp;<A HREF="javascript:if(confirm('http://www.jaring.my/usiskl/usa/distance.html  \n\nThis file was not retrieved by Teleport Pro, because it is addressed on a domain or path outside the boundaries set for its Starting Address.  \n\nDo you want to open it from the server?'))window.location='http://www.jaring.my/usiskl/usa/distance.html'" tppabs="http://www.jaring.my/usiskl/usa/distance.html">http://www.jaring.my/usiskl/usa/distance.html</A></TT><P>so it may be wildly inaccurate, but that doesn't matter. The format of the file looks like this:</P><PRE>"Atlanta"       "Chicago"       700"Atlanta"       "Boston"        1,100"Atlanta"       "Chicago"       700"Atlanta"       "Dallas"        800"Atlanta"       "Denver"        1,450"Atlanta"       "Detroit"       750"Atlanta"       "Orlando"       400</PRE><P>Each line of the file contains the names of two cities in quotation marks and the distance between them in miles. The quotation marks are useful because they make it easy to deal with names that have more than one word, like ``San Francisco.''</P><P>By searching for the quotation marks in a line of input, we can find the beginning and end of each city name. Searching for special characters like quotation marks can be a little awkward, though, because the quotation mark is a special character in C++, used to identify string values.</P><P>If we want to find the first appearance of a quotation mark, we have to write something like:</P><PRE>  int index = line.find ('\"');</PRE><P>The argument here looks like a mess, but it represents a single character, a double quotation mark. The outermost single-quotes indicate that this is a character value, as usual. The backslash (\) indicates that we want to treat the next character literally.  The sequence \" represents a quotation mark; theisequence \' represents a single-quote. Interestingly, the sequence \\ represents a single backslash. The first backslash indicates that we should take the second backslash seriously.</P><P>Parsing input lines consists of finding the beginning and end of each city name and using the <TT>substr</TT> function to extract the cities and distance.<TT>substr</TT> is an <TT>apstring</TT> member function; it takes two arguments, the starting index of the substring and the length.</P><PRE>void processLine (const apstring& line){  // the character we are looking for is a quotation mark  char quote = '\"';  // store the indices of the quotation marks in a vector  apvector<int> quoteIndex (4);  // find the first quotation mark using the built-in find  quoteIndex[0] = line.find (quote);  // find the other quotation marks using the find from Chapter 7  for (int i=1; i<4; i++) {    quoteIndex[i] = find (line, quote, quoteIndex[i-1]+1);  }  // break the line up into substrings  int len1 = quoteIndex[1] - quoteIndex[0] - 1;  apstring city1 = line.substr (quoteIndex[0]+1, len1);  int len2 = quoteIndex[3] - quoteIndex[2] - 1;  apstring city2 = line.substr (quoteIndex[2]+1, len2);  int len3 = line.length() - quoteIndex[2] - 1;  apstring distString = line.substr (quoteIndex[3]+1, len3);  // output the extracted information  cout << city1 << "\t" << city2 << "\t" << distString << endl;}</PRE><P>Of course, just displaying the extracted information is not exactly what we want, but it is a good starting place.</P><BR><BR><H3>15.5 Parsing numbers</H3><P>The next task is to convert the numbers in the file from strings to integers. When people write large numbers, they often use commas to group the digits, as in 1,750. Most of the time when computers write large numbers, they don't include commas, and the built-in functions for reading numbers usually can't handle them. That makes the conversion a little more difficult, but it also provides an opportunity to write a comma-stripping function, so that's ok.Once we get rid of the commas, we can use the library function <TT>atoi</TT>to convert to integer. <TT>atoi</TT> is defined in the header file <TT>stdlib.h</TT>.</P><P>To get rid of the commas, one option is to traverse the string and check whether each character is a digit. If so, we add it to the result string. At the end of the loop, the result string contains all the digits from the original string, in order.</P><PRE>int convertToInt (const apstring& s){  apstring digitString = "";  for (int i=0; i&lt;s.length(); i++) {    if (isdigit (s[i])) {      digitString += s[i];    }  }  return atoi (digitString.c_str());}</PRE><P>The variable <TT>digitString</TT> is an example of an {\bf accumulator</TT>.It is similar to the counter we saw in Section~\ref{loopcount}, except that instead of getting incremented, it gets accumulates one new character at a time, using string concatentation.</P><P>The expression</P><PRE>      digitString += s[i];</PRE><P>is equivalent to</P><PRE>      digitString = digitString + s[i];</PRE><P>Both statements add a single character onto the end of the existing string.</P><P>Since <TT>atoi</TT> takes a C string as a parameter, we have to convert <TT>digitString</TT> to a C string before passing it as an argument.</P><BR><BR><H3>15.6 The <TT>Set</TT> data structure</H3><P>A data structure is a container for grouping a collection of data into a single object. We have seen some examples already, including <TT>apstring</TT>s, which are collections of characters, and <TT>apvector</TT>s which are collections on any type.</P><P>An ordered set is a collection of items with two defining properties:</P><DL>  <DT>Ordering:</DT><DD> The elements of the set have indices associated with   them. We can use these indices to identify elements of the set.</DD>  <DT>Uniqueness:</DT><DD> No element appears in the set more than once. If you   try to add an element to a set, and it already exists, there is no effect.  </DD></DL><P>In addition, our implementation of an ordered set will have the following property:</P><DL>  <DT>Arbitrary size:</DT><DD> As we add elements to the set, it expands to   make room for new elements.</DD></DL>Both <TT>apstring</TT>s and <TT>apvector</TT>s have an ordering; everyelement has an index we can use to identify it.  Both none ofthe data structures we have seen so far have the properties ofuniqueness or arbitrary size.<P>To achieve uniqueness, we have to write an <TT>add</TT> function that searches the set to see if it already exists. To make the set expand as elements are added, we can take advantage of the <TT>resize</TT> function on <TT>apvector</TT>s.</P><P>Here is the beginning of a class definition for a <TT>Set</TT>.</P><PRE>class Set {private:  apvector&lt;apstring&gt; elements;  int numElements;public:  Set (int n);  int getNumElements () const;  apstring getElement (int i) const;  int find (const apstring& s) const;  int add (const apstring& s);};Set::Set (int n){  apvector&lt;apstring&gt; temp (n);  elements = temp;  numElements = 0;}</PRE><P>The instance variables are an <TT>apvector</TT> of strings and an integer that keeps track of how many elements there are in the set. Keep in mind that the number of elements in the set, <TT>numElements</TT>, is not the same thing as the size of the <TT>apvector</TT>. Usually it will be smaller.</P><P>The <TT>Set</TT> constructor takes a single parameter, which is the initial size of the <TT>apvector</TT>. The initial number of elements is always zero.</P><P><TT>getNumElements</TT> and <TT>getElement</TT> are accessor functions for the instance variables, which are private. <TT>numElements</TT> is a read-only variable, so we provide a <TT>get</TT> function but not a <TT>set</TT> function.</P><PRE>int Set::getNumElements () const{
12 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -