📄 7serial.html

📁 C ++ in action
💻 HTML
📖 第 1 页 / 共 3 页
字号:
上一页 1 23
            throw "unexpected end of file";
        _stream.read (reinterpret_cast&lt;char *&gt; (&amp;d), sizeof (double));
        if (_stream.bad())
            throw "file read failed";
        return d;
    }
    std::string GetString ()
    {
        long len = GetLong ();
        std::string str;
        str.resize (len);
        _stream.read (&amp;str [0], len);
        if (_stream.bad())
            throw "file read failed";
        return str;
    }
    bool GetBool ()
    {
        long b = GetLong ();
        if (_stream.bad())
            throw "file read failed";
        if (b == TruePattern)
            return true;
        else if (b == FalsePattern)
            return false;
        else
            throw "data corruption";
    }
private:
    std::ifstream _stream;
};</pre>
    </td></tr>
</table>
<!-- End Code -->

<p>Several interesting things happen here. First of all: What are these strange flags that we pass to <var>ifstream::open ()</var>? The first one, <var>ios_base::in</var>, means that we are opening the file for input. The second one, <var>ios_base::binary</var>, tells the operating system that we don't want any carriage return-linefeed translations.
<!-- Sidebar -->
<table width=100% border=0 cellpadding=5><tr>
<td width=10>
<td bgcolor="#cccccc" class=sidebar>
What is this carriage return-linefeed nonsense? It's one the biggest blunders of the DOS file system, that was unfortunately inherited by all flavors of Windows. The creators of DOS decided that the system should convert single character '\n' into a pair '\r', '\n'. The reasoning was that, when you print a file, the printer interprets carriage return, '\r', as the command to go back to the beginning of the current line, and line feed, '\n', as the command to move down to the next line (not necessarily to its beginning). So, to go to the beginning of the next line, a printer requires two characters. Nowadays, when we use laser printers that understand Postscript and print wysywig documents, this whole idea seems rather odd. Even more so if you consider that an older operating system, Unix, found a way of dealing with this problem without involving low level file system services.
<p>Anyway, if all you want is to store bytes of data in a file, you have to remember to open it in the "binary" mode, otherwise you might get unexpected results. By the way, the default mode is <var>ios_base::text</var> which does the unfortunate character translation.
</table>
<!-- End Sidebar -->

<p>Another interesting point is that the method <var>ifstream::read</var> reads data to a character buffer--it expects <var>char *</var> as its first argument. When we want to read a long, we can't just pass the address of a long to it--the compiler doesn't know how to convert a <var>long *</var> to a <var>char *</var>. This is one of these cases when we <i>have to</i> force the compiler to trust us. We want to split the long ito its constituent bytes (we're ignoring here the big endian/little endian problem). A reasonably clean way to do it is to use the <var>reinterpret_cast</var>. We are essentially telling the compiler to "reinterpret" a chunk of memory occupied by the long as a series of chars. We can tell how many chars a long contains by applying to it the operator <var>sizeof</var>.

<p>This is a good place to explain the various types of casts. You use
<ul>
<li>const_cast--to remove the const attribute
<li>static_cast--to convert related types
<li>reinterpret_cast--to convert unrelated types
</ul>
<p>(There is also a dynamic_cast, which we won't discuss here.)
<p>Here's an example of const_cast:
<!-- Code --><table width=100% cellspacing=10><tr>    <td class=codetable>
<pre>char const * str = "No modify!";
char * tmp = const_cast&lt;char *&gt; (str);
tmp [0] = 'D';</pre>
</td></tr></table><!-- End Code -->

<p>To understand static_cast, think of it as the inverse of implicit conversion. Whenever type T can be implicitly converted to type U (in other words, T is-a U), you can use static_cast to perform the conversion the other way. For instance, a char can be implicitly converted to an int:
<!-- Code --><table width=100% cellspacing=10><tr>    <td class=codetable>
<pre>char c = '\n';
int i = c; // implicit conversion</pre>
</td></tr></table><!-- End Code -->
Therefore, when you need to convert an int into a char, use static_cast:
<!-- Code --><table width=100% cellspacing=10><tr>    <td class=codetable>
<pre>int i = 0x0d;
char c = static_cast&lt;char&gt; (i);</pre>
</td></tr></table><!-- End Code -->
<p>Or, if you have two classes, <var>Base</var> and <var>Derived: public Base</var>, you can implicitly convert pointer to <var>Derived</var> to a pointer to <var>Base</var> (<var>Derived</var> is-a <var>Base</var>). Therefore, you can use static_cast to go the other way:
<!-- Code --><table width=100% cellspacing=10><tr>    <td class=codetable>
<pre>Base * bp = new Derived; // implicit conversion
Derived * = static_cast&lt;Base *&gt; (bp);</pre>
</td></tr></table><!-- End Code -->

<p>You should realize that casts are dangerous and should be used very judiciously. Try to avoid casting at all costs. Serialization and deserialization are special in this respect, since they require low level manipulation of types.

<p>Finally, notice the strange way we store Boolean values. A Boolean value really requires only one bit for its storage. But, since we don't want to split bytes (or even longs, for that matter), we'll use some redundancy here. We could, in principle store the value <var>true</var> as one and <var>false</var> as zero. However, it will cost us the same to write a zero as to write an arbitrary value. The difference is that zeros are much more common in files than, say, <i>0xbad1bad2</i>. So when I read back the value <i>0xbad1bad2</i> and I expect a Boolean, I feel reassured that I'm reading sensible data and not some random garbage. This is only one of the ways of using redundancy for consistency checking.

<p>The output serializing stream is the mirror image of <var>DeSerializer</var>.

<!-- Code --><table width=100% cellspacing=10><tr>    <td class=codetable>
<pre>class Serializer
{
public:
    Serializer (std::string const &amp; nameFile)
        : _stream (nameFile.c_str (), ios_base::out | ios_base::binary)
    {
        if (!_stream.is_open ())
            throw "couldn't open file";
    }
    void PutLong (long l)
    {
        _stream.write (reinterpret_cast&lt;char *&gt; (&amp;l), sizeof (long));
        if (_stream.bad())
            throw "file write failed";
    }
    void PutDouble (double d)
    {
        _stream.write (reinterpret_cast&lt;char *&gt; (&amp;d), sizeof (double));
        if (_stream.bad())
            throw "file write failed";
    }
    void PutString (std::string const &amp; str)
    {
        int len = str.length ();
        PutLong (len);
        _stream.write (str.data (), len);
        if (_stream.bad())
            throw "file write failed";
    }
    void PutBool (bool b)
    {
        long l = b? TruePattern: FalsePattern;
        PutLong (l);
        if (_stream.bad ())
            throw "file write failed";
    }
private:
    std::ofstream _stream;
};</pre>
</td></tr></table><!-- End Code -->

<!-- Sidebar -->
<table width=100% border=0 cellpadding=5><tr>
<td width=10>
<td bgcolor="#cccccc" class=sidebar>
There is a shortcut notation combining assignment with a conditional. The following code:
<!-- Code --><table width=100% cellspacing=10><tr>	<td class=codetable>
<pre>long l = b? TruePattern: FalsePattern;</pre>
</td></tr></table><!-- End Code -->
is equivalent to:

<!-- Code --><table width=100% cellspacing=10><tr>	<td class=codetable>
<pre>long l;
if (b)
    l = TruePattern;
else
    l = FalsePattern;</pre>
</td></tr></table><!-- End Code -->
<p>The ternary (meaning, three-argument) operator <var>A? B: C</var> first evaluates A. If A is true, it evaluates and returns B, otherwise it evaluates and returns C. A piece of trivia: unlike in C, in C++ the ternary operator returns an l-value, so it can be used on the left-hand-side of the assignment. Not that I would recommend this style!
<p>There is an even more obscure operator in C++, the comma sequencing operator. The expression <var>A, B</var> first evaluates A, then evaluates and returns B. The evaluation of A is therefore a side-effect of the whole operation. Most often the comma operator is used to combine two expressions where one is expected, like in this double loop:
<!-- Code --><table width=100% cellspacing=10><tr>	<td class=codetable>
<pre>for (int i = 0, j = 0; i < maxI && j < maxJ; ++i, ++j)</pre>
</td></tr></table><!-- End Code -->
By the way, the first comma separates the declarations (complete with initialization) of two variables of the same type. It's the second comma, between ++i and ++j, that is the sequencing operator.
</table>
<!-- End Sidebar -->


<p>Notice how protective we are when reading from or writing to a file. That's because our program doesn't have full control of the disk. A write can fail because we run out of disk space. This can happen at any time, because we are not the only client of the file system--there are other applications and system services that keep allocating (and presumably freeing) disk space. Reading is worse, because we're not even sure what to expect in the file. Not only may a read fail because of a hardware problem (unreadable disk sector), but we must be prepared for all kinds of sabotage. Other applications could have gotten hold of our precious file and truncated, edited or written all over it. We can't even be sure that the file we are trying to parse has been created by our program. The user could have mistakenly or maliciously pass to our program the name of some executable, a spreadsheet or autoexec.bat. 
<p>We already have the first line of defense against such cases of mistaken identity or downright corruption--the version number. The first four bytes we read from the file must match our current version number or we refuse to load it. The error message we display in such a case is a bit misleading. A much better solution would be to spare a few additional bytes and stamp all our files with a magic number. Many people use their initials for the magic number in the hope that one day they'll be able to say to their children or grandchildren, "You see these bytes at the beginning of each file of this type? These are your mom's (dad's, gramma's, grampa's) initials." Provided the application or the system survives that long and is not widely considered an example of bad software engineering.

<h3>In-Memory (De-) Serialization</h3>

<p>Serialization of data structures is not necessarily related to their storage in files. Sometimes you just want to store some data structure in a chunk of memory, especially if you want to pass it to another application. Programs can talk to each other and pass data through shared memory or other channels (Windows clipboard comes to mind). You might also want to send data in packets across the network. These are all situations in which you can't simply pass pointers embedded in your data. You have to change the format of data.
<p>The serialization procedure is the same, whether the output goes to a file or to memory. In fact, if your data structure is serializable (it has the <var>Serialize</var> and <var>DeSerialize</var> methods), all you might need to do in order to serialize it to memory is to change the implementation of <var>Serializer</var> and <var>DeSerializer</var>. Even better, you might make these classes abstract--turn methods <var>PutLong</var>, <var>PutDouble</var>, <var>PutBool</var> and <var>PutString</var> to pure virtual--and provide two different implementations, one writing to a file and one writing to memory. You can do the same with the deserializer.
<p>There is one big difference between a file and a chunk of memory--the file grows as you write to it, a chunk of memory has fixed size. You have two choices--you can either grow your memory buffer as needed, or you can calculate the required amount of memory up front and pre-allocate the whole buffer. As it turns out, calculating the size of a serializable data structure is surprisingly easy. All you need is yet another implementation of the <var>Serializer</var> interface called the counting serializer. The counting serializer doesn't write anything, it just adds up the sizes of various data types it is asked to write.

<!-- Code --><table width=100% cellspacing=10><tr>    <td class=codetable>
<pre>class CountingSerializer: public Serializer
{
public:
    CountingSerializer ()
        : _size (0) {}
    int GetSize () const { return _size; }
    void PutLong (long l)
    {
        _size += sizeof (long);
    }
    void PutDouble (double d)
    {
        _size += sizeof (double);
    }
    void PutString (std::string const &amp; str)
    {
        _size += sizeof (long); // count
        _size += str.length ();
    }
    void PutBool (bool b)
    {
        _size += sizeof (long);
    }
private:
    int _size;
};</pre>
</td></tr></table><!-- End Code -->
<p>For instance, if you wanted to calculate the size of the file or memory buffer required for the serialization of a calculator, you'd call its <var>Serialize</var> method with a counting serializer.

<!-- Code --><table width=100% cellspacing=10><tr>    <td class=codetable>
<pre>CountingSerializer counter;
_calc.Serialize (counter);
int size = counter.GetSize ();</pre>
</td></tr></table><!-- End Code -->

<p>Remember that, in order for this to work, all methods of <var>Serializer</var> <i>must</i> be virtual.

<h3>Multiple Inheritance</h3>

<p>In order to make a class serializable, you have to add to it two methods, <var>Serialize</var> and <var>DeSerialize</var>, and implement them. It makes sense, then, to create a separate abstract class--a pure interface--to abstract this behavior.
<!-- Code -->
<table width="100%" cellspacing=10><tr>
    <td class=codeTable>
<pre>class Serializable
{
public:
    virtual void Serialize (Serializer &amp; out) const = 0;
    virtual void DeSerialize (DeSerializer &amp; in) = 0;
};</pre>
    </td></tr>
</table>
<!-- End Code -->
<p>All classes that are serializable, should inherit from the <var>Serializable</var> interface.

<!-- Code -->
<table width="100%" cellspacing=10><tr>
    <td class=codeTable>
<pre>class Calculator: public Serializable
class SymbolTable: public Serializable
class Store: public Serializable</pre>
    </td></tr>
</table>
<!-- End Code -->

<p>What's the advantage of doing that? After all, even when you inherit from <var>Serializable</var>, you still have to add the declaration of the two methods to you class and you have to provide their implementation. Suppose that a new programmer joins your group and he (or she) has to add a new class to the project. One day he sends you email asking, "How do I make this class serializable?" If this functionality is abstracted into a class, your answer could simply be, "Derive your class from Serializable." That's it! No further explanation is necessary.

<p>There is however a catch. What if your class is already derived from some other class? Now it will have to inherit from that class <i>and</i> from Serializable. This is exactly the case in which multiple inheritance can be put to work. In C++ a class may have more than one base class. The syntax for multiple inheritance is pretty straightforward:
<!-- Code --><table width=100% cellspacing=10><tr>    <td class=codetable>
<pre>class MultiDerived: public Base1, public Base2</pre>
</td></tr></table><!-- End Code -->
<p>Suppose, for instance, that you were not satisfied with treating <var>std::string</var> as a simple type, known to the <var>Serializer</var>. Instead, you'd like to create a separate type, a serializable string. Here's how you could do it, using multiple inheritance:

<!-- Code --><table width=100% cellspacing=10><tr>    <td class=codetable>
<pre>using std::string;

class SerialString: public string, public Serializable
{
public:
    SerialString (std::string const &amp; str): string (str) {}
    void Serialize (Serializer &amp; out) const;
    void DeSerialize (DeSerializer &amp; in);
};</pre>
</td></tr></table><!-- End Code -->

<p>Multiple inheritance is particularly useful when deriving from abstract classes. This kind of inheritance deals with interface rather than implementation. In fact, this is exactly the restriction on multiple inheritance that's built into Java. In Java you can inherit only from one full-blown class, but you can add to it multiple inheritance from any number of interfaces (the equivalent of C++ abstract classes). In most cases this is indeed a very reasonable restriction. 

<p>Next: <a href="8trans.html">transactions</a>.

    </td>
    </tr>
</table>
<!-- End Main Table -->
</body>
</html>
上一页 1 23
💿 文件大小 792 K
👤 上传用户 peterzhang1982
📂 所属分类电子书籍
🏷️ 相关标签

#action #in
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -