📄 ch13.htm
字号:
<tt> ~Date();</tt><tt> //a non-virtual member function</tt><tt> bool isLeap() const;</tt><tt> bool operator == (const Date& other);</tt><tt>};</tt></pre><p>The Standard guarantees that within every instance of class <tt>Date</tt>, data members are set down in the order of their declarations (static data members are stored outside the object and are therefore ignored). There is no requirement that members be set down in contiguous memory regions; the compiler can insert additional padding bytes (more on this in Chapter 11, "Memory Management") between data members to ensure proper alignment. However, this is also the practice in C, so you can safely assume that a <tt>Date</tt> object has a memory layout that is identical to that of the following C struct:</p><pre><tt>/*** filename POD_Date.h***/</tt><tt>struct POD_Date</tt><tt>/* the following struct has memory layout that is identical</tt><tt>to a Date object */</tt><tt>{</tt><tt> int day;</tt><tt> int month;</tt><tt> int year;</tt><tt>};</tt><tt>/*** POD_Date.h***/</tt></pre><p>Consequently, a <tt>Date</tt> object can be passed to C code and treated as if it were an instance of <tt>POD_Date</tt>. That the memory layout in C and C++ is identical in this case might seem surprising; class <tt>Date</tt> defines member functions in addition to data members, yet there is no trace of these member functions in the object's memory layout. Where are these member functions stored? C++ treats nonstatic member functions as static functions. In other words, member functions are ordinary functions. They are no different from global functions, except that they take an implicit <tt>this</tt> argument, which ensures that they are called on an object and that they can access its data members. An invocation of a member function is transformed to a function call, whereby the compiler inserts an additional argument that holds the address of the object. Consider the following example:</p><pre><tt>void func()</tt><tt>{</tt><tt> Date d;</tt><tt> bool leap = d.isLeap(); //1</tt><tt>}</tt></pre><p>The invocation of the member function <tt>isLeap()</tt> in (1) is transformed by a C++ compiler into something such as</p><pre><tt>_x_isLeap?Date@KPK_Date@(&d); //pseudo C++ code</tt></pre><p>What was that again? Parse it carefully. The parentheses contain the <tt>this</tt> argument, which is inserted by the compiler in every nonstatic member function call. As you already know, function names are mangled. <tt>_x_isLeap?Date@KPK_Date@</tt> is a hypothetical mangled name of the member function <tt>bool Date::isLeap() const;</tt>. In the hypothetical C++ compiler, every mangled name begins with an underscore to minimize the potential for conflicts with user-given names. Next, the <tt>x</tt> indicates a function, as opposed to a data variable. <tt>isLeap</tt> is the user-given name of the function. The <tt>?</tt> is a delimiter that precedes the name of the class. The <tt>@</tt> that follows the class name indicates the parameter list, which begins with a <tt>KPK</tt> and <tt>Date</tt> to indicate a <tt>const</tt> pointer to a <tt>const</tt> <tt>Date</tt> (the <tt>this</tt> argument of a <tt>const</tt> member function is a <tt>const</tt> pointer to a <tt>const</tt> object). Finally, a closing <tt>@</tt> indicates the end of the parameter list. <tt>_x_isLeap?Date@KPK_Date@</tt> is, therefore, the underlying name of the member function <tt>bool Date::isLeap() const;</tt>. Other compilers are likely to use different name mangling schemes, but the details are quite similar to the example presented here. You must be thinking: "This is very similar to the way procedural programming manipulates data." It is. The crucial difference is that the compiler, rather than the human programmer, takes care of these low-level details.</p><h3> <a name="Heading32">The C++ Object Model is Efficient</a></h3><p>The object model of C++ is the underlying mechanism that supports object-oriented concepts such as constructors and destructors, encapsulation, inheritance, and polymorphism. The underlying representation of class member functions has several advantages. It is very efficient in terms of execution speed and memory usage because an object does not store pointers to its member functions. In addition, the invocation of a nonvirtual member function does not involve additional lookup and pointer dereferencing. A third advantage is backward compatibility with C; an object of type <tt>Date</tt> can be passed to C code safely because the binary representation of such an object complies with the binary representation of a corresponding C struct. Other object-oriented languages use a radically different object model, which might not be compatible with either C or C++. Most of them use <i>reference semantics</i>. In a reference-based object model, an object is represented as a reference (a pointer or a handle) that refers to a memory block in which data members and pointers to functions are stored. There are some advantages to reference semantics; for example, reference counting and garbage collection are easier to implement in such languages, and indeed such languages usually provide automatic reference counting and garbage collection. However, garbage collection also incurs additional runtime overhead, and a reference-based model breaks down backward compatibility with C. The C++ object model, on the other hand, enables C++ compilers to be written in C, and (as you read in Chapter 6, "Exception<i> </i>Handling,") early C++ compilers were essentially C++-to-C translators. </p><h3> <a name="Heading33">Memory Layout of Derived Objects</a></h3><p>The Standard does not specify the memory layout of base class subobjects in a derived class. In practice, however, all C++ compilers use the same convention: The base class subobject appears first (in left-to-right order in the event of multiple inheritance), and data members of the derived class follow. C code can access derived objects, as long as the derived class abides by the same restrictions that were specified previously. For example, consider a nonpolymorphic class that inherits from <tt>Date</tt> and has additional data members:</p><pre><tt>class DateTime: public Date</tt><tt>{</tt><tt>public: //additional members</tt><tt>long time;</tt><tt>bool PM; //display time in AM or PM?</tt><tt>DateTime();</tt><tt>~DateTime();</tt><tt>long getTime() const;</tt><tt>};</tt></pre><p>The two additional data members of <tt>DateTime</tt> are appended after the three members of the base class <tt>Time</tt>, so the memory layout of a <tt>DateTime</tt> object is equivalent to the following C struct:</p><pre><tt>/*** filename POD_Date.h***/</tt><tt>struct POD_DateTime</tt><tt>{</tt><tt> int day;</tt><tt> int month;</tt><tt> int year;</tt><tt> long time</tt><tt> bool PM;</tt><tt>};</tt><tt>/*** POD_Date.h***/</tt></pre><p>In a similar vein, the nonpolymorphic member functions of <tt>DateTime</tt> have no effect on the size or memory layout of the object.</p><p>The compatible memory layout of nonpolymorphic C++ objects and C structs has many useful applications. For example, it enables relational databases to retrieve and insert objects into a database table. Data Manipulation Languages, such as SQL, that do not support object semantics, can still treat a "live" object as a raw chunk of memory. In fact, several commercial databases rely on this compatibility to provide an object-oriented interface with an underlying relational data model. Another application is the capability to transmit objects as a stream of bytes from one machine to another.</p><h3> <a name="Heading34">Support for Virtual Member Functions</a></h3><p>What happens when an object becomes polymorphic? In this case, backward compatibility with C is trickier. As was noted previously, the compiler is allowed to insert additional data members to a class in addition to user-declared data members. These members can be padding bytes that ensure proper alignment. In the case of virtual functions, an additional member is inserted into the class: a pointer to the virtual table, or <tt>_vptr</tt>. The <tt>_vptr</tt> holds the address of a static table of function pointers (as well as the runtime type information of a polymorphic class; see Chapter 7, "Runtime<i> </i>Type<i> </i>Identification"). The exact position of the <tt>_vptr</tt> is implementation-dependent. Traditionally, it was placed after the class's user-declared data members. However, some compilers have moved it to the beginning of the class for performance reasons. Theoretically, the _<tt>vptr</tt> can be located anywhere inside the class -- even among user-declared members.</p><p>A virtual member function, like a nonvirtual member function, is an ordinary function. When a derived class overrides it, however, multiple distinct versions of the function exist. It is not always possible to determine at compile time which of these functions needs to be invoked. For example</p><pre><tt>#include <iostream></tt><tt>using namespace std;</tt><tt>class PolyDate</tt><tt>{</tt><tt>public:</tt><tt>//PolyDate has the same members as Date but it's polymorphic</tt><tt>virtual void name() const { cout<<"PolyDate"<<endl;}</tt><tt>};</tt><tt>class PolyDateTime: public PolyDate</tt><tt>{</tt><tt>public:</tt><tt>// the same members as DateTime but it's polymorphic</tt><tt>void name() const { cout<<"PolyDateTime"<<endl;} //override PolyDate::name()</tt><tt>};</tt></pre><p>When these classes are compiled, the hypothetical compiler generates two underlying functions that correspond to <tt>PolyDate::name()</tt> and <tt>PolyDateTime()::name()</tt>:</p><pre><tt> // mangled name of void PolyDate::name() const</tt><tt>_x_name?PolyDate@KPK_PolyDate@</tt><tt> // mangled name of void PolyDateTime::name() const;</tt><tt>_x_name?PolyDateTime@KPK_PolyDateTime@</tt></pre><p>So far, there's nothing unusual about this. You already know that a member function is an ordinary function that takes an implicit <tt>this</tt> argument. Because you have defined two versions of the same virtual function, you also expect to find two corresponding functions, each of which has a distinct mangled name. However, unlike nonvirtual functions, the compiler cannot always transform an invocation of a virtual member function into a direct function call. For example</p><pre><tt>void func(const PolyDate* pd)</tt><tt>{</tt><tt> pd->name();</tt><tt>}</tt></pre><p><tt>func()</tt> can be located in a separate source file, which might have been compiled before class <tt>PolyDateTime</tt> was defined. Therefore, the invocation of the virtual function <tt>name()</tt> has to be deferred until runtime. The compiler transforms the function call into something such as</p><pre><tt>(* pd->_vptr[2]) (pd);</tt></pre><p>Analyze it; the member <tt>_vptr</tt> points to the internally-generated virtual table. The first member of the virtual table is usually saved for the address of the destructor, and the second might store the address of the class's <tt>type_info</tt>. Any other user-defined virtual member functions are located in higher positions. In this example, the address of <tt>name()</tt> is stored at the third position in the virtual table (in practice, the name of the _<tt>vptr</tt> is also mangled). Thus, the expression <tt>pd->_vptr[2]</tt> returns the address of the function <tt>name()</tt> associated with the current object. <tt>pd</tt>, in the second occurrence, represents the <tt>this</tt> argument.</p><p>Clearly, defining a corresponding C struct is more precarious in this case and requires intimate acquaintance with the compiler's preferred position of the <tt>_vptr</tt> as well as with its size. There is another hazard here: The value of the <tt>_vptr</tt> is transient, which means that it might have a different value, according to the address space of the process that executes the program. . Consequently, when an entire polymorphic object is stored in a file and retrieved later, the retrieved data cannot be used as a valid object. For all these reasons, accessing polymorphic objects from C code is dangerous and generally needs to be avoided.</p><h3> <a name="Heading35">Virtual Inheritance</a></h3><p>C code does not access objects that have a virtual base class either. The reason is that a virtual base is usually represented in the form of a pointer to a shared instance of the virtual subobject. Here again, the position of this pointer among user-defined data members is implementation-dependent. Likewise, the pointer holds a transient value, which can change from one execution of the program to another.</p><h3> <a name="Heading36">Different Access Specifiers</a></h3><p>The fourth restriction on the legality of accessing C++ objects from C code states that all the data members of the class are declared without an intervening access specifier. This means, theoretically, that the memory layout of a class that looks similar to the following</p><pre><tt>class AnotherDate</tt><tt>{</tt><tt>private:</tt><tt> int day;</tt><tt>private:</tt><tt> int month;</tt><tt>private:</tt><tt> int year;</tt><tt>public:</tt><tt> //constructor and destructor</tt><tt> AnotherDate(); //current date</tt><tt> ~AnotherDate();</tt><tt> //a non-virtual member function</tt><tt> bool isLeap() const;</tt><tt> bool operator == (const Date& other);</tt><tt>};</tt></pre><p>might differ from a class that has the same data members declared in the same order, albeit without any intervening access specifiers. In other words, for class <tt>AnotherDate</tt>, an implementation is allowed to place the member <tt>month</tt> before the member <tt>day</tt>, <tt>year</tt> before <tt>month</tt>, or whatever. Of course, this nullifies any compatibility with C code. However, in practice, all current C++ compilers ignore the access specifiers and store the data members in the order of declaration. So C code that accesses a class object that has multiple access specifiers might work -- but there is no guarantee that the compatibility will remain in the future.</p><h2> <a name="Heading37">Conclusions</a></h2><p>The creators of C++ have attempted to preserve, as closely as possible, backward compatibility with C. Indeed, almost without exception, every C program is also a valid C++ program. Still, there are some subtle differences between the seemingly common denominator of the two languages. Most of them, as you might have noted, derive from the improved type-safety of C++. -- for example, the obligatory declaration of a function prior to its usage, the need to use explicit cast of <tt>void</tt> pointers to the target pointer, the deprecation of implicit <tt>int</tt> declarations, and the enforcement of a null terminator in a string literal. Other discrepancies between the two languages derive from the different rules of type definition.</p><p>C code can be called directly from C++ code. Calling C++ code from C is also possible under certain conditions, but it requires additional adjustments regarding the linkage type and it is confined to global functions exclusively. C++ objects can be accessed from C code, as you have seen, but here again, there are stringent constraints to which you must adhere. </p><CENTER><P><HR> <A HREF="/publishers/que/series/professional/0789720221/index.htm"><img src="/publishers/que/series/professional/0789720221/button/contents.gif" WIDTH="128"HEIGHT="28" ALIGN="BOTTOM" ALT="Contents" BORDER="0"></A> <BR><BR><BR><p></P><P>© <A HREF="/publishers/que/series/professional/0789720221/copy.htm">Copyright 1999</A>, Macmillan Computer Publishing. Allrights reserved.</p></CENTER></BODY></HTML>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -