overview.txt
来自「开放源码的编译器open watcom 1.6.0版的源代码」· 文本 代码 · 共 511 行 · 第 1/2 页
TXT
511 行
Just what the heck's going on in the merger...
Entities:
Id numbers
Declarations
Definitions
Files
Usages
Deltas
Scopes
Strings
Types
Guards/Macros
Template stuff
Other issues:
Avoiding duplication
Pre-compiled headers
Caching
Container types
Re-indexing
Memory usage
Compiler-generated names
How to use the compiler
Other stuff to do
Things to do:
Files to look at:
brmtypes.h -- declarations shared by both the compiler end and
the Optima++ end of the merge process.
\\groupdir\cproj\brinfo\h\brmtypes.h
pcheader.h -- a PLUSPLUS header file which is unfortunately
needed by the merger DLL.
\\groupdif\cproj\plusplus\h\pcheader.h
ppopsdef.h -- a PLUSPLUS header file which in not included by
the DLL, but on which the information in the ppops
module of the DLL is based.
\\groupdir\cproj\plusplus\h\ppopsdef.h
Entities
========
Id numbers
----------
Many kinds of cross references are used in the compiler-generated
browse files, and since these files exist on disk rather than use
pointers a system of id numbers is used.
Every scope, declaration, type and string is assigned a 32-bit id
number when the browse file is generated. There are four typedefs
in brmtypes.h to reflect this:
typedef uint_32 BRIStringID;
typedef uint_32 BRISymbolID;
typedef uint_32 BRITypeID;
typedef uint_32 BRIScopeID;
Declarations
------------
Anything we would call a "symbol" in a C++ program is represented by
a Declaration in a browse file. This includes classes, functions,
typedefs, and variables. The BRI_SymbolAttributes enumeration in
brmtypes.h describes other kinds of symbols, such as labels, but they're
just there for future expansion.
In a browse file, Declarations come with a BRI_SymbolAttributes value,
a name and a type. The names are supposed to be unmangled, however
occasionally the compiler will generate symbols with pre-mangled names
(I've seen this happen with "virtual function thunks", whatever they are).
Symbols which are not compiler generated will still have the name the
user gave them. See the section on compiler-generated names.
The BRI_SymbolAttributes value specifies both the nature of the symbol
(function, variable, class, etc) but can also specify the access (public,
protected, or private). Currently this feature is not implemented nor used.
After merging, a Declaration will also include a reference to the
enclosing scope. This information is reconstructed by the merger.
Definitions
-----------
A Definition gives a file name and line and column numbers for a
specific symbol. After merging multiple browse files, there will
often be more than one Definition given for a particular Declaration.
This happens most often with functions: modules which just use a
function will think it's defined in the header file, while the module
which defines it will think it's defined in the source file.
Files
-----
A File record in a browse file just contains the filename. A FileEnd
record in a browse file contains no data. Together the two type of
records describe a hierarchy which is supposed to be as close to the
#include heirarchy of the program as possible.
File information is important to the interpretation of Usage records,
and to the interpretation of Guard information (see below).
See "Avoiding Duplication" on what happens during merging if the same
source file is used in multiple browse files.
Usages
------
A Usage is a reference to a Declaration or occasionally a Type. Calling
a function is considered a reference, as is inheriting from a class.
Every Usage has a location. In an effort to save some space, these
locations are stored in a slightly strange fashion.
To determine the source file that a reference is in, you must examine
where the Usage record is in relation to the heirarchy of File and FileEnd
records. To determine the line and column number of a reference, examine
all of the Usage and Delta (see below) record up to and including this
one which originate from the correct source file. Add up all of the
delta_line and delta_col fields from these records. The totals will
be the line and column location of this reference. With this scheme,
line deltas are stored as int_16's, while column deltas are stored
as int_8's.
The scheme is borrowed from Ivan, who noticed that references would tend
to clump at various locations in a source file.
After merging, the Usage records are grouped by file and stored in sorted
order by location. Also, the enclosing scope of the Usage is determined
at merge time.
Deltas
------
A Delta just contains a line and column number change. They are used
when the next Usage located too far away from the previous Usage for the
line and column number delta to be stored in a single int_16 and int_8.
Scopes
------
Scopes are the most compilicated part of this whole process.
There are seven types of scopes which can appear in a browse file: File,
Class, Function, Block, TemplateDecl, TemplateInst, and TemplateParm.
The first four have obvious functions. The last three are generated by
the compiler to contain template declarations and the code created when
a template is instantiated.
Scopes are heirarchical, just like Files (there is a ScopeEnd record).
The top of the scope tree should always be a file scope, and there should
be only one file scope declared in any browse file. Declarations and
Usages have enclosing scopes which are determined at merge time by the
relative order of the Scope, ScopeEnd, Declaration and Usage records.
After merging, lots of information is added to the Scope records.
The tree structure is reconstructed more explicitly. Also, each
Scope record receives a list of all symbols and class types which
were defined within the scope.
Scopes are also involved with eliminating duplication of browse data.
Simply put, you should never have two copies of the same function scope.
More on this in "Avoiding Duplication", below.
Strings
-------
This part is easy. All strings appear in a String record. Pre-merge,
the strings are stored willy-nilly in the browse file (but always appear
before they are used). Post-merge, the strings are all stored in a common
buffer with a hash-table to index them. Strings are always referred to
by their BRIStringID. Strings are '\0'-terminated ASCII-text (so all
functions in the DLL which manipulate WStrings use "GetAnsiText" and
"SetAnsiText" rather than just "GetText" and "SetText") and case-sensitive.
File names, when stored in a browse file, are always full path names in
lower case.
To save space, no string appears twice in a browse file. If the same
string appears in two browse files, one will be discarded during a merge.
Types
-----
Types are complicated simply because the C++ type system is complicated.
In browse files, types are represented by records which together make
up a directed acyclic graph. For example, the type record for (int *)
would contain the index of a type record for (int). Class types contain
the index of the name of the class and the index of the corresponding
class Declaration. I won't go into the structure of Type records here
except to say that it involves lots of nice big unions.
Type records always appear in browse files before they are used. A type
may appear twice in a browse file if the compiler has two identical types
in its records, however identical types are combined during a merge.
Guards/Macros
-------------
A Guard record in a browse file can represent either a dependancy,
a macro declaration or a macro usage. (In other words, Guard records
are where preprocessor stuff gets put.)
A dependancy specifically a preprocessor dependancy on the state of
a particular macro. For example, a statement of the form
"#ifndef __FILE_HPP_INCLUDED" would result in a dependancy being
placed in the corresponding browse files. Dependancies are used
by the merge process to cut down on the amount of duplicate browse
information that must be examined. See "Avoiding duplication", below.
Macro declarations and macro usages are included in case browse
information for them is desired. Macros are not symbols in any real
sense (for example, they ignore scopes and can be defined multiple
times), but some useful information can be gleaned from them.
Template stuff
--------------
Finally, the Template and TemplateEnd records. These are the only
records which correspond to nothing in the compiler, but are there
solely for the convenience of the compiler. As such, they take
a little explaining.
Imagine the following two files:
template.hpp:
template<class T> class Foo {
// ...
};
instance.cpp
#include "template.hpp"
// ...
Foo<int> my_foo;
When the compiler is processing "instance.cpp" and gets to the last line,
it suddenly has to invisibly generate code to implement the class "Foo<int>".
That code is all defined in "template.hpp", but the compiler tacks it to
the end of "instance.cpp" in a hidden "TemplateInst" scope. This causes
problems with browse information; suddenly the source of the browse
information jumps from "instance.cpp" back to "template.hpp", even though
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?