📄 cdbfile.txt
字号:
CDBFile v. 1.0, a C++ package for handling dBASE III files
Herve GOURMELON (herve.gourmelon@enssat.fr)
1. Introduction
Last term, I had to begin a reengineering project for the ENSSAT, the
French graduate school I am working at. The core of the project was a
software that drives automatic doors with magnetic cards, which are referred
to in data files. Well, I had long thought that the dBASE III
file standard was no longer in use since dBASE IV and dBASE
V had appeared. It turned out that the former software that was running
those doors seemed to work very well with those DBF files, and yet their
structure was quite simple. I could have chosen to use raw text files for
that application, which woud have made it a great deal easier to program.
But there are quite a few companies and administrations in France that
used the same system, and I thought that it would be easier for them to
migrate to a new software if they did not have to modify the structure of
their own data files.
Browsing the Net for some documents or some code, I discovered
that hardly anything had been released under the GPL Well, I
discovered later that a dBASE file viewer also existed, written in C for
UNIX/Linux... in that field. The only thing I found was Mark W. Schumann's
"ffld.c" , but I needed more than that, and I wanted
those tools to be object-oriented. So I decided to distribute the few tools
I would build under the GPL (please read the COPYING text file
included in the package). They are far from being comprehensive, but I
hope that their modularity will enable other developpers to expand them
and/or customize them as they like...
2. Database file structure
The structure of a dBASE III database file is composed of a header
and data records. The layout is given below.
dBASE III DATABASE FILE HEADER:
+---------+-------------------+---------------------------------+
| BYTE | CONTENTS | MEANING |
+---------+-------------------+---------------------------------+
| 0 | 1 byte | dBASE III version number |
| | | (03H without a .DBT file) |
| | | (83H with a .DBT file) |
+---------+-------------------+---------------------------------+
| 1-3 | 3 bytes | date of last update |
| | | (YY MM DD) in binary format |
+---------+-------------------+---------------------------------+
| 4-7 | 32 bit number | number of records in data file |
+---------+-------------------+---------------------------------+
| 8-9 | 16 bit number | length of header structure |
+---------+-------------------+---------------------------------+
| 10-11 | 16 bit number | length of the record |
+---------+-------------------+---------------------------------+
| 12-31 | 20 bytes | reserved bytes (version 1.00) |
+---------+-------------------+---------------------------------+
| 32-n | 32 bytes each | field descriptor array |
| | | (see below) | --+
+---------+-------------------+---------------------------------+ |
| n+1 | 1 byte | 0DH as the field terminator | |
+---------+-------------------+---------------------------------+ |
|
|
A FIELD DESCRIPTOR: <------------------------------------------+
+---------+-------------------+---------------------------------+
| BYTE | CONTENTS | MEANING |
+---------+-------------------+---------------------------------+
| 0-10 | 11 bytes | field name in ASCII zero-filled |
+---------+-------------------+---------------------------------+
| 11 | 1 byte | field type in ASCII |
| | | (C N L D or M) |
+---------+-------------------+---------------------------------+
| 12-15 | 32 bit number | field data address |
| | | (address is set in memory) |
+---------+-------------------+---------------------------------+
| 16 | 1 byte | field length in binary |
+---------+-------------------+---------------------------------+
| 17 | 1 byte | field decimal count in binary |
+---------+-------------------+---------------------------------+
| 18-31 | 14 bytes | reserved bytes (version 1.00) |
+---------+-------------------+---------------------------------+
The data records are layed out as follows:
1. Data records are preceeded by one byte that is a space (20H) if the
record is not deleted and an asterisk (2AH) if it is deleted.
2. Data fields are packed into records with no field separators or
record terminators.
3. Data types are stored in ASCII format as follows:
DATA TYPE DATA RECORD STORAGE
--------- --------------------------------------------
Character (ASCII characters)
Numeric - . 0 1 2 3 4 5 6 7 8 9
Logical ? Y y N n T t F f (? when not initialized)
Memo (10 digits representing a .DBT block number)
Date (8 digits in YYYYMMDD format, such as
19840704 for July 4, 1984)
3. Description of the object
3.1. Specification
For the project I was working on, I had to process various dBASE files
that were very different in their structures. The easiest way to solve that
problem was to create a set of tools that would be completely generic,
smart enough to process various files with various kinds of data. I thought
that there could be a specific object for the file, which would contain at
least the whole description of the header.
The next problem was : how will I store the records? The first idea that
springs to mind is :
"Okay, let's create a generic structure containing a dynamic list
of fields, each of which will contain the data. Every field in the list
will have a type identifier, a buffer containing the data, and a pointer to
the next field. Every record will be a list of those fields, and the records
will also be stored in a dynamic list".
This might be an elegant solution, but it is very heavy: suppose you
have 256 fields in each record, each of which containing only one byte of
data... With all the pointers involved in the dynamic structures, the data
could take up to ten times as much space in memory as on the disk!
In order to save memory, I decided to keep a simple structure: every
record is stored as a string of raw ASCII read directly from the disk, into
a dynamic list of records. The description of the fields (with their types
and lengths) is stored in a separate list. Values are read from and written
to the records using functions which format the data according to the field
types.
This solution consumes less memory and is easier to implement. There is
a price to pay, though, for the genericity of the tools: using the functions
to access the data (reading from / writing to a record) you have to pass or
to receive a void pointer, so the programmer has to know exactly which is
the type of the value that is passed or received.
3.2. Structure
It contains two dynamic lists: one list of field descriptors (pointers to
CField objects) and one list of records (pointers to Record structures).
The field descriptors are organized in a ring: the last CField
pointer points to the first CField* in the list. The ring is
single-linked. I chose to implement it that way because it is easier to
maintain than a double-linked dynamic list, and quite efficient, given the
few number of fields that we have in general.
The records are stored in a double-linked list. I tried to use a
single-linked list first, but it turned out that I needed a second link in
order to apply the quick sort algorithm to the list. Pity.
Another thing that worries me about the quality of the tools is that I
did not declare Record as an object, but only as a structure (I
thought I didn't need to, given the little data contained in that structure).
R eal C++ P rogrammers won't like it at all... Maybe
in a later version I (or someone else) will fix that.
Nota: in the next subsections, I will describe the various
components of the CD B File object and their behavior. That
description will only be a quick overview of what every member function does,
so if you really need more details, have a look at the code in
"cdbfile.cpp" , which is thoroughly commented and should help you understand
my somehow foggy and twisted ideas.
3.3 The CField object
It is described in the header file as follows :
class CField
private :
char Name[11]; // field name in ASCII zero-filled
char Type; // field type in ASCII
unsigned char Length; // field length in binary
unsigned char DecCount; // field decimal count in binary
unsigned char FieldNumber; // field number within the record
unsigned short Offset; // field offset within the character string
CField* Next; // Next field in the list
...
;
You can see that all the data that can be found in the field descriptors is
stored in those CField objects (apart from the field data address, which is
useless when you are not using the first versions of Ashton-Tate's dBASE ).
An additional value has been appended: the
Offset field, which indicates the beginning of the field within the raw
character string. The CField* Next pointer points to the next
field structure in the ring. Every member variable can be read or written
to using the associated public member functions. Two special functions (in
fact, two overloaded versions of the same member function) are provided to
access the right CField object in the ring, using either its name or its
number to identify it. These are recursive functions which need a
CField pointer to return the position of a CField object within
the ring --- see the code for more details. So here are the headers
for those member functions :
class CField
private :
...
public :
CField(); // default constructor
CField(char* NName, char NType, unsigned char NLength,
unsigned char NDecCount, unsigned char FieldNum);
// another constructor
CField(); // default destructor
char* GetName() return Name;
void SetName(char* NewName) strcpy(Name, NewName);
// ...and so on for all the member variables
CField* GetField(char* FieldName, CField* Start=NULL);
CField* GetField(unsigned short Number, CField* Start=NULL);
;
That's all for the CField object, which you won't have to use directly --- unless
you want to add some more functions to CDBFile.
3.4 The CDBFile object
Now let's get down to some serious work. This is the main part of the
toolbox, and only a few member functions of CDBFile are available for the
user (i.e., the programmer who doesn't want to handle raw DBF files
barehands!). This object manipulates four major entities :
* The CField ring: see the description of CField above.
* The Record list: A double-linked list of structures containing the
data in a raw ASCII format, straight from the file. This is the core
of the object, and most of the member functions will manipulate that
list.
* The contents of those records: A character table containing raw
data. Since we know the offset, the length and the type of each field,
we can convert that data into numbers or character strings, or even
booleans; we can also do the reverse operation in order to write or
modify a value within a record.
* The DBF file: it will be accessed for reading or writing records.
It is absolutely necessary to open such a file in order to
initialize the field descriptor ring, since that toolbox does not
provide yet (shame! shame!) enough functions to build a CDBFile
object from scratch.
3.4.1 Handling the record list
Here is the structure of the records, as declared in the source:
struct rec
char* Contents; // raw character string
BOOL ModifFlag; // TRUE if the record has been modified
unsigned long RecordNumber; // Position of the record within the file
struct rec *Next; // Points to the next record in the list
struct rec *Previous; // Points to the previous record in the list
;
typedef struct rec Record;
This is the minimum structure that we need to process any kind of DBF
records, provided that we know the field descriptor array. Let's have a
look at the private variables and function headers below :
class CDBFile
private :
Record* RecordList; // Head of the list of records
Record* CurrentRec; // Current record pointed in the list
...
Record* ReadRecord(unsigned long RecNum);
Record* CreateNewRecord();
Record* GetRecord(unsigned long RecordNum);
void Append(Record* Rec, Record* Tail=NULL);
void DeleteRecord(Record *Rec);
void SortAllRecords(Record *Head, Record *Tail, CField* Criter1);
;
CDBFile contains two pointers to records: the former,
RecordList , is the head of the double-linked list of records,
whereas the latter, CurrentRec , points to the record that is
being handled by the user. CurrentRec is mainly used by public
functions that give the user some possibilities to access the records
indirectly (for reading or writing, creating or deleting).
As for the private functions, they do the following :
* ReadRecord() reads the RecNum th record in
the DBF file, and returns a pointer to a newly created record
containig the accessed data.
* CreateNewRecord() returns a pointer to a newly created record.
* GetRecord() points to the record whose number matches
RecordNum .
* Append() inserts Rec in the list after Tail ,
if Tail is not NULL . Otherwise, it will insert
Rec in the list according to its record number.
* DeleteRecord() simply deletes the record and its contents.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -