⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 metakit-fileformat - metakit database system.mht

📁 TCL的数据库处理支撑库及一些示例
💻 MHT
📖 第 1 页 / 共 3 页
字号:
From: =?gb2312?B?08kgV2luZG93cyBJbnRlcm5ldCBFeHBsb3JlciA3ILGjtOY=?=
Subject: metakit-fileformat - Metakit Database System
Date: Fri, 29 Feb 2008 00:27:29 +0800
MIME-Version: 1.0
Content-Type: text/html;
	charset="gb2312"
Content-Transfer-Encoding: quoted-printable
Content-Location: http://www.equi4.com/metakit/metakit-ff.html
X-MimeOLE: Produced By Microsoft MimeOLE V6.0.6000.16545

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<! -- -*- tcl doctools -*-=0A=
   --><HTML><HEAD><TITLE>metakit-fileformat - Metakit Database =
System</TITLE>
<META http-equiv=3DContent-Type content=3D"text/html; =
charset=3Dgb2312"><! -- Generated from file 'metakit-ff.man' by =
tcllib/doctools with format 'html'=0A=
   --><! -- Copyright (c) 1996-2003 Jean Claude Wippler =
&lt;jcw@equi4.com&gt;   -- Copyright (c) 2003 Andreas Kupries =
&lt;andreas_kupries@users.sourceforge.net&gt;=0A=
   --><! -- CVS: $Id$ metakit-fileformat.n=0A=
   -->
<META content=3D"MSHTML 6.00.6000.16609" name=3DGENERATOR></HEAD>
<BODY>
<H1>metakit-fileformat(n) 1.0 "Metakit Database System"</H1><A =
name=3Dname>
<H2>NAME</H2>
<P>metakit-fileformat - Metakit File Format <! -- Copyright JCW for =
metakit, Copyright AK for this document=0A=
   --><! -- __________________________________________=0A=
   --><A=20
name=3Ddescription>
<H2>DESCRIPTION</H2>This document specifies the file format used by the =
metakit=20
database library for persistent storage of its databases. The same =
format is=20
also used for serialization and subsequent transfer of a database over =
some=20
communication system, like pipes and sockets.=20
<P>How the metakit library uses files in the specified format is outside =
of the=20
scope of this document, although in some sections, hints for specific =
uses might=20
be given.=20
<P>To ensure an unambiguous use of all terms inside of this document and =
when=20
discussing its contents a glossary was created. See section <A=20
href=3D"http://www.equi4.com/metakit/metakit-ff.html#glossary">GLOSSARY</=
A> at the=20
end.=20
<P>It was decided to specify the format in the form of a grammar in =
Extended=20
Backus-Naur Form (EBNF) augmented by free-form text to capture the=20
context-sensitive parts of the language. It is assumed that the reader =
of this=20
document is familiar with EBNF. <! -- =
__________________________________________=0A=
   --><A name=3Dbackground>
<H2>BACKGROUND</H2>The background for this specification is =
<EM>metakit</EM> (<A=20
href=3D"http://www.equi4.com/metakit">http://www.equi4.com/metakit</A>), =
a=20
flexible database system developed by Jean-Claude Wippler (<A=20
href=3D"mailto:jcw@equi4.com">mailto:jcw@equi4.com</A>).=20
<P>In contrast to most other systems which handle their data row-wise it =
manages=20
the data in a column-oriented way, i.e. all data for a single column is =
handled=20
together. This characteristic is reflected in the file format too.=20
<P>What would be called tables in other relational database systems are =
known as=20
<EM>views</EM> in Metakit. Views consist of <EM>columns</EM>, which =
store=20
specific pieces of data in <EM>cells</EM>. The cells at the same =
row-index in=20
all columns of a view are called a <EM>row</EM>.=20
<P>Another difference is metakit's ability to define <EM>subview</EM> =
columns,=20
which are columns where the data in each cell is a complete view in its =
own=20
right, although they are sharing the same structural definition. <! --   =
 -- 	Note: I am told that metakit allows heterogeneous subview columns,  =
 -- 	where each cell can have at least one of several possible   -- 	=
structures, but currently lack information on those details. I   -- 	=
especially do not have the information on how this ability is   -- 	=
reflected in the file format. ... Another possibility is that   -- 	each =
cell stores not only the view itself, but also its   -- 	definition.   =
-- =0A=
   --><! -- __________________________________________=0A=
   --><A=20
name=3Dtypedefinitions>
<H2>TYPE DEFINITIONS</H2>Metakit supports the following six types for =
its=20
columns. The key in the list below is the character used by metakit as =
indicator=20
for that type. See section <A=20
href=3D"http://www.equi4.com/metakit/metakit-ff.html#structuredefinition"=
>STRUCTURE=20
DEFINITION</A> for the place in which these characters are used.=20
<DL>
  <DT><STRONG>S</STRONG>
  <DD>All entries in a column of this type contain strings. <BR><BR>
  <DT><STRONG>I</STRONG>
  <DD>All entries in a column of this type contain integer numbers =
requiring at=20
  most 32 bits of storage space. All entries will use the same number of =
bits to=20
  store their data. <BR><BR>
  <DT><STRONG>F</STRONG>
  <DD>All entries in a column of this type contain single precision =
floating=20
  point numbers, each taking up 4 bytes (32 bits) of space. <BR><BR>
  <DT><STRONG>D</STRONG>
  <DD>All entries in a column of this type contain double precision =
floating=20
  point numbers, each taking up 8 bytes (64 bits) of space. <BR><BR>
  <DT><STRONG>B</STRONG>
  <DD>All entries in a column of this type contain arbitrary binary data =
of=20
  arbitrary length. There is no bit-packing, the data is measured in =
bytes.=20
  <BR><BR>
  <DT><STRONG>L</STRONG>
  <DD>All entries in a column of this type contain large integer =
numbers, each=20
  taking up 8 bytes (64 bits) of space. </DD></DL><! -- =
__________________________________________=0A=
   --><A=20
name=3Dcolumnmapping>
<H2>COLUMN MAPPING</H2>When metakit stores column data into a file or=20
serialization it places them into one or more <EM>itemvectors</EM>, the =
physical=20
containers for the data. How many itemvectors are required is dependent =
on the=20
type of the column, and on the data contained in it.=20
<P>This section describes only the basic mapping required to create the =
table of=20
contents (See <STRONG>TableOfContents</STRONG>), and none of the=20
<EM>secondary</EM> itemvectors indirectly reachable through the =
<EM>primary</EM>=20
itemvectors of a column listed in the table of contents.=20
<DL>
  <DT><STRONG>I</STRONG>, <STRONG>L</STRONG>, <STRONG>F</STRONG>,=20
  <STRONG>D</STRONG>
  <DD>A single primary itemvector is used to store all column data. =
<BR><BR>
  <DT><STRONG>S</STRONG>, <STRONG>B</STRONG>
  <DD>Depending on the size of the string/binary data stored in the =
entries of a=20
  column either two or three primary itemvectors are used to store the =
column=20
  data. In addition secondary itemvectors may be reached through these, =
holding=20
  the actual string/binary data. </DD></DL>The exact contents of each =
itemvector=20
are described in the upcoming grammar. See <STRONG>IVecData</STRONG> and =
its=20
variants. <! -- __________________________________________=0A=
   --><A=20
name=3Dvariablesizeddata>
<H2>VARIABLE SIZED DATA</H2>One of the consequences of using a =
column-wise=20
representation for views is that for any insertion, deletion, or change =
of a row=20
the system has to relocate and copy all itemvectors for all the columns =
in the=20
view. This is not so big a problem for data of a fixed size, like for =
the types=20
<STRONG>I</STRONG>, <STRONG>F</STRONG>, <STRONG>D</STRONG>, and=20
<STRONG>L</STRONG>. For them this operation is only invoked when =
inserting or=20
deleting row. Changing the value of a cell invokes only the relocation =
and=20
copying of the itemvectors for one column, and they tend to be =
relatively small.=20

<P>This situation changes when data of varying and arbitrary length is =
involved,=20
be it strings or just binary data (types <STRONG>S</STRONG> and=20
<STRONG>B</STRONG>). For them the simple method of storing all the data =
in one=20
itemvector and the sizes of the items in a second scales badly as even =
minuscule=20
changes cause the copying of large amount of data.=20
<P>To evade this trap the file format uses a slightly more complex =
method.=20
Instead of only two itemvectors it employs three. The first two are the =
same=20
ones as for the simple method, with a small change. While the first =
itemvector=20
contains the sizes for all items, the second itemvector contains only =
the data=20
for the items with a size &gt; 0. Items whose size is recored as zero =
are not=20
stored in the second itemvector, but are <EM>indirect</EM>ly reachable =
through=20
the third itemvector, a catalog. Each of the third itemvector's items =
records=20
the location of another itemvector in the file on the one hand, and =
information=20
determining to which row in the column the item belongs to. In other =
words, how=20
to interleave the items reachable through the catalog with the items in =
the=20
first two itemvectors to reconstruct their proper order at the logical =
level of=20
the column.=20
<P>With the above structure in place any writer of a database is now =
free in his=20
decision where to actually place the variable sized data of a cell when =
writing=20
to the file. Namely either directly into the second itemvector, or into =
a block=20
of its own with the location of that block recorded in the catalog =
vector. By=20
storing smaller data directly and larger data indirectly the performance =
impact=20
of the large data is reduced considerably, because now only the =
itemvector=20
containing the catalog has to be copied for changes, whereas the large =
data=20
blocks often can be left in the location initially given to them.=20
<P>The relevant symbols of the grammar are =
<STRONG>IVecCatalogData</STRONG> and=20
<STRONG>VariableMapping</STRONG>. See section <A=20
href=3D"http://www.equi4.com/metakit/metakit-ff.html#formatgrammar">FORMA=
T=20
GRAMMAR</A> for their definition.=20
<P>
<P>
<TABLE>
  <TBODY>
  <TR>
    <TD bgColor=3Dblack>&nbsp;</TD>
    <TD><PRE class=3Dsample>  For the example let us assume that we have =
items 0 and 3 and 6 all
  having small amounts of data, items 1 and 4 are empty, and 2 is a
  larger memo item. Then the situation would be:

  Column 0, the data: concatenated contents of items 0, 3, and 6.

  Column 1, the sizes: sizes for entries 0, 3, and 6, the rest zeroes.

  Column 2, the memos:
    2 as byte-packed integer, meaning skip 2 rows
    the size of the data in row 2, as byte-packed integer
    the pointer to the data in row 2, as byte-packed integer
</PRE></TD></TR></TBODY></TABLE></P><! -- =
__________________________________________=0A=
   --><A=20
name=3Dlexicalunits>
<H2>LEXICAL UNITS</H2>The lexical units of the grammar used here are are =
the=20
fundamental pieces making up a metakit file or serialization. This unit =
is the=20
<EM>byte</EM>, containing 8 <EM>bits</EM>. <! -- =
__________________________________________=0A=
   --><A name=3Dformatgrammar>
<H2>FORMAT GRAMMAR</H2>The grammar is written in a bottom up format. =
This means=20
that the more basic elements are specified first, and the specification =
of the=20
complete database is the last element.=20
<DL>
  <DT><STRONG>Word</STRONG>
  <DD>::=3D byte byte <BR><BR>A 16-bit word consists of two bytes. The =
endianess=20
  of words is variable. When reading, the metakit library determines the =
actual=20
  endianess from the marker in the <STRONG>Header</STRONG>. When writing =
the=20
  metakit library uses the native endianess of the host on which it is =
running.=20
  <BR><BR>
  <DT><STRONG>Long</STRONG>
  <DD>::=3D byte byte byte byte <BR><BR>A 32-bit long word consists of =
four bytes=20
  (or two words). The endianess of long words is variable. When reading, =
the=20
  metakit library determines the actual endianess from the marker in the =

  <STRONG>Header</STRONG>. When writing the metakit library uses the =
native=20
  endianess of the host on which it is running. <BR><BR>
  <DT><STRONG>bpInt</STRONG>
  <DD>::=3D [ bpiSignByte ] { bpiDataByte } bpiStopByte <BR><BR>The name =
is a=20
  shortcut for byte-packed integer. It is a notation for storing =
arbitrarily=20
  large integer numbers in a very compact way. Note that any number is =
always=20
  stored in the most compact way possible. This means that leading zeros =
are=20
  always stripped down as much as possible. In other words, no instance =
of=20
  <STRONG>bpInt</STRONG> will contain a <STRONG>bpiDataByte</STRONG> of =
value=20

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -