📄 rfc610.txt

📁 RFC 的详细文档！
💻 TXT
📖 第 1 页 / 共 5 页
字号:
Although such "whole file" usage of the datacomputer would be motivated
primarily by economic advantages of scale, data sharing at the file
level could also be a concern.  For example, the source files of common
network software might reside at the datacomputer. These files have



Winter, Hill & Greiff                                          [Page 11]

RFC 610           Further Datalanguage Design Concepts     December 1973


little or no structure, but their common use dictates that they be
available in a common, always accessible place.  It is taking advantage
of the economics of the datacomputer, more than anything else, since
most of these services are available on any file system.

This mode of use is mentioned here because it may account for a large
percentage of datalanguage requests.  It requires only capabilities
which would be present in datalanguage in any case; the only special
requirement is to make sure it is easy and simple to accomplish these
tasks.


2.4.4 Use of Datacomputer for File Archiving

This is another economics-oriented application.  The basic idea is to
store on the datacomputer everything that you intend to read rarely, if
ever.  This could include backup files, audit trails, and the like.

An interesting idea related to archiving is incremental archiving. A
typical practice, with regard to backing up data stored online in a
time-sharing system, is to write out all the pages which are different
than they were in the last dump.  It is then possible to recover by
restoring the last full dump, and then restoring all incremental dumps
up to the version desired.  This system offers a lower cost for dumping
and storage, and a higher cost for recovery; it is appropriate when the
probability of needing a recovery is low.  Datalanguage, then, should be
designed to permit convenient incremental archiving.

As in the case of the previous application (file system), archiving is
important as a design consideration because of its expected frequency
and economics, not because it necessarily requires any extra generality
at the language level. It may dictate that specialized mechanisms for
archiving be built into the system.


2.5 Data Sharing

Controlled sharing of data is a central concern of the project. Three
major sub-problems in data sharing are: (1) concurrent use, (2)
independent concepts of the same database, and (3) varying
representations of the same database.

Concurrent use of a resource by multiple independent processes is
commonly implemented for data on the file level in systems in which
files are regarded as disjoint, unrelated objects.  It is sometimes
implemented on the page level.

Considerable work on this problem has already been done within the



Winter, Hill & Greiff                                          [Page 12]

RFC 610           Further Datalanguage Design Concepts     December 1973


datacomputer project.  When this work is complete, it will have some
impact on the language design; by and large however, we do not consider
this aspect of concurrent use to be a language problem.

Other aspects of the concurrent use problem, however, may require more
conscious participation by the user.  They relate to the semantics of
collections of data objects, when such collections span the boundaries
of files known to the internal operating system.  Here the question of
what constitutes an update conflict is more complex.  Related questions
arise in backup and recovery. If two files are related, then perhaps it
is meaningless to recover an earlier state of one without recovering the
corresponding state of the other.  These problems are yet to be
investigated.

Another problem in data sharing is that not all users of a database
should have the same concept of that database.  Examples: (1) for
privacy reasons, some users should be aware of only part of the database
(e.g., scientists doing statistical studies on medical files do not need
access to name and address), (2) for program-data independence, payroll
programs should access only data of concern in writing paychecks, even
though skill inventories may be stored in the same database, (3) for
global control of efficiency, simplicity in application programming, and
program-data independence each application program should "see" a data
organization that is best for its job.

To further analyze example (3), consider a database which contains
information about students, teachers, subjects and also indicates which
students have which teachers for which subjects.  Depending on the
problem to be solved, an application program may have a strong
requirement for one of the following organizations:
(1) entries of the form (student,teacher,subject) with no concern about
    redundancy.  In this organization an object of any of the three
    types may occur many times.
(2) entries of the form
             (student,       (teacher,subject),
                             (teacher,subject),
                             .
                             .
                             .
                             (teacher,subject))
(3) entries of the form
             (teacher,       subject,(student...student),
                             subject,(student...student),
                             subject,(student.. .student))
and other organizations are certainly possible.

One approach to this problem is to choose an organization for stored
data, and then have application programs write requests which organize



Winter, Hill & Greiff                                          [Page 13]

RFC 610           Further Datalanguage Design Concepts     December 1973


output in the form they want.  The application programmer applies his
ingenuity in stating the request so that the process of reorganization
is combined with the process of retrieval, and the result is relatively
efficient.  There are important, practical situations in which this
approach is adequate; in fact there are situations in which it is
desirable. In particular, if efficiency or cost is an overriding
consideration, it may be necessary for every application programmer to
be aware of all the data access and organization factors.  This may be
the case for a massive file, in which each retrieval must be tuned to
the access strategy and organization; any other mode of operation would
result in unacceptable costs or response times.

However, dependence between application programs and data organization
or access strategy is not a good policy in general. In a widely-shared
database, it can mean enormous cost in the event of database
reorganization, changes to access software, or even changes in the
storage medium.  Such a change may require reprogramming in hundreds of
application programs distributed throughout the network.

As a result, we see a need for a language which supports a spectrum of
operating modes, including: (1) application program is completely
independent of storage structure, access technique, and reorganization
strategy, (2) application program parametrically controls these, (3)
application program entirely controls them. For a widely-shared
database, mode (1) would be the preferred policy, except when (a) the
application programmer could do a better job than the system in making
decisions, and (b) the need for this increment of efficiency outweighed
the benefits of program-data independence.

In evaluating this question for a particular application, it is
important to realize the role of global efficiency analysis.  When there
are many users of a database, in some sense the best mode of operation
is that which minimizes the total cost of processing all requests and
the total cost of storing the data.  When applications come and go, as
real-world needs change, then the advantages of centralized control are
more likely to outweigh the advantages of optimization for a particular
application program.

The third major sub-problem arises in connection with item level
representations.  Because of the environment in which it executes, each
application program has a preferred set of formatting concepts, length
indicators, padding and alignment conventions, word sizes, character
representations, and so on.  Once again it is better policy for the
application program to be concerned only with the representations it
wants and not with the stored data representation.  However, there will
be cases in which efficiency for a given request overrides all other
factors.




Winter, Hill & Greiff                                          [Page 14]

RFC 610           Further Datalanguage Design Concepts     December 1973


At this level of representation, there is at least one additional
consideration: potential loss of information when conversion takes
place.  Whoever initiates a type conversion (and this will sometimes be
the datacomputer and sometimes the application program) must also be
responsible for seeing that the intent of the request is preserved.
Since the datacomputer must always be responsible for the consistency
and the meaning of a shared database, there are some conflicts to be
resolved here.

To summarize, it seems that the result of wide sharing of databases is
that a larger system must be considered in choosing a data management
policy for a particular database.  This larger system, in the case of
the datacomputer, consists of a network of geographically distributed
applications programs, a centralized database, and a centralized data
management system.  The requirement for datalanguage is to provide
flexibility in the management of this larger system.  In particular, it
must be possible to control when and where conversions, data re-
organizations, and access strategies are made.


2.6 Need for High Level Communication

All of the above considerations point to the need for high level
communication between the datacomputer and its users.  The complex and
distinct nature of datacomputer hardware make it imperative that
requests be put to the datacomputer so that it can make major decisions
regarding the access strategies to be used.  At the same time, the large
amounts of data stored and the demand of some users for extremely high
transmission bandwidths make it necessary to provide for user control of
some storage and transmission schemes.  The fact that databases will be
used by applications which desire different views of the same data and
with different constraints means that the datacomputer must be capable
of mapping one users request onto another users data.  Interprocess use
of the datacomputer means that datasharing must be completely
controllable to avoid the need for human intervention. Extensive
facilities for ensuring data integrity and controlling access must be
provided.


2.6.1 Data Description

Basic to all these needs is the requirement that the data stored at the
datacomputer be completely described in both functional and physical
parameters.  A high level description of the data is especially
important to provide the sharing and control of data.  The datacomputer
must be able to map between different hardware and different
applications. In its most trivial form this means being able to convert
between floating point number representations on different machines.  On



Winter, Hill & Greiff                                          [Page 15]

RFC 610           Further Datalanguage Design Concepts     December 1973


the other extreme it means being able to provide matrix data for the
ILLIAC IV as well as being able to provide answers to queries from a
natural language program, both addressed to the same weather data base.
Data descriptions must provide the ability to specify the bit level
representations and the logical properties and relationships of data.


2.6.2 Data integrity and Access Control

In the environment we have been describing, the problems of maintaining
data integrity and controlling use of data assume extreme importance.
Shared use of datacomputer files depends on the ability of the
datacomputer to guarantee that the restrictions on data-access are
strictly enforced.  Since different users will have different
descriptions, the access control mechanism must be associated with the
descriptions themselves.  One can control access to data by controlling
access to its various descriptors.  A user can be constrained to access
a given data base only through one specific description which limits the
data he can access.  In a system where the updaters of a database may be
unknown to each other, and possibly have different views of the data,
only the datacomputer can assure data integrity.  For this reason, all
restrictions on possible values of data objects, and on possible or
necessary relationships between objects must be stated in the data
description.


2.6.3 Optimization

The decisions regarding data access strategy must ordinarily be made at
the datacomputer, where knowledge of the physical considerations is
available.  These decisions cannot be made intelligently unless the
requests for data access are made at a high level.

For example, compare the following two situations: (1) a request calls
for output of _all_ weather observations made in California exhibiting
certain wind and pressure conditions, (2) a series of requests is sent,
each one retrieving California weather observations; when a request
finds an observation with the required wind and pressure conditions, it
transmits this observation to a remote system.  Both sessions achieve
the same result: the transmission of a certain set of observations to a
remote site for processing.  In the first session, however, the
datacomputer receives, at the outset, a description of the data that is
needed; in the second, it processes a series of requests, each one of
which is a surprise.

In the first case, a smart datacomputer has the option of retrieving all
of the needed data in one access to the mass storage device.  It can
then buffer this data on disk until the user is ready to accept it.  In



Winter, Hill & Greiff                                          [Page 16]

RFC 610           Further Datalanguage Design Concepts     December 1973


the second case, the datacomputer lacks the information it needs to make
such an optimization.

The language should permit and encourage users to provide the
information needed to do optimization.  The cost of not doing it is much
higher with mass storage devices and large files than it is in
conventional systems.


2.7 Application Oriented Concerns

In the above sections we have described a number of features which the
datacomputer system must provide.  In this section we focus on what is
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -