📄 rfc610.txt
字号:
the system will ultimately provide other services (such as accounting
for use, monitoring performance) these are really auxiliary and common
to all service facilities.
This section presents global considerations for the design of
datalanguage, based on our observations about the problem and the
environment in which it is to be solved. The central problem is data
management, and the datacomputer shares the same goals as many currently
available data management systems. Several aspects of the datacomputer
create a unique set of problems to be solved.
2.2 Hardware Considerations
2.2.1 Separate Box
The datacomputer is a complete data management utility in a separate,
closed box. That is, the hardware, the data and the data management
software are segregated from any general-purpose processing facilities.
There is a separate installation dedicated to data management.
Datalanguage is the only means users have for communicating with the
datacomputer and the sole activity of the datacomputer is to process
datalanguage requests.
Dedicating hardware provides an obvious advantage: one can specialize it
for data management. The processor(s) can be modified to have data
management "instructions"; common low-level software functions can be
built into the hardware.
Winter, Hill & Greiff [Page 6]
RFC 610 Further Datalanguage Design Concepts December 1973
A less obvious, but possibly more significant, advantage is gained from
the separateness itself. The system can be more easily protected. A
fully-developed datacomputer on which there is only maintenance activity
can provide a very carefully controlled environment. First, it can be
made as physically secure as required. Second, it needs to execute only
system software developed at CCA; all user programs are in a high-level
language (datalanguage) which is effectively interpreted by the system.
Hence, only datacomputer system software processes the data, and the
system is not very vulnerable to capture by a hostile program. Thus,
since there is the potential to develop data privacy and integrity
services that are not available on general-purpose systems, one can
expect less difficulty in developing privacy controls (including
physical ones) for the datacomputer than for the systems it serves.
2.2.2 Mass Storage Hardware
The datacomputer will store most of its data on mass storage devices,
which have distinctive access characteristics. Two examples of such
hardware are Precision Instruments' Unicon 690 and Ampex Corporation's
TBM system. They are quite different from disks, and differ
significantly from one another.
However, almost all users will be ignorant of the characteristics of
these devices; many will not even know that the data they use is at the
datacomputer. Finally, as the development of the system progresses,
data may be invisibly shunted from one datacomputer to another, and as a
result be stored in a physical format quite different from that
originally used.
In such an environment, it is clear that requests for data should be
stated in logical, not physical terms.
2.3 Network Environment
The network environment provides additional requirements for
datacomputer design.
2.3.1 Remote Use
Since the datacomputer is to be accessed remotely, the requirement for
effective data selection techniques and good mechanisms for the
expression of selection criteria is amplified. This is because of the
narrow path through which network users communicate with the
datacomputer. Presently, a typical process-to-process transfer rate
over the Arpanet is 30 kilobits per second. While this can be increased
through optimization of software and protocols, and through additional
Winter, Hill & Greiff [Page 7]
RFC 610 Further Datalanguage Design Concepts December 1973
expenditure for hardware and communications lines, it seems safe to
assume that it will not soon approach local transfer rates (measured in
the megabits per second).
A typical request calls for either transfer of part of a file to a
remote site, or for selective update to a file already stored at the
datacomputer. In both of these situations, good mechanisms for
specifying the parts of the data to be transmitted or changed will
reduce the amount of data ordinarily transferred. This is extremely
important because with the low per bit cost of storing data at the
datacomputer, transmission costs will be a significant part of the total
cost of datacomputer usage.
2.3.2 Interprocess Use of the Datacomputer System
Effective use of the network requires that groups of processes, remote
from one another, be capable of cooperating to accomplish a given task
or provide a given service. For example, to solve a given problem which
involves array manipulation, data retrieval, interaction with a user at
a terminal, and the generalized services of a language like PL/I, it may
be most economical to have four cooperating processes. One of these
could execute at the ILLIAC IV, one at the datacomputer, one at MULTICS,
and one at a TIP. While there is overhead in setting up these four
processes and in having them communicate, each is doing its job on a
system specialized for that job. In many cases, the result of using the
specialized system is a gain of several orders of magnitude in economy
or efficiency (for example, online storage at the datacomputer has a
capital cost two orders of magnitude lower than online costs on
conventional systems). As a result, there is considerable incentive to
consider solutions involving cooperating processes on specialized
systems.
To summarize: the datacomputer must be prepared to function as a
component of small networks of specialized processes, in order that it
can be used effectively in a network in which there are many specialized
nodes.
2.3.3 Common Network Data Handling
A large network can support enough data management hardware to construct
more than one datacomputer. While this hardware can be combined into
one even larger datacomputer, there are advantages to configuring it as
two (or possibly more) systems. Each system should be large enough to
obtain economies of scale in data storage and to support the data
management software. Important data bases can be duplicated, with a
copy at each datacomputer; if one datacomputer fails, or is cut off by
Winter, Hill & Greiff [Page 8]
RFC 610 Further Datalanguage Design Concepts December 1973
network failure, the data is still available. Even if duplicating the
file is not warranted, the description can be kept at the different
datacomputers so that applications which need to store data constantly
can be guaranteed that at least one datacomputer is available to receive
input.
These kinds of failure protection involve cooperation between a pair of
datacomputers; in some sense, they require that the two datacomputers
function as a single system. Given a system of datacomputers (which one
can think of as a small network of datacomputers), it is obviously
possible to experiment with providing additional services on the
datacomputer-network level. For example, all requests could be
addressed simply to the datacomputer-network; the datacomputer-network
could then determine where each referenced file was stored (i.e., which
datacomputer), and how best to satisfy the request.
Here, two kinds of cooperation in the network environment have been
mentioned: cooperation among processes to solve a given problem, and
cooperation among datacomputers to provide global optimizations in the
network-level data handling problem. These are only two examples,
especially interesting because they can be implemented in the near term.
In the network, much more general kinds of cooperation are possible, if
a little farther in the future. For example, eventually, one might want
the datacomputer(s) to be part of a network-wide data management system,
in which data, directories, services, and hardware were generally
distributed about the network. The entire system could function as a
whole under the right circumstances. Most requests would use the data
and services of only a few nodes. Within this network-wide system,
there would be more than one data management system, but all systems
would be interfaced through a common language. Because the
datacomputers represent the largest data management resource in the
network, they would certainly play an important role in any network-wide
system. The language of the datacomputer (datalanguage) is certainly a
convenient choice for the common language of such a system.
Thus a final, albeit futuristic, requirement imposed by the network on
the design of the datacomputer system, is that it be a suitable major
component for network-wide data management systems. If feasible, one
would like datalanguage to be a suitable candidate for the common
language of a network-wide group of cooperating data management systems.
2.4 Different Modes of Datacomputer Usage
Within this network environment, the datacomputer will play several
roles. In this section four such roles are described. Each of them
imposes constraints on the design of datalanguage. We can analyze them
in terms of four overlapping advantages which the datacomputer provides:
Winter, Hill & Greiff [Page 9]
RFC 610 Further Datalanguage Design Concepts December 1973
1. Generalized data management services
2. Large file handling
3. Shared access
4. Economic volume storage
Of course, the primary reason for using the datacomputer will be the
data management services which it provides. However, for some
applications size will be the dominating factor in that the datacomputer
will provide for online access to files which are so large that
previously only offline storage and processing were possible. The
ability to share data between different network sites with widely
different hardware is another feature provided only by the datacomputer.
Economies of scale make the datacomputer a viable substitute for tapes
in such applications as operating system backup.
Naturally, a combination of the above factors will be at work in most
datacomputer applications. The following subsections describe some
possible modes of interaction with the datacomputer.
2.4.1 Support of Large Shared Databases
This is the most significant application of the datacomputer, in nearly
every sense.
Projects are already underway which will put databases of over one
hundred billion bits online on the Arpanet datacomputer. Among these
are a database which will ultimately include 10 years of weather
observations from 5000 weather stations located all over the world. As
online databases, these are unprecedented in size. They will be of
international interest and be shared by users operating on a wide
variety of hardware and in a wide variety of languages.
Because these databases are online in an international network, and
because they are expected to be of considerable interest to researchers
in the related fields, it seems obvious that there will be extremely
broad patterns of use. A strong requirement, then, is a flexible and
general approach to handling them. This requirement of providing
different users of a database with different views of the data is an
overriding concern of the datalanguage design effort. It is discussed
separately in Section 2.5.
2.4.2 Extensions of Local Data management Systems
We imagine local data handling systems (data management systems,
applications-oriented packages, text-handling systems, etc.) wanting to
take advantage of the datacomputer. They may do so because of the
Winter, Hill & Greiff [Page 10]
RFC 610 Further Datalanguage Design Concepts December 1973
economics of storage, because of the data management services, or
because they want to take advantage of data already stored at the
datacomputer. In any case, such systems have some distinctive
properties as datacomputer users: (1) most would use local data as well
as datacomputer data, (2) many would be concerned with the translation
of local requests into datalanguage.
For example, a system which does simple data retrieval and statistical
analysis for non-programming social scientists might want to use a
census database stored at the datacomputer. Such a system may perform a
range of data retrieval functions, and may need sophisticated
interaction with the datacomputer. Its usage patterns would make quite
a contrast with those of a single application program whose sole use of
the datacomputer involves printing a specific report based on a single
known file.
This social-science system would also use some local databases, which it
keeps at its own site because they are small and more efficiently
accessed locally. One would like it to be convenient to think of data
the same way, whether it is stored locally or at the datacomputer.
Certainly at the lower levels of the local software, there will have to
be differences in interfacing; it would be nice, however, if local
concepts and operations could easily be translated into datalanguage.
2.4.3 File Level Use of the Datacomputer
In this mode of use, other computer systems take advantage of the online
storage capacity of the datacomputer. To these systems, datacomputer
storage represents a new class of storage: cheaper and safer than tape,
nearly as accessible as local disk. Perhaps they even automatically
move files between local online storage and the datacomputer, giving
users the impression that everything is stored locally online.
The distinctive feature of this mode of use is that the operations are
on whole files.
A system operating in this mode uses only the ability to store,
retrieve, append, rename, do directory listings and the like. An
obvious way to make such file level handling easily available to the
network community is to make use of the File Transfer Protocol (see
Network Information Center document #17759 -- File Transfer Protocol)
already in use for host to host file transfer.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -