📄 rfc610.txt
字号:
datalanguage, based on our observations about the problem and theenvironment in which it is to be solved. The central problem is datamanagement, and the datacomputer shares the same goals as many currentlyavailable data management systems. Several aspects of the datacomputercreate a unique set of problems to be solved.2.2 Hardware Considerations2.2.1 Separate BoxThe datacomputer is a complete data management utility in a separate,closed box. That is, the hardware, the data and the data managementsoftware are segregated from any general-purpose processing facilities.There is a separate installation dedicated to data management.Datalanguage is the only means users have for communicating with thedatacomputer and the sole activity of the datacomputer is to processdatalanguage requests.Dedicating hardware provides an obvious advantage: one can specialize itfor data management. The processor(s) can be modified to have datamanagement "instructions"; common low-level software functions can bebuilt into the hardware.Winter, Hill & Greiff [Page 6]RFC 610 Further Datalanguage Design Concepts December 1973A less obvious, but possibly more significant, advantage is gained fromthe separateness itself. The system can be more easily protected. Afully-developed datacomputer on which there is only maintenance activitycan provide a very carefully controlled environment. First, it can bemade as physically secure as required. Second, it needs to execute onlysystem software developed at CCA; all user programs are in a high-levellanguage (datalanguage) which is effectively interpreted by the system.Hence, only datacomputer system software processes the data, and thesystem is not very vulnerable to capture by a hostile program. Thus,since there is the potential to develop data privacy and integrityservices that are not available on general-purpose systems, one canexpect less difficulty in developing privacy controls (includingphysical ones) for the datacomputer than for the systems it serves.2.2.2 Mass Storage HardwareThe datacomputer will store most of its data on mass storage devices,which have distinctive access characteristics. Two examples of suchhardware are Precision Instruments' Unicon 690 and Ampex Corporation'sTBM system. They are quite different from disks, and differsignificantly from one another.However, almost all users will be ignorant of the characteristics ofthese devices; many will not even know that the data they use is at thedatacomputer. Finally, as the development of the system progresses,data may be invisibly shunted from one datacomputer to another, and as aresult be stored in a physical format quite different from thatoriginally used.In such an environment, it is clear that requests for data should bestated in logical, not physical terms.2.3 Network EnvironmentThe network environment provides additional requirements fordatacomputer design.2.3.1 Remote UseSince the datacomputer is to be accessed remotely, the requirement foreffective data selection techniques and good mechanisms for theexpression of selection criteria is amplified. This is because of thenarrow path through which network users communicate with thedatacomputer. Presently, a typical process-to-process transfer rateover the Arpanet is 30 kilobits per second. While this can be increasedthrough optimization of software and protocols, and through additionalWinter, Hill & Greiff [Page 7]RFC 610 Further Datalanguage Design Concepts December 1973expenditure for hardware and communications lines, it seems safe toassume that it will not soon approach local transfer rates (measured inthe megabits per second).A typical request calls for either transfer of part of a file to aremote site, or for selective update to a file already stored at thedatacomputer. In both of these situations, good mechanisms forspecifying the parts of the data to be transmitted or changed willreduce the amount of data ordinarily transferred. This is extremelyimportant because with the low per bit cost of storing data at thedatacomputer, transmission costs will be a significant part of the totalcost of datacomputer usage.2.3.2 Interprocess Use of the Datacomputer SystemEffective use of the network requires that groups of processes, remotefrom one another, be capable of cooperating to accomplish a given taskor provide a given service. For example, to solve a given problem whichinvolves array manipulation, data retrieval, interaction with a user ata terminal, and the generalized services of a language like PL/I, it maybe most economical to have four cooperating processes. One of thesecould execute at the ILLIAC IV, one at the datacomputer, one at MULTICS,and one at a TIP. While there is overhead in setting up these fourprocesses and in having them communicate, each is doing its job on asystem specialized for that job. In many cases, the result of using thespecialized system is a gain of several orders of magnitude in economyor efficiency (for example, online storage at the datacomputer has acapital cost two orders of magnitude lower than online costs onconventional systems). As a result, there is considerable incentive toconsider solutions involving cooperating processes on specializedsystems.To summarize: the datacomputer must be prepared to function as acomponent of small networks of specialized processes, in order that itcan be used effectively in a network in which there are many specializednodes.2.3.3 Common Network Data HandlingA large network can support enough data management hardware to constructmore than one datacomputer. While this hardware can be combined intoone even larger datacomputer, there are advantages to configuring it astwo (or possibly more) systems. Each system should be large enough toobtain economies of scale in data storage and to support the datamanagement software. Important data bases can be duplicated, with acopy at each datacomputer; if one datacomputer fails, or is cut off byWinter, Hill & Greiff [Page 8]RFC 610 Further Datalanguage Design Concepts December 1973network failure, the data is still available. Even if duplicating thefile is not warranted, the description can be kept at the differentdatacomputers so that applications which need to store data constantlycan be guaranteed that at least one datacomputer is available to receiveinput.These kinds of failure protection involve cooperation between a pair ofdatacomputers; in some sense, they require that the two datacomputersfunction as a single system. Given a system of datacomputers (which onecan think of as a small network of datacomputers), it is obviouslypossible to experiment with providing additional services on thedatacomputer-network level. For example, all requests could beaddressed simply to the datacomputer-network; the datacomputer-networkcould then determine where each referenced file was stored (i.e., whichdatacomputer), and how best to satisfy the request.Here, two kinds of cooperation in the network environment have beenmentioned: cooperation among processes to solve a given problem, andcooperation among datacomputers to provide global optimizations in thenetwork-level data handling problem. These are only two examples,especially interesting because they can be implemented in the near term.In the network, much more general kinds of cooperation are possible, ifa little farther in the future. For example, eventually, one might wantthe datacomputer(s) to be part of a network-wide data management system,in which data, directories, services, and hardware were generallydistributed about the network. The entire system could function as awhole under the right circumstances. Most requests would use the dataand services of only a few nodes. Within this network-wide system,there would be more than one data management system, but all systemswould be interfaced through a common language. Because thedatacomputers represent the largest data management resource in thenetwork, they would certainly play an important role in any network-widesystem. The language of the datacomputer (datalanguage) is certainly aconvenient choice for the common language of such a system.Thus a final, albeit futuristic, requirement imposed by the network onthe design of the datacomputer system, is that it be a suitable majorcomponent for network-wide data management systems. If feasible, onewould like datalanguage to be a suitable candidate for the commonlanguage of a network-wide group of cooperating data management systems.2.4 Different Modes of Datacomputer UsageWithin this network environment, the datacomputer will play severalroles. In this section four such roles are described. Each of themimposes constraints on the design of datalanguage. We can analyze themin terms of four overlapping advantages which the datacomputer provides:Winter, Hill & Greiff [Page 9]RFC 610 Further Datalanguage Design Concepts December 1973 1. Generalized data management services 2. Large file handling 3. Shared access 4. Economic volume storageOf course, the primary reason for using the datacomputer will be thedata management services which it provides. However, for someapplications size will be the dominating factor in that the datacomputerwill provide for online access to files which are so large thatpreviously only offline storage and processing were possible. Theability to share data between different network sites with widelydifferent hardware is another feature provided only by the datacomputer.Economies of scale make the datacomputer a viable substitute for tapesin such applications as operating system backup.Naturally, a combination of the above factors will be at work in mostdatacomputer applications. The following subsections describe somepossible modes of interaction with the datacomputer.2.4.1 Support of Large Shared DatabasesThis is the most significant application of the datacomputer, in nearlyevery sense.Projects are already underway which will put databases of over onehundred billion bits online on the Arpanet datacomputer. Among theseare a database which will ultimately include 10 years of weatherobservations from 5000 weather stations located all over the world. Asonline databases, these are unprecedented in size. They will be ofinternational interest and be shared by users operating on a widevariety of hardware and in a wide variety of languages.Because these databases are online in an international network, andbecause they are expected to be of considerable interest to researchersin the related fields, it seems obvious that there will be extremelybroad patterns of use. A strong requirement, then, is a flexible andgeneral approach to handling them. This requirement of providingdifferent users of a database with different views of the data is anoverriding concern of the datalanguage design effort. It is discussedseparately in Section 2.5.2.4.2 Extensions of Local Data management SystemsWe imagine local data handling systems (data management systems,applications-oriented packages, text-handling systems, etc.) wanting totake advantage of the datacomputer. They may do so because of theWinter, Hill & Greiff [Page 10]RFC 610 Further Datalanguage Design Concepts December 1973economics of storage, because of the data management services, orbecause they want to take advantage of data already stored at thedatacomputer. In any case, such systems have some distinctiveproperties as datacomputer users: (1) most would use local data as wellas datacomputer data, (2) many would be concerned with the translationof local requests into datalanguage.For example, a system which does simple data retrieval and statisticalanalysis for non-programming social scientists might want to use acensus database stored at the datacomputer. Such a system may perform arange of data retrieval functions, and may need sophisticatedinteraction with the datacomputer. Its usage patterns would make quitea contrast with those of a single application program whose sole use ofthe datacomputer involves printing a specific report based on a singleknown file.This social-science system would also use some local databases, which itkeeps at its own site because they are small and more efficientlyaccessed locally. One would like it to be convenient to think of datathe same way, whether it is stored locally or at the datacomputer.Certainly at the lower levels of the local software, there will have tobe differences in interfacing; it would be nice, however, if localconcepts and operations could easily be translated into datalanguage.2.4.3 File Level Use of the DatacomputerIn this mode of use, other computer systems take advantage of the onlinestorage capacity of the datacomputer. To these systems, datacomputerstorage represents a new class of storage: cheaper and safer than tape,nearly as accessible as local disk. Perhaps they even automaticallymove files between local online storage and the datacomputer, givingusers the impression that everything is stored locally online.The distinctive feature of this mode of use is that the operations areon whole files.A system operating in this mode uses only the ability to store,retrieve, append, rename, do directory listings and the like. Anobvious way to make such file level handling easily available to thenetwork community is to make use of the File Transfer Protocol (seeNetwork Information Center document #17759 -- File Transfer Protocol)already in use for host to host file transfer.Although such "whole file" usage of the datacomputer would be motivatedprimarily by economic advantages of scale, data sharing at the filelevel could also be a concern. For example, the source files of commonnetwork software might reside at the datacomputer. These files haveWinter, Hill & Greiff [Page 11]RFC 610 Further Datalanguage Design Concepts December 1973
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -