📄 vldb_1998_elementary.txt
字号:
<proceedings><paper><title>Atomicity versus Anonymity: Distributed Transactions for Electronic Commerce.</title><author><AuthorName>J. D. Tygar</AuthorName><institute><InstituteName></InstituteName><country></country></institute></author><year>1998</year><conference>International Conference on Very Large Data Bases</conference><citation><name>Untraceable Electronic Cash.</name><name>Transaction Processing: Concepts and Techniques.</name><name>Cryptographic Postage Indicia.</name><name>Verifiable Secret Sharing and Multiparty Protocols with Honest Majority (Extended Abstract).</name><name>A Method for Obtaining Digital Signatures and Public-Key Cryptosystems.</name><name>Atomicity in Electronic Commerce.</name><name>Review - Atomicity versus Anonymity: Distributed Transactions for Electronic Commerce.</name></citation><abstract>Electronic commerce challenges our notions of distributed transactions in several ways. I discuss issues how distributed transactions can apply to electronic transactions, with special emphasis on the role of atomicity. I discuss the application of these ideas to two systems I have helped designand build: NetBill (a system for highly atomic micro-transactions) and Cryptographic Postage Indicia (a system for generating postage on laser printers attached to PCs or other devices.) I discuss the difficulties in integrating atomic, anonymous payment systems and some issues in supporting anonymous auctions. Finally, I conclude with a set of open questions.</abstract></paper><paper><title>Technology and the Future of Commerce and Finance (Abstract).</title><author><AuthorName>David Elliot Shaw</AuthorName><institute><InstituteName></InstituteName><country></country></institute></author><year>1998</year><conference>International Conference on Very Large Data Bases</conference><citation></citation><abstract>Over the coming years, an increasingly ubiquitous and increasingly capacious Internet will introduce new opportunities for the creation of tightly integrated databases distributed across multiple institutions. These new capabilities, along with certain techniques arising from the emerging field of computational finance, could ultimately transform a substantial portionof the world's commercial and financial activity in fundamental ways. Thistalk will focus on some of the most significant changes such technologies may induce in the structure of the world financial system and the mechanisms of global commerce. Consideration will be given to such topics as algorithmic trading and portfolio optimization; electronic markets, automated market- making, and the historical inevitability of computational disintermediation; and the future of electronic commerce, including the potential use of shared knowledge bases incorporating standardized representations ofenormous numbers of products and services available from multiple sources.</abstract></paper><paper><title>Determining Text Databases to Search in the Internet.</title><author><AuthorName>Weiyi Meng</AuthorName><institute><InstituteName></InstituteName><country></country></institute></author><author><AuthorName>King-Lup Liu</AuthorName><institute><InstituteName></InstituteName><country></country></institute></author><author><AuthorName>Clement T. Yu</AuthorName><institute><InstituteName></InstituteName><country></country></institute></author><author><AuthorName>Xiaodong Wang</AuthorName><institute><InstituteName></InstituteName><country></country></institute></author><author><AuthorName>Yuhsi Chang</AuthorName><institute><InstituteName></InstituteName><country></country></institute></author><author><AuthorName>Naphtali Rishe</AuthorName><institute><InstituteName></InstituteName><country></country></institute></author><year>1998</year><conference>International Conference on Very Large Data Bases</conference><citation><name>Searching Distributed Collections with Inference Networks.</name><name>Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies.</name><name>Merging Ranks from Heterogeneous Internet Sources.</name><name>ALIWEB - Archie-like Indexing in the WEB.</name><name>A Clustered Search Algorithm Incorporating Arbitrary Term Dependencies.</name><name>Introduction to Modern Information Retrieval.
McGraw-Hill Book Company 1984, ISBN 0-07-054484-0</name><name>Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer.
Addison-Wesley 1989, ISBN 0-201-12227-8</name><name>Learning Collection Fusion Strategies.</name><name>SIFT - a Tool for Wide-Area Information Dissemination.</name><name>On the Estimation of the Number of Desired Records with Respect to a Given Query.</name><name>Server Ranking for Distributed Text Retrieval Systems on the Internet.</name></citation><abstract>Text data in the Internet can be partitioned into many databases naturally. Efficient retrieval of desired data can be achieved if we can accuratelypredict the usefulness of each database, because with such information, weonly need to retrieve potentially useful documents from useful databases. In this paper, we propose two new methods for estimating the usefulness oftext databases. For a given query, the usefulness of a text database in this paper is defined to be the number of documents in the database that aresufficiently similar to the query. Such a usefulness measure enables naive-users to make informed decision about which databases to search. We also consider the collection fusion problem. Because local databases may employsimilarity functions that are different from that used by the global database, the threshold used by a local database to determine whether a document is potentially useful may be different from that used by the global database. We provide techniques that determine the best threshold for a given local database.</abstract></paper><paper><title>Proximity Search in Databases.</title><author><AuthorName>Roy Goldman</AuthorName><institute><InstituteName></InstituteName><country></country></institute></author><author><AuthorName>Narayanan Shivakumar</AuthorName><institute><InstituteName></InstituteName><country></country></institute></author><author><AuthorName>Suresh Venkatasubramanian</AuthorName><institute><InstituteName></InstituteName><country></country></institute></author><author><AuthorName>Hector Garcia-Molina</AuthorName><institute><InstituteName></InstituteName><country></country></institute></author><year>1998</year><conference>International Conference on Very Large Data Bases</conference><citation><name>The Design and Analysis of Computer Algorithms.
Addison-Wesley 1974, ISBN 0-201-00029-6</name><name>A Linear-Time Algorithm for Finding Tree-Decompositions of Small Treewidth.</name><name>Shortest Path Queries in Digraphs of Small Treewidth.</name><name>DTL's DataSpot: Database Exploration Using Plain Language.</name><name>Integrating SQL Databases with Content-Specific Search Engines.</name><name>A Performance Study of Transitive Closure Algorithms.</name><name>Combining Fuzzy Information from Multiple Systems.</name><name>Faster shortest-path algorithms for planar graphs.</name><name>Applications of a Planar Separator Theorem.</name><name>Lore: A Database Management System for Semistructured Data.</name><name>Object Exchange Across Heterogeneous Information Sources.</name><name>Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer.
Addison-Wesley 1989, ISBN 0-201-12227-8</name><name>Principles of Database and Knowledge-Base Systems, Volume II.
Computer Science Press 1989, ISBN 0-7167-8162-X</name><name>Searching Large Lexicons for Partially Specified Terms using Compressed Inverted Files.</name></citation><abstract>An information retrieval (IR) engine can rank documents based on textual proximity of keywords within each document. In this paper we apply this notion to search across an entire database for objects that are "near" other relevant objects. Proximity search enables simple "focusing" queries basedon general relationships among objects, helpful for interactive query sessions. We view the database as a graph, with data in vertices (objects) andrelationships indicated by edges. Proximity is defined based on shortest paths between objects. We have implemented a prototype search engine that uses this model to enable keyword searches over databases, and we have found it very effective for quickly finding relevant information. Computing the distance between objects in a graph stored on disk can be very expensive. Hence, we show how to build compact indexes that allow us to quickly find the distance between objects at search time. Experiments show that our algorithms are efficient and scale well.</abstract></paper><paper><title>Incremental Maintenance for Materialized Views over Semistructured Data.</title><author><AuthorName>Serge Abiteboul</AuthorName><institute><InstituteName></InstituteName><country></country></institute></author><author><AuthorName>Jason McHugh</AuthorName><institute><InstituteName></InstituteName><country></country></institute></author><author><AuthorName>Michael Rys</AuthorName><institute><InstituteName></InstituteName><country></country></institute></author><author><AuthorName>Vasilis Vassalos</AuthorName><institute><InstituteName></InstituteName><country></country></institute></author><author><AuthorName>Janet L. Wiener</AuthorName><institute><InstituteName></InstituteName><country></country></institute></author><year>1998</year><conference>International Conference on Very Large Data Bases</conference><citation><name>Querying Semi-Structured Data.</name><name>Objects and Views.</name><name>The Lorel Query Language for Semistructured Data.</name><name>A Logical Query Language for Hypertext Systems.</name><name>A View Mechanism for Object-Oriented Databases.</name><name>On Modeling Cost Functions for Object-Oriented Databases.</name><name>Efficiently Updating Materialized Views.</name><name>Semistructured Data.</name><name>Adding Structure to Unstructured Data.</name><name>Programming Constructs for Unstructured Data.</name><name>The Object Database Standard: ODMG-93 (Release 1.1).</name><name>From Structured Documents to Novel Query Facilities.</name><name>Evaluating Queries with Generalized Path Expressions.</name><name>Algorithms for Deferred View Maintenance.</name><name>Optimizing Regular Path Expressions Using Graph Schemas.</name><name>A Cost Model for Clustered Object-Oriented Databases.</name><name>Incremental Updates for Materialized OQL Views.</name><name>DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases.</name><name>Incremental Maintenance of Views with Duplicates.</name><name>Maintenance of Materialized Views: Problems, Techniques, and Applications.</name><name>Maintaining Views Incrementally.</name><name>A Performance Analysis of View Materialization Strategies.</name><name>Implementing Incremental View Maintenance in Nested Data Models.</name><name>W3QS: A Query System for the World-Wide Web.</name><name>A Snapshot Differential Refresh Algorithm.</name><name>Lore: A Database Management System for Semistructured Data.</name><name>Querying the World Wide Web.</name><name>Representative Objects: Concise Representations of Semistructured, Hierarchial Data.</name><name>MedMaker: A Mediation System Based on Declarative Specifications.</name><name>Object Exchange Across Heterogeneous Information Sources.</name><name>The ADMS Project: View R Us.</name><name>Multiview: A Methodology for Supporting Multiple Views in Object-Oriented Databases.</name><name>Intra-Transaction Parallelism in the Mapping of an Object Model to a Relational Multi-Processor System.</name><name>Updatable Views in Object-Oriented Databases.</name><name>Virtual Schemas and Bases.</name><name>Query Decomposition and View Maintenance for Query Languages for Unstructured Data.</name><name>A First Course in Database Systems.</name><name>Graph Structured Views and Their Incremental Maintenance.</name></citation><abstract>Semistructured data is not strictly typed like relational or object-oriented data and may be irregular or incomplete.
It often arises in practice, e.g., when heterogeneous data sources are integrated or data is taken from the World Wide Web.
Views over semistructured data can be used to filter the data and to restructure (or provide structure to) it.
To achieve fast query response time, these views are often materialized.
This paper proposes an incremental maintenance algorithm for materialized views over semistructured data.
We use the graph-based data model OEM and the query language Lorel, developed at Stanford, as the framework for our work.
Our algorithm produces a set of queries that compute the updates to the view based upon an update of the source.
We develop an analytic cost model and compare the cost of executing our incremental maintenance algorithm to that of recomputing the view.
We show that for nearly all types of database updates, it is more efficient to apply our incremental maintenance algorithm to the view than to recompute the view from the database, even when there are thousands of updates.</abstract></paper><paper><title>Performance Measurements of Tertiary Storage Devices.</title><author><AuthorName>Theodore Johnson</AuthorName><institute><InstituteName></InstituteName><country></country></institute></author><author><AuthorName>Ethan L. Miller</AuthorName><institute><InstituteName></InstituteName><country></country></institute></author><year>1998</year><conference>International Conference on Very Large Data Bases</conference><citation><name>BigSur: A System For the Management of Earth Science Data.</name><name>Efficient organization and access of multi-dimensional datasets on tertiary storage systems.</name><name>Principles of Optimally Placing Data in Tertiary Storage Libraries.</name><name>Optimal Placement of High-Probability Randomly Retrieved Blocks on CLV Optical Discs.</name><name>Analysis of Striping Techniques in Robotic Storage Libraries.</name><name>The Five-Minute Rule Ten Years Later, and Other Computer Storage Rules of Thumb.</name><name>On the Modeling and Performance Characteristics of a Serpentine Tape Drive.</name><name>Random I/O Scheduling in Online Tertiary Storage Systems.</name><name>Hierarchical Storage Management for Relational Databases.</name><name>Coarse Indices for a Tape-Based Data Warehouse.</name><name>Using Tertiary Storage in Video-on-Demand Servers.</name><name>Architecture and Design of Storage and Data Management for the NASA Earth Observing System Data and Information System (EOSDIS).</name><name>Vertical Data Migration in Large Near-Line Document Archives Based on Markov-Chain Predictions.</name><name>Managing and Service a Multiterabyte Data Set at the Fermilab D0 Experiment.</name><name>Disk-Tape Joins: Synchronizing Disk and Tape Access.</name><name>Efficient Buffering for Concurrent Disk and Tape I/O.</name><name>Relational Joins for Data on Tertiary Storage.</name><name>Database Compression.</name><name>An Introduction to Disk Drive Modeling.</name><name>Reordering Query Execution in Tertiary Memory Databases.</name><name>Query Processing in Tertiary Memory Databases.</name><name>On-Demand Data Elevation in Hierarchical Multimedia Storage Servers.</name><name>Algorithmic Studies in Mass Storage Systems.
Computer Science Press 1983</name><name>Query Pre-Execution and Batching in Paradise: A Two-Pronged Approach to the Efficient Processing of Queries on Tape-Resident Raster Images.</name></citation><abstract>In spite of the rapid decrease in magnetic disk prices, tertiary storage (i.e., removable media in a robotic storage library) is becoming increasingly popular. The fact that so much data can be stored encourages applications that use ever more massive data sets. Application drivers include multimedia databases, data warehouses, scientific databases, and digital libraries and archives. The database research community has responded with investigations into systems integration, performance modeling, and performance optimization.
Tertiary storage systems present special challenges because of their unusual performance characteristics. Access latencies can range into minutes even on unloaded systems, but transfer rates can be very high. Tertiary storage is implemented with a wide array of technologies, each with its own performance quirks. However, little detailed performance information about tertiary storage devices has been published. In this paper we present detailed measurements of several tape drives and robotic storage libraries. Thetape drives we measure include the DLT 4000, DLT 7000, Ampex 310, IBM 3590, 4mm DAT, and the Sony DTF drive. This mixture of equipment includes highand low performance drives, serpentine and helical scan drives, and cartridge and cassette tapes. The detailed measurements of different aspects of tertiary storage system performance provides an understanding of the issues related to integrating tape- based tertiary storage with a DBMS.</abstract></paper><paper><title>Active Storage for Large-Scale Data Mining and Multimedia.</title><author><AuthorName>Erik Riedel</AuthorName><institute><InstituteName></InstituteName><country></country></institute></author><author><AuthorName>Garth A. Gibson</AuthorName><institute><InstituteName></InstituteName><country></country></institute></author><author><AuthorName>Christos Faloutsos</AuthorName><institute><InstituteName></InstituteName><country></country></institute></author><year>1998</year><conference>International Conference on Very Large Data Bases</conference><citation><name>Fast Algorithms for Mining Association Rules in Large Databases.</name><name>Parallel Mining of Association Rules.</name><name>High-Performance Sorting on Networks of Workstations.</name><name>QBISM: Extending a DBMS to Support 3D Medical Images.</name><name>The X-tree : An Index Structure for High-Dimensional Data.</name><name>A Cost Model For Nearest Neighbor Search in High-Dimensional Data Space.</name><name>Extensibility, Safety and Performance in the SPIN Operating System.</name><name>Disk Shadowing.</name><name>Database Machines: An Idea Whose Time Passed? A Critique of the Future of Database Machines.</name><name>The TickerTAIP Parallel RAID Architecture.</name><name>A Performance Evaluation of Data Base Machine Architectures (Invited Paper).</name><name>Multiprocessor Hash-Based Join Algorithms.</name><name>Parallel Sorting on a Shared-Nothing Architecture using Probabilistic Splitting.</name><name>Parallel Database Systems: The Future of High Performance Database Systems.</name><name>RAID-II: A High-Bandwidth Network File Server.</name><name>Efficient and Effective Querying by Image Content.</name><name>Query by Image and Video Content: The QBIC System.</name><name>File Server Scaling with Network-Attached Secure Disks.</name><name>The Java&#153; Language Specification.</name><name>Application of Hash to Data Base Machine and Its Architecture.</name><name>Disk-directed I/O for MIMD Multiprocessors.</name><name>Petal: Distributed Virtual Disks.</name><name>Multi-Disk Management Algorithms.</name><name>Safe Kernel Extensions Without Run-Time Checking.</name><name>A Case for Redundant Arrays of Inexpensive Disks (RAID).</name><name>Informed Prefetching and Caching.</name><name>The Structure and Performance of Interpreters.</name><name>CASSM: A Cellular System for Very Large Data Bases.</name><name>Intelligent Access to Digital Video: Informedia Project.</name><name>Efficient Software-Based Fault Isolation.</name><name>The HP AutoRAID Hierarchical Storage System.</name><name>A General Approach to d-Dimensional Geometric Queries (Extended Abstract).</name></citation><abstract>The increasing performance and decreasing cost of processors and memory are causing system intelligence to move into peripherals from the CPU.
Storage system designers are using this trend toward "excess" compute power to perform more complex processing and optimizations inside storage devices.
To date, such optimizations have been at relatively low levels of the storage protocol.
At the same time, trends in storage density, mechanics, and electronics are eliminating the bottleneck in moving data off the media and putting pressure on interconnects and host processors to move data more efficiently.
We propose a system called Active Disks that takes advantage of processing power on individual disk drives to run application-level code.
Moving portions of an application's processing to execute directly at diskdrives can dramatically reduce data traffic and take advantage of the storage parallelism already present in large systems today.
We discuss several types of applications that would benefit from this capability with a focus on the areas of database, data mining, and multimedia.
We develop an analytical model of the speed- ups possible for scan-intensive applications in an Active Disk system.
We also experiment with a prototype Active Disk system using relatively low-powered processors in comparison to a database server system with a single, fast processor.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -