📄 sigmod_2002_elementary.txt
字号:
<proceedings><paper><title>Archiving scientific data</title><author><AuthorName>Peter Buneman</AuthorName><institute><InstituteName>University of Edinburgh and University of Pennsylvania</InstituteName><country></country></institute></author><author><AuthorName>Sanjeev Khanna</AuthorName><institute><InstituteName>University of Pennsylvania</InstituteName><country></country></institute></author><author><AuthorName>Keishi Tajima</AuthorName><institute><InstituteName>Japan Advanced Institute of Science and Technology</InstituteName><country></country></institute></author><author><AuthorName>Wang-Chiew Tan</AuthorName><institute><InstituteName>University of Pennsylvania</InstituteName><country></country></institute></author><year>2002</year><conference>International Conference on Management of Data</conference><citation><name>A. Bairoch and R. Apweiler. The SWISS-PROT protein sequence database and its supplement TrEMBL. Nucleic Acids Research, 28:45-48, 2000.</name><name>P. Buneman, S. Khanna, K. Tajima, and W. Tan. Archiving Scientific Data. Technical report, University of Pennsylvania, 2002.</name><name>The WWW Virtual Library of Cell Biology. http://vlib.org/Science/Cell_Biology/databases.shtml.</name><name>Concurrent Versions System. Unix man pages - cvs.</name><name>E. Myers. An O(ND) difference algorithm and its variations. Algorithmica, 1(2):251-266, 1986.</name><name>G. Cobena and S. Abiteboul and A. Marian. Detecting Changes in XML Documents. In Int'l Conf. on Data Engineering, 2001.</name><name>XML TreeDiff. http://www.alphaworks.ibm.com/formula/xmltreediff.</name><name>J. Clark and S. DeRose. XML Path Language (XPath). W3C Working Draft, November 1999. http://www.w3.org/TR/xpath.</name><name>James R. Driscoll , Neil Sarnak , Daniel D. Sleator , Robert E. Tarjan, Making data structures persistent, Journal of Computer and System Sciences, v.38 n.1, p.86-124, February 1989</name><name>K. Zhang , D. Shasha, Simple fast algorithms for the editing distance between trees and related problems, SIAM Journal on Computing, v.18 n.6, p.1245-1262, Dec. 1989</name><name>Dennis Shasha , Kaizhong Zhang, Fast algorithms for the unit cost editing distance between trees, Journal of Algorithms, v.11 n.4, p.581-621, Dec. 1990</name><name>Hartmut Liefke , Dan Suciu, XMill: an efficient compressor for XML data, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.153-164, May 15-18, 2000, Dallas, Texas, United States</name><name>Am&#233;lie Marian , Serge Abiteboul , Gregory Cobena , Laurent Mignet, Change-Centric Management of Versions in an XML Warehouse, Proceedings of the 27th International Conference on Very Large Data Bases, p.581-590, September 11-14, 2001</name><name>Online Mendelian Inheritance in Man, OMIM (TM), 2000. http://www.ncbi.nlm.nih.gov/omim/.</name><name>Peter Buneman , Susan Davidson , Wenfei Fan , Carmem Hara , Wang-Chiew Tan, Keys for XML, Proceedings of the 10th international conference on World Wide Web, p.201-210, May 01-05, 2001, Hong Kong, Hong Kong</name><name>The NIST Reference on Constants, Units, and Uncertainty. http://physics.nist.gov/cuu/Constants/links.html.</name><name>Raghu Ramakrishnan , Johannes Gehrke, Database Management Systems, McGraw-Hill Higher Education, 2000</name><name>Shu-Yao Chien , Vassilis J. Tsotras , Carlo Zaniolo, Efficient Management of Multiversion Documents by Object Referencing, Proceedings of the 27th International Conference on Very Large Data Bases, p.291-300, September 11-14, 2001</name><name>Sudarshan S. Chawathe , Anand Rajaraman , Hector Garcia-Molina , Jennifer Widom, Change detection in hierarchically structured information, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.493-504, June 04-06, 1996, Montreal, Quebec, Canada</name><name>Sudarshan S. Chawathe , Hector Garcia-Molina, Meaningful change detection in structured data, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.26-37, May 11-15, 1997, Tucson, Arizona, United States</name><name>Source Code Control System. Unix man pages - sccs.</name><name>A. R. Schmidt , Florian Waas , Martin L. Kersten , D. Florescu , I. Manolescu , M. J. Carey , R. Busse, The XML benchmark project, CWI (Centre for Mathematics and Computer Science), Amsterdam, The Netherlands, 2001</name><name>K. Tufte and D. Maier. Aggregation and Accumulation of XML Data. IEEE Data Engineering Bulletin, 24(2):34-39, 2001.</name><name>W. Miller and E. Myers. A file comparison program. Software-Practice and Experience, 15(11):1025-1040, 1985.</name><name>W3C. Extensible Markup Language (XML) 1.0, Feb 1998. http://www.w3.org/TR/REC-xml.</name><name>W3C. Namespaces in XML, January 1999. http://www.w3.org/TR/REC-xml-names.</name><name>W3C. XML Schema Part 0: Primer, May 2000. http://www.w3.org/TR/xmlschema-0/.</name><name>W3C. XQuery 1.0: An XML Query Language, June 2001. http://www.w3.org/TR/xquery/.</name></citation><abstract>We present an archiving technique for hierarchical data with key structure. Our approach is based on the notion of timestamps whereby an element appearing in multiple versions of the database is stored only once along with a compact description of versions in which it appears. The basic idea of timestamping was discovered by Driscoll et. al. in the context of persistent data structures where one wishes to track the sequences of changes made to a data structure. We extend this idea to develop an archiving tool for XML data that is capable of providing meaningful change descriptions and can also efficiently support a variety of basic functions concerning the evolution of data such as retrieval of any specific version from the archive and querying the temporal history of any element. This is in contrast to diff-based approaches where such operations may require undoing a large number of changes or significant reasoning with the deltas. Surprisingly, our archiving technique does not incur any significant space overhead when contrasted with other approaches. Our experimental results support this and also show that the compacted archive file interacts well with other compression techniques. Finally, another useful property of our approach is that the resulting archive is also in XML and hence can directly leverage existing XML tools.</abstract></paper><paper><title>Efficient integration and aggregation of historical information</title><author><AuthorName>Mirek Riedewald</AuthorName><institute><InstituteName>University of California, Santa Barbara, CA</InstituteName><country></country></institute></author><author><AuthorName>Divyakant Agrawal</AuthorName><institute><InstituteName>University of California, Santa Barbara, CA</InstituteName><country></country></institute></author><author><AuthorName>Amr El Abbadi</AuthorName><institute><InstituteName>University of California, Santa Barbara, CA</InstituteName><country></country></institute></author><year>2002</year><conference>International Conference on Management of Data</conference><citation><name>Bruno Becker , Stephan Gschwind , Thomas Ohler , Bernhard Seeger , Peter Widmayer, An asymptotically optimal multiversion B-tree, The VLDB Journal &mdash; The International Journal on Very Large Data Bases, v.5 n.4, p.264-275, December 1996</name><name>Stefan Berchtold , Christian B&#246;hm , Hans-Peter Kriegel, Improving the Query Performance of High-Dimensional Index Structures by Bulk-Load Operations, Proceedings of the 6th International Conference on Extending Database Technology: Advances in Database Technology, p.216-230, March 23-27, 1998</name><name>Gerth St&#248;lting Brodal, Partially persistent data structures of bounded degree with constant update time, Nordic Journal of Computing, v.3 n.3, p.238-255, Fall 1996</name><name>Chee Yong Chan , Yannis E. Ioannidis, Hierarchical Prefix Cubes for Range-Sum Queries, Proceedings of the 25th International Conference on Very Large Data Bases, p.675-686, September 07-10, 1999</name><name>Surajit Chaudhuri , Umeshwar Dayal, An overview of data warehousing and OLAP technology, ACM SIGMOD Record, v.26 n.1, p.65-74, March 1997</name><name>James R. Driscoll , Neil Sarnak , Daniel D. Sleator , Robert E. Tarjan, Making data structures persistent, Journal of Computer and System Sciences, v.38 n.1, p.86-124, February 1989</name><name>C. S. Jensen et al. Temporal Databases - Research and Practice, volume 1399 of LNCS, chapter The Consensus Glossary of Temporal Database Concepts, pages 367-405. Springer Verlag, 1998.</name><name>Steven Geffner , Divyakant Agrawal , Amr El Abbadi, The Dynamic Data Cube, Proceedings of the 7th International Conference on Extending Database Technology: Advances in Database Technology, p.237-253, March 27-31, 2000</name><name>Bruce C. Huang, Parallel Algorithms for Computing Temporal Aggregates, Proceedings of the 15th International Conference on Data Engineering, p.418, March 23-26, 1999</name><name>Jim Gray , Surajit Chaudhuri , Adam Bosworth , Andrew Layman , Don Reichart , Murali Venkatrao , Frank Pellow , Hamid Pirahesh, Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals, Data Mining and Knowledge Discovery, v.1 n.1, p.29-53, 1997</name><name>C. J. Hahn, S. G. Warren, and J. London. Edited synoptic cloud reports from ships and land stations over the globe, 1982-1991, 1996. Data available at http://cdiac.esd.ornl.gov/ftp/ndp026b.</name><name>Ching-Tien Ho , Rakesh Agrawal , Nimrod Megiddo , Ramakrishnan Srikant, Range queries in OLAP data cubes, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.73-88, May 11-15, 1997, Tucson, Arizona, United States</name><name>W. H. Inmon. What is a data warehouse? White Paper, 2000. Available at http://www.billinmon.com/cif/edw/edw_content.html.</name><name>Nick Kline , Richard Thomas Snodgrass, Computing Temporal Aggregates, Proceedings of the Eleventh International Conference on Data Engineering, p.222-231, March 06-10, 1995</name><name>Anil Kumar , Vassilis J. Tsotras , Christos Faloutsos, Designing Access Methods for Bitemporal Databases, IEEE Transactions on Knowledge and Data Engineering, v.10 n.1, p.1-20, January 1998</name><name>Iosif Lazaridis , Sharad Mehrotra, Progressive approximate aggregate queries with a multi-resolution tree structure, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.401-412, May 21-24, 2001, Santa Barbara, California, United States</name><name>Ines Fernando Vega Lopez, Scalable Algorithms for Large Temporal Aggregation, Proceedings of the 16th International Conference on Data Engineering, p.145, February 28-March 03, 2000</name><name>Melissa E. O'Neill , F. Warren Burton, A new method for functional arrays, Journal of Functional Programming, v.7 n.5, p.487-513, September 1997</name><name>Mirek Riedewald , Divyakant Agrawal , Amr El Abbadi, pCube: Update-Efficient Online Aggregation with Progressive Feedback and Error Bounds, Proceedings of the 12th International Conference on Scientific and Statistical Database Management (SSDBM'00), p.95, July 26-28, 2000</name><name>Mirek Riedewald , Divyakant Agrawal , Amr El Abbadi, Flexible Data Cubes for Online Aggregation, Proceedings of the 8th International Conference on Database Theory, p.159-173, January 04-06, 2001</name><name>M. Riedewald, D. Agrawal, and A. El Abbadi. Efficient integration and aggregation of historical information. Technical Report 2002-07, University of California, Santa Barbara, 2002.</name><name>Yufei Tao , Dimitris Papadias, MV3R-Tree: A Spatio-Temporal Access Method for Timestamp and Interval Queries, Proceedings of the 27th International Conference on Very Large Data Bases, p.431-440, September 11-14, 2001</name><name>Donhui Zhang , Alexander Markowetz , Vassilis Tsotras , Dimitrios Gunopulos , Bernhard Seeger, Efficient computation of temporal aggregates with range predicates, Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.237-245, May 2001, Santa Barbara, California, United States</name><name>D. Zhang, V. J. Tsotras, and D. Gunopulos. Efficient aggregation over objects with extent. In Proc. Int. Conf. on Extending Database Technology (EDBT), 2002. To appear.</name></citation><abstract>Data warehouses support the analysis of historical data. This often involves aggregation over a period of time. Furthermore, data is typically incorporated in the warehouse in the increasing order of a time attribute, e.g., date of sale or time of a temperature measurement. In this paper we propose a framework to take advantage of this append only nature of updates due to a time attribute. The framework allows us to integrate large amounts of new data into the warehouse and generate historical summaries efficiently. Query and update costs are virtually independent from the extent of the data set in the time dimension, making our framework an attractive aggregation approach for append-only data streams. A specific instantiation of the general approach is developed for MOLAP data cubes, involving a new data structure for append-only arrays with pre-aggregated values. Our framework is applicable to point data and data with extent, e.g., hyper-rectangles.</abstract></paper><paper><title>An adaptive peer-to-peer network for distributed caching of OLAP results</title><author><AuthorName>Panos Kalnis</AuthorName><institute><InstituteName>Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong</InstituteName><country></country></institute></author><author><AuthorName>Wee Siong Ng</AuthorName><institute><InstituteName>National University of Singapore</InstituteName><country></country></institute></author><author><AuthorName>Beng Chin Ooi</AuthorName><institute><InstituteName>National University of Singapore</InstituteName><country></country></institute></author><author><AuthorName>Dimitris Papadias</AuthorName><institute><InstituteName>Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong</InstituteName><country></country></institute></author><author><AuthorName>Kian-Lee Tan</AuthorName><institute><InstituteName>National University of Singapore</InstituteName><country></country></institute></author><year>2002</year><conference>International Conference on Management of Data</conference><citation><name>J. Albrecht , W. Lehner, On-Line Analytical Processing in Distributed Data Warehouses, Proceedings of the 1998 International Symposium on Database Engineering & Applications, p.78, July 08-10, 1998</name><name>P. Cao, J. Zhang, and P. B. Beach. Active cache: Caching dynamic contents on the web. In Middleware Conference, 1998.</name><name>Shaul Dar , Michael J. Franklin , Bj&#246;rn &#222;&#243;r J&#243;nsson , Divesh Srivastava , Michael Tan, Semantic Data Caching and Replacement, Proceedings of the 22th International Conference on Very Large Data Bases, p.330-341, September 03-06, 1996</name><name>Prasad Deshpande , Jeffrey F. Naughton, Aggregate Aware Caching for Multi-Dimensional Queries, Proceedings of the 7th International Conference on Extending Database Technology: Advances in Database Technology, p.167-182, March 27-31, 2000</name><name>Prasad M. Deshpande , Karthikeyan Ramasamy , Amit Shukla , Jeffrey F. Naughton, Caching multidimensional queries using chunks, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.259-270, June 01-04, 1998, Seattle, Washington, United States</name><name>Hector Garcia-Molina , Wilburt J. Labio , Janet L. Wiener , Yue Zhuge, Distributed and parallel computing issues in data warehousing (abstract), Proceedings of the seventeenth annual ACM symposium on Principles of distributed computing, p.7, June 28-July 02, 1998, Puerto Vallarta, Mexico</name><name>Gnutella. http://gnutella.wego.com.</name><name>S. Gribble, A. Halevy, Z. Ives, M. Rodrig, and D. Suciu. What can databases do for peer-to-peer? In WebDB Workshop, 2001.</name><name>Venky Harinarayan , Anand Rajaraman , Jeffrey D. Ullman, Implementing data cubes efficiently, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.205-216, June 04-06, 1996, Montreal, Quebec, Canada</name><name>Richard Hull , Gang Zhou, A framework for supporting data integration using the materialized and virtual approaches, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.481-492, June 04-06, 1996, Montreal, Quebec, Canada</name><name>Icq. http://www.icq.com.</name><name>Panos Kalnis , Dimitris Papadias, Proxy-server architectures for OLAP, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.367-378, May 21-24, 2001, Santa Barbara, California, United States</name><name>Arthur M. Keller , Julie Basu, A predicate-based caching scheme for client-server database architectures, The VLDB Journal &mdash; The International Journal on Very Large Data Bases, v.5 n.1, p.035-047, January 1996</name><name>Donald Kossmann, The state of the art in distributed query processing, ACM Computing Surveys (CSUR), v.32 n.4, p.422-469, Dec. 2000</name><name>Yannis Kotidis , Nick Roussopoulos, DynaMat: a dynamic view management system for data warehouses, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.371-382, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States</name><name>Thanasis Loukopoulos , Panos Kalnis , Ishfaq Ahmad , Dimitris Papadias, Active Caching of On-Line-Analytical-Processing Queries in WWW Proxies, Proceedings of the 2001 International Conference on Parallel Processing, p.419-426, September 03-07, 2001</name><name>Napster. http://www.napster.com.</name><name>W. S. Ng, B. C. Ooi, and K. L. Tan. Bestpeer: A self configurable peer-to-peer system (poster). In ICDE, 2002.</name><name>Olap council apb-1 olap benchmark r-ii. http://www.olapcouncil.org.</name><name>M. Tamer &#214;zsu , Patrick Valduriez, Principles of distributed database systems (2nd ed.), Prentice-Hall, Inc., Upper Saddle River, NJ, 1999</name><name>Peter Scheuermann , Junho Shim , Radek Vingralek, WATCHMAN: A Data Warehouse Intelligent Cache Manager, Proceedings of the 22th International Conference on Very Large Data Bases, p.51-62, September 03-06, 1996</name><name>Seti@home. http://setiathome.ssl.berkely.edu.</name><name>Amit Shukla , Prasad Deshpande , Jeffrey F. Naughton, Materialized View Selection for Multidimensional Datasets, Proceedings of the 24rd International Conference on Very Large Data Bases, p.488-499, August 24-27, 1998</name><name>Amit Shukla , Prasad Deshpande , Jeffrey F. Naughton, Materialized View Selection for Multi-Cube Data Models, Proceedings of the 7th International Conference on Extending Database Technology: Advances in Database Technology, p.269-284, March 27-31, 2000</name><name>Michael Stonebraker , Paul M. Aoki , Witold Litwin , Avi Pfeffer , Adam Sah , Jeff Sidell , Carl Staelin , Andrew Yu, Mariposa: a wide-area distributed database system, The VLDB Journal &mdash; The International Journal on Very Large Data Bases, v.5 n.1, p.048-063, January 1996</name><name>Beverly Yang , Hector Garcia-Molina, Comparing Hybrid Peer-to-Peer Systems, Proceedings of the 27th International Conference on Very Large Data Bases, p.561-570, September 11-14, 2001</name><name>Neal Young, On-line caching as cache size varies, Proceedings of the second annual ACM-SIAM symposium on Discrete algorithms, p.241-250, January 28-30, 1991, San Francisco, California, United States</name><name>Yihong Zhao , Prasad M. Deshpande , Jeffrey F. Naughton, An array-based algorithm for simultaneous multidimensional aggregates, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.159-170, May 11-15, 1997, Tucson, Arizona, United States</name></citation><abstract>Peer-to-Peer (P2P) systems are becoming increasingly popular as they enable users to exchange digital information by participating in complex networks. Such systems are inexpensive, easy to use, highly scalable and do not require central administration. Despite their advantages, however, limited work has been done on employing database systems on top of P2P networks.Here we propose the PeerOLAP architecture for supporting On-Line Analytical Processing queries. A large number low-end clients, each containing a cache with the most useful results, are connected through an arbitrary P2P network. If a query cannot be answered locally (i.e. by using the cache contents of the computer where it is issued), it is propagated through the network until a peer that has cached the answer is found. An answer may also be constructed by partial results from many peers. Thus PeerOLAP acts as a large distributed cache, which amplifies the benefits of traditional client-side caching. The system is fully distributed and can reconfigure itself on-the-fly in order to decrease the query cost for the observed workload. This paper describes the core components of PeerOLAP and presents our results both from simulation and a prototype installation running on geographically remote peers.</abstract></paper><paper><title>Rate-based query optimization for streaming information sources</title><author><AuthorName>Stratis D. Viglas</AuthorName><institute><InstituteName>University of Wisconsin-Madison, Madison, WI</InstituteName><country></country></institute></author><author><AuthorName>Jeffrey F. Naughton</AuthorName><institute><InstituteName>University of Wisconsin-Madison, Madison, WI</InstituteName><country></country></institute></author><year>2002</year><conference>International Conference on Management of Data</conference><citation><name>Arasu, B. Babcock, S. Babu, J. McAlister and J. Widom, Characterizing Memory Requirements for Queries over Continuous Data Streams, Stanford Techinical Report, November 2001, http://dbpubs.stanford.edu/pub/2001-49.</name><name>Ron Avnur , Joseph M. Hellerstein, Eddies: continuously adaptive query processing, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.261-272, May 15-18, 2000, Dallas, Texas, United States</name><name>Dimitri Bertsekas , Robert Gallager, Data networks, Prentice-Hall, Inc., Upper Saddle River, NJ, 1987</name><name>Shivnath Babu , Jennifer Widom, Continuous queries over data streams, ACM SIGMOD Record, v.30 n.3, September 2001</name><name>Jianjun Chen , David J. DeWitt , Feng Tian , Yuan Wang, NiagaraCQ: a scalable continuous query system for Internet databases, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.379-390, May 15-18, 2000, Dallas, Texas, United States</name><name>Mayur Datar , Aristides Gionis , Piotr Indyk , Rajeev Motwani, Maintaining stream statistics over sliding windows: (extended abstract), Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms, p.635-644, January 06-08, 2002, San Francisco, California</name><name>Minos N. Garofalakis , Yannis E. Ioannidis, Parallel Query Scheduling and Optimization with Time- and Space-Shared Resources, Proceedings of the 23rd International Conference on Very Large Data Bases, p.296-305, August 25-29, 1997</name><name>Joseph M. Hellerstein, Optimization techniques for queries with expensive methods, ACM Transactions on Database Systems (TODS), v.23 n.2, p.113-157, June 1998</name><name>Wei Hong , Michael Stonebraker, Optimization of parallel query execution plans in XPRS, Distributed and Parallel Databases, v.1 n.1, p.9-32, Jan. 1993</name><name>Zachary G. Ives , Daniela Florescu , Marc Friedman , Alon Levy , Daniel S. Weld, An adaptive query execution system for data integration, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.299-310, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States</name><name>Z. G. Ives, A. Y. Levy and D. S. Weld. Efficient Evaluation of Regular Path Expressions on Streaming XML Data, University of Washington, Technical Report UW-CSE-2000-05-02.</name><name>Navin Kabra , David J. DeWitt, Efficient mid-query re-optimization of sub-optimal query execution plans, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.106-117, June 01-04, 1998, Seattle, Washington, United States</name><name>Chiang Lee , Chih-Horng Ke , J.-B. Chang , Yaw-Huei Chen, Minimization of Resource Consumption for Multidatabase Query Optimization, Proceedings of the 3rd IFCIS International Conference on Cooperative Information Systems, p.241-250, August 20-22, 1998</name><name>J. Naughton, D. J. DeWitt, D. Maier et al. The Niagara Internet Query System, IEEE Data Engineering Bulletin 24 (2): 27-33 (2001), http://www.cs.wisc.edu/niagara.</name><name>Kenneth W. Ng , Zhenghao Wang , Richard R. Muntz , Silvia Nittel, Dynamic Query Re-Optimization, Proceedings of the 11th International Conference on Scientific on Scientific and Statistical Database Management, p.264, July 28-30, 1999</name><name>P. Griffiths Selinger , M. M. Astrahan , D. D. Chamberlin , R. A. Lorie , T. G. Price, Access path selection in a relational database management system, Proceedings of the 1979 ACM SIGMOD international conference on Management of data, May 30-June 01, 1979, Boston, Massachusetts</name><name>Michael Stonebraker , Paul M. Aoki , Witold Litwin , Avi Pfeffer , Adam Sah , Jeff Sidell , Carl Staelin , Andrew Yu, Mariposa: a wide-area distributed database system, The VLDB Journal &mdash; The International Journal on Very Large Data Bases, v.5 n.1, p.048-063, January 1996</name><name>G. Schumacher. GEI's Experience with Britton-Lee's IDM, IWDM, 1983, pp. 233-241.</name><name>Leonard D. Shapiro, Join processing in database systems with large main memories, ACM Transactions on Database Systems (TODS), v.11 n.3, p.239-264, Sept. 1986</name><name>Jayavel Shanmugasundaram , Eugene J. Shekita , Rimon Barr , Michael J. Carey , Bruce G. Lindsay , Hamid Pirahesh , Berthold Reinwald, Efficiently Publishing Relational Data as XML Documents, Proceedings of the 26th International Conference on Very Large Data Bases, p.65-76, September 10-14, 2000</name><name>J. Shanmugasundaram, K. Tufte, D. J. DeWitt, J. F. Naughton and D. Maier. Architecting a Network Query Engine for Producing Partial Results, WebDB 2000.</name><name>T. Urhan and M. J. Franklin. Xjoin: A Reactively-Scheduled Pipelined Join Operator, IEEE Data Engineering Bulletin, June 2000, (23) 2:27-33.</name><name>Tolga Urhan , Michael J. Franklin , Laurent Amsaleg, Cost-based query scrambling for initial delays, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.130-141, June 01-04, 1998, Seattle, Washington, United States</name><name>A. N. Wilschut and P. M. G. Apers. Pipelining in Query Execution, Conference on Databases, Parallel Architectures and their Applications, Miami, 1991.</name><name>Chun Zhang , Jeffrey Naughton , David DeWitt , Qiong Luo , Guy Lohman, On supporting containment queries in relational database management systems, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.425-436, May 21-24, 2001, Santa Barbara, California, United States</name></citation><abstract>Relational query optimizers have traditionally relied upon table cardinalities when estimating the cost of the query plans they consider. While this approach has been and continues to be successful, the advent of the Internet and the need to execute queries over streaming sources requires a different approach, since for streaming inputs the cardinality may not be known or may not even be knowable (as is the case for an unbounded stream.) In view of this, we propose shifting from a cardinality-based approach to a rate-based approach, and give an optimization framework that aims at maximizing the output rate of query evaluation plans. This approach can be applied to cases where the cardinality-based approach cannot be used. It may also be useful for cases where cardinalities are known, because by focusing on rates we are able not only to optimize the time at which the last result tuple appears, but also to optimize for the number of answers computed at any specified time after the query evaluation commences. We present a preliminary validation of our rate-based optimization framework on a prototype XML query engine, though it is generic enough to be used in other database contexts. The results show that rate-based optimization is feasible and can indeed yield correct decisions.</abstract></paper><paper><title>Continuously adaptive continuous queries over streams</title><author><AuthorName>Samuel Madden</AuthorName><institute><InstituteName>UC Berkeley</InstituteName><country></country></institute></author><author><AuthorName>Mehul Shah</AuthorName><institute><InstituteName>UC Berkeley</InstituteName><country></country></institute></author><author><AuthorName>Joseph M. Hellerstein</AuthorName><institute><InstituteName>UC Berkeley</InstituteName><country></country></institute></author><author><AuthorName>Vijayshankar Raman</AuthorName><institute><InstituteName>IBM Almaden Research Center</InstituteName><country></country></institute></author><year>2002</year><conference>International Conference on Management of Data</conference><citation><name>Ron Avnur , Joseph M. Hellerstein, Eddies: continuously adaptive query processing, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.261-272, May 15-18, 2000, Dallas, Texas, United States</name><name>J. Chen, D. DeWitt, and J. Naughton. Design and evaluation of alternative selection placement strategies in optimizing continuous queries. In ICDE, San Jose, CA, February 2002.</name><name>Jianjun Chen , David J. DeWitt , Feng Tian , Yuan Wang, NiagaraCQ: a scalable continuous query system for Internet databases, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.379-390, May 15-18, 2000, Dallas, Texas, United States</name><name>David J. DeWitt , Jeffrey F. Naughton , Donovan A. Schneider, An Evaluation of Non-Equijoin Algorithms, Proceedings of the 17th International Conference on Very Large Data Bases, p.443-452, September 03-06, 1991</name><name>C. Forgy. Rete: A fast algorithm for the many patterns/many objects match problem. Artificial Intelligence, 19(1):17-37, 1982.</name><name>E. Hanson, N. A. Fayoumi, C. Carnes, M. Kandil, H. Liu, M. Lu, J. Park, and A. Vernon. TriggerMan: An Asynchronous Trigger Processor as an Extension to an Object-Relational DBMS. Technical Report 97-024, University of Florida, December 1997.</name><name>Wendi Rabiner Heinzelman , Joanna Kulik , Hari Balakrishnan, Adaptive protocols for information dissemination in wireless sensor networks, Proceedings of the 5th annual ACM/IEEE international conference on Mobile computing and networking, p.174-185, August 15-19, 1999, Seattle, Washington, United States</name><name>J. M. Hellerstein, M. J. Franklin, S. Chandrasekaran, A. Deshpande, K. Hildrum, S. Madden, V. Raman, and M. Shah. Adaptive query processing: Technology in evolution. IEEE Data Engineering Bulletin, 23(2):7-18, 2000.</name><name>Jason Hill , Robert Szewczyk , Alec Woo , Seth Hollar , David Culler , Kristofer Pister, System architecture directions for networked sensors, Proceedings of the ninth international conference on Architectural support for programming languages and operating systems, p.93-104, November 2000, Cambridge, Massachusetts, United States</name><name>Zachary G. Ives , Daniela Florescu , Marc Friedman , Alon Levy , Daniel S. Weld, An adaptive query execution system for data integration, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.299-310, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States</name><name>J. M. Kahn , R. H. Katz , K. S. J. Pister, Next century challenges: mobile networking for &ldquo;Smart Dust&rdquo;, Proceedings of the 5th annual ACM/IEEE international conference on Mobile computing and networking, p.271-278, August 15-19, 1999, Seattle, Washington, United States</name><name>N. Lanham. The telegraph screen scraper, 2000. http://db.cs.berkeley.edu/ nickl/tess.</name><name>Ling Liu , Calton Pu , Wei Tang, Continual Queries for Internet Scale Event-Driven Information Delivery, IEEE Transactions on Knowledge and Data Engineering, v.11 n.4, p.610-628, July 1999</name><name>S. Madden and M. Franklin. Fjording the stream: An architecture for queries over streaming sensor data. San Jose, CA, February 2002. ICDE.</name><name>D. P. Miranker. Treat: A better match algorithm for ai production system matching. In Proceedings of AAAI, pages 42-47, 1987.</name><name>Hoshi Mistry , Prasan Roy , S. Sudarshan , Krithi Ramamritham, Materialized view selection and maintenance using multi-query optimization, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.307-318, May 21-24, 2001, Santa Barbara, California, United States</name><name>Philippe Bonnet , Johannes Gehrke , Praveen Seshadri, Towards Sensor Database Systems, Proceedings of the Second International Conference on Mobile Data Management, p.3-14, January 08-10, 2001</name><name>Vijayshankar Raman , Joseph M. Hellerstein, Interactive query processing, 2001</name><name>Prasan Roy , S. Seshadri , S. Sudarshan , Siddhesh Bhobe, Efficient and extensible algorithms for multi query optimization, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.249-260, May 15-18, 2000, Dallas, Texas, United States</name><name>P. Selinger, M. Astrahan, D. Chamberlin, R. Lorie, and T. Price. Access path selection in a relational database management system. pages 23-34, Boston, MA, 1979.</name><name>T. Sellis. Multiple query optimization. ACM Transactions on Database Systems, 1986.</name><name>Praveen Seshadri , Miron Livny , Raghu Ramakrishnan, The Design and Implementation of a Sequence Database System, Proceedings of the 22th International Conference on Very Large Data Bases, p.99-110, September 03-06, 1996</name><name>Mehul A. Shah , Michael J. Franklin , Samuel Madden , Joseph M. Hellerstein, Java support for data-intensive systems: experiences building the telegraph dataflow system, ACM SIGMOD Record, v.30 n.4, December 2001</name><name>M. Sullivan and A. Heybey. Tribeca: A system for managing large databases of network traffic. In Proceedings of the USENIX Annual Technical Conference, New Orleans, LA, June 1998.</name><name>Douglas Terry , David Goldberg , David Nichols , Brian Oki, Continuous queries over append-only databases, Proceedings of the 1992 ACM SIGMOD international conference on Management of data, p.321-330, June 02-05, 1992, San Diego, California, United States</name><name>T. Urhan and M. Franklin. XJoin: A reactively-scheduled pipelined join operator. IEEE Data Engineering Bulletin, pages 27-33, 2000 2000.</name><name>Tolga Urhan , Michael J. Franklin , Laurent Amsaleg, Cost-based query scrambling for initial delays, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.130-141, June 01-04, 1998, Seattle, Washington, United States</name><name>Annita N. Wilschut , Peter M. G. Apers, Dataflow query execution in a parallel main-memory environment, Proceedings of the first international conference on Parallel and distributed information systems, p.68-77, December 1991, Miami, Florida, United States</name></citation><abstract>We present a continuously adaptive, continuous query (CACQ) implementation based on the eddy query processing framework. We show that our design provides significant performance benefits over existing approaches to evaluating continuous queries, not only because of its adaptivity, but also because of the aggressive cross-query sharing of work and space that it enables. By breaking the abstraction of shared relational algebra expressions, our Telegraph CACQ implementation is able to share physical operators --- both selections and join state --- at a very fine grain. We augment these features with a grouped-filter index to simultaneously evaluate multiple selection predicates. We include measurements of the performance of our core system, along with a comparison to existing continuous query approaches.</abstract></paper><paper><title>Processing complex aggregate queries over data streams</title><author><AuthorName>Alin Dobra</AuthorName><institute><InstituteName>Cornell University</InstituteName><country></country></institute></author><author><AuthorName>Minos Garofalakis</AuthorName><institute><InstituteName>Bell Labs, Lucent</InstituteName><country></country></institute></author><author><AuthorName>Johannes Gehrke</AuthorName><institute><InstituteName>Cornell University</InstituteName><country></country></institute></author><author><AuthorName>Rajeev Rastogi</AuthorName><institute><InstituteName>Bell Labs, Lucent</InstituteName><country></country></institute></author><year>2002</year><conference>International Conference on Management of Data</conference><citation><name>Swarup Acharya , Phillip B. Gibbons , Viswanath Poosala , Sridhar Ramaswamy, Join synopses for approximate query answering, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.275-286, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States</name><name>Noga Alon , Phillip B. Gibbons , Yossi Matias , Mario Szegedy, Tracking join and self-join sizes in limited storage, Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.10-20, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States</name><name>Noga Alon , Yossi Matias , Mario Szegedy, The space complexity of approximating the frequency moments, Proceedings of the twenty-eighth annual ACM symposium on Theory of computing, p.20-29, May 22-24, 1996, Philadelphia, Pennsylvania, United States</name><name>Shivnath Babu , Jennifer Widom, Continuous queries over data streams, ACM SIGMOD Record, v.30 n.3, September 2001</name><name>L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. "Classification and Regression Trees". Chapman & Hall, 1984.</name><name>Kaushik Chakrabarti , Minos N. Garofalakis , Rajeev Rastogi , Kyuseok Shim, Approximate Query Processing Using Wavelets, Proceedings of the 26th International Conference on Very Large Data Bases, p.111-122, September 10-14, 2000</name><name>Surajit Chaudhuri , Umeshwar Dayal, An overview of data warehousing and OLAP technology, ACM SIGMOD Record, v.26 n.1, p.65-74, March 1997</name><name>Mayur Datar , Aristides Gionis , Piotr Indyk , Rajeev Motwani, Maintaining stream statistics over sliding windows: (extended abstract), Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms, p.635-644, January 06-08, 2002, San Francisco, California</name><name>A. Dobra, M. Garofalakis, J. Gehrke, and R. Rastogi. "Processing Complex Aggregate Queries over Data Streams". Bell Labs Tech. Memorandum, March 2002.</name><name>Pedro Domingos , Geoff Hulten, Mining high-speed data streams, Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, p.71-80, August 20-23, 2000, Boston, Massachusetts, United States</name><name>J. Feigenbaum , S. Kannan , M. Strauss , M. Viswanathan, An Approximate L1-Difference Algorithm for Massive Data Streams, Proceedings of the 40th Annual Symposium on Foundations of Computer Science, p.501, October 17-18, 1999</name><name>Minos N. Garofalakis , Phillip B. Gibbon, Approximate Query Processing: Taming the TeraBytes, Proceedings of the 27th International Conference on Very Large Data Bases, p.725, September 11-14, 2001</name><name>Johannes Gehrke , Flip Korn , Divesh Srivastava, On computing correlated aggregates over continual data streams, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.13-24, May 21-24, 2001, Santa Barbara, California, United States</name><name>Phillip B. Gibbons , Yossi Matias , Viswanath Poosala, Fast Incremental Maintenance of Approximate Histograms, Proceedings of the 23rd International Conference on Very Large Data Bases, p.466-475, August 25-29, 1997</name><name>Anna C. Gilbert , Yannis Kotidis , S. Muthukrishnan , Martin Strauss, Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries, Proceedings of the 27th International Conference on Very Large Data Bases, p.79-88, September 11-14, 2001</name><name>Michael Greenwald , Sanjeev Khanna, Space-efficient online computation of quantile summaries, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.58-66, May 21-24, 2001, Santa Barbara, California, United States</name><name>Sudipto Guha , Nick Koudas , Kyuseok Shim, Data-streams and histograms, Proceedings of the thirty-third annual ACM symposium on Theory of computing, p.471-475, July 2001, Hersonissos, Greece</name><name>S. Guha , N. Mishra , R. Motwani , L. O'Callaghan, Clustering data streams, Proceedings of the 41st Annual Symposium on Foundations of Computer Science, p.359, November 12-14, 2000</name><name>Peter J. Haas , Joseph M. Hellerstein, Ripple joins for online aggregation, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.287-298, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States</name><name>Yannis E. Ioannidis , Viswanath Poosala, Histogram-Based Approximation of Set-Valued Query-Answers, Proceedings of the 25th International Conference on Very Large Data Bases, p.174-185, September 07-10, 1999</name><name>Gurmeet Singh Manku , Sridhar Rajagopalan , Bruce G. Lindsay, Random sampling techniques for space efficient online computation of order statistics of large datasets, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.251-262, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States</name><name>Yossi Matias , Jeffrey Scott Vitter , Min Wang, Dynamic Maintenance of Wavelet-Based Histograms, Proceedings of the 26th International Conference on Very Large Data Bases, p.101-110, September 10-14, 2000</name><name>Jeffrey S. Vitter, Random sampling with a reservoir, ACM Transactions on Mathematical Software (TOMS), v.11 n.1, p.37-57, March 1985</name><name>Jeffrey Scott Vitter , Min Wang, Approximate computation of multidimensional aggregates of sparse data using wavelets, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.193-204, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States</name></citation><abstract>Recent years have witnessed an increasing interest in designing algorithms for querying and analyzing streaming data (i.e., data that is seen only once in a fixed order) with only limited memory. Providing (perhaps approximate) answers to queries over such continuous data streams is a crucial requirement for many application environments; examples include large telecom and IP network installations where performance data from different parts of the network needs to be continuously collected and analyzed.In this paper, we consider the problem of approximately answering general aggregate SQL queries over continuous data streams with limited memory. Our method relies on randomizing techniques that compute small "sketch" summaries of the streams that can then be used to provide approximate answers to aggregate queries with provable guarantees on the approximation error. We also demonstrate how existing statistical information on the base data (e.g., histograms) can be used in the proposed framework to improve the quality of the approximation provided by our algorithms. The key idea is to intelligently partition the domain of the underlying attribute(s) and, thus, decompose the sketching problem in a way that provably tightens our guarantees. Results of our experimental study with real-life as well as synthetic data streams indicate that sketches provide significantly more accurate answers compared to histograms for aggregate queries. This is especially true when our domain partitioning methods are employed to further boast the accuracy of the final estimates.</abstract></paper><paper><title>Best-effort cache synchronization with source cooperation</title><author><AuthorName>Chris Olston</AuthorName><institute><InstituteName>Stanford University</InstituteName><country></country></institute></author><author><AuthorName>Jennifer Widom</AuthorName><institute><InstituteName>Stanford University</InstituteName><country></country></institute></author><year>2002</year><conference>International Conference on Management of Data</conference><citation><name>B. Adelberg , H. Garcia-Molina , B. Kao, Applying update streams in a soft real-time database system, Proceedings of the 1995 ACM SIGMOD international conference on Management of data, p.245-256, May 22-25, 1995, San Jose, California, United States</name><name>Brad Adelberg , Ben Kao , Hector Garcia-Molina, Database Support for Efficiently Maintaining Derived Data, Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology, p.223-240, March 25-29, 1996</name><name>Sergey Brin , Lawrence Page, The anatomy of a large-scale hypertextual Web search engine, Proceedings of the seventh international conference on World Wide Web 7, p.107-117, April 1998, Brisbane, Australia</name><name>Gerth St&#248;lting Brodal , Jesper Larsson Tr&#228;ff , Christos D. Zaroliagis, A parallel priority queue with constant time operations, Journal of Parallel and Distributed Computing, v.49 n.1, p.4-21, Feb. 1998</name><name>M. Cherniack, M. J. Franklin, and S. Zdonik. Expressing user profiles for data recharging. IEEE Personal Communications: Special Issue on Pervasive Computing, 8(4):32-38, Aug. 2001.</name><name>J. Cho and H. Garcia-Molina. Estimating frequency of change. Technical report, Stanford University Computer Science Department, 2000. http://dbpubs.stanford.edu/pub/2000-4.</name><name>Junghoo Cho , Hector Garcia-Molina, Synchronizing a database to improve freshness, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.117-128, May 15-18, 2000, Dallas, Texas, United States</name><name>E. Cohen and H. Kaplan. Refreshment policies for web content caches. In Proceedings of the Twentieth Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM 2001), Anchorage, Alaska, Apr. 2001.</name><name>Pavan Deolasee , Amol Katkar , Ankur Panchbudhe , Krithi Ramamritham , Prashant Shenoy, Adaptive push-pull: disseminating dynamic web data, Proceedings of the 10th international conference on World Wide Web, p.265-274, May 01-05, 2001, Hong Kong, Hong Kong</name><name>Lyman Do , Prabhu Ram , Pamela Drew, The need for distributed asynchronous transactions, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.534-535, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States</name><name>T. Dorcey. CU-SeeMe desktop videoconferencing software. Connexions, 9(3), Mar. 1995.</name><name>D. Estrin, L. Girod, G. Pottie, and M. Srivastava. Instrumenting the world with wireless sensor networks. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2001), Salt Lake City, Utah, May 2001.</name><name>Avigdor Gal , Jonathan Eckstein, Managing periodically updated data in relational databases: a stochastic modeling approach, Journal of the ACM (JACM), v.48 n.6, p.1141-1183, November 2001</name><name>Richard A. Golding , Darrell D. E. Long, MODELING REPLICA DIVERGENCE IN A WEAK-CONSISTENCY PROTOCOL FOR GLOBAL-SCALE DISTRIBUTED DATA BASES, University of California at Santa Cruz, Santa Cruz, CA, 1993</name><name>J. M. Kahn , R. H. Katz , K. S. J. Pister, Next century challenges: mobile networking for &ldquo;Smart Dust&rdquo;, Proceedings of the 5th annual ACM/IEEE international conference on Mobile computing and networking, p.271-278, August 15-19, 1999, Seattle, Washington, United States</name><name>Alexandros Labrinidis , Nick Roussopoulos, Update Propagation Strategies for Improving the Quality of Data on the Web, Proceedings of the 27th International Conference on Very Large Data Bases, p.391-400, September 11-14, 2001</name><name>M. J. McPhaden. Tropical Atmosphere Ocean Project, Pacific Marine Environmental Laboratory, 2001. http://www.pmel.noaa.gov/tao/.</name><name>Chris Olston , Boon Thau Loo , Jennifer Widom, Adaptive precision setting for cached approximate values, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.355-366, May 21-24, 2001, Santa Barbara, California, United States</name><name>Chris Olston , Jennifer Widom, Offering a Precision-Performance Tradeoff for Aggregation Queries over Replicated Data, Proceedings of the 26th International Conference on Very Large Data Bases, p.144-155, September 10-14, 2000</name><name>C. Olston and J. Widom. Best-effort cache synchronization with source cooperation. Technical report, Stanford University Computer Science Department, 2001. http://dbpubs.stanford.edu/pub/2001-43.</name><name>G. J. Pottie , W. J. Kaiser, Wireless integrated network sensors, Communications of the ACM, v.43 n.5, p.51-58, May 2000</name><name>Calton Pu , Avraham Leff, Replica control in distributed systems: as asynchronous approach, Proceedings of the 1991 ACM SIGMOD international conference on Management of data, p.377-386, May 29-31, 1991, Denver, Colorado, United States</name><name>Krithi Ramamritham, Real-time databases, Distributed and Parallel Databases, v.1 n.2, p.199-226, April 1993</name><name>G. Salton and C. S. Yang. On the specification of term values in automatic indexing. Journal of Documentation, 29:351-372, 1973.</name><name>Peter Sanders, Randomized priority queues for fast parallel access, Journal of Parallel and Distributed Computing, v.49 n.1, p.86-97, Feb. 1998</name><name>J. Stewart. Calculus: Early Transcendentals, Second Edition. Brooks/Cole, 1991.</name><name>Maintaining Mutual Consistency for Cached Web Objects, Proceedings of the The 21st International Conference on Distributed Computing Systems, p.371, April 16-19, 2001</name><name>H. Yu and A. Vahdat. Design and evaluation of a continuous consistency model for replicated services. In Proceedings of the Fourth Symposium on Operating Systems Design and Implementation, San Diego, California, Oct. 2000.</name></citation><abstract>In environments where exact synchronization between source data objects and cached copies is not achievable due to bandwidth or other resource constraints, stale (out-of-date) copies are permitted. It is desirable to minimize the overall divergence between source objects and cached copies by selectively refreshing modified objects. We call the online process of selecting which objects to refresh in order to minimize divergence best-effort synchronization. In most approaches to best-effort synchronization, the cache coordinates the process and selects objects to refresh. In this paper, we propose a best-effort synchronization scheduling policy that exploits cooperation between data sources and the cache. We also propose an implementation of our policy that incurs low communication overhead even in environments with very large numbers of sources. Our algorithm is adaptive to wide fluctuations in available resources and data update rates. Through experimental simulation over synthetic and real-world data, we demonstrate the effectiveness of our algorithm, and we quantify the significant decrease in divergence achievable with source cooperation.</abstract></paper><paper><title>Efficient evaluation of queries in a mediator for WebSources</title><author><AuthorName>Vladimir Zadorozhny</AuthorName><institute><InstituteName>University of Pittsburgh, Pittsburgh, PA</InstituteName><country></country></institute></author><author><AuthorName>Louiqa Raschid</AuthorName><institute><InstituteName>University of Maryland, College Park, MD</InstituteName><country></country></institute></author><author><AuthorName>Maria Esther Vidal</AuthorName><institute><InstituteName>Simon Bolivar University, Caracas, Venezuela</InstituteName><country></country></institute></author><author><AuthorName>Tolga Urhan</AuthorName><institute><InstituteName>BEA Systems, Inc., San Jose, CA</InstituteName><country></country></institute></author><author><AuthorName>Laura Bright</AuthorName><institute><InstituteName>University of Maryland, College Park, MD</InstituteName><country></country></institute></author><year>2002</year><conference>International Conference on Management of Data</conference><citation><name>S. Adali , K. S. Candan , Y. Papakonstantinou , V. S. Subrahmanian, Query caching and optimization in distributed mediator systems, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.137-146, June 04-06, 1996, Montreal, Quebec, Canada</name><name>Laurent Amsaleg , Michael J. Franklin , Anthony Tomasic, Dynamic Query Operator Scheduling for Wide-Area Remote Access, Distributed and Parallel Databases, v.6 n.3, p.217-246, July 1998</name><name>Ron Avnur , Joseph M. Hellerstein, Eddies: continuously adaptive query processing, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.261-272, May 15-18, 2000, Dallas, Texas, United States</name><name>L. Bright, J.-R. Gruser, L. Raschid, and M. E. Vidal. A wrapper generation toolkit to specify and construct wrappers for webaccesible data sources (websources). Journal of Computer Systems Special Issue on Semantics in the WWW, 14(2):83-98, 1999.</name><name>L. Bright and L. Raschid. Cost modeling of wrappers for web accesible data sources (websources). http://www.umiacs.umd.edu/labs/CLIP/DARPA/ww97.html. (under review), 1998.</name><name>Surajit Chaudhuri , Kyuseok Shim, Query Optimization in the Presence of Foreign Functions, Proceedings of the 19th International Conference on Very Large Data Bases, p.529-542, August 24-27, 1993</name><name>Jianjun Chen , David J. DeWitt , Feng Tian , Yuan Wang, NiagaraCQ: a scalable continuous query system for Internet databases, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.379-390, May 15-18, 2000, Dallas, Texas, United States</name><name>Weimin Du , Ravi Krishnamurthy , Ming-Chien Shan, Query Optimization in a Heterogeneous DBMS, Proceedings of the 18th International Conference on Very Large Data Bases, p.277-291, August 23-27, 1992</name><name>B. Eckman, A. Kosky, and L. Laroco. Extending traditional query-based integration approaches for functional characterization of post-genomic data. BioInformatics, 17(7):587-601, 2001.</name><name>Barbara A. Eckman , Zo&#233; Lacroix , Louiqa Raschid, Optimized Seamless Integration of Biomolecular Data, Proceedings of the 2nd IEEE International Symposium on Bioinformatics and Bioengineering, p.23, March 04-06, 2001</name><name>Daniela Florescu , Alon Levy , Ioana Manolescu , Dan Suciu, Query optimization in the presence of limited access patterns, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.311-322, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States</name><name>Capability-Sensitive Query Processing on Internet Sources, Proceedings of the 15th International Conference on Data Engineering, p.50, March 23-26, 1999</name><name>Georges Gardarin , Sofiane Gannouni , B&#233;atrice Finance , Peter Fankhausera , Wolfgang Klas , Dominique Pastrea , R&#233;gis Legoff , Antonis Ramfos, IRO-DB: a distributed system federating object and relational databases, Object-oriented multidatabase systems: a solution for advanced applications, Prentice Hall International (UK) Ltd., Hertfordshire, UK, 1995</name><name>Roy Goldman , Jennifer Widom, WSQ/DSQ: a practical approach for combined querying of databases and the Web, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.285-296, May 15-18, 2000, Dallas, Texas, United States</name><name>Jean-Robert Gruser , Louiqa Raschid , Vladimir Zadorozhny , Tao Zhan, Learning response time for WebSources using query feedback and application in query optimization, The VLDB Journal &mdash; The International Journal on Very Large Data Bases, v.9 n.1, p.18-37, March 2000</name><name>Laura M. Haas , Donald Kossmann , Edward L. Wimmers , Jun Yang, Optimizing Queries Across Diverse Data Sources, Proceedings of the 23rd International Conference on Very Large Data Bases, p.276-285, August 25-29, 1997</name><name>L. M. Haas , P. M. Schwarz , P. Kodali , E. Kotlar , J. E. Rice , W. C. Swope, DiscoveryLink: a system for integrated access to life sciences data sources, IBM Systems Journal, v.40 n.2, p.489-511, February 2001</name><name>Peter J. Haas , Joseph M. Hellerstein, Ripple joins for online aggregation, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.287-298, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States</name><name>J. Hellerstein et al. Adaptive query processing: Technology in evolution. IEEE Data Engineering Bulletin, 2000.</name><name>Y. E. Ioannidis , Younkyung Kang, Randomized algorithms for optimizing large join queries, Proceedings of the 1990 ACM SIGMOD international conference on Management of data, p.312-321, May 23-26, 1990, Atlantic City, New Jersey, United States</name><name>Zachary G. Ives , Daniela Florescu , Marc Friedman , Alon Levy , Daniel S. Weld, An adaptive query execution system for data integration, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.299-310, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States</name><name>Z. Ives, A. Levy, D. Weld, D. Florescu, and M. Friedman. Adaptive query processing for internet applications. IEEE Data Engineering Bulletin, 23(2):19-26, 2000.</name><name>Alon Y. Levy , Anand Rajaraman , Joann J. Ordille, Querying Heterogeneous Information Sources Using Source Descriptions, Proceedings of the 22th International Conference on Very Large Data Bases, p.251-262, September 03-06, 1996</name><name>C. Li and E. Chang. Query planning with limited source capabilities. Proceedings of ICDE, 2000.</name><name>Chen Li , Edward Y. Chang, On Answering Queries in the Presence of Limited Access Patterns, Proceedings of the 8th International Conference on Database Theory, p.219-233, January 04-06, 2001</name><name>ACM Digital Library. http://www.acm.org/dl/Search.html.</name><name>Mary Tork Roth , Fatma Ozcan , Laura M. Haas, Cost Models DO Matter: Providing Cost Information for Diverse Data Sources in a Federated System, Proceedings of the 25th International Conference on Very Large Data Bases, p.599-610, September 07-10, 1999</name><name>Hubert Naacke , Georges Gardarin , Anthony Tomasic, Leveraging Mediator Cost Models with Heterogeneous Data Sources, Proceedings of the Fourteenth International Conference on Data Engineering, p.351-360, February 23-27, 1998</name><name>Bureau of Labor Statistics. http://stats.bls.gov.</name><name>Yannis Papakonstantinou , Ashish Gupta , Laura Haas, Capabilities-based query rewriting in mediator systems, Proceedings of the fourth international conference on on Parallel and distributed information systems, p.170-183, December 18-20, 1996, Miami Beach, Florida, United States</name><name>Praveen Seshadri , Miron Livny , Raghu Ramakrishnan, The Case for Enhanced Abstract Data Types, Proceedings of the 23rd International Conference on Very Large Data Bases, p.66-75, August 25-29, 1997</name><name>P. Selinger, M. Astrahan, D. Chamberlin, R. Lorie, and T. Price. Access path selection in a relational database management system. 1979.</name><name>Scaling heterogeneous databases and the design of Disco, Proceedings of the 16th International Conference on Distributed Computing Systems (ICDCS '96), p.449, May 27-30, 1996</name><name>Jeffrey D. Ullman, Principles of database and knowledge-base systems, Vol. I, Computer Science Press, Inc., New York, NY, 1988</name><name>Jeffrey D. Ullman, Principles of database and knowledge-base systems, Vol. I, Computer Science Press, Inc., New York, NY, 1988</name><name>T. Urhan and M. Franklin. Xjoin: A reactively-scheduled pipelined join operator. IEEE Data Engineering Bulletin, 23(2):27-33, 2000.</name><name>Tolga Urhan , Michael J. Franklin, Dynamic Pipeline Scheduling for Improving Interactive Query Performance, Proceedings of the 27th International Conference on Very Large Data Bases, p.501-510, September 11-14, 2001</name><name>Tolga Urhan , Michael J. Franklin , Laurent Amsaleg, Cost-based query scrambling for initial delays, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.130-141, June 01-04, 1998, Seattle, Washington, United States</name><name>Vasilis Vassalos , Yannis Papakonstantinou, Describing and Using Query Capabilities of Heterogeneous Sources, Proceedings of the 23rd International Conference on Very Large Data Bases, p.256-265, August 25-29, 1997</name><name>V. Vassalos and Y. Papakonstantinou. Using knowledge of redundancy for query optimization in mediators. Proceedings of the AAAI Symposium on AI and Data Integration, 1998.</name><name>M. E. Vidal. A Mediator for Scaling up to Multiple WebSources. PhD thesis, University Simon Bolivar, 2000.</name><name>M. E. Vidal, L. Raschid, and V. Zadorozhny. Decision support model for pre-plans. In preparation, 2001.</name><name>Ramana Yerneni , Chen Li , Jeffrey D. Ullman , Hector Garcia-Molina, Optimizing Large Join Queries in Mediation Systems, Proceeding of the 7th International Conference on Database Theory, p.348-364, January 10-12, 1999</name><name>Vladimir Zadorozhny , Louiqa Raschid , Tao Zhan , Laura Bright, Validating an Access Cost Model for Wide Area Applications, Proceedings of the 9th International Conference on Cooperative Information Systems, p.371-385, September 05-07, 2001</name></citation><abstract>We consider an architecture of mediators and wrappers for Internet accessible WebSources of limited query capability. Each call to a source is a WebSource Implementation (WSI) and it is associated with both a capability and (a possibly dynamic) cost. The multiplicity of WSIs with varying costs and capabilities increases the complexity of a traditional optimizer that must assign WSIs for each remote relation in the query while generating an (optimal) plan. We present a two-phase Web Query Optimizer (WQO). In a pre-optimization phase, the WQO selects one or more WSIs for a pre-plan; a pre-plan represents a space of query evaluation plans (plans) based on this choice of WSIs. The WQO uses cost-based heuristics to evaluate the choice of WSI assignment in the pre-plan and to choose a good pre-plan. The WQO uses the pre-plan to drive the extended relational optimizer to obtain the best plan for a pre-plan. A prototype of the WQO has been developed. We compare the effectiveness of the WQO, i.e., its ability to efficiently search a large space of plans and obtain a low cost plan, in comparison to a traditional optimizer. We also validate the cost-based heuristics by experimental evaluation of queries in the noisy Internet environment.</abstract></paper><paper><title>Proxy-based acceleration of dynamically generated content on the world wide web: an approach and implementation</title><author><AuthorName>Anindya Datta</AuthorName><institute><InstituteName>Georgia Institute of Technology, Atlanta, GA</InstituteName><country></country></institute></author><author><AuthorName>Kaushik Dutta</AuthorName><institute><InstituteName>Georgia Institute of Technology, Atlanta, GA</InstituteName><country></country></institute></author><author><AuthorName>Helen Thomas</AuthorName><institute><InstituteName>Carnegie Mellon University, Pittsburgh, PA</InstituteName><country></country></institute></author><author><AuthorName>Debra VanderMeer</AuthorName><institute><InstituteName>Georgia Institute of Technology, Atlanta, GA</InstituteName><country></country></institute></author><author><AuthorName>Suresha</AuthorName><institute><InstituteName>Database Systems Lab, Indian Institute of Science, Bangalore, India</InstituteName><country></country></institute></author><author><AuthorName>Krithi Ramamritham</AuthorName><institute><InstituteName>Indian Institute of Technology-Bombay, Powai, Mumbai, India</InstituteName><country></country></institute></author><year>2002</year><conference>International Conference on Management of Data</conference><citation><name>Digital Island (a Cable & Wireless Company). Digital island 2 way web services. http://www.digitalisland.net.</name><name>Virg&#237;lio Almeida , Azer Bestavros , Mark Crovella , Adriana de Oliveira, Characterizing reference locality in the WWW, Proceedings of the fourth international conference on on Parallel and distributed information systems, p.92-107, December 18-20, 1996, Miami Beach, Florida, United States</name><name>Network Appliance. http://www.netapp.com.</name><name>CacheFlow. Accelerating e-commerce with cacheflow internet caching appliances (a cacheflow white paper, October 1999.</name><name>K. Sel&#231;uk Candan , Wen-Syan Li , Qiong Luo , Wang-Pin Hsiung , Divyakant Agrawal, Enabling dynamic content caching for database-driven web sites, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.532-54
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -