📄 sigmod_2000_elementary.txt
字号:
<proceedings><paper><title>Mining frequent patterns without candidate generation</title><author><AuthorName>Jiawei Han</AuthorName><institute><InstituteName>School of Computing Science, Simon Fraser University</InstituteName><country></country></institute></author><author><AuthorName>Jian Pei</AuthorName><institute><InstituteName>School of Computing Science, Simon Fraser University</InstituteName><country></country></institute></author><author><AuthorName>Yiwen Yin</AuthorName><institute><InstituteName>School of Computing Science, Simon Fraser University</InstituteName><country></country></institute></author><year>2000</year><conference>International Conference on Management of Data</conference><citation><name>R. Agarwal, C. Aggarwal, and V. V. V. Prasad. Depth-first generation of large itemsets for association rules. IBM Tech. Report RC21538, July 1999.</name><name>R. Agarwal, C. Aggarwal, and V. V. V. Prasad. A tree projection algorithm for generation of frequent itemsets. In J, Parallel and Distributed Computing, 2000.</name><name>Rakesh Agrawal , Ramakrishnan Srikant, Fast Algorithms for Mining Association Rules in Large Databases, Proceedings of the 20th International Conference on Very Large Data Bases, p.487-499, September 12-15, 1994</name><name>Rakesh Agrawal , Ramakrishnan Srikant, Mining Sequential Patterns, Proceedings of the Eleventh International Conference on Data Engineering, p.3-14, March 06-10, 1995</name><name>Roberto J. Bayardo, Jr., Efficiently mining long patterns from databases, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.85-93, June 01-04, 1998, Seattle, Washington, United States</name><name>Sergey Brin , Rajeev Motwani , Craig Silverstein, Beyond market baskets: generalizing association rules to correlations, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.265-276, May 11-15, 1997, Tucson, Arizona, United States</name><name>Guozhu Dong , Jinyan Li, Efficient mining of emerging patterns: discovering trends and differences, Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, p.43-52, August 15-18, 1999, San Diego, California, United States</name><name>G. Grahne, L. Lakshmanan, and X. Wang. Efficient mining of constrained correlated sets. In ICDE'00.</name><name>Efficient Mining of Partial Periodic Patterns in Time Series Database, Proceedings of the 15th International Conference on Data Engineering, p.106, March 23-26, 1999</name><name>J. Han, J. Pei, and Y. Yin. Mining partial periodicity using frequent pattern trees. In GS Tech, Rep, 99-10, Simon Fraser University, July 1999.</name><name>M. Kamber, J. Han, and J. Y. Chiang. Metaruleguided mining of multi-dimensional association rules using data cubes. In KDD'97, pp. 207-210.</name><name>Mika Klemettinen , Heikki Mannila , Pirjo Ronkainen , Hannu Toivonen , A. Inkeri Verkamo, Finding interesting rules from large sets of discovered association rules, Proceedings of the third international conference on Information and knowledge management, p.401-407, November 29-December 02, 1994, Gaithersburg, Maryland, United States</name><name>Brian Lent , Arun N. Swami , Jennifer Widom, Clustering Association Rules, Proceedings of the Thirteenth International Conference on Data Engineering, p.220-231, April 07-11, 1997</name><name>Heikki Mannila , Hannu Toivonen , A. Inkeri Verkamo, Discovery of Frequent Episodes in Event Sequences, Data Mining and Knowledge Discovery, v.1 n.3, p.259-289, 1997</name><name>Raymond T. Ng , Laks V. S. Lakshmanan , Jiawei Han , Alex Pang, Exploratory mining and pruning optimizations of constrained associations rules, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.13-24, June 01-04, 1998, Seattle, Washington, United States</name><name>Jong Soo Park , Ming-Syan Chen , Philip S. Yu, An effective hash-based algorithm for mining association rules, Proceedings of the 1995 ACM SIGMOD international conference on Management of data, p.175-186, May 22-25, 1995, San Jose, California, United States</name><name>Sunita Sarawagi , Shiby Thomas , Rakesh Agrawal, Integrating association rule mining with relational database systems: alternatives and implications, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.343-354, June 01-04, 1998, Seattle, Washington, United States</name><name>Ashoka Savasere , Edward Omiecinski , Shamkant B. Navathe, An Efficient Algorithm for Mining Association Rules in Large Databases, Proceedings of the 21th International Conference on Very Large Data Bases, p.432-444, September 11-15, 1995</name><name>Craig Silverstein , Sergey Brin , Rajeev Motwani , Jeffrey D. Ullman, Scalable Techniques for Mining Causal Structures, Proceedings of the 24rd International Conference on Very Large Data Bases, p.594-605, August 24-27, 1998</name><name>R. Srikant, Q. Vu, and R. Agrawal. Mining association rules with item constraints. In KDD'97, pp. 67-73.</name></citation><abstract>Mining frequent patterns in transaction databases, time-series databases, and many other kinds of databases has been studied popularly in data mining research. Most of the previous studies adopt an Apriori-like candidate set generation-and-test approach. However, candidate set generation is still costly, especially when there exist prolific patterns and/or long patterns.</abstract></paper><paper><title>Data mining on an OLTP system (nearly) for free</title><author><AuthorName>Erik Riedel</AuthorName><institute><InstituteName>Hewlett-Packard Laboratories, Palo Alto, California</InstituteName><country></country></institute></author><author><AuthorName>Christos Faloutsos</AuthorName><institute><InstituteName>School of Computer Science, Carnegie Mellon University, Pittsburgh, PA</InstituteName><country></country></institute></author><author><AuthorName>Gregory R. Ganger</AuthorName><institute><InstituteName>School of Computer Science, Carnegie Mellon University, Pittsburgh, PA</InstituteName><country></country></institute></author><author><AuthorName>David F. Nagle</AuthorName><institute><InstituteName>School of Computer Science, Carnegie Mellon University, Pittsburgh, PA</InstituteName><country></country></institute></author><year>2000</year><conference>International Conference on Management of Data</conference><citation><name>Acharya, A., Uysal, M. and Saltz, J. "Active Disks" ASPLOS, October 1998.</name><name>Rakesh Agrawal , John C. Shafer, Parallel Mining of Association Rules, IEEE Transactions on Knowledge and Data Engineering, v.8 n.6, p.962-969, December 1996</name><name>Brown, K., Carey, M., DeWitt, D., Mehta, M. and Naughton, J. "Resource Allocation and Scheduling for Mixed Database Workloads" Technical Report, University of Wisconsin, 1992.</name><name>Kurt P. Brown , Michael J. Carey , Miron Livny, Managing Memory to Meet Multiclass Workload Response Time Goals, Proceedings of the 19th International Conference on Very Large Data Bases, p.328-341, August 24-27, 1993</name><name>Surajit Chaudhuri , Umeshwar Dayal, An overview of data warehousing and OLAP technology, ACM SIGMOD Record, v.26 n.1, p.65-74, March 1997</name><name>Cirrus Logic, Inc. "New Open-Processor Platform Enables Cost-Effective, System-on-a-chip Solutions for Hard Disk Drives" www.cirrus.com/3ci, June 1998.</name><name>Denning, P.J. "Effects of Scheduling on File Memory Operations" AFIPS Spring Joint Computer Conference, April 1967.</name><name>Fayyad, U. "Taming the Giants and the Monsters: Mining Large Databases for Nuggets of Knowledge" Database Programming and Design, March 1998.</name><name>Ganger, G.R., Worthington, B.L. and Patt, Y.N. "The DiskSim Simulation Environment Version 1.0 Reference Manual" Technical Report, University of Michigan, February 1998.</name><name>Gray, J. "What Happens When Processing, Storage, and Bandwidth are Free and Infinite?" IOPADS Keynote, November 1997.</name><name>Sudipto Guha , Rajeev Rastogi , Kyuseok Shim, CURE: an efficient clustering algorithm for large databases, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.73-84, June 01-04, 1998, Seattle, Washington, United States</name><name>Hewlett-Packard Company "HP to Deliver Enterprise-Class Storage Area Network Management Solution" News Release, October 1998.</name><name>IBM Corporation and International Data Group "Survey says Storage Area Networks may unclog future roadblocks to e-Business" News Release, December 1999.</name><name>Kimberly Keeton , David A. Patterson , Joseph M. Hellerstein, A case for intelligent disks (IDISKs), ACM SIGMOD Record, v.27 n.3, p.42-52, Sept. 1, 1998</name><name>Flip Korn , Alexandros Labrinidis , Yannis Kotidis , Christos Faloutsos, Ratio Rules: A New Paradigm for Fast, Quantifiable Data Mining, Proceedings of the 24rd International Conference on Very Large Data Bases, p.582-593, August 24-27, 1998</name><name>Paulin, J. "Performance Evaluation of Concurrent OLTP and DSS Workloads in a Single Database System" Master's Thesis, Carleton University, November 1997.</name><name>Erik Riedel , Garth A. Gibson , Christos Faloutsos, Active Storage for Large-Scale Data Mining and Multimedia, Proceedings of the 24rd International Conference on Very Large Data Bases, p.62-73, August 24-27, 1998</name><name>Chris Ruemmler , John Wilkes, An introduction to disk drive modeling, Computer, v.27 n.3, p.17-28, March 1994</name><name>Seagate Technology, Inc. "Storage Networking: The Evolution of Information Management" White Paper, November 1998.</name><name>Siemens Microelectronics, Inc. "Siemens Announces Availability of TriCore-1 For New Embedded System Designs" News Release, March 1998.</name><name>Veritas Software Corporation "Veritas Software and Other Industry Leaders Demonstrate SAN Solutions" News Release, May 1999.</name><name>Jennifer Widom, Research problems in data warehousing, Proceedings of the fourth international conference on Information and knowledge management, p.25-30, November 29-December 02, 1995, Baltimore, Maryland, United States</name><name>Bruce L. Worthington , Gregory R. Ganger , Yale N. Patt, Scheduling algorithms for modern disk drives, Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems, p.241-251, May 16-20, 1994, Nashville, Tennessee, United States</name><name>Bruce L. Worthington , Gregory R. Ganger , Yale N. Patt , John Wilkes, On-line extraction of SCSI disk drive parameters, Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems, p.146-156, May 15-19, 1995, Ottawa, Ontario, Canada</name><name>Tian Zhang , Raghu Ramakrishnan , Miron Livny, BIRCH: A New Data Clustering Algorithm and Its Applications, Data Mining and Knowledge Discovery, v.1 n.2, p.141-182, 1997</name></citation><abstract>This paper proposes a scheme for scheduling disk requests that takes advantage of the ability of high-level functions to operate directly at individual disk drives. We show that such a scheme makes it possible to support a Data Mining workload on an OLTP system almost for free: there is only a small impact on the throughput and response time of the existing workload. Specifically, we show that an OLTP system has the disk resources to consistently provide one third of its sequential bandwidth to a background Data Mining task with close to zero impact on OLTP throughput and response time at high transaction loads. At low transaction loads, we show much lower impact than observed in previous work. This means that a production OLTP system can be used for Data Mining tasks without the expense of a second dedicated system. Our scheme takes advantage of close interaction with the on-disk scheduler by reading blocks for the Data Mining workload as the disk head &ldquo;passes over&rdquo; them while satisfying demand blocks from the OLTP request stream. We show that this scheme provides a consistent level of throughput for the background workload even at very high foreground loads. Such a scheme is of most benefit in combination with an Active Disk environment that allows the background Data Mining application to also take advantage of the processing power and memory available directly on the disk drives.</abstract></paper><paper><title>Turbo-charging vertical mining of large databases</title><author><AuthorName>Pradeep Shenoy</AuthorName><institute><InstituteName>Lucent Bell Labs, 600 Mountain Avenue, Murray Hill, NJ, Computer Science and Engg., Tndian Institiue of Technology, Mumbai 400076,INDIA</InstituteName><country></country></institute></author><author><AuthorName>Jayant R. Haritsa</AuthorName><institute><InstituteName>Database Systems Lab, SERC, Indian Institue of science, Bangalore 560012,INDIA, Lucent Bell Labs, 600 Mountain Avenue, Murray Hill, NJ</InstituteName><country></country></institute></author><author><AuthorName>S. Sundarshan</AuthorName><institute><InstituteName>Computer Science and Engg., Tndian Institiue of Technology, Mumbai 400076,INDIA</InstituteName><country></country></institute></author><author><AuthorName>Gaurav Bhalotia</AuthorName><institute><InstituteName>Computer Science and Engg., Tndian Institiue of Technology, Mumbai 400076,INDIA</InstituteName><country></country></institute></author><author><AuthorName>Mayank Bawa</AuthorName><institute><InstituteName>Computer Science and Engg., Tndian Institiue of Technology, Mumbai 400076,INDIA</InstituteName><country></country></institute></author><author><AuthorName>Devavrat Shah</AuthorName><institute><InstituteName>Computer Science and Engg., Tndian Institiue of Technology, Mumbai 400076,INDIA</InstituteName><country></country></institute></author><year>2000</year><conference>International Conference on Management of Data</conference><citation><name>Rakesh Agrawal , Tomasz Imieli&#324;ski , Arun Swami, Mining association rules between sets of items in large databases, Proceedings of the 1993 ACM SIGMOD international conference on Management of data, p.207-216, May 25-28, 1993, Washington, D.C., United States</name><name>Rakesh Agrawal , Ramakrishnan Srikant, Fast Algorithms for Mining Association Rules in Large Databases, Proceedings of the 20th International Conference on Very Large Data Bases, p.487-499, September 12-15, 1994</name><name>B. Dunkel and N. Soparkar. Data organization and access for efficient data mining. In Proc. of 15th Intl. Conf. on Data Engineering (ICDE), 1999.</name><name>G. Gardarin, P. Pucheral, and F. Wu. Bitmap based algorithms for mining association rules. Technical report 1998-18, University of Versailles, 1998. (http://www.prism.uvsq.fr/rapports/1998/ document_1998_18.ps.gz)</name><name>S.W. Golomb. Run-length encoding. IEEE Trans. on Information Theory, 12(3), 3uly 1966.</name><name>M. Holsheimer, M. Kersten, H. Mannila, and H. Toivonen. A perspective on databases and data mining. In Proc. of 1st Intl. Conf. on Knowledge Discovery and Data Mining (KDD), August 1995.</name><name>Ashoka Savasere , Edward Omiecinski , Shamkant B. Navathe, An Efficient Algorithm for Mining Association Rules in Large Databases, Proceedings of the 21th International Conference on Very Large Data Bases, p.432-444, September 11-15, 1995</name><name>P. Shenoy, 3. Haritsa, S. Sudarshan, M. Bawa, G. Bhalotia, and D. Shah. Turbo-charging vertical mining of large databases. Technical Report TR-2000-02, DSL, Indian Institute of Science, 2000. (http://dsl.serc.iisc.ernet.in/pub/TR/TR-2000-02.ps)</name><name>Show-Jane Yen , Arbee L. P. Chen, An efficient approach to discovering knowledge from large databases, Proceedings of the fourth international conference on on Parallel and distributed information systems, p.8-18, December 18-20, 1996, Miami Beach, Florida, United States</name><name>Mohammed Javeed Zaki , Wei Li, Scalable data mining for rules, 1998</name><name>M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. New algorithms for fast discovery of association rules. In Proc. of 3rd Intl. Conf. on Knowledge Discovery and Data Mining (KDD), August 1997.</name></citation><abstract>In a vertical representation of a market-basket database, each item is associated with a column of values representing the transactions in which it is present. The association-rule mining algorithms that have been recently proposed for this representation show performance improvements over their classical horizontal counterparts, but are either efficient only for certain database sizes, or assume particular characteristics of the database contents, or are applicable only to specific kinds of database schemas. We present here a new vertical mining algorithm called VIPER, which is general-purpose, making no special requirements of the underlying database. VIPER stores data in compressed bit-vectors called &ldquo;snakes&rdquo; and integrates a number of novel optimizations for efficient snake generation, intersection, counting and storage. We analyze the performance of VIPER for a range of synthetic database workloads. Our experimental results indicate significant performance gains, especially for large databases, over previously proposed vertical and horizontal mining algorithms. In fact, there are even workload regions where VIPER outperforms an optimal, but practically infeasible, horizontal mining algorithm.</abstract></paper><paper><title>High speed on-line backup when using logical log operations</title><author><AuthorName>David B. Lomet</AuthorName><institute><InstituteName>Microsoft Research, One MicrosoftWay, Redmond, WA</InstituteName><country></country></institute></author><year>2000</year><conference>International Conference on Management of Data</conference><citation><name>Philip A. Bernstein , Vassco Hadzilacos , Nathan Goodman, Concurrency control and recovery in database systems, Addison-Wesley Longman Publishing Co., Inc., Boston, MA, 1987</name><name>Gray, J. Notes on Data Base Operating Systems. IBM Tech Report RJ2188 (Feb. 1978), IBM Corp., San Jose, CA</name><name>Jim Gray , Paul McJones , Mike Blasgen , Bruce Lindsay , Raymond Lorie , Tom Price , Franco Putzolu , Irving Traiger, The Recovery Manager of the System R Database Manager, ACM Computing Surveys (CSUR), v.13 n.2, p.223-242, June 1981</name><name>Jim Gray , Andreas Reuter, Transaction Processing: Concepts and Techniques, Morgan Kaufmann Publishers Inc., San Francisco, CA, 1992</name><name>Theo Haerder , Andreas Reuter, Principles of transaction-oriented database recovery, ACM Computing Surveys (CSUR), v.15 n.4, p.287-317, December 1983</name><name>Richard P. King , Nagui Halim , Hector Garcia-Molina , Christos A. Polyzois, Management of a remote backup copy for disaster recovery, ACM Transactions on Database Systems (TODS), v.16 n.2, p.338-368, June 1991</name><name>Vijay Kumar , Meichun Hsu, Recovery mechanisms in database systems, Prentice-Hall, Inc., Upper Saddle River, NJ, 1998</name><name>David B. Lomet, Persistent Applications Using Generalized Redo Recovery, Proceedings of the Fourteenth International Conference on Data Engineering, p.154-163, February 23-27, 1998</name><name>David B. Lomet , Betty Salzberg, Exploiting A History Database for Backup, Proceedings of the 19th International Conference on Very Large Data Bases, p.380-390, August 24-27, 1993</name><name>David B. Lomet , Mark R. Tuttle, Redo Recovery after System Crashes, Proceedings of the 21th International Conference on Very Large Data Bases, p.457-468, September 11-15, 1995</name><name>David Lomet , Mark Tuttle, Logical logging to extend recovery to new domains, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.73-84, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States</name><name>C. Mohan , Don Haderle , Bruce Lindsay , Hamid Pirahesh , Peter Schwarz, ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging, ACM Transactions on Database Systems (TODS), v.17 n.1, p.94-162, March 1992</name><name>C. Mohan , Inderpal Narang, An efficient and flexible method for archiving a data base, Proceedings of the 1993 ACM SIGMOD international conference on Management of data, p.139-146, May 25-28, 1993, Washington, D.C., United States</name></citation><abstract>Media recovery protects a database from failures of the stable medium by maintaining an extra copy of the database, called the backup, and a media recovery log. When a failure occurs, the database is &ldquo;restored&rdquo; from the backup, and the media recovery log is used to roll forward the database to the desired time, usually the current time. Backup must be both fast and &ldquo;on-line&rdquo;, i.e. concurrent with on-going update activity. Conventional online backup sequentially copies from the stable database, almost independent of the database cache manager, but requires page-oriented log operations. But results of logical operations must be flushed to a stable database (a backup is a stable database) in a constrained order to guarantee recovery. This order is not naturally achieved for the backup by a cache manager concerned only with crash recovery. We describe a &ldquo;full speed&rdquo; backup, only loosely coupled to the cache manager, and hence similar to current online backups, but effective for general logical log operations. This requires additional logging of cached objects to guarantee media recoverability. We then show how logging can be greatly reduced when log operations have a constrained form which nonetheless provides very useful additional logging efficiency for database systems.</abstract></paper><paper><title>Efficient resumption of interrupted warehouse loads</title><author><AuthorName>Wilburt Juan Labio</AuthorName><institute><InstituteName>Gigabeat, Inc. Palo Alto CA</InstituteName><country></country></institute></author><author><AuthorName>Janet L. Wiener</AuthorName><institute><InstituteName>Compaq SRC, Palo Alto, CA</InstituteName><country></country></institute></author><author><AuthorName>Hector Garcia-Molina</AuthorName><institute><InstituteName>Stanford University</InstituteName><country></country></institute></author><author><AuthorName>Vlad Gorelik</AuthorName><institute><InstituteName>Sagent Technologies</InstituteName><country></country></institute></author><year>2000</year><conference>International Conference on Management of Data</conference><citation><name>Philip A. Bernstein , Meichun Hsu , Bruce Mann, Implementing recoverable requests using queues, Proceedings of the 1990 ACM SIGMOD international conference on Management of data, p.112-122, May 23-26, 1990, Atlantic City, New Jersey, United States</name><name>Philip Bernstein , Eric Newcomer, Principles of transaction processing: for the systems professional, Morgan Kaufmann Publishers Inc., San Francisco, CA, 1996</name><name>F. Carino. High-performance, parallel warehouse servers and large-scale applications, Oct. 1997. Talk about Teradata given in Stanford Database Seminar.</name><name>TPC Committee. Transaction Processing Council. Available at: http://www.tpc.org/.</name><name>Jim Gray , Andreas Reuter, Transaction Processing: Concepts and Techniques, Morgan Kaufmann Publishers Inc., San Francisco, CA, 1992</name><name>Informatica. Powermart 4.0 overview. Available at: http://www.informatica.com/pm_tech_over.html.</name><name>W. J. Labio, J. L. Wiener, H. Garcia-Molina, and V. Gorelik. Resumption algorithms. Technical report, Stanford University, 1998. Available at http://wwwdb. stanford.edu/pub/papers/resume.ps.</name><name>C. Mohan , Inderpal Narang, Algorithms for creating indexes for very large tables without quiescing updates, Proceedings of the 1992 ACM SIGMOD international conference on Management of data, p.361-370, June 02-05, 1992, San Diego, California, United States</name><name>R. Reinsch and M. Zimowski. Method for Restarting a Long- Running, Fault-Tolerant Operation in a Transaction-Oriented Data Base System Without Burdening the System Log. U.S. Patent 4,868,744, IBM, 1989.</name><name>Sagent Technologies. Personal correspondence with customers.</name><name>Janet L. Wiener , Jeffrey F. Naughton, OODB Bulk Loading Revisited: The Partitioned-List Approach, Proceedings of the 21th International Conference on Very Large Data Bases, p.30-41, September 11-15, 1995</name><name>Andrew Witkowski , Felipe Cari&#241;o , Pekka Kostamaa, NCR 3700 - The Next-Generation Industrial Database Computer, Proceedings of the 19th International Conference on Very Large Data Bases, p.230-243, August 24-27, 1993</name></citation><abstract>Data warehouses collect large quantities of data from distributed sources into a single repository. A typical load to create or maintain a warehouse processes GBs of data, takes hours or even days to execute, and involves many complex and user-defined transformations of the data (e.g., find duplicates, resolve data inconsistencies, and add unique keys). If the load fails, a possible approach is to &ldquo;redo&rdquo; the entire load. A better approach is to resume the incomplete load from where it was interrupted. Unfortunately, traditional algorithms for resuming the load either impose unacceptable overhead during normal operation, or rely on the specifics of transformations. We develop a resumption algorithm called DR that imposes no overhead and relies only on the high-level properties of the transformations. We show that DR can lead to a ten-fold reduction in resumption time by performing experiments using commercial software.</abstract></paper><paper><title>On-line reorganization in object databases</title><author><AuthorName>Mohana K. Lakhamraju</AuthorName><institute><InstituteName>University of California, Berkeley CA</InstituteName><country></country></institute></author><author><AuthorName>Rajeev Rastogi</AuthorName><institute><InstituteName>Bell Labs, Murray Hill, NJ</InstituteName><country></country></institute></author><author><AuthorName>S. Seshadri</AuthorName><institute><InstituteName>Bell Labs, Murray Hill, NJ</InstituteName><country></country></institute></author><author><AuthorName>S. Sudarshan</AuthorName><institute><InstituteName>Indian Institute of Technology, Bombay, India</InstituteName><country></country></institute></author><year>2000</year><conference>International Conference on Management of Data</conference><citation><name>Laurent Amsaleg , Michael J. Franklin , Olivier Gruber, Efficient Incremental Garbage Collection for Client-Server Object Database Systems, Proceedings of the 21th International Conference on Very Large Data Bases, p.42-53, September 11-15, 1995</name><name>Kiran J. Achyutuni , Edward Omiecinski , Shamkant B. Navathe, Two techniques for on-line index modification in shared nothing parallel databases, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.125-136, June 04-06, 1996, Montreal, Quebec, Canada</name><name>Srinivas Ashwin , Prasan Roy , S. Seshadri , Abraham Silberschatz , S. Sudarshan, Garbage Collection in Object Oriented Databases Using Transactional Cyclic Reference Counting, Proceedings of the 23rd International Conference on Very Large Data Bases, p.366-375, August 25-29, 1997</name><name>Jay Banerjee , Won Kim , Hyoung-Joo Kim , Henry F. Korth, Semantics and implementation of schema evolution in object-oriented databases, Proceedings of the 1987 ACM SIGMOD international conference on Management of data, p.311-322, May 27-29, 1987, San Francisco, California, United States</name><name>Philip Bohannon , Daniel Lieuwen , Rajeev Rastogi , Avi Silberschatz , S. Seshadri , S. Sudarshan, The Architecture of the Dal&iacute; Main-Memory Storage Manager, Multimedia Tools and Applications, v.4 n.2, p.115-151, March 1997</name><name>Jonathan E. Cook , Alexander L. Wolf , Benjamin G. Zorn, Partition selection policies in object database garbage collection, Proceedings of the 1994 ACM SIGMOD international conference on Management of data, p.371-382, May 24-27, 1994, Minneapolis, Minnesota, United States</name><name>B. Salzberg (Special Issue Editor). Special issue on online reorganization. IEEE Data Engineering Bulletin, 19(2), June 1996.</name><name>Andr&#233; Eickler , Carsten Andreas Gerlhof , Donald Kossmann, A Performance Evaluation of OID Mapping Techniques, Proceedings of the 21th International Conference on Very Large Data Bases, p.18-29, September 11-15, 1995</name><name>H. V. Jagadish , Daniel F. Lieuwen , Rajeev Rastogi , Abraham Silberschatz , S. Sudarshan, Dal&iacute;: A High Performance Main Memory Storage Manager, Proceedings of the 20th International Conference on Very Large Data Bases, p.48-59, September 12-15, 1994</name><name>Elliot K. Kolodner , William E. Weihl, Atomic incremental garbage collection and recovery for a large stable heap, Proceedings of the 1993 ACM SIGMOD international conference on Management of data, p.177-186, May 25-28, 1993, Washington, D.C., United States</name><name>M.K. Lakhamraju, R. Rastogi, S. Seshadri, and S. Sudarshan. On-line reorganization of objects. In Technical Report, Bell-labs, February 1999.</name><name>C. Mohan , Inderpal Narang, Algorithms for creating indexes for very large tables without quiescing updates, Proceedings of the 1992 ACM SIGMOD international conference on Management of data, p.361-370, June 02-05, 1992, San Diego, California, United States</name><name>Scott Nettles , James O'Toole , David Pierce, Replication-Based Incremental Copying Collection, Proceedings of the International Workshop on Memory Management, p.357-364, September 17-19, 1992</name><name>E. Omiecinski. Concurrent file reorganization: Clustering, conversion and maintenance. IEEE Data Engineering Bulletin, 19(2), 1996.</name><name>V. Srinivasan , Michael J. Carey, Compensation-based on-line query processing, Proceedings of the 1992 ACM SIGMOD international conference on Management of data, p.331-340, June 02-05, 1992, San Diego, California, United States</name><name>V. Srinivasan , Michael J. Carey, Performance of On-Line Index Construction Algorithms, Proceedings of the 3rd International Conference on Extending Database Technology: Advances in Database Technology, p.293-309, March 23-27, 1992</name><name>Betty Salzberg , Allyn Dimock, Principles of Transaction-Based On-Line Reorganization, Proceedings of the 18th International Conference on Very Large Data Bases, p.511-520, August 23-27, 1992</name><name>Manolis M. Tsangaris , Jeffrey F. Naughton, A stochastic approach for clustering in object bases, Proceedings of the 1991 ACM SIGMOD international conference on Management of data, p.12-21, May 29-31, 1991, Denver, Colorado, United States</name><name>William J. McIver, Jr. , Roger King, Self-adaptive, on-line reclustering of complex object data, Proceedings of the 1994 ACM SIGMOD international conference on Management of data, p.407-418, May 24-27, 1994, Minneapolis, Minnesota, United States</name><name>Voon-Fee Yong , Jeffrey F. Naughton , Jie-Bing Yu, Storage Reclamation and Reorganization in Client-Server Persistent Object Stores, Proceedings of the Tenth International Conference on Data Engineering, p.120-131, February 14-18, 1994</name><name>Chendong Zou , Betty Salzberg, On-line reorganization of sparsely-populated B+-trees, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.115-124, June 04-06, 1996, Montreal, Quebec, Canada</name><name>C. Zou and B. Salzberg. Towards efficient online database reorganization. IEEE Data Engineering Bulletin, 19(2):33-40, June 1996.</name><name>Chendong Zou , Betty Salzberg, Safely and Efficiently Updating References During On-line Reorganization, Proceedings of the 24rd International Conference on Very Large Data Bases, p.512-522, August 24-27, 1998</name></citation><abstract>Reorganization of objects in an object databases is an important component of several operations like compaction, clustering, and schema evolution. The high availability requirements (24 &times; 7 operation) of certain application domains requires reorganization to be performed on-line with minimal interference to concurrently executing transactions.</abstract></paper><paper><title>Finding generalized projected clusters in high dimensional spaces</title><author><AuthorName>Charu C. Aggarwal</AuthorName><institute><InstituteName>IBM T.J. Watson Research Center, Yorktown Heights, NY</InstituteName><country></country></institute></author><author><AuthorName>Philip S. Yu</AuthorName><institute><InstituteName>IBM T.J. Watson Research Center, Yorktown Heights, NY</InstituteName><country></country></institute></author><year>2000</year><conference>International Conference on Management of Data</conference><citation><name>Charu C. Aggarwal , Joel L. Wolf , Philip S. Yu , Cecilia Procopiuc , Jong Soo Park, Fast algorithms for projected clustering, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.61-72, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States</name><name>Rakesh Agrawal , Johannes Gehrke , Dimitrios Gunopulos , Prabhakar Raghavan, Automatic subspace clustering of high dimensional data for data mining applications, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.94-105, June 01-04, 1998, Seattle, Washington, United States</name><name>Kevin S. Beyer , Jonathan Goldstein , Raghu Ramakrishnan , Uri Shaft, When Is ''Nearest Neighbor'' Meaningful?, Proceeding of the 7th International Conference on Database Theory, p.217-235, January 10-12, 1999</name><name>Chun-Hung Cheng , Ada Waichee Fu , Yi Zhang, Entropy-based subspace clustering for mining numerical data, Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, p.84-93, August 15-18, 1999, San Diego, California, United States</name><name>M. Ester et. al. A Density Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. KDD Conference, 1996.</name><name>Christos Faloutsos , King-Ip Lin, FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets, Proceedings of the 1995 ACM SIGMOD international conference on Management of data, p.163-174, May 22-25, 1995, San Jose, California, United States</name><name>Aristides Gionis , Piotr Indyk , Rajeev Motwani, Similarity Search in High Dimensions via Hashing, Proceedings of the 25th International Conference on Very Large Data Bases, p.518-529, September 07-10, 1999</name><name>Sudipto Guha , Rajeev Rastogi , Kyuseok Shim, CURE: an efficient clustering algorithm for large databases, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.73-84, June 01-04, 1998, Seattle, Washington, United States</name><name>Alexander Hinneburg , Daniel A. Keim, Optimal Grid-Clustering: Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering, Proceedings of the 25th International Conference on Very Large Data Bases, p.506-517, September 07-10, 1999</name><name>Piotr Indyk , Rajeev Motwani, Approximate nearest neighbors: towards removing the curse of dimensionality, Proceedings of the thirtieth annual ACM symposium on Theory of computing, p.604-613, May 24-26, 1998, Dallas, Texas, United States</name><name>Anil K. Jain , Richard C. Dubes, Algorithms for clustering data, Prentice-Hall, Inc., Upper Saddle River, NJ, 1988</name><name>I. T. Jolliffe. Principal Component Analysis. Springer-Verlag, New York, 1986.</name><name>Jon M. Kleinberg, Two algorithms for nearest-neighbor search in high dimensions, Proceedings of the twenty-ninth annual ACM symposium on Theory of computing, p.599-608, May 04-06, 1997, El Paso, Texas, United States</name><name>R. Kohavi, D. Sommerfield. Feature Subset Selection Using the Wrapper Method: Overfitting and Dynamic Search Space Topology. KDD, 1995.</name><name>Raymond T. Ng , Jiawei Han, Efficient and Effective Clustering Methods for Spatial Data Mining, Proceedings of the 20th International Conference on Very Large Data Bases, p.144-155, September 12-15, 1994</name><name>K. V. Ravi Kanth , Divyakant Agrawal , Ambuj Singh, Dimensionality reduction for similarity searching in dynamic databases, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.166-176, June 01-04, 1998, Seattle, Washington, United States</name><name>Xiaowei Xu , Martin Ester , Hans-Peter Kriegel , J&#246;rg Sander, A Distribution-Based Clustering Algorithm for Mining in Large Spatial Databases, Proceedings of the Fourteenth International Conference on Data Engineering, p.324-331, February 23-27, 1998</name><name>Tian Zhang , Raghu Ramakrishnan , Miron Livny, BIRCH: an efficient data clustering method for very large databases, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.103-114, June 04-06, 1996, Montreal, Quebec, Canada</name></citation><abstract>High dimensional data has always been a challenge for clustering algorithms because of the inherent sparsity of the points. Recent research results indicate that in high dimensional data, even the concept of proximity or clustering may not be meaningful. We discuss very general techniques for projected clustering which are able to construct clusters in arbitrarily aligned subspaces of lower dimensionality. The subspaces are specific to the clusters themselves. This definition is substantially more general and realistic than currently available techniques which limit the method to only projections from the original set of attributes. The generalized projected clustering technique may also be viewed as a way of trying to redefine clustering for high dimensional applications by searching for hidden subspaces with clusters which are created by inter-attribute correlations. We provide a new concept of using extended cluster feature vectors in order to make the algorithm scalable for very large databases. The running time and space requirements of the algorithm are adjustable, and are likely ta tradeoff with better accuracy.</abstract></paper><paper><title>Density biased sampling: an improved method for data mining and clustering</title><author><AuthorName>Christopher R. Palmer</AuthorName><institute><InstituteName>Computer Science Department, Carnegie Mellon University, Pittsburgh, PA</InstituteName><country></country></institute></author><author><AuthorName>Christos Faloutsos</AuthorName><institute><InstituteName>Computer Science Department, Carnegie Mellon University, Pittsburgh, PA</InstituteName><country></country></institute></author><year>2000</year><conference>International Conference on Management of Data</conference><citation><name>Alfred V. Aho , Ravi Sethi , Jeffrey D. Ullman, Compilers: principles, techniques, and tools, Addison-Wesley Longman Publishing Co., Inc., Boston, MA, 1986</name><name>P.S Bradley, Usama Fayyad, and Cory Reina. Scaling clustering algorithms to large databases. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD- 98), pages 9-15, New York City, New York, August 1998. AAAI Press.</name><name>P.S Bradley, Usama Fayyad, and Cory Reina. Scaling EM (expectation maximization) clustering to large databases. Technical Report MSR-TR-98- 35, Microsoft Research, Redmond, WA, November, 1998.</name><name>Chris Buckley, Mandar Mitra, Janet Walz, and Clarie Cardie. Using clustering and superconcepts within SMART: TREC 6. In Sixth Text REtrieval Conference (TREC-6), Gaithersburg, Maryland, November 1997. National Institute of Standards and Technology (NIST), United States Department of Commerce.</name><name>Christos Faloutsos , H. V. Jagadish, On B-Tree Indices for Skewed Distributions, Proceedings of the 18th International Conference on Very Large Data Bases, p.363-374, August 23-27, 1992</name><name>Christos Faloutsos , Yossi Matias , Abraham Silberschatz, Modeling Skewed Distribution Using Multifractals and the `80-20' Law, Proceedings of the 22th International Conference on Very Large Data Bases, p.307-317, September 03-06, 1996</name><name>Usama M. Fayyad, Cory A. Reina, and Paul S. Bradley. Initialization of iterative refinement clustering algorithms. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD-98), pages 194-198, New York City, New York, August 1998. AAAI Press.</name><name>Phillip B. Gibbons , Yossi Matias, New sampling-based summary statistics for improving approximate query answers, ACM SIGMOD Record, v.27 n.2, p.331-342, June 1998</name><name>Sudipto Guha , Rajeev Rastogi , Kyuseok Shim, CURE: an efficient clustering algorithm for large databases, ACM SIGMOD Record, v.27 n.2, p.73-84, June 1998</name><name>Peter J. Haas , Jeffrey F. Naughton , S. Seshadri , Lynne Stokes, Sampling-Based Estimation of the Number of Distinct Values of an Attribute, Proceedings of the 21th International Conference on Very Large Data Bases, p.311-322, September 11-15, 1995</name><name>John A. Hartigan, Clustering Algorithms, John Wiley & Sons, Inc., New York, NY, 1975</name><name>Joseph M. Hellerstein , Peter J. Haas , Helen J. Wang, Online aggregation, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.171-182, May 11-15, 1997, Tucson, Arizona, United States</name><name>Bernardo A. Huberman, Peter L. T. Pirolli, James E. Pitkow, and Rajan M. Lukose. Strong regularities in world wide web surfing. Science, 280(5360):95-97, April 3 1998.</name><name>Najmeh Joze-Hkajavi and Kenneth Salem. Twophase clustering of large datasets. Technical Report CS-98-27, Department of Computer Science, University of Waterloo, November 1998.</name><name>Jeremy Kepner, Xiaohui Fan, Neta Buhcall, James Gunn, Robert Lupton, and Ghohung Xu. An automated cluster finder: the adaptive matched filter. The Astrophysics Journal, 517, 1999.</name><name>Thomas M. Mitchell, Machine Learning, McGraw-Hill Higher Education, 1997</name><name>Frank Olken , Doron Rotem , Ping Xu, Random sampling from hash files, ACM SIGMOD Record, v.19 n.2, p.375-386, Jun. 1990</name><name>Edie Rasmussen, Clustering algorithms, Information retrieval: data structures and algorithms, Prentice-Hall, Inc., Upper Saddle River, NJ, 1992</name><name>Brian D. Ripley. Spatial Statistics. John Wiley & Sons, 1981.</name><name>Manfred Schroeder. Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise. W.H. Freeman and Company, New York, 1991.</name><name>Jeffrey S. Vitter, Random sampling with a reservoir, ACM Transactions on Mathematical Software (TOMS), v.11 n.1, p.37-57, March 1985</name><name>Oren Zamir , Oren Etzioni, Web document clustering: a feasibility demonstration, Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, p.46-54, August 24-28, 1998, Melbourne, Australia</name><name>Tian Zhang , Raghu Ramakrishnan , Miron Livny, BIRCH: an efficient data clustering method for very large databases, ACM SIGMOD Record, v.25 n.2, p.103-114, June 1996</name><name>G.K. Zipf. Human Behavior and Principle of Least Effort: An Introduction to Human Ecology. Addison Wesley, Cambridge, Massachusetts, 1949.</name></citation><abstract>Data mining in large data sets often requires a sampling or summarization step to form an in-core representation of the data that can be processed more efficiently. Uniform random sampling is frequently used in practice and also frequently criticized because it will miss small clusters. Many natural phenomena are known to follow Zipf's distribution and the inability of uniform sampling to find small clusters is of practical concern. Density Biased Sampling is proposed to probabilistically under-sample dense regions and over-sample light regions. A weighted sample is used to preserve the densities of the original data. Density biased sampling naturally includes uniform sampling as a special case. A memory efficient algorithm is proposed that approximates density biased sampling using only a single scan of the data. We empirically evaluate density biased sampling using synthetic data sets that exhibit varying cluster size distributions finding up to a factor of six improvement over uniform sampling.</abstract></paper><paper><title>LOF: identifying density-based local outliers</title><author><AuthorName>Markus M. Breunig</AuthorName><institute><InstituteName>Institute for Computer Science, University of Munich, Oettingenstr. 67, D-80538 Munich, Germany</InstituteName><country></country></institute></author><author><AuthorName>Hans-Peter Kriegel</AuthorName><institute><InstituteName>Institute for Computer Science, University of Munich, Oettingenstr. 67, D-80538 Munich, Germany</InstituteName><country></country></institute></author><author><AuthorName>Raymond T. Ng</AuthorName><institute><InstituteName>Department of Computer Science, University of British Columbia, Vancouver, BC V6T 1Z4 Canada</InstituteName><country></country></institute></author><author><AuthorName>J&#246;rg Sander</AuthorName><institute><InstituteName>Institute for Computer Science, University of Munich, Oettingenstr. 67, D-80538 Munich, Germany</InstituteName><country></country></institute></author><year>2000</year><conference>International Conference on Management of Data</conference><citation><name>Aming, A., Agrawal R., Raghavan R: "A Linear Method for Deviation Detection in Large Databases", Proc. 2rid Int. Conf. on Knowledge Discovery and Data Mining, Portland, OR, AAAI Press, 1996, p. 164-169.</name><name>Mihael Ankerst , Markus M. Breunig , Hans-Peter Kriegel , J&#246;rg Sander, OPTICS: ordering points to identify the clustering structure, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.49-60, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States</name><name>Rakesh Agrawal , Johannes Gehrke , Dimitrios Gunopulos , Prabhakar Raghavan, Automatic subspace clustering of high dimensional data for data mining applications, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.94-105, June 01-04, 1998, Seattle, Washington, United States</name><name>Stefan Berchtold , Daniel A. Keim , Hans-Peter Kriegel, The X-tree: An Index Structure for High-Dimensional Data, Proceedings of the 22th International Conference on Very Large Data Bases, p.28-39, September 03-06, 1996</name><name>Barnett V., Lewis T.: "Outliers in statistical data", John Wiley, 1994.</name><name>DuMouchel W., Schonlau M.: "A Fast Computer Intrusion Detection Algorithm based on Hypothesis Testing of Command Transition Probabilities", Proc. 4th Int. Conf. on Knowledge Discovery and Data Mining, New York, NY, AAAI Press, 1998, pp. 189-193.</name><name>Ester M., Kriegel H.-E, Sander J., Xu X.: "A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise", Proc. 2rid Int. Conf. on Knowledge Discovery and Data Mining, Portland, OR, AAAI Press, 1996, pp. 226-231.</name><name>Tom Fawcett , Foster Provost, Adaptive Fraud Detection, Data Mining and Knowledge Discovery, v.1 n.3, p.291-316, 1997</name><name>Fayyad U., Piatetsky-Shapiro G., Smyth R: "Knowledge Discovery and Data Mining: Towards a Unifying Framework", Proc. 2rid Int. Conf. on Knowledge Discovery and Data Mining, Portland, OR, 1996, pp. 82-88.</name><name>Hawkins, D.: "Identification of Outliers", Chapman and Hall, London, 1980.</name><name>Hinneburg A., Keim D.A.: "An Efficient Approach to Clustering in Large Multimedia Databases with Noise", Proc. 4th Int. Conf. on Knowledge Discovery and Data Mining, New York City, NY, 1998,pp. 58-65.</name><name>Johnson T., Kwok I., Ng R.: "Fast Computation of 2- Dimensional Depth Contours", Proc. 4th Int. Conf. on Knowledge Discovery and Data Mining, New York, NY, AAAI Press, 1998, pp. 224-228.</name><name>Edwin M. Knorr , Raymond T. Ng, Algorithms for Mining Distance-Based Outliers in Large Datasets, Proceedings of the 24rd International Conference on Very Large Data Bases, p.392-403, August 24-27, 1998</name><name>Edwin M. Knorr , Raymond T. Ng, Finding Intensional Knowledge of Distance-Based Outliers, Proceedings of the 25th International Conference on Very Large Data Bases, p.211-222, September 07-10, 1999</name><name>Raymond T. Ng , Jiawei Han, Efficient and Effective Clustering Methods for Spatial Data Mining, Proceedings of the 20th International Conference on Very Large Data Bases, p.144-155, September 12-15, 1994</name><name>Franco P. Preparata , Michael I. Shamos, Computational geometry: an introduction, Springer-Verlag New York, Inc., New York, NY, 1985</name><name>Ramaswamy S., Rastogi R., Kyuseok S.: "Efficient Algorithms for Mining Outliers from Large Data Sets", Proc. ACM SIDMOD Int. Conf. on Management of Data, 2000.</name><name>Ida Ruts , Peter J. Rousseeuw, Computing depth contours of bivariate point clouds, Computational Statistics & Data Analysis, v.23 n.1, p.153-168, Nov. 15, 1996</name><name>Gholamhosein Sheikholeslami , Surojit Chatterjee , Aidong Zhang, WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases, Proceedings of the 24rd International Conference on Very Large Data Bases, p.428-439, August 24-27, 1998</name><name>Tukey J. W.: "Exploratory Data Analysis", Addison-Wesley, 1977.</name><name>Roger Weber , Hans-J&#246;rg Schek , Stephen Blott, A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces, Proceedings of the 24rd International Conference on Very Large Data Bases, p.194-205, August 24-27, 1998</name><name>Wei Wang , Jiong Yang , Richard R. Muntz, STING: A Statistical Information Grid Approach to Spatial Data Mining, Proceedings of the 23rd International Conference on Very Large Data Bases, p.186-195, August 25-29, 1997</name><name>Tian Zhang , Raghu Ramakrishnan , Miron Livny, BIRCH: an efficient data clustering method for very large databases, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.103-114, June 04-06, 1996, Montreal, Quebec, Canada</name></citation><abstract>For many KDD applications, such as detecting criminal activities in E-commerce, finding the rare instances or the outliers, can be more interesting than finding the common patterns. Existing work in outlier detection regards being an outlier as a binary property. In this paper, we contend that for many scenarios, it is more meaningful to assign to each object a degree of being an outlier. This degree is called the local outlier factor (LOF) of an object. It is local in that the degree depends on how isolated the object is with respect to the surrounding neighborhood. We give a detailed formal analysis showing that LOF enjoys many desirable properties. Using real-world datasets, we demonstrate that LOF can be used to find outliers which appear to be meaningful, but can otherwise not be identified with existing approaches. Finally, a careful performance evaluation of our algorithm confirms we show that our approach of finding local outliers can be practical.</abstract></paper><paper><title>Answering complex SQL queries using automatic summary tables</title><author><AuthorName>Markos Zaharioudakis</AuthorName><institute><InstituteName>IBM Almaden Research Center, San Jose, CA</InstituteName><country></country></institute></author><author><AuthorName>Roberta Cochrane</AuthorName><institute><InstituteName>IBM Almaden Research Center, San Jose, CA</InstituteName><country></country></institute></author><author><AuthorName>George Lapis</AuthorName><institute><InstituteName>IBM Almaden Research Center, San Jose, CA</InstituteName><country></country></institute></author><author><AuthorName>Hamid Pirahesh</AuthorName><institute><InstituteName>IBM Almaden Research Center, San Jose, CA</InstituteName><country></country></institute></author><author><AuthorName>Monica Urata</AuthorName><institute><InstituteName>IBM Almaden Research Center, San Jose, CA</InstituteName><country></country></institute></author><year>2000</year><conference>International Conference on Management of Data</conference><citation><name>Randall G. Bello , Karl Dias , Alan Downing , James J. Feenan, Jr. , William D. Norcott , Harry Sun , Andrew Witkowski , Mohamed Ziauddin, Materialized Views in Oracle, Proceedings of the 24rd International Conference on Very Large Data Bases, p.659-664, August 24-27, 1998</name><name>Surajit Chaudhuri , Ravi Krishnamurthy , Spyros Potamianos , Kyuseok Shim, Optimizing Queries with Materialized Views, Proceedings of the Eleventh International Conference on Data Engineering, p.190-200, March 06-10, 1995</name><name>Sara Cohen , Werner Nutt , Alexander Serebrenik, Rewriting aggregate queries using views, Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.155-166, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States</name><name>Jim Gray , Adam Bosworth , Andrew Layman , Hamid Pirahesh, Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total, Proceedings of the Twelfth International Conference on Data Engineering, p.152-159, February 26-March 01, 1996</name><name>St&#233;phane Grumbach , Maurizio Rafanelli , Leonardo Tininini, Querying aggregate data, Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.174-184, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States</name><name>Ashish Gupta , Venky Harinarayan , Dallan Quass, Aggregate-Query Processing in Data Warehousing Environments, Proceedings of the 21th International Conference on Very Large Data Bases, p.358-369, September 11-15, 1995</name><name>Venky Harinarayan , Anand Rajaraman , Jeffrey D. Ullman, Implementing data cubes efficiently, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.205-216, June 04-06, 1996, Montreal, Quebec, Canada</name><name>A. Y. Levy, A. O. Mendelzon, Y. Sagiv, D. Srivastava, "Answering queries using views", Proc. of the ACM-PODS Conf., San Jose, CA, 1995.</name><name>J. Melton (ed.), "Final Committee Draft- Database Language SQL- Part 2: Foundation (SQL/Foundation)", H2-98-519/DBL FRA-017, 1998.</name><name>Inderpal Singh Mumick , Dallan Quass , Barinderpal Singh Mumick, Maintenance of data cubes and summary tables in a warehouse, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.100-111, May 11-15, 1997, Tucson, Arizona, United States</name><name>Werner Nutt , Yehoshus Sagiv , Sara Shurin, Deciding equivalences among aggregate queries, Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, p.214-223, June 01-04, 1998, Seattle, Washington, United States</name><name>Divesh Srivastava , Shaul Dar , H. V. Jagadish , Alon Y. Levy, Answering Queries with Aggregation Using Views, Proceedings of the 22th International Conference on Very Large Data Bases, p.318-329, September 03-06, 1996</name><name>Zaharioudakis, R. Cochrane, G. Lapis, H. Pirahesh, M. Urata, "Answering Complex SQL Queries Using Automated Summary Tables", available upon request from the authors.</name></citation><abstract>We investigate the problem of using materialized views to answer SQL queries. We focus on modern decision-support queries, which involve joins, arithmetic operations and other (possibly user-defined) functions, aggregation (often along multiple dimensions), and nested subqueries. Given the complexity of such queries, the vast amounts of data upon which they operate, and the requirement for interactive response times, the use of materialized views (MVs) of similar complexity is often mandatory for acceptable performance. We present a novel algorithm that is able to rewrite a user query so that it will access one or more of the available MVs instead of the base tables. The algorithm extends prior work by addressing the new sources of complexity mentioned above, that is, complex expressions, multidimensional aggregation, and nested subqueries. It does so by relying on a graphical representation of queries and a bottom-up, pair-wise matching of nodes from the query and MV graphs. This approach offers great modularity and extensibility, allowing for the rewriting of a large class of queries.</abstract></paper><paper><title>Synchronizing a database to improve freshness</title><author><AuthorName>Junghoo Cho</AuthorName><institute><InstituteName>Stanford University</InstituteName><country></country></institute></author><author><AuthorName>Hector Garcia-Molina</AuthorName><institute><InstituteName>Stanford University</InstituteName><country></country></institute></author><year>2000</year><conference>International Conference on Management of Data</conference><citation><name>Google Inc. http://www.google.com.</name><name>Soumen Chakrabarti , Martin van den Berg , Byron Dom, Focused crawling: a new approach to topic-specific Web resource discovery, Proceeding of the eighth international conference on World Wide Web, p.1623-1640, May 1999, Toronto, Canada</name><name>J. Cho and H. Garcia-Molina. Synchronizing a database to improve freshness. Technical report, Stanford University, 1999. http://www-db. stanford.edu/~cho/papers/cho-synch.ps.</name><name>J. Cho and H. Garcia-Molina. Estimating frequency of change. Technical report, Stanford University, 2000.</name><name>Junghoo Cho , Hector Garcia-Molina , Lawrence Page, Efficient crawling through URL ordering, Computer Networks and ISDN Systems, v.30 n.1-7, p.161-172, April 1, 1998</name><name>E. Coffman, Jr., Z. Liu, and R. R. Weber. Optimal robot scheduling for web search engines. Technical report, INRIA, 1997.</name><name>J. Hammer, H. Garcia-Molina, J. Widom, W. J. Labio, and Y. Zhuge. The Stanford data warehousing project. IEEE Data Engineering Bulletin, June 1995.</name><name>Venky Harinarayan , Anand Rajaraman , Jeffrey D. Ullman, Implementing data cubes efficiently, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.205-216, June 04-06, 1996, Montreal, Quebec, Canada</name><name>S. Lawrence and C. L. Giles. Accessibility of information on the web. Nature, 400:107-109, 1999.</name><name>Sergey Brin , Lawrence Page, The anatomy of a large-scale hypertextual Web search engine, Computer Networks and ISDN Systems, v.30 n.1-7, p.107-117, April 1, 1998</name><name>H. M. Taylor and S. Karlin. An Introduction To Stochastic Modeling. Academic Press, 3rd edition, 1998.</name><name>G. B. Thomas, Jr. Calculus and analytic geometry. Addison-Wesley, 4th edition, 1969.</name><name>Yue Zhuge , H&#233;ctor Garc&#237;a-Molina , Joachim Hammer , Jennifer Widom, View maintenance in a warehousing environment, Proceedings of the 1995 ACM SIGMOD international conference on Management of data, p.316-327, May 22-25, 1995, San Jose, California, United States</name></citation><abstract>In this paper we study how to refresh a local copy of an autonomous data source to maintain the copy up-to-date. As the size of the data grows, it becomes more difficult to maintain the copy \ fresh, &ldquo;making it crucial to synchronize the copy effectively. We define two freshness metrics, change models of the underlying data, and synchronization policies. We analytically study how effective the various policies are. We also experimentally verify our analysis, based on data collected from 270 web sites for more than 4 months, and we show that our new policy improves the \ freshness&rdquo; very significantly compared to current policies in use.</abstract></paper><paper><title>How to roll a join: asynchronous incremental view maintenance</title><author><AuthorName>Kenneth Salem</AuthorName><institute><InstituteName>Dept. of Computer Science, University of Waterloo</InstituteName><country></country></institute></author><author><AuthorName>Kevin Beyer</AuthorName><institute><InstituteName>Computer Sciences Dept., University of Wisconsin</InstituteName><country></country></institute></author><author><AuthorName>Bruce Lindsay</AuthorName><institute><InstituteName>IBM almaden Research Center</InstituteName><country></country></institute></author><author><AuthorName>Roberta Cochrane</AuthorName><institute><InstituteName>IBM almaden Research Center</InstituteName><country></country></institute></author><year>2000</year><conference>International Conference on Management of Data</conference><citation><name>D. Agrawal , A. El Abbadi , A. Singh , T. Yurek, Efficient view maintenance at data warehouses, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.417-427, May 11-15, 1997, Tucson, Arizona, United States</name><name>Randall G. Bello , Karl Dias , Alan Downing , James J. Feenan, Jr. , William D. Norcott , Harry Sun , Andrew Witkowski , Mohamed Ziauddin, Materialized Views in Oracle, Proceedings of the 24rd International Conference on Very Large Data Bases, p.659-664, August 24-27, 1998</name><name>Jose A. Blakeley , Per-Ake Larson , Frank Wm Tompa, Efficiently updating materialized views, Proceedings of the 1986 ACM SIGMOD international conference on Management of data, p.61-71, May 28-30, 1986, Washington, D.C., United States</name><name>Latha S. Colby , Timothy Griffin , Leonid Libkin , Inderpal Singh Mumick , Howard Trickey, Algorithms for deferred view maintenance, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.469-480, June 04-06, 1996, Montreal, Quebec, Canada</name><name>Ashish Gupta , H. V. Jagadish , Inderpal Singh Mumick, Data Integration using Self-Maintainable Views, Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology, p.140-144, March 25-29, 1996</name><name>Ashish Gupta and Inderpal Singh Mumick. Maintenance of materialized views: Problems, techniques, and applications. Bulletin of the IEEE Technical Committee on Data Engineering, i8(2):3-i9, i995.</name><name>Ashish Gupta , Inderpal Singh Mumick , V. S. Subrahmanian, Maintaining views incrementally, Proceedings of the 1993 ACM SIGMOD international conference on Management of data, p.157-166, May 25-28, 1993, Washington, D.C., United States</name><name>Inderpal Singh Mumick , Dallan Quass , Barinderpal Singh Mumick, Maintenance of data cubes and summary tables in a warehouse, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.100-111, May 11-15, 1997, Tucson, Arizona, United States</name><name>Dallan Quass , Ashish Gupta , Inderpal Singh Mumick , Jennifer Widom, Making views self-maintainable for data warehousing, Proceedings of the fourth international conference on on Parallel and distributed information systems, p.158-169, December 18-20, 1996, Miami Beach, Florida, United States</name><name>Dallan Quass , Jennifer Widom, On-line warehouse view maintenance, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.393-404, May 11-15, 1997, Tucson, Arizona, United States</name><name>K. Salem, K. Beyer, B. Lindsay, and It. Cochrane. How to roll a join: Asynchronous incremental view maintenance. Technical l#eport CS-2000-6, Dept. of Computer Science, University of Waterloo, February 2000.</name><name>Yue Zhuge , H&#233;ctor Garc&#237;a-Molina , Joachim Hammer , Jennifer Widom, View maintenance in a warehousing environment, Proceedings of the 1995 ACM SIGMOD international conference on Management of data, p.316-327, May 22-25, 1995, San Jose, California, United States</name><name>Yue Zhuge , Hector Garcia-Molina , Janet L. Wiener, The Strobe algorithms for multi-source warehouse consistency, Proceedings of the fourth international conference on on Parallel and distributed information systems, p.146-157, December 18-20, 1996, Miami Beach, Florida, United States</name></citation><abstract>Incremental refresh of a materialized join view is often less expensive than a full, non-incremental refresh. However, it is still a potentially costly atomic operation. This paper presents an algorithm that performs incremental view maintenance as a series of small, asynchronous steps. The size of each step can be controlled to limit contention between the refresh process and concurrent operations that access the materialized view or the underlying relations. The algorithm supports point-in-time refresh, which allows a materialized view to be refreshed to any time between the last refresh and the present.</abstract></paper><paper><title>On wrapping query languages and efficient XML integration</title><author><AuthorName>Vassilis Christophides</AuthorName><institute><InstituteName>Institute of Computer Science, Forth, P.O, Box 1385, Heraldion, Greece</InstituteName><country></country></institute></author><author><AuthorName>Sophie Cluet</AuthorName><institute><InstituteName>INRIA Rooquencourt, BP 105,78153, Le Chesnay Cedex, France</InstituteName><country></country></institute></author><author><AuthorName>J&#233;r&#466;me Sim&#232;on</AuthorName><institute><InstituteName>Bell Laboratories, 600 Mountain Avenue, Murray Hill, NJ</InstituteName><country></country></institute></author><year>2000</year><conference>International Conference on Management of Data</conference><citation><name>S. Abiteboul, D. Quass, J. McHugh, J. Widom, and J. L. Wiener. The lorel query language for semistructured data. Inte
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -