📄 sigmod_1998_elementary.txt
字号:
In this paper, we propose the new parallel algorithms for mining association rules with classification hierarchy on a shared-nothing parallel machine to improve its performance. Our algorithms partition the candidate itemsets over the processors, which exploits the aggregate memory of the system effectively. If the candidate itemsets are partitioned without considering classification hierarchy, both the items and its all the ancestor items have to be transmitted, that causes prohibitively large amount of communications. Our method minimizes interprocessor communication by considering the hierarchy. Moreover, in our algorithm, the available memory space is fully utilized by identifying the frequently occurring candidate itemsets and copying them over all the processors, through which frequent itemsets can be processed locally without any communication. Thus it can effectively reduce the load skew among the processors. Several experiments are done by changing the granule of copying itemsets, from the whole tree, to the small group of the frequent itemsets along the hierarchy. The coarser the grain, the easier the control but it is rather difficult to achieve the sufficient load balance. The finer the grain, the more complicated the control is required but it can balance the load quite well.
We implemented proposed algorithms on IBM SP-2. Performance evaluations show that our algorithms are effective for handling skew and attain sufficient speedup ratio.</abstract></paper><paper><title>Reusing invariants: a new strategy for correlated queries</title><author><AuthorName>Jun Rao</AuthorName><institute><InstituteName>Department of Computer Science, Columbia University</InstituteName><country></country></institute></author><author><AuthorName>Kenneth A. Ross</AuthorName><institute><InstituteName>Department of Computer Science, Columbia University</InstituteName><country></country></institute></author><year>1998</year><conference>International Conference on Management of Data</conference><citation><name>Randy Bello. Invariant subplans in dataflow. Sybase IQ Internal Engineering Document~ 1996.</name><name>Hong-Tai Chou and David J. Dewitt. An evaluation of buffer management strategies for relational database systems. In Proceedings of the 11th VLDB Conference, pages 127-141, 1985.</name><name>Damianos Chatziantoniou, Optimization of complex aggregate queries in relational databases, Columbia University, New York, NY, 1998</name><name>Damianos Chatziantoniou , Kenneth A. Ross, Groupwise Processing of Relational Queries, Proceedings of the 23rd International Conference on Very Large Data Bases, p.476-485, August 25-29, 1997</name><name>Umeshwar Dayal, Of Nests and Trees: A Unified Approach to Processing Queries That Contain Nested Subqueries, Aggregates, and Quantifiers, Proceedings of the 13th International Conference on Very Large Data Bases, p.197-208, September 01-04, 1987</name><name>G. Graefe , A. Linville , L. D. Shapiro, Sort vs. Hash Revisited, IEEE Transactions on Knowledge and Data Engineering, v.6 n.6, p.934-944, December 1994</name><name>Richard A. Ganski , Harry K. T. Wong, Optimization of nested SQL queries revisited, Proceedings of the 1987 ACM SIGMOD international conference on Management of data, p.23-33, May 27-29, 1987, San Francisco, California, United States</name><name>Joseph M. Hellerstein, Practical predicate placement, Proceedings of the 1994 ACM SIGMOD international conference on Management of data, p.325-335, May 24-27, 1994, Minneapolis, Minnesota, United States</name><name>Joseph M. Hellerstein , Peter J. Haas , Helen J. Wang, Online aggregation, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.171-182, May 11-15, 1997, Tucson, Arizona, United States</name><name>Joseph M. Hellerstein , Jeffrey F. Naughton, Query execution techniques for caching expensive methods, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.423-434, June 04-06, 1996, Montreal, Quebec, Canada</name><name>Roberto J. Bayardo, Jr. , Daniel P. Miranker, Processing queries for first-few answers, Proceedings of the fifth international conference on Information and knowledge management, p.45-52, November 12-16, 1996, Rockville, Maryland, United States</name><name>Won Kim, On optimizing an SQL-like nested query, ACM Transactions on Database Systems (TODS), v.7 n.3, p.443-469, Sept. 1982</name><name>Dan Leary. Dataflow operators feature specification. Sybase IQ Internal Engineering Document, 1996.</name><name>Donald Michie. "memo" functions and machine learning. Nature, 218:19-22, 1968.</name><name>Kiyoshi Ono , Guy M. Lohman, Measuring the complexity of join enumeration in query optimization, Proceedings of the sixteenth international conference on Very large databases, p.314-325, September 1990, Brisbane, Australia</name><name>Patrick O'Neil , Dallan Quass, Improved query performance with variant indexes, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.38-49, May 11-15, 1997, Tucson, Arizona, United States</name><name>Glenn Paulley, 1997. personal communication.</name><name>Hamid Pirahesh , Joseph M. Hellerstein , Waqar Hasan, Extensible/rule based query rewrite optimization in Starburst, Proceedings of the 1992 ACM SIGMOD international conference on Management of data, p.39-48, June 02-05, 1992, San Diego, California, United States</name><name>P. Griffiths Selinger , M. M. Astrahan , D. D. Chamberlin , R. A. Lorie , T. G. Price, Access path selection in a relational database management system, Proceedings of the 1979 ACM SIGMOD international conference on Management of data, May 30-June 01, 1979, Boston, Massachusetts</name><name>Timos K. Sellis, Multiple-query optimization, ACM Transactions on Database Systems (TODS), v.13 n.1, p.23-52, March 1988</name><name>T. Sellis , S. Ghosh, On the Multiple-Query Optimization Problem, IEEE Transactions on Knowledge and Data Engineering, v.2 n.2, p.262-266, June 1990</name><name>Praveen Seshadri , Hamid Pirahesh , T. Y. Cliff Leung, Complex Query Decorrelation, Proceedings of the Twelfth International Conference on Data Engineering, p.450-458, February 26-March 01, 1996</name><name>David Simmen , Eugene Shekita , Timothy Malkemus, Fundamental techniques for order optimization, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.57-67, June 04-06, 1996, Montreal, Quebec, Canada</name><name>Sybase Corporation. Adaptive Server Enterprise 11.5, 1997.</name><name>Sybase Corporation. Sybase IQ 11.2.1, 1997.</name><name>Tpc-d benchmark standard specification (revision 1.0). May 1995.</name><name>Weipeng P. Yan , Per-&#197;ke Larson, Eager Aggregation and Lazy Aggregation, Proceedings of the 21th International Conference on Very Large Data Bases, p.345-357, September 11-15, 1995</name></citation><abstract>Correlated queries are very common and important in decision support systems. Traditional nested iteration evaluation methods for such queries can be very time consuming. When they apply, query rewriting techniques have been shown to be much more efficient. But query rewriting is not always possible. When query rewriting does not apply, can we do something better than the traditional nested iteration methods? In this paper, we propose a new invariant technique to evaluate correlated queries efficiently. The basic idea is to recognize the part of the subquery that is not related to the outer references and cache the result of that part after its first execution. Later, we can reuse the result and combine it with the result of the rest of the subquery that is changing for each iteration. Our technique applies to arbitrary correlated subqueries.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -