📄 sigmod_2001_elementary.txt
字号:
Example-Based Approach, Journal of Intelligent Information Systems, v.8 n.2, p.133-153, March/April 1997</name><name>J. J. Rocchio. Relevance feedback in information retrieval. In G. Salton, editor, The SMART Information Retrieval System, pages 313-323. Prentice Hall, Englewood Cliffs, NJ, 1971.</name><name>Gerard Salton , Christopher Buckley, Term-weighting approaches in automatic text retrieval, Information Processing and Management: an International Journal, v.24 n.5, p.513-523, 1988</name><name>G. Salton , M. J. McGill, The SMART and SIRE experimental retrieval systems, Readings in information retrieval, Morgan Kaufmann Publishers Inc., San Francisco, CA, 1997</name><name>Hinrich Sch&#252;tze , David A. Hull , Jan O. Pedersen, A comparison of classifiers and document representations for the routing problem, Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, p.229-237, July 09-13, 1995, Seattle, Washington, United States</name><name>Atsushi Sugiura , Oren Etzioni, Query routing for Web search engines: architectures and experiments, Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking: the international journal of computer and telecommunications netowrking, p.417-429, June 2000, Amsterdam, The Netherlands</name><name>C. J. Van Rijsbergen, Information Retrieval, Butterworth-Heinemann, Newton, MA, 1979</name><name>Wenxian Wang , Weiyi Meng , Clement Yu, Concept Hierarchy Based Text Database Categorization in a Metasearch Engine Environment, Proceedings of the First International Conference on Web Information Systems Engineering (WISE'00)-Volume 1, p.283, June 19-20, 2000</name><name>Jinxi Xu , Jamie Callan, Effective retrieval with distributed collections, Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, p.112-120, August 24-28, 1998, Melbourne, Australia</name><name>Yiming Yang , Xin Liu, A re-examination of text categorization methods, Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, p.42-49, August 15-19, 1999, Berkeley, California, United States</name><name>G. K. Zipf. Human Behavior and the Principle of Least Effort. Addison-Wesley, 1949.</name></citation><abstract>The contents of many valuable web-accessible databases are only accessible through search interfaces and are hence invisible to traditional web &ldquo;crawlers.&rdquo; Recent studies have estimated the size of this &ldquo;hidden web&rdquo; to be 500 billion pages, while the size of the &ldquo;crawlable&rdquo; web is only an estimated two billion pages. Recently, commercial web sites have started to manually organize web-accessible databases into Yahoo!-like hierarchical classification schemes. In this paper, we introduce a method for automating this classification process by using a small number of query probes. To classify a database, our algorithm does not retrieve or inspect any documents or pages from the database, but rather just exploits the number of matches that each query probe generates at the database in question. We have conducted an extensive experimental evaluation of our technique over collections of real documents, including over one hundred web-accessible databases. Our experiments show that our system has low overhead and achieves high classification accuracy across a variety of databases.</abstract></paper><paper><title>Data bubbles: quality preserving performance boosting for hierarchical clustering</title><author><AuthorName>Markus M. Breunig</AuthorName><institute><InstituteName>Institute for Computer Science, University of Munich, Oettingenstr. 67, D-80538 Munich, Germany</InstituteName><country></country></institute></author><author><AuthorName>Hans-Peter Kriegel</AuthorName><institute><InstituteName>Institute for Computer Science, University of Munich, Oettingenstr. 67, D-80538 Munich, Germany</InstituteName><country></country></institute></author><author><AuthorName>Peer Kr&#246;ger</AuthorName><institute><InstituteName>Institute for Computer Science, University of Munich, Oettingenstr. 67, D-80538 Munich, Germany</InstituteName><country></country></institute></author><author><AuthorName>J&#246;rg Sander</AuthorName><institute><InstituteName>Department of Computer Science, University of British Columbia, Vancouver, BC V6T 1Z4 Canada</InstituteName><country></country></institute></author><year>2001</year><conference>International Conference on Management of Data</conference><citation><name>Mihael Ankerst , Markus M. Breunig , Hans-Peter Kriegel , J&#246;rg Sander, OPTICS: ordering points to identify the clustering structure, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.49-60, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States</name><name>Bradley P. S., Fayyad U., Reina C.: "Scaling Clustering Algorithms to Large Databases", Proc. 4th Int. Conf. on Knowledge Discovery and Data Mining, New York, NY, AAAI Press, 1998, pp. 9-15.</name><name>Markus M. Breunig , Hans-Peter Kriegel , J&#246;rg Sander, Fast Hierarchical Clustering Based on Compressed Data and OPTICS, Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery, p.232-242, September 13-16, 2000</name><name>William DuMouchel , Chris Volinsky , Theodore Johnson , Corinna Cortes , Daryl Pregibon, Squashing flat files flatter, Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, p.6-15, August 15-18, 1999, San Diego, California, United States</name><name>Ester M., Kriegel H.-P., Sander J., Xu X.: "A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise", Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining, Portland, OR, AAAI Press, 1996, pp. 226-231.</name><name>Anil K. Jain , Richard C. Dubes, Algorithms for clustering data, Prentice-Hall, Inc., Upper Saddle River, NJ, 1988</name><name>Kaufman L., Rousseeuw P. J.: "Finding Groups in Data: An Introduction to Cluster Analysis",John Wiley &Sons,1990.</name><name>MacQueen J.: "Some Methods for Classification and Analysis of Multivariate Observations", Proc. 5th Berkeley Symp. Math. Statist. Prob., 1967, Vol. 1, pp. 281-297.</name><name>Sibson R.: "SLINK: an optimally efficient algorithm for the single-link cluster method", the Computer Journal Vol. 16, No. 1, 1973, pp. 30-34.</name><name>Tian Zhang , Raghu Ramakrishnan , Miron Livny, BIRCH: an efficient data clustering method for very large databases, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.103-114, June 04-06, 1996, Montreal, Quebec, Canada</name></citation><abstract>In this paper, we investigate how to scale hierarchical clustering methods (such as OPTICS) to extremely large databases by utilizing data compression methods (such as BIRCH or random sampling). We propose a three step procedure: 1) compress the data into suitable representative objects; 2) apply the hierarchical clustering algorithm only to these objects; 3) recover the clustering structure for the whole data set, based on the result for the compressed data. The key issue in this approach is to design compressed data items such that not only a hierarchical clustering algorithm can be applied, but also that they contain enough information to infer the clustering structure of the original data set in the third step. This is crucial because the results of hierarchical clustering algorithms, when applied naively to a random sample or to the clustering features (CFs) generated by BIRCH, deteriorate rapidly for higher compression rates. This is due to three key problems, which we identify. To solve these problems, we propose an efficient post-processing step and the concept of a Data Bubble as a special kind of compressed data item. Applying OPTICS to these Data Bubbles allows us to recover a very accurate approximation of the clustering structure of a large data set even for very high compression rates. A comprehensive performance and quality evaluation shows that we only trade very little quality of the clustering result for a great increase in performance.</abstract></paper><paper><title>Mining needle in a haystack: classifying rare classes via two-phase rule induction</title><author><AuthorName>Mahesh V. Joshi</AuthorName><institute><InstituteName>Department of Computer Science, IBM T. J. Watson Research Center and University of Minnesota, Minneapolis</InstituteName><country></country></institute></author><author><AuthorName>Ramesh C. Agarwal</AuthorName><institute><InstituteName>IBM T. J. Watson Research Center, P.O. Box 704, Yorktown Heights, NY</InstituteName><country></country></institute></author><author><AuthorName>Vipin Kumar</AuthorName><institute><InstituteName>Department of Computer Science, University of Minnesota, Minneapolis, MN</InstituteName><country></country></institute></author><year>2001</year><conference>International Conference on Management of Data</conference><citation></citation><abstract>Learning models to classify rarely occurring target classes is an important problem with applications in network intrusion detection, fraud detection, or deviation detection in general. In this paper, we analyze our previously proposed two-phase rule induction method in the context of learning complete and precise signatures of rare classes. The key feature of our method is that it separately conquers the objectives of achieving high recall and high precision for the given target class. The first phase of the method aims for high recall by inducing rules with high support and a reasonable level of accuracy. The second phase then tries to improve the precision by learning rules to remove false positives in the collection of the records covered by the first phase rules. Existing sequential covering techniques try to achieve high precision for each individual disjunct learned. In this paper, we claim that such approach is inadequate for rare classes, because of two problems: splintered false positives and error-prone small disjuncts. Motivated by the strengths of our two-phase design, we design various synthetic data models to identify and analyze the situations in which two state-of-the-art methods, RIPPER and C4.5 rules, either fail to learn a model or learn a very poor model. In all these situations, our two-phase approach learns a model with significantly better recall and precision levels. We also present a comparison of the three methods on a challenging real-life network intrusion detection dataset. Our method is significantly better or comparable to the best competitor in terms of achieving better balance between recall and precision.</abstract></paper><paper><title>Efficient evaluation of XML middle-ware queries</title><author><AuthorName>Mary Fernandez</AuthorName><institute><InstituteName>AT&T Labs, Research</InstituteName><country></country></institute></author><author><AuthorName>Atsuyuki Morishima</AuthorName><institute><InstituteName>University of Tsukuba</InstituteName><country></country></institute></author><author><AuthorName>Dan Suciu</AuthorName><institute><InstituteName>University of Washington</InstituteName><country></country></institute></author><year>2001</year><conference>International Conference on Management of Data</conference><citation><name>Serge Abiteboul , Richard Hull , Victor Vianu, Foundations of Databases: The Logical Level, Addison-Wesley Longman Publishing Co., Inc., Boston, MA, 1995</name><name>Catriel Beeri , Philip A. Bernstein, Computational problems related to the design of normal form relational schemas, ACM Transactions on Database Systems (TODS), v.4 n.1, p.30-59, March 1979</name><name>Michael J. Carey , Jerry Kiernan , Jayavel Shanmugasundaram , Eugene J. Shekita , Subbu N. Subramanian, XPERANTO: Middleware for Publishing Object-Relational Data as XML Documents, Proceedings of the 26th International Conference on Very Large Data Bases, p.646-648, September 10-14, 2000</name><name>Alin Deutsch , Mary Fernandez , Daniela Florescu , Alon Levy , Dan Suciu, A query language for XML, Proceeding of the eighth international conference on World Wide Web, p.1155-1169, May 1999, Toronto, Canada</name><name>Mary Fern&#225;ndez , Wang-Chiew Tan , Dan Suciu, SilkRoute: trading between relations and XML, Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking, p.723-745, June 2000, Amsterdam, The Netherlands</name><name>M. Rys, Microsoft, Support WebCast: Microsoft SQL Server 2000: New XML Features, April, 2000 (http://support.microsoft.com/servicedesks/ Webcasts/wc042800/wcblurb042800.asp)</name><name>Arnaud Sahuguet, Everything You Ever Wanted to Know About DTDs, But Were Afraid to Ask (Extended Abstract), Selected papers from the Third International Workshop WebDB 2000 on The World Wide Web and Databases, p.171-183, May 18-19, 2000</name><name>P. Griffiths Selinger , M. M. Astrahan , D. D. Chamberlin , R. A. Lorie , T. G. Price, Access path selection in a relational database management system, Proceedings of the 1979 ACM SIGMOD international conference on Management of data, May 30-June 01, 1979, Boston, Massachusetts</name><name>Jayavel Shanmugasundaram , Eugene J. Shekita , Rimon Barr , Michael J. Carey , Bruce G. Lindsay , Hamid Pirahesh , Berthold Reinwald, Efficiently Publishing Relational Data as XML Documents, Proceedings of the 26th International Conference on Very Large Data Bases, p.65-76, September 10-14, 2000</name><name>B. Wait, Oracle Corporation, "Using XML in Oracle Database Applications", Nov., 1999, (http://technet.oracle.com/tech/xml/info /htdocs/otnwp/about xml.htm)</name><name>Transaction Processing Performance Council, TPC-H (ad-hoc, decision support) benchmark, http://www.tpc.org/</name><name>XML Extender Administration and Programming, "IBM DB2 Universal Database XML Extender", (http://www-4.ibm.com/software/data/db2/ extenders/xmlext/docs/v71wrk/english/index.htm)</name><name>World-Wide Web Consortium XSL Transformations (XSLT), Version 1.0. W3C Recommendation, Nov., 1999. http://www.w3.org/TR/xslt/.</name></citation><abstract>We address the problem of efficiently constructing materialized XML views of relational databases. In our setting, the XML view is specified by a query in the declarative query language of a middle-ware system, called SilkRoute. The middle-ware system evaluates a query by sending one or more SQL queries to the target relational database, integrating the resulting tuple streams, and adding the XML tags. We focus on how to best choose the SQL queries, without having control over the target RDBMS.</abstract></paper><paper><title>Filtering algorithms and implementation for very fast publish/subscribe systems</title><author><AuthorName>Fran&#231;oise Fabret</AuthorName><institute><InstituteName>INRIA Rocquencourt</InstituteName><country></country></institute></author><author><AuthorName>H. Arno Jacobsen</AuthorName><institute><InstituteName>University of Toronto</InstituteName><country></country></institute></author><author><AuthorName>Fran&#231;ois Llirbat</AuthorName><institute><InstituteName>INRIA Rocquencourt</InstituteName><country></country></institute></author><author><AuthorName>Jo&#259;o Pereira</AuthorName><institute><InstituteName>INRIA Rocquencourt</InstituteName><country></country></institute></author><author><AuthorName>Kenneth A. Ross</AuthorName><institute><InstituteName>Columbia University</InstituteName><country></country></institute></author><author><AuthorName>Dennis Shasha</AuthorName><institute><InstituteName>Courant Institute of Mathematical Sciences, New York University</InstituteName><country></country></institute></author><year>2001</year><conference>International Conference on Management of Data</conference><citation><name>Marcos K. Aguilera , Robert E. Strom , Daniel C. Sturman , Mark Astley , Tushar D. Chandra, Matching events in a content-based subscription system, Proceedings of the eighteenth annual ACM symposium on Principles of distributed computing, p.53-61, May 04-06, 1999, Atlanta, Georgia, United States</name><name>Mehmet Altinel , Michael J. Franklin, Efficient Filtering of XML Documents for Selective Dissemination of Information, Proceedings of the 26th International Conference on Very Large Data Bases, p.53-64, September 10-14, 2000</name><name>Antonio Carzaniga , David S. Rosenblum , Alexander L. Wolf, Achieving scalability and expressiveness in an Internet-scale event notification service, Proceedings of the nineteenth annual ACM symposium on Principles of distributed computing, p.219-227, July 16-19, 2000, Portland, Oregon, United States</name><name>Jianjun Chen , David J. DeWitt , Feng Tian , Yuan Wang, NiagaraCQ: a scalable continuous query system for Internet databases, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.379-390, May 15-18, 2000, Dallas, Texas, United States</name><name>Phil Bernstein , Michael Brodie , Stefano Ceri , David DeWitt , Mike Franklin , Hector Garcia-Molina , Jim Gray , Jerry Held , Joe Hellerstein , H. V. Jagadish , Michael Lesk , Dave Maier , Jeff Naughton , Hamid Pirahesh , Mike Stonebraker , Jeff Ullman, The Asilomar report on database research, ACM SIGMOD Record, v.27 n.4, p.74-80, Dec. 1998</name><name>K. J. Gough and G. Smith. Efficient recognition of events in distributed systems. In Proceedings of ACSC-18, 1995.</name><name>Eric N. Hanson, Rule condition testing and action execution in Ariel, Proceedings of the 1992 ACM SIGMOD international conference on Management of data, p.49-58, June 02-05, 1992, San Diego, California, United States</name><name>E. Hanson, C. Carnes, L. Huang, M. Konyala, L. Noronha, S. Parasarathy, J. Park and A. Vernon. Scalable trigger processing. In Proceedings of the Int. Conf. on Data Engineering, 1999.</name><name>Eric N. Hanson , Moez Chaabouni , Chang-Ho Kim , Yu-Wang Wang, A predicate matching algorithm for database rule systems, Proceedings of the 1990 ACM SIGMOD international conference on Management of data, p.271-280, May 23-26, 1990, Atlantic City, New Jersey, United States</name><name>New Era of networks Inc. http://www.neonsoft.com/products/NEONet.html.</name><name>Jo&#227;o Pereira , Fran&#231;oise Fabret , Fran&#231;ois Llirbat , Radu Preotiuc-Pietro , Kenneth A. Ross , Dennis Shasha, Publish/Subscribe on the Web at Extreme Speed, Proceedings of the 26th International Conference on Very Large Data Bases, p.627-630, September 10-14, 2000</name><name>Jo&#227;o Pereira , Fran&#231;oise Fabret , Fran&#231;ois Llirbat , Dennis Shasha, Efficient Matching for Web-Based Publish/Subscribe Systems, Proceedings of the 7th International Conference on Cooperative Information Systems, p.162-173, September 06-08, 2000</name><name>Jun Rao , Kenneth A. Ross, Cache Conscious Indexing for Decision-Support in Main Memory, Proceedings of the 25th International Conference on Very Large Data Bases, p.78-89, September 07-10, 1999</name><name>Bill Segal and David Arnold. Elvin has left the building: A publish/ subscribe notification service with quenching. In Proceedings of AUUG97, 1997.</name><name>Steven P. Vanderwiel , David J. Lilja, Data prefetch mechanisms, ACM Computing Surveys (CSUR), v.32 n.2, p.174-199, June 2000</name><name>T. Yan and H. Garcia-Molina. The sift information dissemination system. In ACM TODS 2000, 2000.</name></citation><abstract>Publish/Subscribe is the paradigm in which users express long-term interests (&ldquo;subscriptions&rdquo;) and some agent &ldquo;publishes&rdquo; events (e.g., offers). The job of Publish/Subscribe software is to send events to the owners of subscriptions satisfied by those events. For example, a user subscription may consist of an interest in an airplane of a certain type, not to exceed a certain price. A published event may consist of an offer of an airplane with certain properties including price. Each subscription consists of a conjunction of (attribute, comparison operator, value) predicates. A subscription closely resembles a trigger in that it is a long-lived conditional query associated with an action (usually, informing the subscriber). However, it is less general than a trigger so novel data structures and implementations may enable the creation of more scalable, high performance publish/subscribe systems. This paper describes an attempt at the construction of such algorithms and its implementation. Using a combination of data structures, application-specific caching policies, and application-specific query processing our system can handle 600 events per second for a typical workload containing 6 million subscriptions.</abstract></paper><paper><title>Adaptable query optimization and evaluation in temporal middleware</title><author><AuthorName>Giedrius Slivinskas</AuthorName><institute><InstituteName>Department of Computer Science, Aalborg University, Denmark</InstituteName><country></country></institute></author><author><AuthorName>Christian S. Jensen</AuthorName><institute><InstituteName>Department of Computer Science, Aalborg University, Denmark</InstituteName><country></country></institute></author><author><AuthorName>Richard Thomas Snodgrass</AuthorName><institute><InstituteName>Department of Computer Science, University of Arizona, AZ</InstituteName><country></country></institute></author><year>2001</year><conference>International Conference on Management of Data</conference><citation><name>Jochen van den Bercken , Jens-Peter Dittrich , Bernhard Seeger, javax.XXL: a prototype for a library of query processing algorithms, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.588, May 15-18, 2000, Dallas, Texas, United States</name><name>Michael H. Böhlen, Temporal database system implementations, ACM SIGMOD Record, v.24 n.4, p.53-60, Dec. 1995</name><name>M. H. Bohlen. The Tiger Temporal Database System. URL: <www.cs.auc.dk/tigeradm/>(current as of February 23, 2001).</name><name>Weimin Du , Ravi Krishnamurthy , Ming-Chien Shan, Query Optimization in a Heterogeneous DBMS, Proceedings of the 18th International Conference on Very Large Data Bases, p.277-291, August 23-27, 1992</name><name>Ramez Elmasri , Shamkant B. Navathe, Fundamentals of database systems (2nd ed.), Benjamin-Cummings Publishing Co., Inc., Redwood City, CA, 1994</name><name>O. Etzion, S. Jajodia, and S. Sripada (eds.). Temporal Databases: Research and Practice. LNCS 1399. Springer-Verlag (1998).</name><name>J. A. G. Gendrano, R. Shah, R. T. Snodgrass, and J. Yang. University Information System (UIS) Dataset. TIMECENTER CD-1, September, 1998.</name><name>Goetz Graefe , William J. McKenna, The Volcano Optimizer Generator: Extensibility and Efficient Search, Proceedings of the Ninth International Conference on Data Engineering, p.209-218, April 19-23, 1993</name><name>Himawan Gunadhi , Arie Segev, A framework for query optimization in temporal databases, Proceedings of the fifth international conference on Statistical and scientific database management, p.131-147, March 1990, Charlotte, North Carolina, United States</name><name>W. H. Inmon, Building the data warehouse (2nd ed.), John Wiley & Sons, Inc., New York, NY, 1996</name><name>Matthias Jarke , Jurgen Koch, Query Optimization in Database Systems, ACM Computing Surveys (CSUR), v.16 n.2, p.111-152, June 1984</name><name>Christian S. Jensen , Richard Thomas Snodgrass, Temporal Data Management, IEEE Transactions on Knowledge and Data Engineering, v.11 n.1, p.36-44, January 1999</name><name>Nick Kline , Richard Thomas Snodgrass, Computing Temporal Aggregates, Proceedings of the Eleventh International Conference on Data Engineering, p.222-231, March 06-10, 1995</name><name>T. Y. C. Leung and R. R. Muntz. Stream Processing: Temporal Query Processing and Optimization. In Temporal Databases: Theory, Design, and Implementation, A. U. Tansel et al. (eds.), Benjamin/Cummings, pp. 329-355 (1993).</name><name>M. Tamer &#214;zsu , Patrick Valduriez, Principles of distributed database systems (2nd ed.), Prentice-Hall, Inc., Upper Saddle River, NJ, 1999</name><name>Mary Tork Roth , Peter M. Schwarz, Don't Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources, Proceedings of the 23rd International Conference on Very Large Data Bases, p.266-275, August 25-29, 1997</name><name>Arie Segev , Himawan Gunadhi , Rakesh Chandra , J. George Shanthikumar, Selectivity estimation of temporal data manipulations, Information Scien
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -