📄 icde_2006_elementary.txt
字号:
<proceedings><paper><title>General Chairs</title><year>2006</year><conference>International Conference on Data Engineering</conference><citation></citation><abstract></abstract></paper><paper><title>External Reviewers</title><year>2006</year><conference>International Conference on Data Engineering</conference><citation></citation><abstract></abstract></paper><paper><title>MiniCount: Efficient Rewriting of COUNT-Queries Using Views</title><author><AuthorName>Vaclav Lin</AuthorName><institute><InstituteName>University of Economics, Czech Republi</InstituteName><country></country></institute></author><author><AuthorName>Vasilis Vassalos</AuthorName><institute><InstituteName>Athens University of Economics and Business, Greec</InstituteName><country></country></institute></author><author><AuthorName>Prodromos Malakasiotis</AuthorName><institute><InstituteName>Athens University of Economics and Business, Greec</InstituteName><country></country></institute></author><year>2006</year><conference>International Conference on Data Engineering</conference><citation></citation><abstract>We present MiniCount, the first efficient sound and&#10;complete algorithm for finding maximally contained&#10;rewritings of conjunctive queries with count, using conjunctive&#10;views with count and conjunctive views without&#10;aggregation. An efficient and scalable solution to this&#10;problem yields significant benefits for data warehousing&#10;and decision support systems, as well as for powerful&#10;data integration systems.We first present a naive rewriting&#10;algorithm implicit in the recent theoretical results by&#10;Cohen et al. [5] and identify three independent sources of&#10;exponential complexity in the naive algorithm, including&#10;an expensive containment check. Then we present&#10;and discuss MiniCount and prove it sound and complete.&#10;We also present an experimental study that shows Mini-&#10;Count to be orders of magnitude faster than the naive&#10;algorithm, and to be able to scale to large numbers of&#10;views</abstract></paper><paper><title>Updates Through Views: A New Hope</title><author><AuthorName>Yannis Kotidis</AuthorName><institute><InstituteName>AT&T Lab</InstituteName><country></country></institute></author><author><AuthorName>Divesh Srivastava</AuthorName><institute><InstituteName>AT&T Lab</InstituteName><country></country></institute></author><author><AuthorName>Yannis Velegrakis</AuthorName><institute><InstituteName>AT&T Lab</InstituteName><country></country></institute></author><year>2006</year><conference>International Conference on Data Engineering</conference><citation></citation><abstract>Database views are extensively used to represent unmaterialized&#10;tables. Applications rarely distinguish between a&#10;materialized base table and a virtual view, thus, they may&#10;issue update requests on the views. Since views are virtual,&#10;update requests on them need to be translated to updates on&#10;the base tables. Existing literature has shown the difficulty&#10;of translating view updates in a side-effect free manner. To&#10;address this problem, we propose a novel approach for separating&#10;the data instance into a logical and a physical level.&#10;This separation allows us to achieve side-effect free translations&#10;of any kind of update on the view. Furthermore,&#10;deletes on a view can be translated without affecting the&#10;base tables. We describe the implementation of the framework&#10;and present our experimental results</abstract></paper><paper><title>Learning from Aggregate Views</title><author><AuthorName>Bee-Chung Chen</AuthorName><institute><InstituteName>University of Wisconsi</InstituteName><country></country></institute></author><author><AuthorName>Lei Chen</AuthorName><institute><InstituteName>University of Wisconsi</InstituteName><country></country></institute></author><author><AuthorName>Raghu Ramakrishnan</AuthorName><institute><InstituteName>University of Wisconsi</InstituteName><country></country></institute></author><author><AuthorName>David R. Musicant</AuthorName><institute><InstituteName>Carleton Colleg</InstituteName><country></country></institute></author><year>2006</year><conference>International Conference on Data Engineering</conference><citation></citation><abstract>In this paper, we introduce a new class of data mining&#10;problems called learning from aggregate views. In&#10;contrast to the traditional problem of learning from a&#10;single table of training examples, the new goal is to learn&#10;from multiple aggregate views of the underlying data,&#10;without access to the un-aggregated data. We motivate&#10;this new problem, present a general problem framework,&#10;develop learning methods for RFA (Restriction-Free&#10;Aggregate) views defined using COUNT, SUM, AVG and&#10;STDEV, and offer theoretical and experimental results that&#10;characterize the proposed methods.</abstract></paper><paper><title>C-Cubing: Efficient Computation of Closed Cubes by Aggregation-Based Checking</title><author><AuthorName>Dong Xin</AuthorName><institute><InstituteName>University of Illinois at Urbana-Champaig</InstituteName><country></country></institute></author><author><AuthorName>Zheng Shao</AuthorName><institute><InstituteName>University of Illinois at Urbana-Champaig</InstituteName><country></country></institute></author><author><AuthorName>Jiawei Han</AuthorName><institute><InstituteName>University of Illinois at Urbana-Champaig</InstituteName><country></country></institute></author><author><AuthorName>Hongyan Liu</AuthorName><institute><InstituteName>Tsinghua University, Chin</InstituteName><country></country></institute></author><year>2006</year><conference>International Conference on Data Engineering</conference><citation></citation><abstract>It is well recognized that data cubing often produces&#10;huge outputs. Two popular efforts devoted to this problem&#10;are (1) iceberg cube, where only significant cells&#10;are kept, and (2) closed cube, where a group of cells&#10;which preserve roll-up/drill-down semantics are losslessly&#10;compressed to one cell. Due to its usability and&#10;importance, efficient computation of closed cubes still&#10;warrants a thorough study.&#10;In this paper, we propose a new measure, called&#10;closedness, for efficient closed data cubing. We show&#10;that closedness is an algebraic measure and can be computed&#10;efficiently and incrementally. Based on closedness&#10;measure, we develop an an aggregation-based approach,&#10;called C-Cubing (i.e., Closed-Cubing), and integrate&#10;it into two successful iceberg cubing algorithms:&#10;MM-Cubing and Star-Cubing. Our performance study&#10;shows that C-Cubing runs almost one order of magnitude&#10;faster than the previous approaches. We further&#10;study how the performance of the alternative algorithms&#10;of C-Cubing varies w.r.t the properties of the data sets.</abstract></paper><paper><title>A Primitive Operator for Similarity Joins in Data Cleaning</title><author><AuthorName>Surajit Chaudhuri</AuthorName><institute><InstituteName>Microsoft Researc</InstituteName><country></country></institute></author><author><AuthorName>Venkatesh Ganti</AuthorName><institute><InstituteName>Microsoft Researc</InstituteName><country></country></institute></author><author><AuthorName>Raghav Kaushik</AuthorName><institute><InstituteName>Microsoft Researc</InstituteName><country></country></institute></author><year>2006</year><conference>International Conference on Data Engineering</conference><citation></citation><abstract>Data cleaning based on similarities involves identification&#10;of &quot;close&quot; tuples, where closeness is evaluated using a&#10;variety of similarity functions chosen to suit the domain and&#10;application. Current approaches for efficiently implementing&#10;such similarity joins are tightly tied to the chosen similarity&#10;function. In this paper, we propose a new primitive&#10;operator which can be used as a foundation to implement&#10;similarity joins according to a variety of popular string similarity&#10;functions, and notions of similarity which go beyond&#10;textual similarity. We then propose efficient implementations&#10;for this operator. In an experimental evaluation using real&#10;datasets, we show that the implementation of similarity joins&#10;using our operator is comparable to, and often substantially&#10;better than, previous customized implementations for particular&#10;similarity functions.</abstract></paper><paper><title>Techniques for Warehousing of Sample Data</title><author><AuthorName>Paul G. Brown</AuthorName><institute><InstituteName>IBM Almaden Research Cente</InstituteName><country></country></institute></author><author><AuthorName>Peter J. Haas</AuthorName><institute><InstituteName>IBM Almaden Research Cente</InstituteName><country></country></institute></author><year>2006</year><conference>International Conference on Data Engineering</conference><citation></citation><abstract>We consider the problem of maintaining a warehouse of&#10;sampled data that &quot;shadows&quot; a full-scale data warehouse,&#10;in order to support quick approximate analytics and metadata&#10;discovery. The full-scale warehouse comprises many&#10;&quot;data sets,&quot; where a data set is a bag of values; the data sets&#10;can vary enormously in size. The values constituting a data&#10;set can arrive in batch or stream form. We provide and compare&#10;several new algorithms for independent and parallel&#10;uniform random sampling of data-set partitions, where the&#10;partitions are created by dividing the batch or splitting the&#10;stream. We also provide novel methods for merging samples&#10;to create a uniform sample from an arbitrary union&#10;of data-set partitions. Our sampling/merge methods are the&#10;first to simultaneously support statistical uniformity, a priori&#10;bounds on the sample footprint, and concise sample storage.&#10;As partitions are rolled in and out of the warehouse, the&#10;corresponding samples are rolled in and out of the sample&#10;warehouse. In this manner our sampling methods approximate&#10;the behavior of more sophisticated stream-sampling&#10;methods, while also supporting parallel processing. Experiments&#10;indicate that our methods are efficient and scalable,&#10;and provide guidance for their application.</abstract></paper><paper><title>Working Models for Uncertain Data</title><author><AuthorName>Anish Das Sarma</AuthorName><institute><InstituteName>Stanford Universit</InstituteName><country></country></institute></author><author><AuthorName>Omar Benjelloun</AuthorName><institute><InstituteName>Stanford Universit</InstituteName><country></country></institute></author><author><AuthorName>Alon Halevy</AuthorName><institute><InstituteName>University of Washingto</InstituteName><country></country></institute></author><author><AuthorName>Jennifer Widom</AuthorName><institute><InstituteName>Stanford Universit</InstituteName><country></country></institute></author><year>2006</year><conference>International Conference on Data Engineering</conference><citation></citation><abstract>This paper explores an inherent tension in modeling and&#10;querying uncertain data: simple, intuitive representations of&#10;uncertain data capture many application requirements, but&#10;these representations are generally incomplete&#8213;standard operations&#10;over the data may result in unrepresentable types of&#10;uncertainty. Complete models are theoretically attractive, but&#10;they can be nonintuitive and more complex than necessary&#10;for many applications. To address this tension, we propose a&#10;two-layer approach to managing uncertain data: an underlying&#10;logical model that is complete, and one or more working&#10;models that are easier to understand, visualize, and query,&#10;but may lose some information. We explore the space of incomplete&#10;working models, place several of them in a strict&#10;hierarchy based on expressive power, and study their closure&#10;properties. We describe how the two-layer approach is being&#10;used in our prototype DBMS for uncertain data, and we identify&#10;a number of interesting open problems to fully realize the&#10;approach.</abstract></paper><paper><title>Reasoning About Approximate Match Query Results</title><author><AuthorName>Sudipto Guha</AuthorName><institute><InstituteName>U of Pennsylvani</InstituteName><country></country></institute></author><author><AuthorName>Nick Koudas</AuthorName><institute><InstituteName>U of Toront</InstituteName><country></country></institute></author><author><AuthorName>Divesh Srivastava</AuthorName><institute><InstituteName>AT&T Labs Researc</InstituteName><country></country></institute></author><author><AuthorName>Xiaohui Yu</AuthorName><institute><InstituteName>U of Toront</InstituteName><country></country></institute></author><year>2006</year><conference>International Conference on Data Engineering</conference><citation></citation><abstract>Join techniques deploying approximate match predicates&#10;are fundamental data cleaning operations. A variety of predicates&#10;have been utilized to quantify approximate match in such&#10;operations and some have been embedded in a declarative data&#10;cleaning framework. These techniques return pairs of tuples&#10;from both relations, tagged with a score, signifying the degree&#10;of similarity between the tuples in the pair according to the&#10;specific approximate match predicate.&#10;In this paper, we consider the problem of estimating various&#10;parameters on the output of declarative approximate join&#10;algorithms for planning purposes. Such algorithms are highly&#10;time consuming, so precise knowledge of the result size as well&#10;as its score distribution is a pressing concern. This knowledge&#10;aids decisions as to which operations are more promising for&#10;identifying highly similar tuples, which is a key operation for&#10;data cleaning. We propose solution strategies that fully comply&#10;with a declarative framework and analytically reason about&#10;the quality of the estimates we obtain as well as the performance&#10;of our strategies.We present the results of a detailed performance evaluation&#10;of all strategies proposed. Our experimental results validate&#10;our analytical expectations and shed additional light on the&#10;quality and performance of our estimation framework. Our&#10;study offers a set of simple, fully declarative techniques for&#10;this problem, which can be readily deployed in data cleaning&#10;systems.</abstract></paper><paper><title>The Gauss-Tree: Efficient Object Identification in Databases of Probabilistic Feature Vectors</title><author><AuthorName>Christian Bohm</AuthorName><institute><InstituteName>University of Munich, German</InstituteName><country></country></institute></author><author><AuthorName>Alexey Pryakhin</AuthorName><institute><InstituteName>University of Munich, German</InstituteName><country></country></institute></author><author><AuthorName>Matthias Schubert</AuthorName><institute><InstituteName>University of Munich, German</InstituteName><country></country></institute></author><year>2006</year><conference>International Conference on Data Engineering</conference><citation></citation><abstract>In applications of biometric databases the typical task&#10;is to identify individuals according to features which are&#10;not exactly known. Reasons for this inexactness are varying&#10;measuring techniques or environmental circumstances.&#10;Since these circumstances are not necessarily the same&#10;when determining the features for different individuals, the&#10;exactness might strongly vary between the individuals as&#10;well as between the features. To identify individuals, similarity&#10;search on feature vectors is applicable, but even the&#10;use of adaptable distance measures is not capable to handle&#10;objects having an individual level of exactness. Therefore,&#10;we develop a comprehensive probabilistic theory in&#10;which uncertain observations are modeled by probabilistic&#10;feature vectors (pfv), i.e. feature vectors where the conventional&#10;feature values are replaced by Gaussian probability&#10;distribution functions. Each feature value of each object&#10;is complemented by a variance value indicating its uncertainty.&#10;We define two types of identification queries, k-mostlikely&#10;identification and threshold identification. For efficient&#10;query processing, we propose a novel index structure,&#10;the Gauss-tree. Our experimental evaluation demonstrates&#10;that pfv stored in a Gauss-tree significantly improve the result&#10;quality compared to traditional feature vectors. Additionally,&#10;we show that the Gauss-tree significantly speeds&#10;up query times compared to competitive methods.</abstract></paper><paper><title>Finding Fastest Paths on A Road Network with Speed Patterns</title><author><AuthorName>Evangelos Kanoulas</AuthorName><institute><InstituteName>Northeastern Universit</InstituteName><country></country></institute></author><author><AuthorName>Yang Du</AuthorName><institute><InstituteName>Northeastern Universit</InstituteName><country></country></institute></author><author><AuthorName>Tian Xia</AuthorName><institute><InstituteName>Northeastern Universit</InstituteName><country></country></institute></author><author><AuthorName>Donghui Zhang</AuthorName><institute><InstituteName>Northeastern Universit</InstituteName><country></country></institute></author><year>2006</year><conference>International Conference on Data Engineering</conference><citation></citation><abstract>This paper proposes and solves the Time-Interval All&#10;Fastest Path (allFP) query. Given a user-defined leaving or&#10;arrival time interval I, a source node s and an end node e,&#10;allFP asks for a set of all fastest paths from s to e, one for&#10;each sub-interval of I. Note that the query algorithm should&#10;find a partitioning of I into sub-intervals. Existing methods&#10;can only be used to solve a very special case of the problem,&#10;when the leaving time is a single time instant. A straightforward&#10;solution to the allFP query is to run existing methods&#10;many times, once for every time instant in I. This paper&#10;proposes a solution based on novel extensions to the A* algorithm.&#10;Instead of expanding the network many times, we&#10;expand once. The travel time on a path is kept as a function&#10;of leaving time. Methods to combine travel-time functions&#10;are provided to expand a path. A novel lower-bound estimator&#10;for travel time is proposed. Performance results reveal&#10;that our method is more efficient and more accurate than&#10;the discrete-time approach.</abstract></paper><paper><title>Approximation Techniques for Indexing the Earth Mover&#8217;s Distance in Multimedia Databases</title><author><AuthorName>Ira Assent</AuthorName><institute><InstituteName>RWTH Aachen University, German</InstituteName><country></country></institute></author><author><AuthorName>Andrea Wenning</AuthorName><institute><InstituteName>EXAConsult GmbH, German</InstituteName><country></country></institute></author><author><AuthorName>Thomas Seidl</AuthorName><institute><InstituteName>RWTH Aachen University, German</InstituteName><country></country></institute></author><year>2006</year><conference>International Conference on Data Engineering</conference><citation></citation><abstract>Todays abundance of storage coupled with digital technologies&#10;in virtually any scientific or commercial application&#10;such as medical and biological imaging or music&#10;archives deal with tremendous quantities of images, videos&#10;or audio files stored in large multimedia databases. For&#10;content-based data mining and retrieval purposes suitable&#10;similarity models are crucial. The Earth Mover&#8217;s Distance&#10;was introduced in Computer Vision to better approach human&#10;perceptual similarities. Its computation, however, is&#10;too complex for usage in interactive multimedia database&#10;scenarios. In order to enable efficient query processing&#10;in large databases, we propose an index-supported multistep&#10;algorithm. We therefore develop new lower bounding&#10;approximation techniques for the Earth Mover&#8217;s Distance&#10;which satisfy high quality criteria including completeness&#10;(no false drops), index-suitability and fast computation. We&#10;demonstrate the efficiency of our approach in extensive experiments&#10;on large image databases</abstract></paper><paper><title>Indexing for Dynamic Abstract Regions</title><author><AuthorName>Joxan Jaffar</AuthorName><institute><InstituteName>National University of Singapor</InstituteName><country></country></institute></author><author><AuthorName>Roland H.C. Yap</AuthorName><institute><InstituteName>National University of Singapor</InstituteName><country></country></institute></author><author><AuthorName>Kenny Q. Zhu</AuthorName><institute><InstituteName>Microsoft Corporatio</InstituteName><country></country></institute></author><year>2006</year><conference>International Conference on Data Engineering</conference><citation></citation><abstract>We propose a new main memory index structure for abstract&#10;regions (objects) which may heavily overlap, the RCtree.&#10;These objects are &quot;dynamic&quot; and may have short life&#10;spans. The novelty is that rather than representing an object&#10;by its minimum bounding rectangle (MBR), possibly&#10;with pre-processed segmentation into many small MBRs,&#10;we use the actual shape of the object to maintain the index.&#10;This saves significant space for objects with large spatial&#10;extents since pre-segmentation is not needed. We show&#10;that the query performance of RC-tree is much better than&#10;many indexing schemes on synthetic overlapping data sets.&#10;The performance is also competitive on real-life GIS nonoverlapping&#10;data sets.</abstract></paper><paper><title>Efficient Processing of Updates in Dynamic XML Data</title><author><AuthorName>Changqing Li</AuthorName><institute><InstituteName>National University of Singapor</InstituteName><country></country></institute></author><author><AuthorName>Tok Wang Ling</AuthorName><institute><InstituteName>National University of Singapor</InstituteName><country></country></institute></author><author><AuthorName>Min Hu</AuthorName><institute><InstituteName>National University of Singapor</InstituteName><country></country></institute></author><year>2006</year><conference>International Conference on Data Engineering</conference><citation></citation><abstract>It is important to process the updates when nodes are&#10;inserted into or deleted from the XML tree. All the&#10;existing labeling schemes have high update cost, thus in&#10;this paper we propose a novel Compact Dynamic Binary&#10;String (CDBS) encoding to efficiently process the&#10;updates. CDBS has two important properties which form&#10;the foundations of this paper: (1) CDBS supports that&#10;codes can be inserted between any two consecutive CDBS&#10;codes with the orders kept and without re-encoding the&#10;existing codes; (2) CDBS is orthogonal to specific&#10;labeling schemes, thus it can be applied broadly to&#10;different labeling schemes or other applications to&#10;efficiently process the updates. We report our&#10;experimental results to show that our CDBS is superior to&#10;previous approaches to process updates in terms of the&#10;number of nodes to re-label and the time for updating.</abstract></paper><paper><title>A Complete and Efficient Algebraic Compiler for XQuery</title><author><AuthorName>Christopher Re</AuthorName><institute><InstituteName>University of Washingto</InstituteName><country></country></institute></author><author><AuthorName>Jerome Simeon</AuthorName><institute><InstituteName>IBM T.J. Watson Research Cente</InstituteName><country></country></institute></author><author><AuthorName>Mary Fernandez</AuthorName><institute><InstituteName>AT&T Lab</InstituteName><country></country></institute></author><year>2006</year><conference>International Conference on Data Engineering</conference><citation></citation><abstract>As XQuery nears standardization, more sophisticated&#10;XQuery applications are emerging, which often exploit the&#10;entire language and are applied to non-trivial XML sources.&#10;We propose an algebra and optimization techniques that are&#10;suitable for building an XQuery compiler that is complete,&#10;correct, and efficient. We describe the compilation rules for&#10;the complete language into that algebra and present novel&#10;optimization techniques that address the needs of complex&#10;queries. These techniques include new query unnesting&#10;rewritings and specialized join algorithms that account for&#10;XQuery&#8217;s complex predicate semantics. The algebra and&#10;optimizations are implemented in the Galax XQuery engine,&#10;and yield execution plans that are up to three orders of magnitude&#10;faster than earlier versions of Galax.</abstract></paper><paper><title>Making Designer Schemas with Colors</title><author><AuthorName>Nuwee Wiwatwattana</AuthorName><institute><InstituteName>U of Michiga</InstituteName><country></country></institute></author><author><AuthorName>H. V. Jagadish</AuthorName><institute><InstituteName>U of Michiga</InstituteName><country></country></institute></author><author><AuthorName>Laks V. S. Lakshmanan</AuthorName><institute><InstituteName>U of British Columbi</InstituteName><country></country></institute></author><author><AuthorName>Divesh Srivastava</AuthorName><institute><InstituteName>AT&T Lab</InstituteName><country></country></institute></author><year>2006</year><conference>International Conference on Data Engineering</conference><citation></citation><abstract>XML schema design has two opposing goals: elimination&#10;of update anomalies requires that the schema be as&#10;normalized as possible; yet higher query performance and&#10;simpler query expression are often obtained through the&#10;use of schemas that permit redundancy. In this paper, we&#10;show that the recently proposed MCT data model, which extends&#10;XML by adding colors, can be used to address this dichotomy&#10;effectively. Specifically, we formalize the intuition&#10;of anomaly avoidance in MCT using notions of node normal&#10;and edge normal forms, and the goal of efficient query&#10;processing using notions of association recoverability and&#10;direct recoverability. We develop algorithms for transforming&#10;design specifications given as ER diagrams into MCT&#10;schemas that are in a node or edge normal form and satisfy&#10;association or direct recoverability. Experimental results&#10;using a wide variety of ER diagrams validate the benefits of&#10;our design methodology.</abstract></paper><paper><title>Mining Actionable Patterns by Role Models</title><author><AuthorName>Ke Wang</AuthorName><institute><InstituteName>Simon Fraser Universit</InstituteName><country></country></institute></author><author><AuthorName>Yuelong Jiang</AuthorName><institute><InstituteName>Simon Fraser Universit</InstituteName><country></country></institute></author><author><AuthorName>Alexander Tuzhilin</AuthorName><institute><InstituteName>New York Universit</InstituteName><country></country></institute></author><year>2006</year><conference>International Conference on Data Engineering</conference><citation></citation><abstract>Data mining promises to discover valid and potentially&#10;useful patterns in data. Often, discovered patterns are not&#10;useful to the user.&quot;Actionability&quot; addresses this problem in&#10;that a pattern is deemed actionable if the user can act upon&#10;it in her favor. We introduce the notion of &quot;action&quot; as a&#10;domain-independent way to model the domain knowledge.&#10;Given a data set about actionable features and an utility&#10;measure, a pattern is actionable if it summarizes a population&#10;that can be acted upon towards a more promising&#10;population observed with a higher utility. We present several&#10;pruning strategies taking into account the actionability&#10;requirement to reduce the search space, and algorithms for&#10;mining all actionable patterns as well as mining the top k&#10;actionable patterns. We evaluate the usefulness of patterns&#10;and the focus of search on a real-world application domain.</abstract></paper><paper><title>Systematic Approach for Optimizing Complex Mining Tasks on Multiple Databases</title><author><AuthorName>Ruoming Jin</AuthorName><institute><InstituteName>Kent State Universit</InstituteName><country></country></institute></author><author><AuthorName>Gagan Agrawal</AuthorName><institute><InstituteName>The Ohio State Universit</InstituteName><country></country></institute></author><year>2006</year><conference>International Conference on Data Engineering</conference><citation></citation><abstract>Many real world applications involve not just a single&#10;dataset, but a view of multiple datasets. These datasets may&#10;be collected from different sources and/or at different time instances.&#10;In such scenarios, comparing patterns or features&#10;from different datasets and understanding their relationships&#10;can be an extremely important part of the KDD process. This&#10;paper considers the problem of optimizing a mining task over&#10;multiple datasets, when it has been expressed using a highlevel&#10;interface. Specifically, we make the following contributions:&#10;1) We present an SQL-based mechanism for querying&#10;frequent patterns across multiple datasets, and establish&#10;an algebra for these queries. 2) We develop a systematic&#10;method for enumerating query plans and present several algorithms&#10;for finding optimized query plan which reduce execution&#10;costs. 3) We evaluate our algorithms on real and synthetic&#10;datasets, and show up to an order of magnitude performance&#10;improvement</abstract></paper><paper><title>New Sampling-Based Estimators for OLAP Queries</title><author><AuthorName>Ruoming Jin</AuthorName><institute><InstituteName>Kent State Universit</InstituteName><country></country></institute></author><author><AuthorName>Leo Glimcher</AuthorName><institute><InstituteName>Ohio State Universit</InstituteName><country></country></institute></author><author><AuthorName>Chris Jermaine</AuthorName><institute><InstituteName>University of Florid</InstituteName><country></country></institute></author><author><AuthorName>Gagan Agrawal</AuthorName><institute><InstituteName>Ohio State Universit</InstituteName><country></country></institute></author><year>2006</year><conference>International Conference on Data Engineering</conference><citation></citation><abstract>One important way in which sampling for approximate query&#10;processing in a database environment differs from traditional&#10;applications of sampling is that in a database, it is feasible to&#10;collect accurate summary statistics from the data in addition&#10;to the sample. This paper describes a set of sampling-based&#10;estimators for approximate query processing that make use&#10;of simple summary statistics to to greatly increase the accuracy&#10;of sampling-based estimators. Our estimators are&#10;able to give tight probabilistic guarantees on estimation accuracy.&#10;They are suitable for low or high dimensional data,&#10;and work with categorical or numerical attributes. Furthermore,&#10;the information used by our estimators can easily be&#10;gathered in a single pass, making them suitable for use in a&#10;streaming environment.</abstract></paper><paper><title>Efficiently Evaluating Order Preserving Similarity Queries over Historical Market-Basket Data</title><author><AuthorName>Reza Sherkat</AuthorName><institute><InstituteName>University of Albert</InstituteName><country></country></institute></author><author><AuthorName>Davood Rafiei</AuthorName><institute><InstituteName>University of Albert</InstituteName><country></country></institute></author><year>2006</year><conference>International Conference on Data Engineering</conference><citation></citation><abstract>We introduce a new domain-independent framework for&#10;formulating and efficiently evaluating similarity queries&#10;over historical data, where given a history as a sequence&#10;of timestamped observations and the pair-wise similarity&#10;of observations, we want to find similar histories. For instance,&#10;given a database of customer transactions and a&#10;time period, we can find customers with similar purchasing&#10;behaviors over this period. Our work is different from&#10;the work on retrieving similar time series; it addresses the&#10;general problem in which a history cannot be modeled as&#10;a time series, hence the relevant conventional approaches&#10;are not applicable. We derive a similarity measure for histories,&#10;based on an aggregation of the similarities between&#10;the observations of the two histories, and propose efficient&#10;algorithms for finding an optimal alignment between two&#10;histories. Given the non-metric nature of our measure, we&#10;develop some upper bounds and an algorithm that makes&#10;use of those bounds to prune histories that are guaranteed&#10;not to be in the answer set. Our experimental results on&#10;real and synthetic data confirm the effectiveness and efficiency&#10;of our approach. For instance, when the minimum&#10;length of a match is provided, our algorithm achieves up to&#10;an order of magnitude speed-up over alternative methods.</abstract></paper><paper><title>End-biased Samples for Join Cardinality Estimation</title><author><AuthorName>Cristian Estan</AuthorName><institute><InstituteName>University of Wisconsin-Madiso</InstituteName><country></country></institute></author><author><AuthorName>Jeffrey F. Naughton</AuthorName><institute><InstituteName>University of Wisconsin-Madiso</InstituteName><country></country></institute></author><year>2006</year><conference>International Conference on Data Engineering</conference><citation></citation><abstract>We present a new technique for using samples to estimate&#10;join cardinalities. This technique, which we term&#10;&quot;end-biased samples,&quot; is inspired by recent work in network&#10;traffic measurement. It improves on random samples&#10;by using coordinated pseudo-random samples and retaining&#10;the sampled values in proportion to their frequency. We&#10;show that end-biased samples always provide more accurate&#10;estimates than random samples with the same sample&#10;size. The comparison with histograms is more interesting&#10;&#8213; while end-biased histograms are somewhat better than&#10;end-biased samples for uncorrelated data sets, end-biased&#10;samples dominate by a large margin when the data is correlated.&#10;Finally, we compare end-biased samples to the recently&#10;proposed &quot;skimmed sketches&quot; and show that neither&#10;dominates the other, that each has different and compelling&#10;strengths and weaknesses. These results suggest that endbiased&#10;samples may be a useful addition to the repertoire of&#10;techniques used for data summarization.</abstract></paper><paper><title>Laws for Rewriting Queries Containing Division Operators</title><author><AuthorName>Ralf Rantzau</AuthorName><institute><InstituteName>IBM Almaden Research Cente</InstituteName><country></country></institute></author><author><AuthorName>Christoph Mangold</AuthorName><institute><InstituteName>Universitat Stuttgart, German</InstituteName><country></country></institute></author><year>2006</year><conference>International Conference on Data Engineering</conference><citation></citation><abstract>Relational division, also known as small divide, is&#10;derived operator of the relational algebra that realizes&#10;many-to-one set containment test, where a set is represented&#10;as a group of tuples: Small divide discovers which&#10;sets in a dividend relation contain all elements of the set&#10;stored in a divisor relation. The great divide operator extends&#10;small divide by realizing many-to-many set containment&#10;tests. It is also similar to the set containment join&#10;operator for schemas that are not in first normal form.&#10;Neither small nor great divide has been implemented in&#10;commercial relational database systems although the operators&#10;solve important problems and many efficient algorithms&#10;for them exist. We present algebraic laws that allow&#10;rewriting expressions containing small or great divide,&#10;illustrate their importance for query optimization, and discuss&#10;the use of great divide for frequent itemset discovery,&#10;an important data mining primitive.A recent theoretic result shows that small divide must be&#10;implemented by special purpose algorithms and not be simulated&#10;by pure relational algebra expressions to achieve efficiency.&#10;Consequently, an efficient implementation requires&#10;that the optimizer treats small divide as a first-class operator&#10;and possesses powerful algebraic laws for query rewriting.</abstract></paper><paper><title>R-trees with Update Memos</title><author><AuthorName>Xiaopeng Xiong</AuthorName><institute><InstituteName>Purdue Universit</InstituteName><country></country></institute></author><author><AuthorName>Walid G. Aref</AuthorName><institute><InstituteName>Purdue Universit</InstituteName><country></country></institute></author><year>2006</year><conference>International Conference on Data Engineering</conference><citation></citation><abstract>The problem of frequently updating multi-dimensional&#10;indexes arises in many location-dependent applications.&#10;While the R-tree and its variants are one of the dominant&#10;choices for indexing multi-dimensional objects, the R-tree&#10;exhibits inferior performance in the presence of frequent updates.&#10;In this paper, we present an R-tree variant, termed&#10;the RUM-tree (stands for R-tree with Update Memo) that&#10;minimizes the cost of object updates. The RUM-tree processes&#10;updates in a memo-based approach that avoids disk&#10;accesses for purging old entries during an update process.&#10;Therefore, the cost of an update operation in the RUM-tree&#10;reduces to the cost of only an insert operation. The removal&#10;of old object entries is carried out by a garbage cleaner inside&#10;the RUM-tree. In this paper, we present the details of&#10;the RUM-tree and study its properties. Theoretical analysis&#10;and experimental evaluation demonstrate that the RUMtree&#10;outperforms other R-tree variants by up to a factor of&#10;eight in scenarios with frequent updates.</abstract></paper><paper><title>Compiled Query Execution Engine using JVM</title><author><AuthorName>Jun Rao</AuthorName><institute><InstituteName>IBM Almaden Research Cente</InstituteName><country></country></institute></author><author><AuthorName>Hamid Pirahesh</AuthorName><institute><InstituteName>IBM Almaden Research Cente</InstituteName><country></country></institute></author><author><AuthorName>C. Mohan</AuthorName><institute><InstituteName>IBM Almaden Research Cente</InstituteName><country></country></institute></author><author><AuthorName>Guy Lohman</AuthorName><institute><InstituteName>IBM Almaden Research Cente</InstituteName><country></country></institute></author><year>2006</year><conference>International Conference on Data Engineering</conference><citation></citation><abstract>A conventional query execution engine in a&#10;database system essentially uses a SQL virtual&#10;machine (SVM) to interpret a dataflow tree in&#10;which each node is associated with a relational&#10;operator. During query evaluation, a single tuple&#10;at a time is processed and passed among the&#10;operators. Such a model is popular because of its&#10;efficiency for pipelined processing. However,&#10;since each operator is implemented statically, it&#10;has to be very generic in order to deal with all&#10;possible queries. Such generality tends to&#10;introduce significant runtime inefficiency,&#10;especially in the context of memory-resident&#10;systems, because the granularity of data&#10;commercial system, using SVM.&#10;processing (a tuple) is too small compared with&#10;the associated overhead. Another disadvantage in&#10;such an engine is that each operator code is&#10;compiled statically, so query-specific&#10;optimization cannot be applied.&#10;To improve runtime efficiency, we propose a&#10;compiled execution engine, which, for a given&#10;query, generates new query-specific code on the&#10;fly, and then dynamically compiles and executes&#10;the code. The Java platform makes our approach&#10;particularly interesting for several reasons: (1)&#10;modern Java Virtual Machines (JVM) have Just-&#10;In-Time (JIT) compilers that optimize code at&#10;runtime based on the execution pattern, a key&#10;feature that SVMs lack; (2) because of Java&#8217;s&#10;continued popularity, JVMs keep improving at a&#10;faster pace than SVMs, allowing us to exploit&#10;new advances in the Java runtime in the future;&#10;(3) Java is a dynamic language, which makes it&#10;convenient to load a piece of new code on the&#10;fly. In this paper, we develop both an interpreted&#10;and a compiled query execution engine in a&#10;relational, Java-based, in-memory database&#10;prototype, and perform an experimental study.&#10;Our experimental results on the TPC-H data set&#10;show that, despite both engines benefiting from&#10;JIT, the compiled engine runs on average about&#10;twice as fast as the interpreted one, and&#10;significantly faster than an in-memory</abstract></paper><paper><title>\ell -Diversity: Privacy Beyond \kappa -Anonymity</title><author><AuthorName>Ashwin Machanavajjhala</AuthorName><institute><InstituteName>Cornell Universit</InstituteName><country></country></institute></author><author><AuthorName>Johannes Gehrke</AuthorName><institute><InstituteName>Cornell Universit</InstituteName><country></country></institute></author><author><AuthorName>Daniel Kifer</AuthorName><institute><InstituteName>Cornell Universit</InstituteName><country></country></institute></author><author><AuthorName>Muthuramakrishnan Venkitasubramaniam</AuthorName><institute><InstituteName>Cornell Universit</InstituteName><country></country></institute></author><year>2006</year><conference>International Conference on Data Engineering</conference><citation></citation><abstract>Publishing data about individuals without revealing sensitive&#10;information about them is an important problem. In&#10;recent years, a new definition of privacy called \kappa-anonymity&#10;has gained popularity. In a \kappa-anonymized dataset, each&#10;record is indistinguishable from at least k-1 other records&#10;with respect to certain &quot;identifying&quot; attributes.&#10;In this paper we show with two simple attacks that a&#10;\kappa-anonymized dataset has some subtle, but severe privacy&#10;problems. First, we show that an attacker can discover the&#10;values of sensitive attributes when there is little diversity&#10;in those sensitive attributes. Second, attackers often have&#10;background knowledge, and we show that \kappa-anonymity does&#10;not guarantee privacy against attackers using background&#10;knowledge. We give a detailed analysis of these two attacks&#10;and we propose a novel and powerful privacy definition&#10;called \ell-diversity. In addition to building a formal&#10;foundation for \ell-diversity, we show in an experimental evaluation&#10;that \ell-diversity is practical and can be implemented&#10;efficiently.&#10;&#10;&#10;&#10;&#10;&#10;&#10;&#10; &#10;&#10;</abstract></paper><paper><title>Mondrian Multidimensional K-Anonymity</title><author><AuthorName>Kristen LeFevre</AuthorName><institute><InstituteName>University of Wisconsin, Madiso</InstituteName><country></country></institute></author><author><AuthorName>David J. DeWitt</AuthorName><institute><InstituteName>University of Wisconsin, Madiso</InstituteName><country></country></institute></author><author><AuthorName>Raghu Ramakrishnan</AuthorName><institute><InstituteName>University of Wisconsin, Madiso</InstituteName><country></country></institute></author><year>2006</year><conference>International Conference on Data Engineering</conference><citation></citation><abstract>K-Anonymity has been proposed as a mechanism for protecting&#10;privacy in microdata publishing, and numerous recoding&#10;&quot;models&quot; have been considered for achieving anonymity.&#10;This paper proposes a new multidimensional&#10;model, which provides an additional degree of flexibility not&#10;seen in previous (single-dimensional) approaches. Often&#10;this flexibility leads to higher-quality anonymizations, as&#10;measured both by general-purpose metrics and more specific&#10;notions of query answerability.&#10;Optimal multidimensional anonymization is NP-hard&#10;(like previous optimal anonymity problems). However,&#10;we introduce a simple greedy approximation algorithm,&#10;and experimental results show that this greedy algorithm&#10;frequently leads to more desirable anonymizations than&#10;exhaustive optimal algorithms for two single-dimensional&#10;models.</abstract></paper><paper><title>Sovereign Joins</title><author><AuthorName>Rakesh Agrawal</AuthorName><institute><InstituteName>IBM Almaden Research Cente</InstituteName><country></country></institute></author><author><AuthorName>Dmitri Asonov</AuthorName><institute><InstituteName>IBM Almaden Research Cente</InstituteName><country></country></institute></author><author><AuthorName>Murat Kantarcioglu</AuthorName><institute><InstituteName>The University of Texas at Dalla</InstituteName><country></country></institute></author><author><AuthorName>Yaping Li</AuthorName><institute><InstituteName>University of California, Berkele</InstituteName><country></country></institute></author><year>2006</year><conference>International Conference on Data Engineering</conference><citation></citation><abstract>We present a secure network service for sovereign information&#10;sharing whose only trusted component is an off-theshelf&#10;secure coprocessor. The participating data providers&#10;send encrypted relations to the service that sends the encrypted&#10;results to the recipients. The technical challenge in&#10;implementing such a service arises from the limited capability&#10;of the secure coprocessors: they have small memory,&#10;no attached disk, and no facility for communicating directly&#10;with other machines in the network. The internal state of an&#10;ongoing computation within the secure coprocessor cannot&#10;be seen from outside, but its interactions with the server can&#10;be exploited by an adversary.&#10;We formulate the problem of computing join in this&#10;setting where the goal is to prevent information leakage&#10;through patterns in I/O while maximizing performance. We&#10;specify criteria for proving the security of a join algorithm&#10;and provide provably safe algorithms. These algorithms&#10;can be used to compute general joins involving arbitrary&#10;predicates and multiple sovereign databases. We thus enable&#10;a new class of applications requiring query processing&#10;across sovereign entities such that nothing apart from the&#10;result is revealed to the recipients.</abstract></paper><paper><title>Privacy Preserving Query Processing Using Third Parties</title><author><AuthorName>Fatih Emekci</AuthorName><institute><InstituteName>University of California Santa Barbar</InstituteName><country></country></institute></author><author><AuthorName>Divyakant Agrawal</AuthorName><institute><InstituteName>University of California Santa Barbar</InstituteName><country></country></institute></author><author><AuthorName>Amr El Abbadi</AuthorName><institute><InstituteName>University of Califonia Santa Barbar</InstituteName><country></country></institute></author><author><AuthorName>Aziz Gulbeden</AuthorName><institute><InstituteName>University of California Santa Barbar</InstituteName><country></country></institute></author><year>2006</year><conference>International Conference on Data Engineering</conference><citation></citation><abstract>Data integration from multiple autonomous data sources&#10;has emerged as an important practical problem. The key requirement&#10;for such data integration is that owners of such&#10;data need to cooperate in a competitive landscape in most&#10;of the cases. The research challenge in developing a query&#10;processing solution is that the answers to the queries need&#10;to be provided while preserving the privacy of the data&#10;sources. In general, allowing unrestricted read access to&#10;the whole data may give rise to potential vulnerabilities as&#10;well as may have legal implications. Therefore, there is a&#10;need for privacy preserving database operations for querying&#10;data residing at different parties. In this paper, we propose&#10;a new query processing technique using third parties in&#10;a peer-to-peer system. We propose and evaluate two different&#10;protocols for various database operations. Our scheme&#10;is able to answer queries without revealing any useful information&#10;to the data sources or to the third parties. Analytical&#10;comparison of the proposed approach with other recent proposals&#10;for privacy-preserving data integration establishes&#10;the superiority of the proposed approach in terms of query&#10;response time</abstract></paper><paper><title>Efficient Batch Top-k Search for Dictionary-based Entity Recognition</title><author><AuthorName>Amit Chandel</AuthorName><institute><InstituteName>IIT Bomba</InstituteName><country></country></institute></author><author><AuthorName>P. C. Nagesh</AuthorName><institute><InstituteName>IIT Bomba</InstituteName><country></country></institute></author><author><AuthorName>Sunita Sarawagi</AuthorName><institute><InstituteName>IIT Bomba</InstituteName><country></country></institute></author><year>2006</year><conference>International Conference on Data Engineering</conference><citation></citation><abstract>We consider the problem of speeding up Entity Recognition&#10;systems that exploit existing large databases of structured&#10;entities to improve extraction accuracy. These systems&#10;require the computation of the maximum similarity scores of&#10;several overlapping segments of the input text with the entity&#10;database. We formulate a Batch-Top-K problem with&#10;the goal of sharing computations across overlapping segments.&#10;Our proposed algorithm performs a factor of three&#10;faster than independent Top-K queries and only a factor of&#10;two slower than an unachievable lower bound on total cost.&#10;We then propose a novel modification of the popular Viterbi&#10;algorithm for recognizing entities so as to work with easily&#10;computable bounds on match scores, thereby reducing the&#10;total inference time by a factor of eight compared to stateof-&#10;the-art methods.</abstract></paper><paper><title>Integrating Unstructured Data into Relational Databases</title><author><AuthorName>Imran R. Mansuri</AuthorName><institute><InstituteName>IIT Bomba</InstituteName><country></country></institute></author><author><AuthorName>Sunita Sarawagi</AuthorName><institute><InstituteName>IIT Bomba</InstituteName><country></country></institute></author><year>2006</year><conference>International Conference on Data Engineering</conference><citation></citation><abstract>In this paper we present a system for automatically integrating&#10;unstructured text into a multi-relational database&#10;using state-of-the-art statistical models for structure extraction&#10;and matching. We show how to extend current highperforming&#10;models, Conditional Random Fields and their&#10;semi-markov counterparts, to effectively exploit a variety of&#10;recognition clues available in a database of entities, thereby&#10;significantly reducing the dependence on manually labeled&#10;training data. Our system is designed to load unstructured&#10;records into columns spread across multiple tables in the&#10;database while resolving the relationship of the extracted&#10;text with existing column values, and preserving the cardinality&#10;and link constraints of the database. We show how to&#10;combine the inference algorithms of statistical models with&#10;the database imposed constraints for optimal data integration.</abstract></paper><paper><title>Clean Answers over Dirty Databases: A Probabilistic Approach</title><author><AuthorName>Periklis Andritsos</AuthorName><institute><InstituteName>Univesity of Trent</InstituteName><country></country></institute></author><author><AuthorName>Ariel Fuxman</AuthorName><institute><InstituteName>University of Toront</InstituteName><country></country></institute></author><author><AuthorName>Renee J. Miller</AuthorName><institute><InstituteName>University of Toront</InstituteName><country></country></institute></author><year>2006</year><conference>International Conference on Data Engineering</conference><citation></citation><abstract>The detection of duplicate tuples, corresponding to the&#10;same real-world entity, is an important task in data integration&#10;and cleaning. While many techniques exist to identify&#10;such tuples, the merging or elimination of duplicates can be&#10;a difficult task that relies on ad-hoc and often manual solutions.&#10;We propose a complementary approach that permits&#10;declarative query answering over duplicated data, where&#10;each duplicate is associated with a probability of being in&#10;the clean database. We rewrite queries over a database containing&#10;duplicates to return each answer with the probability&#10;that the answer is in the clean database. Our rewritten&#10;queries are sensitive to the semantics of duplication and help&#10;a user understand which query answers are most likely to be&#10;present in the clean database.&#10;The semantics that we adopt is independent of the way&#10;the probabilities are produced, but is able to effectively exploit&#10;them during query answering. In the absence of external&#10;knowledge that associates each database tuple with a&#10;probability, we offer a technique, based on tuple summaries,&#10;that automates this task. We experimentally study the performance&#10;of our rewritten queries. Our studies show that&#10;the rewriting does not introduce a significant overhead in&#10;query execution time. This work is done in the context of&#10;the ConQuer project at the University of Toronto, which focuses&#10;on the efficient management of inconsistent and dirty&#10;databases.</abstract></paper><paper><title>Syntactic Rule Based Approach toWeb Service Composition</title><author><AuthorName>Ken Pu</AuthorName><institute><InstituteName>University of Toront</InstituteName><country></country></institute></author><author><AuthorName>Vagelis Hristidis</AuthorName><institute><InstituteName>Florida National International Universit</InstituteName><country></country></institute></author><author><AuthorName>Nick Koudas</AuthorName><institute><InstituteName>University of Toront</InstituteName><country></country></institute></author><year>2006</year><conference>International Conference on Data Engineering</conference><citation></citation><abstract>This paper studies a problem of web service composition&#10;from a syntactic approach. In contrast with other approaches&#10;on enriched semantic description such as statetransition&#10;description of web services, our focus is in the&#10;case when only the input-output type information from the&#10;WSDL specifications is available.&#10;The web service composition problem is formally formulated&#10;as deriving a given desired type from a collection&#10;of available types and web services using a prescribed&#10;set of rules with costs. We show that solving the minimal&#10;cost composition is NP-complete in general, and present&#10;a practical solution based on dynamic programming. Experiements&#10;using a mixture of synthetic and real data sets&#10;show that our approach is viable and produces good results.</abstract></paper><paper><title>Hilda: A High-Level Language for Data-DrivenWeb Applications</title><author><AuthorName>Fan Yang</AuthorName><institute><InstituteName>Cornell Universit</InstituteName><country></country></institute></author><author><AuthorName>Jayavel Shanmugasundaram</AuthorName><institute><InstituteName>Cornell Universit</InstituteName><country></country></institute></author><author><AuthorName>Mirek Riedewald</AuthorName><institute><InstituteName>Cornell Universit</InstituteName><country></country></institute></author><author><AuthorName>Johannes Gehrke</AuthorName><institute><InstituteName>Cornell Universit</InstituteName><country></country></institute></author><year>2006</year><conference>International Conference on Data Engineering</conference><citation></citation><abstract>We propose Hilda, a high-level language for developing&#10;data-driven web applications. The primary benefits of Hilda&#10;over existing development platforms are: (a) it uses a unified&#10;data model for all layers of the application, (b) it is&#10;declarative, (c) it models both application queries and updates,&#10;(d) it supports structured programming for web sites,&#10;and (e) it enables conflict detection for concurrent updates.&#10;We also describe the implementation of a simple proof-ofconcept&#10;Hilda compiler, which translates a Hilda application&#10;program into Java Servlet code.</abstract></paper><paper><title>UNIT: User-centric Transaction Management in Web-Database Systems</title><author><AuthorName>Huiming Qu</AuthorName><institute><InstituteName>University of Pittsburg</InstituteName><country></country></institute></author><author><AuthorName>Alexandros Labrinidis</AuthorName><institute><InstituteName>University of Pittsburg</InstituteName><country></country></institute></author><author><AuthorName>Daniel Mosse</AuthorName><institute><InstituteName>University of Pittsburg</InstituteName><country></country></institute></author><year>2006</year><conference>International Conference on Data Engineering</conference><citation></citation><abstract>Web-database systems are nowadays an integral part of&#10;everybody&#8217;s life, with applications ranging from monitoring/&#10;trading stock portfolios, to personalized blog aggregation&#10;and news services, to personalized weather tracking&#10;services. For most of these services to be successful (and&#10;their users to be kept satisfied), two criteria need to be met:&#10;user requests must be answered in a timely fashion and using&#10;fresh data. This paper presents a framework to balance&#10;both requirements from the users&#8217; perspective. Toward this,&#10;we propose a user satisfaction metric to measure the overall&#10;effectiveness of the Web-database system. We also provide&#10;a set of algorithms to dynamically optimize this metric,&#10;through query admission control and update frequency&#10;modulation. Finally, we present extensive experimental results&#10;which compare our proposed algorithms to the current&#10;state of the art and show that we outperform competitors&#10;under various workloads (generated based on real traces)&#10;and user requirements.</abstract></paper><paper><title>VBI-Tree: A Peer-to-Peer Framework for Supporting Multi-Dimensional Indexing Schemes</title><author><AuthorName>H. V. Jagadish</AuthorName><institute><InstituteName>University of Michiga</InstituteName><country></country></institute></author><author><AuthorName>Beng Chin Ooi</AuthorName><institute><InstituteName>NUS, Singapor</InstituteName><country></country></institute></author><author><AuthorName>Quang Hieu Vu</AuthorName><institute><InstituteName>NUS, Singapor</InstituteName><country></country></institute></author><author><AuthorName>Rong Zhang</AuthorName><institute><InstituteName>Fudan University, Chin</InstituteName><country></country></institute></author><author><AuthorName>Aoying Zhou</AuthorName><institute><InstituteName>Fudan University, Chin</InstituteName><country></country></institute></author><year>2006</year><conference>International Conference on Data Engineering</conference><citation></citation><abstract>Multi-dimensional data indexing has received much attention&#10;in a centralized database. However, not so much&#10;work has been done on this topic in the context of Peerto-&#10;Peer systems. In this paper, we propose a new Peer-to-&#10;Peer framework based on a balanced tree structure overlay,&#10;which can support extensible centralized mapping methods&#10;and query processing based on a variety of multidimensional&#10;tree structures, including R-Tree, X-Tree, SSTree,&#10;and M-Tree. Specifically, in a network with N nodes,&#10;our framework guarantees that point queries and range&#10;queries can be answered within O(logN) hops. We also&#10;provide an effective load balancing strategy to allow nodes&#10;to balance their work load efficiently. An experimental assessment&#10;validates the practicality of our proposal.</abstract></paper><paper><title>Transaction Time Support Inside a Database Engine</title><author><AuthorName>David Lomet</AuthorName><institute><InstituteName>Microsoft Researc</InstituteName><country></country></institute></author><author><AuthorName>Roger Barga</AuthorName><institute><InstituteName>Microsoft Researc</InstituteName><country></country></institute></author><author><AuthorName>Mohamed F. Mokbel</AuthorName><institute><InstituteName>University of Minnesot</InstituteName><country></country></institute></author><author><AuthorName>German Shegalov</AuthorName><institute><InstituteName>Max Planck Institute, German</InstituteName><country></country></institute></author><year>2006</year><conference>International Conference on Data Engineering</conference><citation></citation><abstract>Transaction time databases retain and provide access&#10;to prior states of a database. An update &quot;inserts&quot; a new&#10;record while preserving the old version. Immortal DB&#10;builds transaction time database support into a database&#10;engine, not in middleware. It supports as of queries returning&#10;records current at the specified time. It also supports&#10;snapshot isolation concurrency control. Versions are&#10;stamped with the &quot;clock times&quot; of their updating transactions.&#10;The timestamp order agrees with transaction serialization&#10;order. Lazy timestamping propagates timestamps&#10;to transaction updates after commit. Versions are kept in&#10;an integrated storage structure, with historical versions initially&#10;stored with current data. Time-splits of pages permit&#10;large histories to be maintained, and enable time based indexing,&#10;which is essential for high performance historical&#10
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -