📄 vldb_1997_elementary.txt
字号:
feasible solution based on individual optimal query plans.
We also map the materialized view design problem as 0-1 integer
programming problem, whose solution can guarantee an optimal solution.</abstract></paper><paper><title>An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server.</title><author><AuthorName>Surajit Chaudhuri</AuthorName><institute><InstituteName></InstituteName><country></country></institute></author><author><AuthorName>Vivek R. Narasayya</AuthorName><institute><InstituteName></InstituteName><country></country></institute></author><year>1997</year><conference>International Conference on Very Large Data Bases</conference><citation><name>Adaptive and Automated Index Selection in RDBMS.</name><name>Physical Database Design for Relational Databases.</name><name>Index Selection for OLAP.</name><name>Index Selection in a Self-Adaptive Data Base Management System.</name><name>Materialized View Maintenance and Integrity Constraint Checking: Trading Space for Time.</name><name>Implementing Data Cubes Efficiently.</name><name>Physical Database Design for Data Warehouses.</name><name>A Framework for Automating Physical Database Design.</name></citation><abstract>In this paper we describe novel techniques that make it
possible to build an industrial-strength tool for automating
the choice of indexes in the physical design of a SQL
database. The tool takes as input a workload of SQL queries,
and suggests a set of suitable indexes. We ensure that the
indexes chosen are effective in reducing the cost of the
workload by keeping the index selection tool and the query
optimizer "in step". The number of index sets that must be
evaluated to find the optimal configuration is very large. We
reduce the complexity of this problem using three
techniques. First, we remove a large number of spurious
indexes from consideration by taking into account both query
syntax and cost information. Second, we introduce
optimizations that nake it possible to cheaply evaluate the
"goodness" of an index set. Third, we describe an iterative
approach to handle the complexity arising from multicolumn
indexes. The tool has been implemented on Microsoft SQL Server 7.0.
We performed extensive experiments over a range of workloads,
including TPC-D.
The results indicate that the tool is efficient and its choices are
close to optimal.</abstract></paper><paper><title>Materialized Views Selection in a Multidimensional Database.</title><author><AuthorName>Elena Baralis</AuthorName><institute><InstituteName></InstituteName><country></country></institute></author><author><AuthorName>Stefano Paraboschi</AuthorName><institute><InstituteName></InstituteName><country></country></institute></author><author><AuthorName>Ernest Teniente</AuthorName><institute><InstituteName></InstituteName><country></country></institute></author><year>1997</year><conference>International Conference on Very Large Data Bases</conference><citation><name>On the Computation of Multidimensional Aggregates.</name><name>Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total.</name><name>Index Selection for OLAP.</name><name>Selection of Views to Materialize in a Data Warehouse.</name><name>Implementing Data Cubes Efficiently.</name><name>The Data Warehouse Toolkit: Practical Techniques for Building Dimensional Data Warehouses.
John Wiley 1996, ISBN 0-471-15337-0</name><name>Materialized View Maintenance and Integrity Constraint Checking: Trading Space for Time.</name><name>Storage Estimation for Multidimensional Aggregates in the Presence of Hierarchies.</name><name>View Maintenance in a Warehousing Environment.</name></citation><abstract>A multidimensional database is a data repository that supports the
efficient execution of complex business decision queries. Query
response can be significantly improved by storing an appropriate set
of materialized views. These views are selected from the
multidimensional lattice whose elements represent the solution space
of the problem.
Several techniques have been proposed in the past to perform the
selection of materialized views for databases with a reduced number of
dimensions. When the number and complexity of dimensions increase,
the proposed techniques do not scale well.
The technique we are proposing reduces the solution space by
considering only the relevant elements of the multidimensional
lattice. An additional statistical analysis allows a further
reduction of the solution space.</abstract></paper><paper><title>Efficient Construction of Regression Trees with Range and Region Splitting.</title><author><AuthorName>Yasuhiko Morimoto</AuthorName><institute><InstituteName></InstituteName><country></country></institute></author><author><AuthorName>Hiromu Ishii</AuthorName><institute><InstituteName></InstituteName><country></country></institute></author><author><AuthorName>Shinichi Morishita</AuthorName><institute><InstituteName></InstituteName><country></country></institute></author><year>1997</year><conference>International Conference on Very Large Data Bases</conference><citation><name>Polynomial-Time Solutions to Image Segmentation.</name><name>Mining Association Rules between Sets of Items in Large Databases.</name><name>Fast Algorithms for Mining Association Rules in Large Databases.</name><name>Classification and Regression Trees.
Wadsworth 1984, ISBN 0-534-98053-8</name><name>Mining Optimized Association Rules for Numeric Attributes.</name><name>Data Mining Using Two-Dimensional Optimized Accociation Rules: Scheme, Algorithms, and Visualization.</name><name>Constructing Efficient Decision Trees by Using Optimized Numeric Association Rules.</name><name>Discovery of Multiple-Level Association Rules from Large Databases.</name><name>SLIQ: A Fast Scalable Classifier for Data Mining.</name><name>An Effective Hash Based Algorithm for Mining Association Rules.</name><name>Knowledge Discovery in Databases.
AAAI/MIT Press 1991, ISBN 0-262-62080-4</name><name>Discovery, Analysis, and Presentation of Strong Rules.</name><name>Induction of Decision Trees.</name><name>C4.5: Programs for Machine Learning.</name><name>Mining Quantitative Association Rules in Large Relational Tables.</name><name>Computing Optimized Rectilinear Regions for Association Rules.</name></citation><abstract>We propose an efficient way of constructing regression trees in order to
predict the objective numeric attribute values of given tuples. A
regression tree is a rooted binary tree such that each internal node
contains a test, which can be expressed as an RDB query, for splitting
tuples into two disjoint classes and passing data in each class down to
the left or right subtree. The mean of the objective attribute values at
the leaf is used as the predicted value of the tuple.
To test a numeric attribute, traditional approaches use a guillotine-cut
splitting that classifies data into those below a given value and
others. Instead, we consider a family R of grid-regions in the
plane associated with two given numeric attributes. We propose to use a
test that splits data into those that lie inside a region R and those
that lie outside.
The contributions of this paper are as follows. We present an efficient
algorithm for computing R in R that minimizes the mean squared
error after the introduction of the test with the region R.
Experiments confirmed that the use of region splitting gives a smaller
mean squared error of regression trees. Our approach can also generate
smaller regression trees.</abstract></paper><paper><title>Parallel Algorithms for High-dimensional Similarity Joins for Data Mining Applications.</title><author><AuthorName>John C. Shafer</AuthorName><institute><InstituteName></InstituteName><country></country></institute></author><author><AuthorName>Rakesh Agrawal</AuthorName><institute><InstituteName></InstituteName><country></country></institute></author><year>1997</year><conference>International Conference on Very Large Data Bases</conference><citation><name>Efficient Similarity Search In Sequence Databases.</name><name>Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases.</name><name>Parallel Processing of Spatial Joins Using R-trees.</name><name>Efficient Processing of Spatial Joins Using R-Trees.</name><name>The Gamma Database Machine Project.</name><name>Multiattribute Hashing Using Gray Codes.</name><name>Fast Subsequence Matching in Time-Series Databases.</name><name>Linear Clustering of Objects with Multiple Atributes.</name><name>Size Separation Spatial Join.</name><name>Generating Seeded Trees from Data Sets.</name><name>Spatial Hash-Joins.</name><name>The Grid File: An Adaptable, Symmetric Multikey File Structure.</name><name>A Class of Data Structures for Associative Searching.</name><name>Partition Based Spatial-Merge Join.</name><name>High-Dimensional Similarity Joins.</name></citation><abstract>We consider the problem of parallelizing high-dimensional
proximity joins. We present
a parallel multidimensional join algorithm
based on an epsilon-kdB tree abd compare
it with the more common approach of space
partitioning. An evaluation of the algorithm
on an IBM SP2 shared-nothing multiprocessor
is presented using both synthetic and real-life
datasets. We also examine the effictiveness
of the algorithms in the context of a specific
data-mining problem, that of finding similar
time-series. The empirical results show that
our algorithm exhibits good performance and
scalability, as well as ability to handle data-skew.</abstract></paper><paper><title>STING: A Statistical Information Grid Approach to Spatial Data Mining.</title><author><AuthorName>Wei Wang</AuthorName><institute><InstituteName></InstituteName><country></country></institute></author><author><AuthorName>Jiong Yang</AuthorName><institute><InstituteName></InstituteName><country></country></institute></author><author><AuthorName>Richard R. Muntz</AuthorName><institute><InstituteName></InstituteName><country></country></institute></author><year>1997</year><conference>International Conference on Very Large Data Bases</conference><citation><name>Data Mining: An Overview from a Database Perspective.</name><name>Knowledge Discovery in Large Spatial Databases: Focusing Techniques for Efficient Class Identification.</name><name>A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise.</name><name>From Data Mining to Knowledge Discovery in Databases.</name><name>Advances in Knowledge Discovery and Data Mining.
AAAI/MIT Press 1996, ISBN 0-262-56097-6</name><name>Spatial Analysis and GIS.
Taylor & Francis 1994, ISBN 0-7484-0103-2</name><name>Extraction of Spatial Proximity Patterns by Concept Generalization.</name><name>Spatial Data Mining: Progress and Challenges.</name><name>Efficient and Effective Clustering Methods for Spatial Data Mining.</name><name>The Design and Analysis of Spatial Data Structures.
Addison-Wesley 1990</name><name>The Sequoia 2000 Benchmark.</name><name>BIRCH: An Efficient Data Clustering Method for Very Large Databases.</name></citation><abstract>Spatial data mining,
i.e., discovery of interesting characteristics and patterns
that may implicitly exist in spatial databases,
is a challenging task due to the huge amounts of spatial data and
to the new conceptual nature of the problems
which must account for spatial distance.
Clustering and region oriented queries are common problems in this domain.
Several approaches have been presented in recent years,
all of which require at least one scan of all individual objects (points).
Consequently, the computational complexity is at least linearly
proportional to the number of objects to answer each query.
In this paper,
we propose a hierarchical statistical information grid based approach for spatial data mining to reduce the cost further.
The idea is to capture statistical information associated with spatial cells in such a manner that whole classes of queries and clustering problems can be answered without recourse to the individual objects.
In theory, and confirmed by empirical studies,
this approach outperforms the best previous method by at least an order of magnitude, especially when the data set is very large.</abstract></paper><paper><title>Merging Ranks from Heterogeneous Internet Sources.</title><author><AuthorName>Luis Gravano</AuthorName><institute><InstituteName></InstituteName><country></country></institute></author><author><AuthorName>Hector Garcia-Molina</AuthorName><institute><InstituteName></InstituteName><country></country></institute></author><year>1997</year><conference>International Conference on Very Large Data Bases</conference><citation><name>Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer.
Addison-Wesley 1989, ISBN 0-201-12227-8</name><name>The QBIC Project: Querying Images by Content, Using Color, Texture, and Shape.</name><name>Combining Fuzzy Information from Multiple Systems.</name><name>Optimizing Queries over Multimedia Repositories.</name><name>The Collection Fusion Problem.</name><name>Searching Distributed Collections with Inference Networks.</name><name>STARTS: Stanford Proposal for Internet Meta-Searching (Experience Paper).</name><name>Amalgame: A Tool for Creating Interoperating, Persistent, Heterogeneous Components.</name><name>A Query Translation Scheme for Rapid Implementation of Wrappers.</name></citation><abstract>Many sources on the Internet and elsewhere rank the objects in query
results according to how well these objects match the original query.
For example, a real-estate agent might rank the available houses
according to how well they match t
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -