http:^^www.cs.cornell.edu^info^people^praveen^projects^seq.html
来自「This data set contains WWW-pages collect」· HTML 代码 · 共 336 行
HTML
336 行
MIME-Version: 1.0
Server: CERN/3.0
Date: Sunday, 24-Nov-96 22:06:18 GMT
Content-Type: text/html
Content-Length: 12061
Last-Modified: Monday, 12-Aug-96 19:50:24 GMT
<HTML><HEAD><TITLE>SEQ Home Page</TITLE><H1>The SEQ Project: Querying Sequence Data</H1><H5>(Document under construction)</H5><HR></HEADER><BODY background="background.gif" TEXT="#000001"><!-- <img src="control.gif" align=middle> --><H3><em><img src="../pics/redball.gif">Time to put Order in the Database!<img src="../pics/redball.gif"><p><img src="../pics/blueball.gif">Order Time put in the Database!<img src="../pics/blueball.gif"><p><img src="../pics/greenball.gif">Time to put the Database in Order!<img src="../pics/greenball.gif"></em></H3><HR><H2>Document Contents:</H2><H4><UL><LI><A href="#Objective">Project Objectives</A><LI><A href="#Status">Current Status</A><LI><A href="#Example">Motivating Example</A><LI><A href="#Data Model">SEQ Data Model</A><LI><A href="#Language"><em> Sequin </em> Query Language</A><LI><A href="#Optimization">Optimization Techniques</A><LI><A href="#System">SEQ System Development</A><LI><A href="#Publications">Publications</A><LI><A href="#Related">Related Work</A><LI><A href="#Contacts">Contact Information</A></H4></UL><hr><A name="Objective"><H2>Project Objectives</H2></A><BLOCKQUOTE><H4> A number of important database applications require the processingof large amounts of ordered <em> sequence </em> data. The domains of theseapplications include financial management, historical analysis,economic and social sciences, metereology, medical sciences andbiological sciences. Existing relational databases are inadequate in this regard; data collections are treated as sets, not sequences. Consequently, expressing sequence queries is tedious, and evaluating them is inefficient.</H4></BLOCKQUOTE><H4><BLOCKQUOTE>Databases should <UL><LI>model the data using the abstraction of <em> sequences </em>,<LI>allow data sequences to be queried in a <em> declarative manner </em>,utilizing the ordered semantics<LI>take advantage of the unique opportunities available for query optimization and evaluation<LI>integrate sequence data with relational data, so that users canstore and query a combination of relation and sequences</UL>These requirements serve as the goals of the SEQ project.Various kinds of sequences need to be supported, temporal sequences being themost important kind. Queries should be expressible using notions like"next" and "previous" which are natural when considering sequences.These queries should be optimized so that they can be evaluated efficiently.These issues need to be studied in theory, and then a database system needs to be built that demonstrates the feasibility of the theoretical ideas.</BLOCKQUOTE></H4><hr><a name="Status"><H2>Project Status</H2></A><H4><BLOCKQUOTE>The current status of the project is:<ul><li>We have defined the <em> SEQ </em> data model that can support most important kinds of sequence data. We have also defined algebraicquery operators that can be composed to form sequence queries (analogousto the composition of relational algebra operators to form relation queries).<li>We have described how sequence queries can be efficiently processed,and have identified various optimization techniques.<li>We use a sequence query language <em> Sequin </em> that candeclaratively express queries over sequences. A <em> Sequin </em>query can include embedded expressions in a relational query language likeSQL, or vice-versa.<li>We are building a disk-based database system to demonstrate the feasibility of our proposals. The system implements the <em> SEQ </em> model using a nested complex object architecture. It is built over the SHORE storage manager and can process several megabytes of data.Relations and sequences are supported in an integrated and extensible manner.</ul></BLOCKQUOTE></H4><hr><a name="Example"><H2>Motivating Example of a Sequence Query</H2></A><p><H4><BLOCKQUOTE>A weather monitoring system records information about various meteorological phenomena. There is a sequentiality in the occurrence of these phenomena; the various meteorological events are sequenced by the time at which they are recorded. A scientist asks the query: <p><img src="../pics/redball.gif"> <em> "For which volcano eruptions didthe most recent earthquake have a strength greater than 7.0 on the Richter scale?"</em>. <p>If this query is to be expressed in a relational query language like SQL, complex features like groupby clauses, correlated subqueries and aggregatefunctions are required. Further, a conventional relational query optimizer would not find an efficient query execution plan, even given the knowledge that the Earthquakes and Volcano relations are sorted by time.<p>However a very efficient plan exists, if one models the data as sequencesordered by time. The two sequences can be scanned in lock step (similar to a sort merge join). The most recent earthquake record scanned can be stored in a temporary buffer. Whenever a volcano record is processed, the value of the most recent earthquake record stored in the buffer is checked to see if its strength was greater than 7.0, possibly generating an answer. This query can therefore be processed with a single scan of the two sequences, and using very little memory. The key to such optimization is the sequentiality of the data and the query. </BLOCKQUOTE></H4><hr><A NAME="Data Model"><H2>Data Model</H2></A><H4><BLOCKQUOTE>The details of the <em> SEQ </em> data model aredescribed in a published paper (click <a href="../papers/seq.de95.ps">here </a>for postscript version). Here we present the gist of it.The basic model of a sequence is a set of records mapped to an ordereddomain of ``positions''.This many-to-many relationship between records andpositions can be viewed in two dual but distinct ways: as a set of recordsmapped to each position, or as a set of positions mapped to each record.These two views are called ``Positional'' and ``Record-Oriented'' respectively,and each gives rise to a set of query operators based on that view.Queries on sequences could require operators of either or both flavors. The Record-Oriented operators are similar to relationaloperators and include various kinds of joins (overlap, containment, etc) andaggregates. Such operators have been extensively explored by researchersin the temporal database community.<p><img src="sequence.gif" ALT="(Picture of Sequence Mapping)"><p>The Positional operators include Next, Previous, Offset, MovingAggregates, etc. Further operators allow ``zooming'' operations on sequences by means of collapsing and expanding the ordering domains associated with the sequence. For instance, a daily sequence could be ``zoomed out'' (i.e.collapsed) to a weekly sequence, or ``zoomed in'' (i.e. expanded) to an hourly sequence. The last part of the model deals with operations on groups (i.e. sets) of sequences. The advantage is that this makes it easy to model queries involving sequence collections (which is the case in many real-worldsituations). All the sequence operators are extended to work with groups of similar sequences, instead of with single sequences. This extensionof the SEQ model indicates that a practical implementation ofSEQ would probably involve a nested complex object system.</BLOCKQUOTE></H4><hr><A NAME="Language"><H2><em> Sequin </em> Query Language</H2></A><H4><BLOCKQUOTE>We have devised a query language called <em> Sequin </em> usingwhich declarative sequence queries can be specified. The languageis similar in flavor to SQL, except that the inputs to queriesas well as the results of queries are sequences. Click <a href="sequin.doc"> here </a> for a description of the <em> Sequin </em>language with examples.</BLOCKQUOTE></H4><hr><A NAME="Optimization"><H2>Optimization Techniques</H2></A><H4><BLOCKQUOTE>We have proposed new optimization techniques for sequence queries involving Positional operators. There are existing techniques thathave been proposed for queries with Record-Oriented operators.Our optimizations use query transformations, meta--data, and caching of intermediate results to efficiently evaluate a query. An optimal query evaluation plan can be generated using an algorithm that relies on cost estimates. One of the important observations is that accessing sequence data in a single stream is probably very efficient, and evaluation strategies should take this into account.<p>The details of the optimization techniques aredescribed in a published paper (click <a href="../papers/seq.sigmod94.ps">here </a>for postscript version).</BLOCKQUOTE></H4><hr><A NAME="System"><H2>System Development</H2></A><H4><BLOCKQUOTE>The SEQ database system has a client-serverarchitecture, supporting multiple clients viaa multi-threaded server. The server is built ontop of the <a href="http://www.cs.wisc.edu/shore/">SHORE </a> storage manager. Both <em> Sequin </em>and a subset of SQL are supported as query languageswhich can be embedded inside each other. The data modelis a nested complex object model that allows arbitrarylevels of nesting of relations inside sequences and viceversa. The system is also extensible, providing supportfor new data types, new ordering domains, user-defined functions, new storage implementations and new query languages. For more detailson the SEQ system, click <a href="seqsys.html">here</a>.</BLOCKQUOTE></H4><hr><A NAME="Publications"><H2>Publications</H2></A><BLOCKQUOTE><H4><a href="../papers/seq.sigmod94.ps"><img src="../pics/redball.gif">Sequence Query Processing </a>Praveen Seshadri, Miron Livny and Raghu Ramakrishnan.Proceedings of the <em> ACM SIGMOD Conference on Data Management</em>, May 1994.<p><p><a href="../papers/seq.de95.ps"><img src="../pics/redball.gif">SEQ: A Framework for Sequence Data</a>Praveen Seshadri, Miron Livny and Raghu Ramakrishnan.Proceedings of the <em> IEEE Conference on Data Engineering</em>, March 1995.<p><p><a href="../papers/seq.sigmod96.ps"><img src="../pics/redball.gif">The Design and Implementation of a Sequence Database System</a>Praveen Seshadri, Miron Livny and Raghu Ramakrishnan.Submitted to VLDB 96.<p><p><a href="http://www.cs.wisc.edu/~mcheng/devise/paper2.ps"><img src="../pics/redball.gif">What's Next? Sequence Queries</a>Raghu Ramakrishnan, Michael Cheng, Miron Livny, and Praveen Seshadri.In Proceedings of the <em> International Conference on the Management of Data (COMAD)</em>, December, 1994.<p></BLOCKQUOTE></H4><hr><A NAME="Related"><H2>Related Work</H2></A><H4><BLOCKQUOTE>The<A HREF="http://www.cs.wisc.edu/~mcheng/devise/devise.html"><B>DEVise</B></A> project is complementary to SEQ. It provides a visualizationenvironment that can be used to explore sequence data. DEVise can act asa front-end through which queries can be posed against a SEQ database server,and the answers can be examined graphically.<p>Also see:<UL> <LI> <A HREF="http://www.cs.wisc.edu/shore/">SHORE Project: Storage Manager used for SEQ</A> <LI> <A HREF="people.html">People working on SEQ and related projects</A> <LI> <A HREF="http://www.cs.wisc.edu/~kristint/dbmshome.html">UW-Madison, Database Research Group</A> <LI> <A HREF="http://www.cs.wisc.edu/">UW-Madison, CS Department Server</A></UL></BLOCKQUOTE></H4><hr><A NAME="Contacts"><H2>Contact Information</H2></A><H4><BLOCKQUOTE>For more information, contact <p><A HREF="http://www.cs.wisc.edu/~praveen/praveen.html">Praveen Seshadri</A>,praveen@cs.wisc.edu<p><A HREF="http://www.cs.wisc.edu/~raghu/raghu.html">Raghu Ramakrishnan</A>,raghu@cs.wisc.edu<p><A HREF="http://www.cs.wisc.edu/~miron/miron.html">Miron Livny</A>,miron@cs.wisc.edu<p><ADDRESS>Computer Sciences Department, <br>University of Wisconsin, <br>1210, W.Dayton Street, <br>Madison, WI 53706.</ADDRESS></BLOCKQUOTE></H4><HR>Last modified: Fri Sep 15 1995 by Praveen Seshadri<ADDRESS>Praveen Seshadri / praveen@cs.wisc.edu</ADDRESS></BODY></HTML>
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?