architecture.html

来自「基于mondrian 开源框架进行OLAP多维分析」· HTML 代码 · 共 298 行 · 第 1/2 页

HTML
298
字号
<html><!--  == $Id: //open/mondrian-release/3.0/doc/architecture.html#2 $  == This software is subject to the terms of the Common Public License  == Agreement, available at the following URL:  == http://www.opensource.org/licenses/cpl.html.  == Copyright (C) 2001-2002 Kana Software, Inc.  == Copyright (C) 2001-2007 Julian Hyde  == All Rights Reserved.  == You must accept the terms of that agreement to use this software.  == jhyde, 24 September, 2002  --><head>    <link rel="stylesheet" type="text/css" href="stylesheet.css"/>	<title>Pentaho Analysis Services: Mondrian Architecture</title></head><body><!-- doc2web start --><script><!-- Beginfunction popUpSnapsHoriz(URL) {day = new Date();id = day.getTime();eval("page" + id + " = window.open(URL, '" + id + "', 'toolbar=0,scrollbars=0,location=0,statusbar=0,menubar=0,resizable=1,width=1024,height=768');");}function popUpSnapsVert(URL) {day = new Date();id = day.getTime();eval("page" + id + " = window.open(URL, '" + id + "', 'toolbar=0,scrollbars=0,location=0,statusbar=0,menubar=0,resizable=1,width=1024,height=768');");}// End --></script><!-- page title --><div class="contentheading">Architecture</div><!-- end page title --><!--#######################################  Layers of a Mondrian system ########################################## --><h3>Layers of a Mondrian system</h3><p>A Mondrian OLAP System consists of four layers; working from the eyes of the end-user to the bowels of the data center, these are as follows: the presentation layer, the dimensional layer, the star layer, and the storage layer. (See <a href="#Figure_1:_Mondrian_architecture">figure 1</a>.)</p><p>The presentation layer determines what the end-user sees on his or her monitor, and how he or she can interact to ask new questions. There are many ways to present multidimensional datasets, including pivot tables (an interactive version of the table shown above), pie, line and bar charts, and advanced visualization tools such as clickable maps and dynamic graphics. These might be written in Swing or JSP, charts rendered in JPEG or GIF format, or transmitted to a remote application via XML. What all of these forms of presentation have in common is the multidimensional 'grammar' of dimensions, measures and cells in which the presentation layer asks the question is asked, and OLAP server returns the answer.</p><p>The second layer is the dimensional layer. The dimensional layer parses, validates and executes MDX queries. A query is evaluted in multiple phases. The axes are computed first, then the values of the cells within the axes. For efficiency, the dimensional layer sends cell-requests to the aggregation layer in batches. A query transformer allows the application to manipulate existing queries, rather than building an MDX statement from scratch for each request. And metadata describes the the dimensional model, and how it maps onto the relational model.</p><p>The third layer is the star layer, and is responsible for maintaining an aggregate cache. An aggregation is a set of measure values ('cells') in memory, qualified by a set of dimension column values. The dimensional layer sends requests for sets of cells. If the requested cells are not in the cache, or derivable by rolling up an aggregation in the cache, the aggregation manager sends a request to the storage layer.</p><p>The storage layer is an RDBMS. It is responsible for providing aggregated cell data, and members from dimension tables. I describe <a href="#Storage_and_aggregation_strategies">below</a> why I decided to use the features of the RDBMS rather than developing a storage system optimized for multidimensional data.</p><p>These components can all exist on the same machine, or can be distributed between machines. Layers 2 and 3, which comprise the Mondrian server, must be on the same machine. The storage layer could be on another machine, accessed via remote JDBC connection. In a multi-user system, the presentation layer would exist on each end-user's machine (except in the case of JSP pages generated on the server).</p><a name="Figure_1:_Mondrian_architecture">&nbsp;</a><table   width="500" class="whiteTable">	<tr>		<td align="center">			<table   width="200" class="whiteTable">				<tr>					<td>						<a href="javascript:popUpSnapsVert('http://mondrian.pentaho.org/images/arch_mondrian_v1_lrg.png')">						<img border="0" alt="Mondrian architecture" src="images/arch_mondrian_v1_tn.png" width="200" height="147"></a>					</td>				</tr>				<tr>					<td>						<table class="whiteTable" align="center">							<tr>								<td width="19">									<a href="javascript:popUpSnapsVert('http://mondrian.pentaho.org/images/arch_mondrian_v1_lrg.png')">									<img height="15" alt="Zoom" src="images/zoom.png" width="14" border="0" /></a>								</td>								<td>									<a href="javascript:popUpSnapsVert('http://mondrian.pentaho.org/images/arch_mondrian_v1_lrg.png')">										Zoom									</a>								</td>							</tr>						</table>					</td>				</tr>			</table>		</td>		<td align="center">			<table   width="200" class="whiteTable">				<tr>					<td>						<a href="javascript:popUpSnapsVert('http://mondrian.pentaho.org/images/architecture2_big.png')">						<img border="0" alt="Mondrian architecture (hand-drawn)" src="images/arch_mondrian_sketch_tn.png" width="201" height="158" align="middle"></a>					</td>				</tr>				<tr>					<td>						<table class="whiteTable" align="center">							<tr>								<td width="19">									<a href="javascript:popUpSnapsVert('http://mondrian.pentaho.org/images/architecture2_big.png')">									<img height="15" alt="Zoom" src="images/zoom.png" width="14" border="0" /></a>								</td>								<td>									<a href="javascript:popUpSnapsVert('http://mondrian.pentaho.org/images/architecture2_big.png')">										Zoom									</a>								</td>							</tr>						</table>					</td>				</tr>			</table>		</td>	</tr></table><!--##############################################  Storage and aggregation strategies ################################################# --><h3>Storage and aggregation strategies<a name="Storage_and_aggregation_strategies">&nbsp;</a></h3><p>OLAP Servers are generally categorized according to how they store their data:</p><ul>	<li>A MOLAP (multidimensional OLAP) server stores all of its data on disk in structures optimized for multidimensional access. Typically, data is stored in dense arrays, requiring only 4 or 8 bytes per cell value. </li>	<li>A ROLAP (relational OLAP) server stores its data in a relational database. Each row in a fact table has a column for each dimension and measure. </li></ul><p>Three kinds of data need to be stored: fact table data (the transactional records), aggregates, and dimensions.</p><p>MOLAP databases store fact data in multidimensional format, but if there are more than a few dimensions, this data will be sparse, and the multidimensional format does not perform well. A HOLAP (hybrid OLAP) system solves this problem by leaving the most granular data in the relational database, but stores aggregates in multidimensional format.</p><p>Pre-computed aggregates are necessary for large data sets, otherwise certain queries could not be answered without reading the entire contents of the fact table. MOLAP aggregates are often an image of the in-memory data structure, broken up into pages and stored on disk. ROLAP aggregates are stored in tables. In some ROLAP systems these are explicitly managed by the OLAP server; in other systems, the tables are declared as materialized views, and they are implicitly used when the OLAP server issues a query with the right combination of columns in the group by clause.</p><p>The final component of the aggregation strategy is the cache. The cache holds pre-computed aggregations in memory so subsequent queries can access cell values without going to disk. If the cache holds the required data set at a lower level of aggregation, it can compute the required data set by rolling up.</p><p>The cache is arguably the most important part of the aggregation strategy because it is adaptive. It is difficult to choose a set of aggregations to pre-compute which speed up the system without using huge amounts of disk, particularly those with a high dimensionality or if the users are submitting unpredictable queries. And in a system where data is changing in real-time, it is impractical to maintain pre-computed aggregates. A reasonably sized cache can allow a system to perform adequately in the face of unpredictable queries, with few or no pre-computed aggregates.</p><p>Mondrian's aggregation strategy is as follows:</p><ul>	<li>Fact data is stored in the RDBMS. Why develop a storage manager when the RDBMS already has one? </li>	<li>Read aggregate data into the cache by submitting group by queries. Again, why develop an aggregator when the RDBMS has one? </li>	<li>If the RDBMS supports materialized views, and the database administrator chooses to create materialized views for particular aggregations, then Mondrian will use them implicitly. Ideally, Mondrian's aggregation manager should be aware that these materialized views exist and that those particular aggregations are cheap to compute. It should even offer tuning suggestings to the database administrator. </li></ul><p>The general idea is to delegate unto the database what is the database's. This places additional burden on the database, but once those features are added to the database, all clients of the database will benefit from them. Multidimensional storage would reduce I/O and result in faster operation in some circumstances, but I don't think it warrants the complexity at this stage.</p><p>A wonderful side-effect is that because Mondrian requires no storage of its own, it can be installed by adding a JAR file to the class path and be up and running immediately. Because there are no redundant data sets to manage, the data-loading process is easier, and Mondrian is ideally suited to do OLAP on data sets which change in real time.</p>

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?