📄 cache_control.html
字号:
Predicates: Year=1997, Quarter={any except Q2}, Nation=USA</pre></td></tr><tr><th>Segment YQNS#1</th><td><pre>Year Quarter Nation State Unit Sales1997 Q1 USA OR xxx1997 Q1 USA WA xxx1997 Q2 USA OR xxx1997 Q2 USA WA xxxPredicates: Year=1997, Quarter=any, Nation=USA, State={OR, WA}</pre></td></tr></table></blockquote><p>The effects are:<ul><li>Segment YN#1 has been deleted. All cells in the segment could contain values in Oregon/1997/Q2.</li><li>The constraints in YNS#1 have been strengthened. The constraint on the <code>State</code> column is modified from<code>State={OR, WA}</code> to <code>State={WA}</code> so that future requests for <code>(1997, Q2, USA, OR)</code> will not consider this segment.</li><li>The constraints in YQN#1 have been strengthened. The constraint on the <code>Quarter</code> column is modified from <code>Quarter=any</code> to <code>Quarter={any except Q2}</code>.</li><li>The constraints in YQNS#1 have been strengthened, similar to YNS#1.</li></ul><h4>3.2. More about cell regions<a name="More_about_cell_regions"> </a></h4><p>The previous example showed how to make a cell region consisting of a single member, and how to combine these regions into a two-dimensional region using a crossjoin. The CacheControl API supports several methods of creating regions:<ul> <li><code>createMemberRegion(Member, boolean)</code> creates a region containing a single member, optionally including its descendants.</li><li><code>createMemberRegion(boolean lowerInclusive, Member lowerMember, boolean upperInclusive, Member upperMember, boolean descendants)</code> creates a region containing a range of members, optionally including their descendants, and optionally including each endpoint. A range may be either closed, or open at one end.</li><li><code>createCrossjoinRegion(CellRegion...)</code> combines several regions into a higher dimensionality region. The constituent regions must not have any dimensions in common.</li><li><code>createUnionRegion(CellRegion...)</code> unions several regions of the same dimensionality.</li><li><code>createMeasuresRegion(Cube)</code> creates a region containing all of the measures of a given cube.</li></ul><p>The second overloading of <code>createMemberRegion()</code> is interesting because it allows a range of members to be flushed. Probably the most common use case for cache flush -- flushing all cells since a given point in time -- is expressed as a member range. For example, to flush all cells since February 15th, 2006, you would use the following code: <blockquote><code>// Lookup members<br/>Cube salesCube =<br/> connection.getSchema().lookupCube(<br/> "Sales", true);<br/>SchemaReader schemaReader =<br/> salesCube.getSchemaReader(null);<br/>Member memberTimeOct15 =<br/> schemaReader.getMemberByUniqueName(<br/> Id.Segment.toList("Time", "2006", "Q1", "2" ,"15"),<br/> true);<br/><br/>// Create a cache region defined by<br/>// [Time].[1997].[Q1].[2].[15] to +infinity.<br/>CacheControl.CellRegion measuresRegion =<br/> cacheControl.createMeasuresRegion(<br/> salesCube);<br/>CacheControl.CellRegion regionTimeFeb15 =<br/> cacheControl.createMemberRegion(<br/> true, memberTimeFeb15, false, null, true);</code></blockquote><p>Recall that the cell cache is organized in terms of columns, not members. This makes member ranges difficult for mondrian to implement. A range such as "February 15th 2007 onwards" becomes<blockquote><code><pre>year > 2007|| (year = 2007 && (quarter > 'Q1' || (quarter = 'Q1' && (month > 2 || (month = 2 && day >= 15)))))</pre></code></blockquote><p>The region returned by <code>createMeasuresRegion(Cube)</code> effectively encompasses the whole cube. To flush all cubes in the schema, use a loop:</p><blockquote><code>Connection connection;<br> CacheControl cacheControl = connection.getCacheControl(null);<br> for (Cube cube : connection.getSchema().getCubes()) {<br> cacheControl.flush(<br> cacheControl.createMeasuresRegion(cube));<br> } </code></blockquote><h4>3.3. Merging and truncating segments<a name="Merging_and_truncating_segments"> </a></h4><p>The current implementation does not actually remove the cells from memory. For instance, in segment YNS#1 in the example above, the cell (1997, USA, OR) is still in the segment, even though it will never be accessed. It doesn't seem worth the effort to rebuild the segment to save a little memory, but we may revisit this decision.</p><p>In future, one possible strategy would be to remove a segment if more than a given percentage of its cells are unreachable.</p><p>It might also be useful to be able to merge segments which have the same dimensionality, to reduce fragmentation if the cache is flushed repeatedly over slightly different bounds. There are some limitations on when this can be done, since predicates can only constrain one column: it would not be possible to merge the segments <code>{(State=TX, Quarter=Q2)}</code> and <code>{(State=WA, Quarter=Q3)}</code> into a single segment, for example. An alternative solution to fragmentation would be to simply remove all segments of a particular dimensionality if fragmentation is detected.</p><h3>4. Other cache control topics<a name="Other_cache_control_topics"> </a></h3><h4>4.1. Flushing the dimension cache<a name="Flushing_the_dimension_cache"> </a></h4><p>An application might also want to make modifications to a dimension table. Mondrian does not currently allow an application to control the cache of members, but we intend to do so in the future. Here are some notes which will allow this to be implemented.</p> <p>The main way that Mondrian caches dimensions in memory is via a cache of member children. That is to say, for a given member, the cache holds the list of all children of that member.</p><p>If a dimension table row was inserted or deleted, or if its key attributes are updated, its parent's child list would need to be modified, and perhaps other ancestors too. For example, if a customer Zachary William is added in city Oakland, the children list of Oakland will need to be flushed. If Zachary is the first customer in Oakland, California's children list will need to be flushed to accommodate the new member Oakland.</p><p>There are a few other ways that members can be cached:<ul><li>Each hierarchy has a list of root members, an 'all' member (which may or not be visible), and a default member (which may or may not be the 'all' member).</li><li>Formulas defined against a cube may reference members.</li><li>All other references to members are ephemeral: they are built up during the execution of a query, and are discarded when the query has finished executing and its result set is forgotten.</li></ul><p>Possible APIs might be <code>flushMember(Member, boolean children)</code> or <code>flushMembers(CellRegion)</code>.</p><h4>4.2. Cache consistency<a name="Cache_consistency"> </a></h4><p>Mondrian's cache implementation must solve several challenges in order to prevent inconsistent query results. Suppose, for example, a connection executes the query <blockquote><code>SELECT {[Measures].[Unit Sales]} ON COLUMNS,<br/> {[Gender].Members} ON ROWS<br/>FROM [Sales]</code></blockquote><p>It would be unacceptable if, due to updates to the underlying database, the query yielded a result where the total for [All gender] did not equal the sum of [Female] and [Male], such as</p><blockquote><table style="border-collapse: collapse" border="1"><tr><td> </td><th>Unit Sales</th></tr> <tr><th>All gender</th><td>100,000</td></tr><tr><th>Female</th><td>60,000</td></tr><tr><th>Male</th><td>55,000</td></tr></table></blockquote><p>We cannot guarantee that the query result is absolutely up to date, but the query must represent the state of the database at some point in time. To do this, the implementation must ensure that both cache flush and cache population are atomic operations.</p> <p>First, Mondrian's implementation must provide atomic cache flush so that from the perspective of any clients of the cache. Suppose that while the above query is being executed, another connection issues a cache flush request. Since the flush request and query are simultaneous, it is acceptable for the query to return the state of the database before the flush request or after, but not a mixture of the two.</p> <p>The query needs to use two aggregates: one containing total sales, and another containing sales sliced by gender. To see a consistent view of the two aggregates, the implementation must ensure that from the perspective of the query, both aggregates are flushed simultaneously. The query evaluator will therefore either see both aggregates, or see none.</p><p>Second, Mondrian must provide atomic cache population, so that the database is read consistently. Consider an example.<ol><li>The end user runs a query asking for the total sales:<blockquote><table style="border-collapse: collapse" border="1"><tr><td> </td><th>Unit Sales</th></tr> <tr><th>All gender</th><td>100,000</td></tr></table></blockquote>After that query has completed, the cache contains the total sales but not the sales for each gender.</li><li>New sales are added to the fact table.</li><li>The end user runs a query which shows total sales and sales for male and female customers. The query uses the cached value for total sales, but issues a query to the fact table to find the totals for male and female, and sees different data than when the cache was last populated. As result, the query is inconsistent:<blockquote><table style="border-collapse: collapse" border="1"><tr><td> </td><th>Unit Sales</th></tr> <tr><th>All gender</th><td>100,000</td></tr><tr><th>Female</th><td>60,000</td></tr><tr><th>Male</th><td>55,000</td></tr></table></blockquote></li></ol><p>Atomic cache population is difficult to ensure if the database is being modified without Mondrian's knowledge. One solution, not currently implemented, would be for Mondrian to leverage the DBMS' support for read-consistent views of the data. Read-consistent views are expensive for the DBMS to implement (for example, in Oracle they yield the infamous 'Snapshot too old' error), so we would not want Mondrian to use these by default, on a database which is known not to be changing.</p><p>Another solution might be to extend the Cache Control API so that the application can say 'this part of the database is currently undergoing modification'.</p><p>This scenario has not even considered aggregate tables. We have assumed that aggregate tables do not exist, or if they do, they are updated in sync with the fact table. How to deal with aggregate tables which are maintained asynchronously is still an open question.</p><h4>4.3. Metadata cache control<a name="Metadata_cache_control"> </a></h4><p>The CacheControl API tidies up a raft of (mostly equivalent) methods which had grown up for controlling metadata (schema XML files loaded into memory). The methods<ul><li><code>mondrian.rolap.RolapSchema.clearCache()</code></li><li><code>mondrian.olap.MondrianServer.flushSchemaCache()</code></li><li><code>mondrian.rolap.cache.CachePool.flush()</code></li><li><code>mondrian.rolap.RolapSchema.flushRolapStarCaches(boolean)</code></li><li><code>mondrian.rolap.RolapSchema.flushAllRolapStarCachedAggregations()</code></li><li><code>mondrian.rolap.RolapSchema.flushSchema(String,String,String,String)</code></li><li><code>mondrian.rolap.RolapSchema.flushSchema(DataSource,String)</code></li></ul>are all deprecated and are superseded by the CacheControl methods <ul><li><code>void flushSchemaCache();</code></li><li><code>void flushSchema(String catalogUrl, String connectionKey, String jdbcUser, String dataSourceStr);</code></li><li><code>void flushSchema(String catalogUrl, DataSource dataSource);</code></li></ul><hr noshade size="1"/><p> Author: Julian Hyde; last modified by Julian Hyde, March 2008.<br/> Version: $Id: //open/mondrian-release/3.0/doc/cache_control.html#2 $ (<a href="http://p4web.eigenbase.org/open/mondrian/doc/cache_control.html?ac=22">log</a>)<br/> Copyright (C) 2006-2008 Julian Hyde</p><br /><!-- doc2web end --></body></html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -