📄 8.txt
字号:
发信人: GzLi (笑梨), 信区: DataMining
标 题: [合集]数据仓库里的实体化视图怎么实现?
发信站: 南京大学小百合站 (Sat Sep 21 12:46:30 2002), 站内信件
netsaint (大圣) 于Tue Sep 10 16:13:29 2002)
提到:
很多文章都是空谈,到底怎么实现呢?
是自己定义格式还是借助别人已有的格式
欢迎提供线索,资源
shg_w@yahoo.com.cn
fervvac (高远) 于Tue Sep 10 22:46:30 2002提到:
What is your question?
To compute the MV, you simple modify existing cube computation algorithms to
skip those cuboids that do not need to be materialized. The modification is
straigh-forward for both top-down and bottom-up algorithms.
If you are asking for the physcial storage format, I think the commercial
systems still store the data in relational tables.
There is another line of approaches that use multidimensional model for the
cube/cuboids. They store the pre-computed results in chunks.
netsaint (大圣) 于Wed Sep 11 09:28:32 2002)
提到:
我指的是物理存储模式,目前有很多都是用chunk方式,但这样实现起来比较难的
如果直接存放在关系表里,那对不同的维的组合就有一个值,表的字段不好统一设定
我想(把那些综合数据)存放在一个临时文件里如.txt里,好像更麻烦了
fervvac (高远) 于Wed Sep 11 14:33:12 2002提到:
For any serious implementation, it is a bad idea to store data in text
format.
First of all, you need to know which type of system you have, ROLAP or MOLAP.
Chunk is only used in MOLAP, and relations are only used in ROLAP.
The basic method to store data in ROLAP is to store them in relational table(s).
You can either create a table with all dimension attributes pulus the measure
attribture, or use multiple tables, each for a cuboid. So there is no
fundamental difficulty there.
For chunks, I am not sure of the real implementation of commercial systems.
For most prototype systems, the problem is that "extra-long" integer type
and its efficient calculation is needed. Space trade-off is another issue.
Moreoever, there is an issue of choosing an appropriate compression method.
netsaint (大圣) 于Wed Sep 11 21:34:38 2002)
提到:
一般的ROLAP确实把这种数据存放在关系表里
我把这些数据存放在临时文件里是出于这样的目的:
把这些文件常驻内存,这样,对每个请求,首先去查询内存中的文件
如果找不到结果,再去数据库检索关系表
是否可行?
对于这个临时文件怎么去统一它的格式,还有怎样去读取里面的数据(好像要对字符串读
取和处理的)?
头都大了
fervvac (高远) 于Thu Sep 12 11:52:42 2002提到:
1. MV/cube is usually much larger than available memory, thus you cannot
put it entirely in the memory.
To quickly return a precomputed result from the MV/cube, you need to index
it. Either the traditional index or some new ones can be used. Related
techniques includeub-tree, cubetree, etc.
Ross has an interesting paper in ssdbm (2000?) about actively caching
part of the cube in memory.
2. Not sure why you think it is difficult to store your result in a file. This
method, although slower, only requires basic knowledge of C/C++. Find any
c/c++ book and read the i/o part. btw, you need to map dimension values to
integer first.
netsaint (大圣) 于Sat Sep 14 09:22:56 2002)
提到:
我最近在看Ross 的文章
他用的是二级存储方式
第一级是存放"高值"元组,即粗粒度的数据,第二级存放最细节的数据,这是他最近定义
的一个内存数据结构,也是基于array的
不过array的方式我不知道他是怎么实现的
是不是要借助已有的中间件来存储?
否则,多维数组怎么实现?
fervvac (高远) 于Sun Sep 15 03:25:57 2002提到:
That's the ssdbm paper.
The memory data structure is just hash tables, I remember.
Linearize it.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -