📄 103.txt
字号:
发信人: roamingo (漫步鸥), 信区: DataMining
标 题: Re: 对大的数据集的处理?
发信站: 南京大学小百合站 (Wed Dec 26 21:36:43 2001), 站内信件
Alternatively, one can use 9 parts as training set and 1 for test,
a.k.a. n-fold cross-validation.
However, an idea data mining algorithm should scan the database,
sequentially starting from the first record, ONLY once (one scan).
It could be interrupted in the middle and save a partially result,
and at a later time, be continued with the rest records and refine
the result (incremental). With these nice attributes, large dataset
is not a problem.
(Argued in: Paul S. Bradley, U. Fayyad, C. Reina. Scaling EM
(Expectation Maximization) Clustering to Large Databases.
Revised October 1999. Microsoft Research tech. report No. MSR-TR-98-35.
http://citeseer.nj.nec.com/bradley99scaling.html )
【 在 fervvac (高远) 的大作中提到: 】
: I know some paper uses the following approach:
: Divide the data into 10 parts, train your classifier (etc.) using one part and
: test against the following parts. Do this 10 times.
--
Read digitally, save a tree.
※ 来源:.南京大学小百合站 bbs.nju.edu.cn.[FROM: 202.120.7.27]
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -