120.txt
来自「This complete matlab for neural network」· 文本 代码 · 共 45 行
TXT
45 行
发信人: fervvac (高远), 信区: DataMining
标 题: Re: 对大的数据集的处理?
发信站: 南京大学小百合站 (Thu Dec 27 14:12:22 2001), 站内信件
I see. I am not sure which part is used in training, but I do remember the
10-fold term you mentioned.
I think there are two different questions here: one is the data preprocessing
method used in classification process, the other being used in other processes
, say, AR mining. The n-fold methods should be a common practice in
testing the classifiers. For AR mining on large data sets, there are some
approximation mining algorithms, for example, approach that uses sampling.
However, I wonder if there is any algorithm (AR mining) that need only
1 scan over the data set? Even for FP-Tree based methods (which is believed
to be fastest), several scan over the data (might not be the original data
set, but the projected DB) is necessary if memory is not enough.
And for incremental mining, you cannot stop at any arbitrary moment, can you?
I am sorry I didn't read that paper before asking those possibly silly
questions, but I am really busy with other papers, :-( Hope you don't mind...
【 在 roamingo (漫步鸥) 的大作中提到: 】
: Alternatively, one can use 9 parts as training set and 1 for test,
: a.k.a. n-fold cross-validation.
: However, an idea data mining algorithm should scan the database,
: sequentially starting from the first record, ONLY once (one scan).
: It could be interrupted in the middle and save a partially result,
: and at a later time, be continued with the rest records and refine
: the result (incremental). With these nice attributes, large dataset
: is not a problem.
: (Argued in: Paul S. Bradley, U. Fayyad, C. Reina. Scaling EM
: (Expectation Maximization) Clustering to Large Databases.
: Revised October 1999. Microsoft Research tech. report No. MSR-TR-98-35.
: http://citeseer.nj.nec.com/bradley99scaling.html )
: 【 在 fervvac (高远) 的大作中提到: 】
: : I know some paper uses the following approach:
: : Divide the data into 10 parts, train your classifier (etc.) using one par..
: : test against the following parts. Do this 10 times.
--
※ 来源:.南京大学小百合站 bbs.nju.edu.cn.[FROM: 饮水思源BBS]
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?