103.txt

来自「This complete matlab for neural network」· 文本代码 · 共 28 行

TXT

28 行

发信人: roamingo (漫步鸥), 信区: DataMining
标  题: Re: 对大的数据集的处理？
发信站: 南京大学小百合站 (Wed Dec 26 21:36:43 2001), 站内信件

Alternatively, one can use 9 parts as training set and 1 for test, 
a.k.a. n-fold cross-validation.

However, an idea data mining algorithm should scan the database,
sequentially starting from the first record, ONLY once (one scan).  
It could be interrupted in the middle and save a partially result, 
and at a later time, be continued with the rest records and refine 
the result (incremental). With these nice attributes, large dataset 
is not a problem. 
(Argued in: Paul S. Bradley, U. Fayyad, C. Reina. Scaling EM 
(Expectation Maximization) Clustering to Large Databases. 
Revised October 1999. Microsoft Research tech. report No. MSR-TR-98-35. 
http://citeseer.nj.nec.com/bradley99scaling.html )

【 在 fervvac (高远) 的大作中提到: 】
: I know some paper uses the following approach:
: Divide the data into 10 parts, train your classifier (etc.) using one part and
: test against the following parts.  Do this 10 times.

--
Read digitally, save a tree.

※ 来源:．南京大学小百合站 bbs.nju.edu.cn．[FROM: 202.120.7.27]

103.txt - 源码说明

本页面展示了「This complete matlab for neural network」中的 103.txt 源码文件，采用文本编程语言编写，共 28 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。

虫虫下载站收录了大量与complete相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。

⌨️ 快捷键说明

复制代码Ctrl + C

搜索代码Ctrl + F

全屏模式F11

增大字号Ctrl + =

减小字号Ctrl + -

显示快捷键?