21.txt

来自「This complete matlab for neural network」· 文本代码 · 共 252 行
TXT
252 行
发信人: GzLi (笑梨), 信区: DataMining
标  题: [合集]第七章的几个重要的概念
发信站: 南京大学小百合站 (Tue Jan  7 11:02:10 2003)

nohau (nohau) 于Thu Jan  2 15:58:28 2003)
提到：

列出了第七章的几个重要的概念.

我觉得这些概念是第七章的主要内容,准确的把握这几个概念是下一步学习的基础.

Chapter 7 

Concepts:

7.2 

真实错误率

True error: the true error of hypothesis h with respect to target concept c an
d distribution D is the probability that h will misclassify an instance drawn 
at random according to D.

       

Note:

The true error depend on the unknown distribution D .


训练错误率

Training error: the probability of training examples misclassified by h.


Note:

1.The training error can be observed by the learner directly, and the true err
or can not.

2.The main problem of learning complexity is: “how probability of  the observ
ed training error for h gives a misleading estimate of the true error.”


样本错误率

Sample error:(defined in chapter 5) The sample error of a hypothesis with resp
ect to some sample S of instances drawn from X is the faction of S that is mis
classified. 


Note:

If S is the set of the training data, the sample error is the training error.


可ＰＡＣ学习

Consider some class C of possible target concepts and a learner using hypothes
is space  H. If a learner L conform two things, C is PAC-learnable by L using 
H.: First, L must with arbitrarily high probability (1-d) output a hypothesis 
having arbitrarily low error (e). Second, it must be efficiently, in time that
 grows at most polynomially with 1/e and 1/d.


7.3

样本复杂度

Sample complexity

The growth in the number of  required training examples with problem size, cal
led the complexity of the learning problem.


一致学习器

Consistent learner:

If a learner output hypothesis that perfectly fit the training data, it is cal
led consistent learner. 


e-详尽

e-exhausted: The version space VSH,D is said to e-exhausted with respect to c 
and D, if every hypothesis h in VSH,D has error less than e with respect to c 
and D.



GzLi (笑梨) 于Thu Jan  2 17:35:52 2003)
提到：

这几个概念的确很重要。
但本章主要讲的还是我在5577文中提到的几个概念。
你觉得我说的是否正确吗？，我自己也不知道这么说是不是确切。
我们讨论一下吧。

【 在 nohau (nohau) 的大作中提到: 】
: 列出了第七章的几个重要的概念.
: 我觉得这些概念是第七章的主要内容,准确的把握这几个概念是下一步学习的基础.
: Chapter 7 
: Concepts:
: 7.2 
: 真实错误率
: True error: the true error of hypothesis h with respect to target concept c an
: d distribution D is the probability that h will misclassify an instance drawn 
: at random according to D.
:        
: Note:
: The true error depend on the unknown distribution D .
: 
: 训练错误率
: Training error: the probability of training examples misclassified by h.
: 
: Note:
: 1.The training error can be observed by the learner directly, and the true err
: or can not.
: 2.The main problem of learning complexity is: “how probability of  the observ
: (以下引言省略 ... ...)


nohau (nohau) 于Thu Jan  2 20:23:33 2003)
提到：

版主发的文里讲的是这一章的概括，我只是今天下午把前三节看完了，把看的主要内容写
出来了，我还是新手，请版主多指教，呵呵。

【 在 nohau 的大作中提到: 】

: 列出了第七章的几个重要的概念.

: 我觉得这些概念是第七章的主要内容,准确的把握这几个概念是下一步学习的基础.

: Chapter 7 

: Concepts:

: 7.2 

: 真实错误率

: True error: the true error of hypothesis h with respect to target concept ..
: d distribution D is the probability that h will misclassify an instance dr..
: at random according to D.

:        

: Note:

: The true error depend on the unknown distribution D .

: 

: 训练错误率

: Training error: the probability of training examples misclassified by h.

: 

: Note:

: 1.The training error can be observed by the learner directly, and the true..
: or can not.

: 2.The main problem of learning complexity is: “how probability of  the ob..
: (以下引言省略...)



txytxy (nils) 于Thu Jan  2 21:07:16 2003)
提到：

我的理解如下，请同志们批评：

　　主动学习的最少训练数为log!H!,监督学习为n+1,为了讨论随机采样方式下的样本复杂
度，从计算假设的真实误差入手。“误差”即为错误的判断。

　　在样本确定的情况下，可以确切计算出在确定范围内某个“假设”的错误判断率，这
就是“训练错误率”。

　　在样本不确定的情况下，即随机按Ｄ分布抽取一批样本时，计算假设的真实误差，实
际上就是计算该假设对随机样本的错误判断的概率，这就是“真实错误率”。

　　因此“训练错误率”和“真实错误率”虽然都是“错误率”，但一个是百分比，一个
是概率，计算方法大不一样。

　　进一步，由于Ｄ未知，所以所谓的“真实错误率”不能直接计算，只能靠样本误差来
逼近，思路是，在样本误差为０的情形下（即该假设属于VS），如果我们能够确认它所在
的VS是e-穷尽的，则可推知，该ｈ的“真实错误率”必然是有界的，这个界是由假设空间
的大小和训练样本数共同定义的，这样就得到了随机采样方式下的样本复杂度，即最少要
训练书上（7.2)次。

     ps:关于定理7.1证明中，“由于已知有ｋ个假设错误率大于ｅ，那么至少有一个与所
有ｍ个训练样例都不一致的概率最多为：”我觉得这句话中“不一致”，应该为“一致”
，大家以为呢？


【 在 nohau 的大作中提到: 】

: 列出了第七章的几个重要的概念.

: 我觉得这些概念是第七章的主要内容,准确的把握这几个概念是下一步学习的基础.

: Chapter 7 

: Concepts:

: 7.2 

: 真实错误率

: True error: the true error of hypothesis h with respect to target concept ..
: d distribution D is the probability that h will misclassify an instance dr..
: at random according to D.

:        

: Note:

: The true error depend on the unknown distribution D .

: 

: 训练错误率

: Training error: the probability of training examples misclassified by h.

: 

: Note:

: 1.The training error can be observed by the learner directly, and the true..
: or can not.

: 2.The main problem of learning complexity is: “how probability of  the ob..
: (以下引言省略...)



GzLi (笑梨) 于Thu Jan  2 22:46:37 2003)
提到：

我也是初学，大家多讨论。
我觉得这个没有错误，是“不一致”，因为这个概率是“不一致”的概率，而不是
“一致”的概率。
不知道其它同志的意思是什么？

【 在 txytxy (nils) 的大作中提到: 】
:      ps:关于定理7.1证明中，“由于已知有ｋ个假设错误率大于ｅ，那么至少有一个..
: 有ｍ个训练样例都不一致的概率最多为：”我觉得这句话中“不一致”，应该为“一致”
: ，大家以为呢？
: 
: 【 在 nohau 的大作中提到: 】
: (以下引言省略 ... ...)


nohau (nohau) 于Fri Jan  3 09:15:37 2003)
提到：

 同意txytxy的意见,我也认为应该是"一致",因为任意假设真实错误率大于e,且与一个随即
抽样一致的可能性为(1-e),因此k个真实错误率大于e的假设中,至少有一个与所有m个样例
一致的概率最多为:k(1-e)^m.

这个概率的意义在于它是"坏"假设的概率,即真实错误率大于e,但是满足了所有的训练样例
的假设的概率.

【 在 nohau 的大作中提到: 】

: 列出了第七章的几个重要的概念.

: 我觉得这些概念是第七章的主要内容,准确的把握这几个概念是下一步学习的基础.

: Chapter 7 

: Concepts:

: 7.2 

: 真实错误率

: True error: the true error of hypothesis h with respect to target concept ..
: d distribution D is the probability that h will misclassify an instance dr..
: at random according to D.

:        

: Note:

: The true error depend on the unknown distribution D .

: 

: 训练错误率

: Training error: the probability of training examples misclassified by h.

: 

: Note:

: 1.The training error can be observed by the learner directly, and the true..
: or can not.

: 2.The main problem of learning complexity is: “how probability of  the ob..
: (以下引言省略...)



txytxy (nils) 于Fri Jan  3 16:58:08 2003)
提到：

1.“划分”：相当于样本集上的一个目标概念。n个样本上最多有2^n个概念。

2.“打散”：如果样本集上存在的每个目标概念，都有一个假设与之对应，则该假设空间
Ｈ“打散”了该样本集S。换言之，如果Ｈ能识别出Ｓ上存在的所有目标概念，则Ｈ打散了
Ｓ。

3.样本大小与假设空间的关系：样本空间越大，则上含的概念越多，打散它就越困难。给
定假设空间，它能打散的样本空间是有限的，能打散的最大一个样本空间的大小称为“vc
维”。


【 在 nohau 的大作中提到: 】

: 列出了第七章的几个重要的概念.

: 我觉得这些概念是第七章的主要内容,准确的把握这几个概念是下一步学习的基础.

: Chapter 7 

: Concepts:

: 7.2 

: 真实错误率

: True error: the true error of hypothesis h with respect to target concept ..
: d distribution D is the probability that h will misclassify an instance dr..
: at random according to D.

:        

: Note:

: The true error depend on the unknown distribution D .

: 

: 训练错误率

: Training error: the probability of training examples misclassified by h.

: 

: Note:

: 1.The training error can be observed by the learner directly, and the true..
: or can not.

: 2.The main problem of learning complexity is: “how probability of  the ob..
: (以下引言省略...)
21.txt - 源码说明

本页面展示了「This complete matlab for neural network」中的 21.txt 源码文件，采用文本编程语言编写，共 252 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫下载站收录了大量与complete相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?