859.txt

来自「This complete matlab for neural network」· 文本代码 · 共 40 行

TXT

40 行

发信人: sinokdd (KDD in China), 信区: DataMining
标  题: Re: 各位大虾,进来讨论一下!
发信站: 南京大学小百合站 (Tue Sep 10 07:30:25 2002)


【 在 singhoo 的大作中提到: 】

: 我准备做web文本分类方面的研究(硕士),看了2个月论文,本来打算做个

: 原形系统,实现别人的算法,但是偶实验室一老师从美国回来,说别人都有

: 现成的系统,这样做没有意义!要做一些算法改进和创新

I think two months' reading is not a short period, you should know

that some people have implemented such systems.



: 搞的我现在很慌张,只好重做打算

: 1:根据目前我的理解,做www文本分类,在机器学习(分类)算法上很难有突破

: 无论是naive bayes,KNN还是SVM都比较成熟,我目前也没有能力去做这个

: 算法的改进,所以想在特征提取算法/层次分类/使用超连接上做点研究,

As I know, one people in CMU has done this, he uses bag of word to

represent the web page, but give weight for each word, for example,

word in title head, hyperlink has more weight.


: 2:另外,我打算建立自己的数据集(中文新闻web),目前国内好像没有公开的

: web训练集,这样应该有点意义吧,不过国外的论文一般都使用几个常用的数据

: 集,我这样的数据集能被承认吗?

It is a good idea. If you can make it publicable, and people may be

interested in it. And I think if you can take advantage of some

special characteristics of Chinese to improve the precision, that

may be interesting.


: 3:做算法研究,而且时间有限,我不想花时间编程实现分类器(NB,KNN,SVM等)

: 有源代码可用吗?

You can download most of such systems from Internet, try

www.kdnuggets.com or search google.




--

※ 来源:．南京大学小百合站 http://bbs.nju.edu.cn [FROM: 129.128.23.55]

859.txt - 源码说明

本页面展示了「This complete matlab for neural network」中的 859.txt 源码文件，采用文本编程语言编写，共 40 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。

虫虫下载站收录了大量与complete相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。

⌨️ 快捷键说明

复制代码Ctrl + C

搜索代码Ctrl + F

全屏模式F11

增大字号Ctrl + =

减小字号Ctrl + -

显示快捷键?