📄 983.txt
字号:
发信人: GzLi (笑梨), 信区: DataMining
标 题: svm<5>Text Categorization
发信站: 南京大学小百合站 (Tue Jun 4 18:43:36 2002), 站内信件
Text Categorization
Text categorization is the assignment of natural language texts to one or
more predefined categories based on their content. Applications include:
assigning subject categories to documents to support text retrieval, routing
, and filtering; email or files sorting into folder hierarchies; web page
sorting into search engine categories.
Reference(s):
Text Categorization with Support Vector Machines: Learning with Many Relevant
Features.
T. Joachims,
European Conference on Machine Learning (ECML),
1998.
Inductive Learning Algorithms and Representations for Text Categorization
,
S. Dumais, J. Platt, D. Heckerman, M. Sahami,
7th International Conference on Information and Knowledge Management,
1998.
Support Vector Machines for Spam Categorization. H. Drucker, with D. Wu and
V. Vapnik. IEEE Trans. on Neural Networks , vol 10, number 5, pp. 1048-1054
. 1999.
Transductive Inference for Text Classification using Support Vector Machines
.
Thorsten Joachims.
International Conference on Machine Learning (ICML),
1999.
Reference link(s):
Joachims-98 Postcript, Joachims-98 PDF
Dumais et al 98
Drucker et al 98
Joachins-99 Postcript
Joachims-99 PDF
Data link(s):
Reuters-21578
Entered by: Isabelle Guyon <isabelle@clopinet.com> - Friday, September 17
, 1999 at 15:19:48 (PDT). Last modified, October 13, 1999.
Comments: Joachims-98 reports that SVMs are well suited to learn in very
high dimensional spaces (> 10000 inputs). They achieve substantial improvements
over the currently best performing methods, eliminating the need for feature
selection. The tests were run on the Oshumed corpus of William Hersh and
Reuter-21578. Dumais et al report that they use linear SVMs because they
are both accurate and fast (to train and to use). They are 35 times faster
to train that the next most accurate classifier that they tested (Decision
Trees). They have applied SVMs to the Reuter-21578 collection, emails and
web pages. Drucker at al classify emails as spam and non spam. They find
that boosting trees and SVMs have similar performance in terms of accuracy
and speed. SVMs train significatly faster. Joachims-99 report that transduction
is a very natural setting for many text classification and information retri
eval
tasks. Transductive SVMs improve performance especially in cases with very
small amounts of labelled training data.
--
*** 端庄厚重 谦卑含容 事有归着 心存济物 ***
今天你挖了吗? DataMining http://DataMining.bbs.lilybbs.net
演草纸式的语言 Matlab http://bbs.sjtu.edu.cn/cgi-bin/bbsdoc?board=Matlab
※ 来源:.南京大学小百合站 bbs.nju.edu.cn.[FROM: 211.80.38.29]
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -