⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 naive bayes algorithm for learning to classify text.htm

📁 Mitchell的《机器学习〉随书源码
💻 HTM
字号:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<!-- saved from url=(0067)http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html -->
<HTML><HEAD><TITLE>Naive Bayes algorithm for learning to classify text</TITLE>
<META http-equiv=Content-Type content="text/html; charset=gb2312"><!-- Changed by: Jason Rennie,  2-Feb-1997 -->
<META content="MSHTML 6.00.2600.0" name=GENERATOR></HEAD>
<BODY aLink=#5e5a80 bgColor=#eff7ff>
<H1>Naive Bayes algorithm for learning to classify text </H1>
<H3>Companion to Chapter 6 of <A 
href="http://www.cs.cmu.edu/~tom/mlbook.html"><I>Machine Learning</I></A> 
textbook. </H3>Naive Bayes classifiers are among the most successful known 
algorithms for learning to classify text documents. This page provides an 
implementation of the Naive Bayes learning algorithm similar to that described 
in Table 6.2 of the textbook. It also provides a dataset containing 20,000 
newsgroup messages drawn from the 20 newsgroups described in Table 6.3. As 
mentioned in the textbook, the dataset contains 1000 documents from each of the 
20 newsgroups. 
<P>
<H3>Note on downloading </H3>This code and data are only supported under the 
Unix and Linux operating systems. (if you would like to volunteer support for 
Windows, please contact me). To reconstruct the original files from a downloaded 
files such as xxx.tar.gz, type the following two commands to Unix: 
<P><I>gunzip xxx.tar.gz <BR>tar -xf xxx.tar </I>
<P>
<H3>Code</H3>This code is based on the Rainbow/Libbow software package developed 
by Andrew McCallum. It includes efficient C code for indexing text documents 
along with code implementing the Naive Bayes learning algorithm. Libbow also 
provides implementations of two additional text learning algorithms: TFIDF and 
prTFIDF. This code may be used as both a building block for creating other 
programs, or as a stand-alone learning/classification system. 
<P>Note: this code is a minor variant of the code described in Table 6.2 of <A 
href="http://www.cs.cmu.edu/~tom/mlbook.html"><I>Machine Learning</I></A>. 
<UL>
  <LI><A href="http://www.cs.cmu.edu/~mccallum/bow">Most recent Libbow source 
  code and documentation</A> 
  <LI><A 
  href="http://www.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes/bow-latest.tar.gz">Old 
  Libbow source code and documentation (tarred and gziped)</A> </LI></UL>. 
<P>
<H3>Newsgroup Data</H3><!--One of the datasets which has been used to evaluate textlearning algorithms --><!-- -->
<UL>
  <LI>The <A 
  href="http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes/20_newsgroups.tar.gz">tarred 
  and gzipped data directory </A>(easiest for downloading). 
  <LI>A <A 
  href="http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes/mini_newsgroups.tar.gz">tarred 
  and gzipped</A> subset of the Newsgroup data which contains 100 randomly 
  selected messages from each newsgroup. This is a useful dataset for learning 
  to use Rainbow. </LI></UL>
<P>
<H3>On-Line Documentation</H3>
<UL>
  <LI><A href="http://www.cs.cmu.edu/~mccallum/bow/rainbow">Rainbow 
  Documentation</A> <!--  <LI><A HREF="/afs/cs/project/theo-11/www/naive-bayes/quick_intro.html">Quick 'n Dirty Intro to Rainbow</A> --></LI></UL>
<P><I>Visitors from outside CMU are invited to use this material free of charge 
for any educational purpose, provided attribution is given in any lectures or 
publications that make use of this material. </I>
<P><I>This page organized by Jason Rennie. </I><BR><IMG 
src="Naive Bayes algorithm for learning to classify text.files/colorsep.gif"> 
<CENTER><I><A href="mailto:jr6b@cs.cmu.edu">jr6b@cs.cmu.edu</A> | Last updated 
4/6/97 </I></CENTER></BODY></HTML>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -