⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 readme.txt

📁 贝叶斯算法
💻 TXT
字号:
This is an English translation of the French set of files. J Wynia translated them,
but is not the original author. I try to answer questions the best I can, but the real
answers are best obtained in French from the original author. English questions can
be posted in the forums at http://www.phpgeek.com.

/*
  ***** BEGIN LICENSE BLOCK *****
   This file is part of PHP Naive Bayesian Filter.

   The Initial Developer of the Original Code is
   Loic d'Anterroches [loic xhtml.net].
   Portions created by the Initial Developer are Copyright (C) 2003
   the Initial Developer. All Rights Reserved.

   PHP Naive Bayesian Filter is free software; you can redistribute it 
   and/or modify it under the terms of the GNU General Public License as 
   published by the Free Software Foundation; either version 2 of 
   the License, or (at your option) any later version.

   PHP Naive Bayesian Filter is distributed in the hope that it will 
   be useful, but WITHOUT ANY WARRANTY; without even the implied
   warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  
   See the GNU General Public License for more details.

   You should have received a copy of the GNU General Public License
   along with Foobar; if not, write to the Free Software
   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA

  ***** END LICENSE BLOCK *****
*/

**Quick Start**
Create a database and load the mysql.sql table definitions into it. Configure the 
index.php sample file to match your database credentials and start playing with that
demo file. Once you are up and running, use the index.php as your model for your
own software that uses the classes.


** Presentation **

This is a general implementation of a filter based on the Bayesian theorem. The 
most common application of which is an anti spam filter. However, it also proves
useful for automatic classification of almost any kind of document.

This program is based on the simple version of the Bayesian theorem written about
by Ken Williams ken@mathforum.org  on the page
http://mathforum.org/~ken/bayes/bayes.html on 10/31/2003. 


The system generally allows the classification of text documents in various categories.  
If you want to use it for a classification of your email between Spam and not-Spam, then 
you will need 2 categories, a "Spam" and a "nonspam".  

This script was created because it is a fashionable subject has the fashion at the moment.  
Particularly to filter the comments and the trackbacks in blogs.  The system proposesd here
makes it possible to have more than two categories Spam and not Spam.  That then makes 
it possible theoriquement to use it for classification in multiple categories.  A small script 
'index.php' enables you to test the system, then you can include the class in your scripts.

The files class.naivebayesian.php and class.naivebayesianstorage.php can also be use with 
the GNU Lesser General Public License Version 2.1 or newer.  

** Functionality **
- a class with the basic logic, another which is the interface to storage. 
- Data storage in a database. For the moment MySQL but you can use that which you want via the storage interface. 
- Training - automatic Filing of the documents 
"reference" - the interface of storage uses MySQL and rests on two classes by Olivier Meunier.  

** Use **
Look at the code of index.php For correct use you should create another class which inherits NaiveBayesian to have your own function to ignore the words which should not influence the score.  This is not done in ' index.php'

class yourclass extends NaiveBayesian 
{
    function getIgnoreList()
    {
    	return array('the', 'that', 'you', 'for', 'and');
    }
}


** Of the questions ** 

Original author (wrote it in French) email at loic@xhtml.net, or come on http://www.xhtml.net/
Questions in English can be posted in the forums of http://www.PHPGeek.com.

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -