⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 sb_notesfilter.py

📁 用python实现的邮件过滤器
💻 PY
📖 第 1 页 / 共 2 页
字号:
#! /usr/bin/env python'''sb_notesfilter.py - Lotus Notes SpamBayes interface.    This module uses SpamBayes as a filter against a Lotus Notes mail    database.  The Notes client must be running when this process is    executed.    It requires a Notes folder, named as a parameter, with four    subfolders:        Spam        Ham        Train as Spam        Train as Ham    Depending on the execution parameters, it will do any or all of the    following steps, in the order given.    1. Train Spam from the Train as Spam folder (-t option)    2. Train Ham from the Train as Ham folder (-t option)    3. Replicate (-r option)    4. Classify the inbox (-c option)    Mail that is to be trained as spam should be manually moved to    that folder by the user. Likewise mail that is to be trained as    ham.  After training, spam is moved to the Spam folder and ham is    moved to the Ham folder.    Replication takes place if a remote server has been specified.    This step may take a long time, depending on replication    parameters and how much information there is to download, as well    as line speed and server load.  Please be patient if you run with    replication.  There is currently no progress bar or anything like    that to tell you that it's working, but it is and will complete    eventually.  There is also no mechanism for notifying you that the    replication failed.  If it did, there is no harm done, and the program    will continue execution.    Mail that is classified as Spam is moved from the inbox to the    Train as Spam folder.  You should occasionally review your Spam    folder for Ham that has mistakenly been classified as Spam.  If    there is any there, move it to the Train as Ham folder, so    SpamBayes will be less likely to make this mistake again.    Mail that is classified as Ham or Unsure is left in the inbox.    There is currently no means of telling if a mail was classified as    Ham or Unsure.    You should occasionally select some Ham and move it to the Train    as Ham folder, so Spambayes can tell the difference between Spam    and Ham. The goal is to maintain an approximate balance between the    number of Spam and the number of Ham that have been trained into    the database. These numbers are reported every time this program    executes.  However, if the amount of Spam you receive far exceeds    the amount of Ham you receive, it may be very difficult to    maintain this balance.  This is not a matter of great concern.    SpamBayes will still make very few mistakes in this circumstance.    But, if this is the case, you should review your Spam folder for    falsely classified Ham, and retrain those that you find, on a    regular basis.  This will prevent statistical error accumulation,    which if allowed to continue, would cause SpamBayes to tend to    classify everything as Spam.    Because there is no programmatic way to determine if a particular    mail has been previously processed by this classification program,    it keeps a pickled dictionary of notes mail ids, so that once a    mail has been classified, it will not be classified again.  The    non-existence of this index file, named <local database>.sbindex,    indicates to the system that this is an initialization execution.    Rather than classify the inbox in this case, the contents of the    inbox are placed in the index to note the 'starting point' of the    system.  After that, any new messages in the inbox are eligible    for classification.Usage:    sb_notesfilter [options]        note: option values with spaces in them must be enclosed              in double quotes        options:            -p  dbname  : pickled training database filename            -d  dbname  : dbm training database filename            -l  dbname  : database filename of local mail replica                            e.g. localmail.nsf            -r  server  : server address of the server mail database                            e.g. d27ml602/27/M/IBM                          if specified, will initiate a replication            -f  folder  : Name of SpamBayes folder                            must have subfolders: Spam                                                  Ham                                                  Train as Spam                                                  Train as Ham            -t          : train contents of Train as Spam and Train as Ham            -c          : classify inbox            -h          : help            -P          : prompt "Press Enter to end" before ending                          This is useful for automated executions where the                          statistics output would otherwise be lost when the                          window closes.            -i filename : index file name            -W          : password            -L dbname   : log to database (template alog4.ntf)            -o section:option:value :                          set [section, option] in the options database                          to valueExamples:    Replicate and classify inbox        sb_notesfilter -c -d notesbayes -r mynoteserv -l mail.nsf -f Spambayes    Train Spam and Ham, then classify inbox        sb_notesfilter -t -c -d notesbayes -l mail.nsf -f Spambayes    Replicate, then classify inbox        sb_notesfilter -c -d test7 -l mail.nsf -r nynoteserv -f SpambayesTo Do:    o Dump/purge notesindex file    o Create correct folders if they do not exist    o Options for some of this stuff?    o sb_server style training/configuration interface?    o parameter to retrain?    o Use spambayes.message MessageInfo db's rather than own database.    o Suggestions?'''# This module is part of the spambayes project, which is Copyright 2002-5# The Python Software Foundation and is covered by the Python Software# Foundation license.__author__ = "Tim Stone <tim@fourstonesExpressions.com>"__credits__ = "Mark Hammond, for his remarkable win32 modules."from __future__ import generatorstry:    True, Falseexcept NameError:    # Maintain compatibility with Python 2.2    True, False = 1, 0    def bool(val):        return not not valimport sysfrom spambayes import tokenizer, storagefrom spambayes.Options import optionsimport cPickle as pickleimport errnoimport win32com.clientimport pywintypesimport getoptdef classifyInbox(v, vmoveto, bayes, ldbname, notesindex, log):    # the notesindex hash ensures that a message is looked at only once    if len(notesindex.keys()) == 0:        firsttime = 1    else:        firsttime = 0    docstomove = []    numham = 0    numspam = 0    numuns = 0    numdocs = 0    doc = v.GetFirstDocument()    while doc:        nid = doc.NOTEID        if firsttime:            notesindex[nid] = 'never classified'        else:            if not notesindex.has_key(nid):                numdocs += 1                # Notes returns strings in unicode, and the Python                # decoder has trouble with these strings when                # you try to print them.  So don't...                message = getMessage(doc)                # generate_long_skips = True blows up on occasion,                # probably due to this unicode problem.                options["Tokenizer", "generate_long_skips"] = False                tokens = tokenizer.tokenize(message)                prob, clues = bayes.spamprob(tokens, evidence=True)                if prob < options["Categorization", "ham_cutoff"]:                    disposition = options["Headers", "header_ham_string"]                    numham += 1                elif prob > options["Categorization", "spam_cutoff"]:                    disposition = options["Headers", "header_spam_string"]                    docstomove += [doc]                    numspam += 1                else:                    disposition = options["Headers", "header_unsure_string"]                    numuns += 1                notesindex[nid] = 'classified'                try:                    print "%s spamprob is %s" % (subj[:30], prob)                    if log:                        log.LogAction("%s spamprob is %s" % (subj[:30],                                                             prob))                except UnicodeError:                    print "<subject not printed> spamprob is %s" % (prob)                    if log:                        log.LogAction("<subject not printed> spamprob " \                                      "is %s" % (prob,))                item = doc.ReplaceItemValue("Spam", prob)                item.IsSummary = True                doc.save(False, True, False)        doc = v.GetNextDocument(doc)    # docstomove list is built because moving documents in the middle of    # the classification loop loses the iterator position    for doc in docstomove:

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -