📄 nway.py

📁 用python实现的邮件过滤器
💻 PY
字号:
#!/usr/bin/env python"""Demonstration of n-way classification possibilities.Usage: %(prog)s [ -h ] tag=db ...-h - print this message and exit.All args are of the form 'tag=db' where 'tag' is the tag to be given in theX-Spambayes-Classification: header.  A single message is read from stdin anda modified message sent to stdout.  The message is compared against eachdatabase in turn.  If its score exceeds the spam threshold when scoredagainst a particular database, an X-Spambayes-Classification header is addedand the modified message is written to stdout.  If none of the comparisonsyields a definite classification, the message is written with an'X-Spambayes-Classification: unsure' header.Training is left up to the user.  In general, you want to train so that amessage in a particular category will score above your spam threshold whenchecked against that category's training database.  For example, suppose youhave the following mbox formatted files: python, music, family, cars.  Ifyou wanted to create a training database for each of them you could executethis series of mboxtrain.py commands:    sb_mboxtrain.py -f -d python.db -s python -g music -g family -g cars    sb_mboxtrain.py -f -d music.db  -g python -s music -g family -g cars    sb_mboxtrain.py -f -d family.db -g python -g music -s family -g cars    sb_mboxtrain.py -f -d cars.db   -g python -g music -g family -s carsYou'd then compare messages using a %(prog)s command like this:    %(prog)s python=python.db music=music.db family=family.db cars=cars.dbNormal usage (at least as I envisioned it) would be to run the program viaprocmail or something similar.  You'd then have a .procmailrc file whichlooked something like this:    :0 fw:sb.lock    | $(prog)s spam=spam.db python=python.db music=music.db ...    :0    * ^X-Spambayes-Classification: spam    spam    :0    * ^X-Spambayes-Classification: python    python    :0    * ^X-Spambayes-Classification: music    music    ...    :0    * ^X-Spambayes-Classification: unsure    unsureNote that I've not tried this (yet).  It should simplify the logic in a.procmailrc file and probably classify messages better than writing moreconvoluted procmail rules."""import getoptimport sysimport osfrom spambayes import hammie, mboxutils, Optionsprog = os.path.basename(sys.argv[0])def help():    print >> sys.stderr, __doc__ % globals()def main(args):    opts, args = getopt.getopt(args, "h")    for opt, arg in opts:        if opt == '-h':            help()            return 0    tagdb_list = []    msg = mboxutils.get_message(sys.stdin)    try:        del msg["X-Spambayes-Classification"]    except KeyError:        pass    for pair in args:        tag, db = pair.split('=', 1)        h = hammie.open(db, True, 'r')        score = h.score(msg)        if score >= Options.options["Categorization", "spam_cutoff"]:            msg["X-Spambayes-Classification"] = "%s; %.2f" % (tag, score)            break    else:        msg["X-Spambayes-Classification"] = "unsure"    sys.stdout.write(msg.as_string(unixfrom=(msg.get_unixfrom()                                             is not None)))    return 0if __name__ == "__main__":    sys.exit(main(sys.argv[1:]))
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -