⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 hammer.py

📁 用python实现的邮件过滤器
💻 PY
📖 第 1 页 / 共 3 页
字号:
#! /usr/bin/env python# Part of the SpamBayes project.  Released under the Python Software# Foundation license; see http://www.python.org/import os, sys, re, random, textwrapfrom spambayes import storagefrom spambayes import tokenizer__author__ = "Richie Hindle <richie@entrian.com>"headerTemplate = """To: %(To)sFrom: %(From)sSubject: %(Subject)sDate: %(Date)s"""# Create a fresh bayes object to train and classify.FILENAME = '__hammer.db'try:    os.remove(FILENAME)except OSError:    passbayes = storage.open_storage(FILENAME, True)def train(text, isSpam):    """Trains the classifier on the given text."""    tokens = tokenizer.tokenize(text)    bayes.learn(tokens, isSpam)def classify(text):    """Classifies the given text, returning the spamprob."""    tokens = tokenizer.tokenize(text)    return bayes.spamprob(tokens)def makeMessage(isSpam):    """Builds a fake email message full of random words taken from a    selection of ham and spam messages."""    # Which set of message shall we base this message on?    if isSpam:        messages = spam    else:        messages = ham    # Take the headers from one of the messages.    messageIndex = random.randrange(3)    headers = headerTemplate % messages[messageIndex]    # Build a body made from a random selection of words from each message    # plus a few purely random words.    bodyWords = []    for i in range(3):        body = messages[i]['Body']        for j in range(10):            offset = random.randrange(len(body) - 50)            bodySection = body[offset:offset+50]            bodyWords.extend(re.findall(r'[^\s]+', bodySection))        # Add a few purely random words.        for i in range(5):            aToZ = 'abcdefghijklmnopqrstuvwxyz'            wordLength = random.randrange(3, 8)            word = ''.join([random.choice(aToZ) for j in range(wordLength)])            bodyWords.append(word)    body = '\n'.join(textwrap.wrap(' '.join(bodyWords)))    # Build a message and return it.    return headers + bodydef hammer():    """Trains and classifies repeatedly."""    global bayes    wellFlushed = False    for i in range(1, 1000000):        # Train.        isSpam = random.choice([True, False])        train(makeMessage(isSpam), isSpam)        # Every thousand messages or so, flush the DB to disk.        if random.randrange(1000) == 1:            print "Flushing."            bayes.store()            if i > 500:                wellFlushed = True        # Classify.        isSpam = random.choice([True, False])        prob = classify(makeMessage(isSpam))        if i < 10 or i % 100 == 0:            print "%6.6d: %d, %.4f" % (i, isSpam, prob)        # Every thousand messages or so, reopen the DB without closing it.        # The way this works will open the new instance before the existing        # one goes away, which can cause a DBRunRecoveryError.  Versions up        # to 1.0a5 had a bug in that did this, but people were still        # reporting DBRunRecoveryErrors in 1.0a6, so I don't think we can        # call it fixed.        # We don't do this within the first few hundred messages, or before        # the DB has been flushed, because that can give a "hamcount > nham"        # error.  Despite this, you still see those errors.  Either I've got        # something badly wrong, or they're the result of corrupt databases        # that aren't caught by bsddb and turned into DBRunRecoveryErrors.        if wellFlushed and random.randrange(1000) == 1:            print "Re-opening."            bayes = storage.open_storage(FILENAME, True)def test():    """Print a random ham message and a random spam message."""    print makeMessage(False)    print    print makeMessage(True)ham = [ {'To': """<richie@entrian.com>, <spambayes-dev@python.org>""",'Subject': """RE: [spambayes-dev] Experimental SpamBayes build available""",'From': """"Tony Meyer" <tameyer@ihug.co.nz>""",'Date': """Wed, 31 Dec 2003 11:51:29 +1300""",'Body': """[I'll leave the install stuff for Mark, but I can sort out the rest ofthese].> The ini file for the proxy appeared in "C:\Documents and> Settings\rjh\Application Data\SpamBayes\Proxy" as you'd> expect, but the database and cache directories appeared in> "C:\Program Files\SpamBayes\bin".Did the ini file have the appropriate [Storage] lines in it?  It's meant toadd them in there, storing the directories in that directory, too.  Youdidn't already have an ini file in there, did you?  (It only adds thoselines if it's a new file, so that it doesn't overwrite someone's settings).> I'd question whether> we need the Stop/Start command - why would I want the tray> icon to stay there but the application to not run?I was thinking this just yesterday. I'm not sure what the original reasoningbehind having it was (and it may have been me that put it there ;).+1 to getting rid of it, unless someone does know the reasoning.  We candump the 'stopped' icon, then, too.  (I'd like to see a '!' icon, though,which appeared when there were important status messages to review).> After training through the web interface, the home page still> says "Database has no training information ..." even though> the stats say "Total emails trained: Spam: 3 Ham: 18".Good spotting.  I've checked in a fix for this.> Defaulting the "Maximum results" field in the Find pane to 1> seems wrong. It made sense when all you could do was search> for a message ID (because they're unique) but if I'm> searching for text, I'll want to see all the hits.Fair enough.  Line 435 of ui.html; change it to whatever you like most :)> The Find pane only looks in the unknown cache, so it won't> find anything once you've trained.  It ought to look in the> ham and spam caches as well.Are you positive?  The code has it looking in all three, and a quick testhere had it finding messages in more than one.> I deliberately induced a false positive (by training on a> thousand spams with no hams trained) then corrected it via> the Review page, and the statistics now say "1 being false> negatives" (plural: ack!) and "0 being false positives".> That's the wrong way round.Opps, my bad.  I've checking in a fix for this.  I think I've fixed all theplurals, too.  If you've still got that false positive statistic around,could you give it a run from cvs?=Tony Meyer""",},{'To': """'spambayes@python.org'" <spambayes@python.org>""",'Subject': """[Spambayes] Spambayes Software and SPAM folder""",'From': """Lily Cornely <lcornely@clfund.com>""",'Date': """Tue, 30 Dec 2003 16:18:03 -0500""",'Body': """Someone here told me that you need to not delete your SPAM folder as thesoftware needs it to know what is "spam". Is this correct? This would seemto be a flaw as this file will build up dramatically over time and beimpossible to manage.Please advise. Can I empty my SPAM folder - or not?Lily CornelyMarketing ConsultantCL Fund1920 Gulf Tower707 Grant StreetPittsburgh, PA  15219412.201.2450412.201.2451 - Faxlcornely@clfund.com_______________________________________________Spambayes@python.orghttp://mail.python.org/mailman/listinfo/spambayesCheck the FAQ before asking: http://spambayes.sf.net/faq.html"""},{'To': """<richie@entrian.com>""",'Subject': """January Sale - up to 50% off!""",'From': """"Firebox.com" <newsletter@firebox.com>""",'Date': """Wed, 31 Dec 2003 15:41:46 GMT""",'Body': """<html><head><title>Firebox.com #55 - January Sale - up to 50% off!</title><meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"></head><BODY text="#000000" link="#ff0000" vlink="#ff0000" MARGINHEIGHT="25" MARGINWIDTH="25" LEFTMARGIN="25" TOPMARGIN="25" RIGHTMARGIN="25" BOTTOMMARGIN="25"BGCOLOR="#ffffff" BACKGROUND="http://www.firebox.com/i/fb_sale.gif"><DIV ALIGN="center">        <TABLE BORDER="0" CELLPADDING="0" CELLSPACING="0" WIDTH="528">                <TR>                        <TD COLSPAN="2" ROWSPAN="2" ALIGN="left" HEIGHT="14" WIDTH="14"><IMG src="http://www.firebox.com/i/nl_corner_tl.gif" alt=""border="0" height="14" width="14"></TD>                        <TD BGCOLOR="#000000" HEIGHT="1" WIDTH="500"><IMG src="http://www.firebox.com/i/spacer.gif" alt="" border=0 height="1"width="500"></TD>                        <TD COLSPAN="2" ROWSPAN="2" ALIGN="right" HEIGHT="14" WIDTH="14"><IMG src="http://www.firebox.com/i/nl_corner_tr.gif" alt=""border="0" height="14" width="14"></TD>                </TR>                <TR>                        <TD BGCOLOR="#ffffff" HEIGHT="13" WIDTH="500"><IMG alt="" border=0 src="http://www.firebox.com/i/spacer.gif" height=13width=500></TD>                </TR>                <TR>                        <TD BGCOLOR="#000000" WIDTH="1"><img alt="" border="0" width="1" src="http://www.firebox.com/i/spacer.gif"></TD>                        <TD BGCOLOR="#ffffff" WIDTH="13"><img alt="" border="0" width="13" src="http://www.firebox.com/i/spacer.gif"></TD>                        <TD BGCOLOR="#ffffff" WIDTH="500">                        <table bgcolor="#ffffff" width="500" border="0" cellspacing="0" cellpadding="0" align="center">                        <tr>                        <td><!-- Content Starts Here //-->              <DIV ALIGN="center"> <font face="Verdana, Arial, Helvetica, sans-serif" size="1">if                you can't read this newsletter properly, <a href="http://www.firebox.com/newsletter/firebox_newsletter_55.html">click                here</a></font><br>                <br>                                </DIV>              <a href="http://www.firebox.com/aff.php?aff=678" target="_blank"><img src="http://www.firebox.com/i/snow_logo_300.gif" width="300" height="41"border=0 alt="firebox.com"></a><br>                                <br>              <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><b>Firebox.com                Newsletter #55 - January Sale! Up to 50% off!</b></font></p>              <div align="center"><ahref="http://www.firebox.com/aff.php?aff=678&redirect=index.html?dir=firebox&action=search&searchstring=jansale&searchfeature=1" target="_blank"><imgsrc="http://www.firebox.com/i/jansale_feature.jpg" alt="January Sale - up to 50% off!" width="393" height="175" border="0"></a></div>              <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Christmas                is over for another year but we are still full of seasonal cheer!                In time-honoured January Sale tradition we have slashed prices                on a wide range of products, just a few of which are featured                below. Be sure to check the <ahref="http://www.firebox.com/aff.php?aff=678&redirect=index.html?dir=firebox&action=search&searchstring=jansale&searchfeature=1">January                Sale</a> page for the full list of bargains!</font></p>              <table width="100%" border="0" cellspacing="0" cellpadding="0">                <tr>                  <td bgcolor="#FF6600"><img src="http://www.firebox.com/i/tasse_jansale.gif" align="left" width="301" height="26" alt="January Sale - SaveUp To 50%!"></td>                </tr>              </table> <br clear="all">              <table width="100%" border="0" cellpadding="5">                <tr align="center" valign="top">                  <td width="33%">                    <p><a href="http://www.firebox.com/aff.php?aff=678&redirect=product.php?pid=482" target="_blank"><imgsrc="http://www.firebox.com/pic/p482p.gif" alt="Atari  Classics 10-in-1" width="80" height="81" border="0"></a></p>                    <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><ahref="http://www.firebox.com/aff.php?aff=678&redirect=product.php?pid=482" target="_blank"><b>Atari                      10-in-1</b></a> <br>                      was &pound;24.95<br>                      now <font color="#FF3300"><strong>&pound;19.95</strong></font></font><br>                      <font face="Verdana, Arial, Helvetica, sans-serif" color="#FF9900" size="1"><b>20%                      off!</b></font> </p></td>                  <td width="33%">                    <p><a href="http://www.firebox.com/aff.php?aff=678&redirect=product.php?pid=353" target="_blank"><imgsrc="http://www.firebox.com/pic/p353p.gif" alt="Desktop Rover" width="80" height="91" border="0"></a></p>                    <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><ahref="http://www.firebox.com/aff.php?aff=678&redirect=product.php?pid=353" target="_blank"><b>Desktop                      Rover</b></a> <br>                      was &pound;39.95<br>                      now </font><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><fontcolor="#FF3300"><strong>&pound;19.95</strong></font></font><br>                      <font face="Verdana, Arial, Helvetica, sans-serif" color="#FF9900" size="1"><b>50%                      off! </b></font></p>                    <font face="Verdana, Arial, Helvetica, sans-serif" size="2">&nbsp;                    </font></td>                  <td width="33%"><p><ahref="http://www.firebox.com/aff.php?aff=678&redirect=product.php?pid=727"target="_blank"><img src="http://www.firebox.com/pic/p727p.jpg" alt="SiPixPocket DV Camcorder" width="80" height="65" border="0"></a></p><p><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><ahref="http://www.firebox.com/aff.php?aff=678&redirect=product.php?pid=727"target="_blank"><b>SiPix DV Camcorder</b></a><br>was &pound;99.95<br>now </font><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><fontcolor="#FF3300"><strong>&pound;49.95</strong></font></font><br><font face="Verdana, Arial, Helvetica, sans-serif" color="#FF9900"size="1"><b>50% off! </b></font></p></td>                </tr>                <tr align="center" valign="top">                  <td width="33%">                    <p><a href="http://www.firebox.com/aff.php?aff=678&redirect=product.php?pid=585" target="_blank"><imgsrc="http://www.firebox.com/pic/p585p.jpg" alt="Mini-K MP3 Player" width="80" height="108" border="0"></a></p>                    <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><ahref="http://www.firebox.com/aff.php?aff=678&redirect=product.php?pid=585" target="_blank"><b>Mini-K                      MP3 Player</b></a> <br>                      was &pound;59.95<br>                      now <font color="#FF3300"><strong>&pound;49.95</strong></font></font><br>                      <font face="Verdana, Arial, Helvetica, sans-serif" color="#FF9900" size="1"><b>save                      &pound;10!</b></font> </p></td>                  <td width="33%">                    <p><a href="http://www.firebox.com/aff.php?aff=678&redirect=product.php?pid=501" target="_blank"><imgsrc="http://www.firebox.com/pic/p501p.gif" alt="Pocket DV3" width="80" height="76" border="0"></a></p>                    <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><ahref="http://www.firebox.com/aff.php?aff=678&redirect=product.php?pid=501" target="_blank"><b>Pocket                      DV3</b></a> <br>                      was &pound;129.95<br>                      now <font color="#FF3300"><strong>&pound;99.95</strong></font></font><br>                      <font face="Verdana, Arial, Helvetica, sans-serif" color="#FF9900" size="1"><b>save                      &pound;30!</b></font> </p>                    <font face="Verdana, Arial, Helvetica, sans-serif" size="2">&nbsp;                    </font></td>                  <td width="33%">                    <p><a href="http://www.firebox.com/aff.php?aff=678&redirect=product.php?pid=689" target="_blank"><imgsrc="http://www.firebox.com/pic/p689p.jpg" alt="Micro Spinster" width="80" height="68" border="0"></a></p>                    <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><ahref="http://www.firebox.com/aff.php?aff=678&redirect=product.php?pid=689" target="_blank"><b>Micro                      Spinster </b></a> <br>                      was &pound;12.95<br>                      now <font color="#FF3300"><strong>&pound;6.50</strong></font></font><br>                      <font face="Verdana, Arial, Helvetica, sans-serif" color="#FF9900" size="1"><b>50%                      off! </b></font></p></td>                </tr>                <tr align="center" valign="top">                  <td>                    <p><a href="http://www.firebox.com/aff.php?aff=678&redirect=product.php?pid=508" target="_blank"><imgsrc="http://www.firebox.com/pic/p508p.gif" alt="Storm Hopper" width="80" height="70" border="0"></a></p>                    <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><ahref="http://www.firebox.com/aff.php?aff=678&redirect=product.php?pid=508" target="_blank"><b>Storm                      Hopper </b></a> <br>                      was &pound;29.95<br>                      now <font color="#FF3300"><strong>&pound;19.95</strong></font></font><br>                      <font face="Verdana, Arial, Helvetica, sans-serif" color="#FF9900" size="1"><b>save                      &pound;10! </b></font></p></td>                  <td>                    <p><a href="http://www.firebox.com/aff.php?aff=678&redirect=product.php?pid=645" target="_blank"><imgsrc="http://www.firebox.com/pic/p645p.jpg" alt="D'Zign Digital Camera" width="80" height="76" border="0"></a></p>                    <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><ahref="http://www.firebox.com/aff.php?aff=678&redirect=product.php?pid=645" target="_blank"><b>D'Zign                      Digital Camera</b></a> <br>                      was &pound;129.95<br>                      now <font color="#FF3300"><strong>&pound;99.95</strong></font></font><br>                      <font face="Verdana, Arial, Helvetica, sans-serif" color="#FF9900" size="1"><b>save                      &pound;30!</b></font> </p></td>                  <td>                    <p><a href="http://www.firebox.com/aff.php?aff=678&redirect=product.php?pid=510" target="_blank"><imgsrc="http://www.firebox.com/pic/p510p.gif" alt="Mars Detector" width="80" height="77" border="0"></a></p>                    <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><ahref="http://www.firebox.com/aff.php?aff=678&redirect=product.php?pid=510" target="_blank"><b>Mars                      Detector</b></a> <br>                      was &pound;29.95<br>                      now <font color="#FF3300"><strong>&pound;19.95</strong></font></font><br>                      <font face="Verdana, Arial, Helvetica, sans-serif" color="#FF9900" size="1"><b>save                      &pound;10! </b></font></p></td>                </tr>              </table>                        </td>                </tr>                  <tr>            <td>                        <br>                        <div align="center">                <p><font face="Verdana, Arial, Helvetica, sans-serif" size="4">                  <a href="http://www.firebox.com/aff.php?aff=678&redirect=index.html?dir=firebox&action=search&searchstring=jansale&searchfeature=1"target="_blank">Check                  out the other great deals!</a></font></p>                <p><font face="Verdana, Arial, Helvetica, sans-serif" size="1">Hurry - all offers only available while stocks last! Sale ends January 15th,2004.</font></p>              </div>                                <table width="100%" border="0" cellspacing="0" cellpadding="0">                <tr>                  <td bgcolor="#FF6600"><a href="http://www.firebox.com/aff.php?aff=678&redirect=index.html?dir=competitions&action=competition55"><imgsrc="http://www.firebox.com/i/tasse_comp.gif" alt="Competition Time" width="201" height="26" border="0" align="left"></a></td>                </tr>              </table>                          <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><b>WIN                a &quot;Zorbing&quot; experience with Stuck on You! </b></font></p>              <a href="http://www.firebox.com/aff.php?aff=678&redirect=index.html?dir=competitions&action=competition55" target="_blank"><imgsrc="http://www.firebox.com/i/soy_small.gif" alt="Stuck On You" width="100" height="149" hspace="10" border="0" align="left"></a>              <p><font face="Verdana, Arial, Helvetica, sans-serif" size="2">                Outrageous comedy and a lot of heart are joined at the hip in                Stuck on You, the latest envelope-pushing exercise in hilarity                from The Farrelly Brothers (There's Something About Mary, Dumb                and Dumber). To celebrate the launch of the movie (in cinemas                from 2nd January 2004) we are offering the chance for you and                a partner to get 'stuck together' inside a huge bouncy ball and                launched down a 300 metre hill! Yes, it is fun, really! <ahref="http://www.firebox.com/aff.php?aff=678&redirect=index.html?dir=competitions&action=competition55" target="_blank">enter&#8230;</a></font><brclear="all">              </p>              <table width="100%" border="0" cellspacing="0" cellpadding="0">

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -