I mean to write more about markets and politics but in the meantime, there are two small software tricks I’ve had to use this week. You must have noticed that most spam now comes in multiple copies, presumably becasue it’s sent out from zombied Windows machine. I have spamassassin set to save it all into an Imap mailbox on pair. Here is a little python script that will read an imap mailbox, and delete all but one example of mail from the same sender. Obviously, this is only safe on a spam mailbox. But it cuts out about three quarters of the bulk of mine, which makes it practicable to riffle through quickly looking for false positives. I’m posting it because it might save someone else an hour or two trying to figure out Python and imap.
Note that the mailserver, password, and username will all have to be changed from these examples. Since first posting this, I have improved it to kill dictionary attacks, too. That leaves very little spam indeed to check.
#! /usr/local/bin/python # quickie to dedupe spamassassin mailboxes import imaplib,string,os, re # dictionaries and list for deduping known_scum={} # a dictionary of spammers ss={} # a dictionary of subjects whacklist=[] da=0 # number of dictionary attacks dupes=0 # number of dupes # regexes for nailing dictionary attackers goodme=re.compile('andrew|ingenstans|shoppping|andrewb|acb') spamfrom=re.compile('From:.+') spamto=re.compile('To: .*') spamsubject=re.compile('Subject: .*') m=imaplib.IMAP4() m.open('mailserver.example.com') if not m.login('logon','password')[0]=='OK': # failed to login exit howmanyspams=m.select('caught_by_SA')[1][0] # choose own spam mbx typ,data=m.search(None,'ALL') # get list of dupes into whacklist for nums in data[0].split(): resp,body=m.fetch(nums,'(BODY[HEADER.FIELDS (FROM TO SUBJECT)])') try: subject=spamsubject.search(body[0][1]).group()[18:-1] except AttributeError: subject= "This space intentionally left blank by acb" if not ss.has_key(subject): ss[subject]=nums try: spammer= spamfrom.search(body[0][1]).group()[6:] target=spamto.search(body[0][1]).group()[4:] except AttributeError: # maybe a malformed or missing address # so kill it anyway whacklist.append(m.store(nums,'FLAGS',r'\Deleted')) continue try: name,domain=target.split('@') except ValueError: name=target if not goodme.search(name): da=da+1 whacklist.append(m.store(nums,'FLAGS',r'\Deleted')) # kill dictionary attacks if not known_scum.has_key(spammer): known_scum[spammer]=nums else: dupes=dupes+1 continue if known_scum.has_key(spammer): whacklist.append(m.store(nums,'FLAGS',r'\Deleted')) # need the 'r' and 'Deleted' exactly dupes=dupes+1 else: known_scum[spammer]=nums # killdupes Imap_response=m.expunge() bye=m.close() adieu=m.logout() print "%s spams checked and %s zapped.\n%s slimeballs sent mail on %s subjects;\n%s were dictionary attacks\nand %s duplicate messages" %( howmanyspams, len(whacklist), len(known_scum), len(ss), da, dupes) # print whacklist # print Imap_response #if howmanyspams >1: # print "The subjects were" # lies=ss.keys() # lies.sort() # for lie in lies: # print lie
On a related note, I was rather pleased to receive comment spam from the Church of England the other day…
http://www.wibsite.com/wiblog/dave/comments.php?d=1088169030&s=0
It’s not really spam if they mean it personally.
Oh. Don’t say that – it’s my only thing to talk about at parties.
Well, if you’re short of stuff to read …