Known_scum as a variable

I mean to write more about markets and politics but in the meantime, there are two small software tricks I’ve had to use this week. You must have noticed that most spam now comes in multiple copies, presumably becasue it’s sent out from zombied Windows machine. I have spamassassin set to save it all into an Imap mailbox on pair. Here is a little python script that will read an imap mailbox, and delete all but one example of mail from the same sender. Obviously, this is only safe on a spam mailbox. But it cuts out about three quarters of the bulk of mine, which makes it practicable to riffle through quickly looking for false positives. I’m posting it because it might save someone else an hour or two trying to figure out Python and imap.


Note that the mailserver, password, and username will all have to be changed from these examples. Since first posting this, I have improved it to kill dictionary attacks, too. That leaves very little spam indeed to check.

#! /usr/local/bin/python

# quickie to dedupe spamassassin mailboxes

import imaplib,string,os, re

# dictionaries and list for deduping
known_scum={} # a dictionary of spammers
ss={} # a dictionary of subjects
whacklist=[]
da=0 # number of dictionary attacks
dupes=0 # number of dupes

# regexes for nailing dictionary attackers
goodme=re.compile('andrew|ingenstans|shoppping|andrewb|acb')
spamfrom=re.compile('From:.+')
spamto=re.compile('To: .*')
spamsubject=re.compile('Subject: .*')


m=imaplib.IMAP4()
m.open('mailserver.example.com')
if not m.login('logon','password')[0]=='OK':
# failed to login
exit
howmanyspams=m.select('caught_by_SA')[1][0] # choose own spam mbx
typ,data=m.search(None,'ALL')
# get list of dupes into whacklist
for nums in data[0].split():
resp,body=m.fetch(nums,'(BODY[HEADER.FIELDS (FROM TO SUBJECT)])')
try:
subject=spamsubject.search(body[0][1]).group()[18:-1]
except AttributeError:
subject= "This space intentionally left blank by acb"
if not ss.has_key(subject):
ss[subject]=nums
try:
spammer= spamfrom.search(body[0][1]).group()[6:]
target=spamto.search(body[0][1]).group()[4:]
except AttributeError: # maybe a malformed or missing address
# so kill it anyway
whacklist.append(m.store(nums,'FLAGS',r'\Deleted'))
continue
try:
name,domain=target.split('@')
except ValueError:
name=target
if not goodme.search(name):
da=da+1
whacklist.append(m.store(nums,'FLAGS',r'\Deleted')) # kill dictionary attacks
if not known_scum.has_key(spammer):
known_scum[spammer]=nums
else:
dupes=dupes+1
continue
if known_scum.has_key(spammer):
whacklist.append(m.store(nums,'FLAGS',r'\Deleted')) # need the 'r' and 'Deleted' exactly
dupes=dupes+1
else:
known_scum[spammer]=nums

# killdupes
Imap_response=m.expunge()
bye=m.close()
adieu=m.logout()

print "%s spams checked and %s zapped.\n%s slimeballs sent mail on %s subjects;\n%s were dictionary attacks\nand %s duplicate messages" %(
howmanyspams,
len(whacklist),
len(known_scum),
len(ss),
da,
dupes)
# print whacklist
# print Imap_response
#if howmanyspams >1:
#	print "The subjects were"
#	lies=ss.keys()
#	lies.sort()
#	for lie in lies:
#		print lie
This entry was posted in Software. Bookmark the permalink.

4 Responses to Known_scum as a variable

  1. Dave says:

    On a related note, I was rather pleased to receive comment spam from the Church of England the other day…

    http://www.wibsite.com/wiblog/dave/comments.php?d=1088169030&s=0

  2. el Patron says:

    It’s not really spam if they mean it personally.

  3. Dave says:

    Oh. Don’t say that – it’s my only thing to talk about at parties.

  4. el Patron says:

    Well, if you’re short of stuff to read …

    ~ --> bin/useful_tidying_scripts/killdupespam.py
    1517 spams checked and 1460 zapped.
    340 slimeballs sent mail on 316 subjects;
    1421 were dictionary attacks
    and 1176 duplicate messages
    

Comments are closed.