{"id":1244,"date":"2004-07-06T16:34:14","date_gmt":"2004-07-06T20:34:14","guid":{"rendered":"http:\/\/www.thewormbook.com\/hlog\/?p=1244"},"modified":"2004-07-06T16:34:14","modified_gmt":"2004-07-06T20:34:14","slug":"known_scum-as-a-variable","status":"publish","type":"post","link":"http:\/\/www.thewormbook.com\/hlog\/?p=1244","title":{"rendered":"Known_scum as a variable"},"content":{"rendered":"<p>I mean to write more about markets and politics but in the meantime, there are two small software tricks I&#8217;ve had to use this week. You must have noticed that most spam now comes in multiple copies, presumably becasue it&#8217;s sent out from zombied Windows machine. I have spamassassin set to save it all into an Imap mailbox on pair. Here is a little python script that will read an imap mailbox, and delete all but one example of mail from the same sender. Obviously, this is only safe on a spam mailbox. But it cuts out about three quarters of the bulk of mine, which makes it practicable to riffle through quickly looking for false positives. I&#8217;m posting it because it might save someone else an hour or two trying to figure out Python and imap.<\/p>\n\n\n<p><!--more--><br \/>\nNote that the mailserver, password, and username will all have to be changed from these examples. Since first posting this, I have improved it to kill dictionary attacks, too. That leaves very little spam indeed to check.<\/p>\n\n\n\n<pre>#! \/usr\/local\/bin\/python\n\n# quickie to dedupe spamassassin mailboxes\n\nimport imaplib,string,os, re\n\n# dictionaries and list for deduping\nknown_scum={} # a dictionary of spammers\nss={} # a dictionary of subjects\nwhacklist=[]\nda=0 # number of dictionary attacks\ndupes=0 # number of dupes\n\n# regexes for nailing dictionary attackers\ngoodme=re.compile('andrew|ingenstans|shoppping|andrewb|acb')\nspamfrom=re.compile('From:.+')\nspamto=re.compile('To: .*')\nspamsubject=re.compile('Subject: .*')\n\n\nm=imaplib.IMAP4()\nm.open('mailserver.example.com')\nif not m.login('logon','password')[0]=='OK':\n# failed to login\nexit\nhowmanyspams=m.select('caught_by_SA')[1][0] # choose own spam mbx\ntyp,data=m.search(None,'ALL')\n# get list of dupes into whacklist\nfor nums in data[0].split():\nresp,body=m.fetch(nums,'(BODY[HEADER.FIELDS (FROM TO SUBJECT)])')\ntry:\nsubject=spamsubject.search(body[0][1]).group()[18:-1]\nexcept AttributeError:\nsubject= &quot;This space intentionally left blank by acb&quot;\nif not ss.has_key(subject):\nss[subject]=nums\ntry:\nspammer= spamfrom.search(body[0][1]).group()[6:]\ntarget=spamto.search(body[0][1]).group()[4:]\nexcept AttributeError: # maybe a malformed or missing address\n# so kill it anyway\nwhacklist.append(m.store(nums,'FLAGS',r'\\Deleted'))\ncontinue\ntry:\nname,domain=target.split('@')\nexcept ValueError:\nname=target\nif not goodme.search(name):\nda=da+1\nwhacklist.append(m.store(nums,'FLAGS',r'\\Deleted')) # kill dictionary attacks\nif not known_scum.has_key(spammer):\nknown_scum[spammer]=nums\nelse:\ndupes=dupes+1\ncontinue\nif known_scum.has_key(spammer):\nwhacklist.append(m.store(nums,'FLAGS',r'\\Deleted')) # need the 'r' and 'Deleted' exactly\ndupes=dupes+1\nelse:\nknown_scum[spammer]=nums\n\n# killdupes\nImap_response=m.expunge()\nbye=m.close()\nadieu=m.logout()\n\nprint &quot;%s spams checked and %s zapped.\\n%s slimeballs sent mail on %s subjects;\\n%s were dictionary attacks\\nand %s duplicate messages&quot; %(\nhowmanyspams,\nlen(whacklist),\nlen(known_scum),\nlen(ss),\nda,\ndupes)\n# print whacklist\n# print Imap_response\n#if howmanyspams &gt;1:\n#\tprint &quot;The subjects were&quot;\n#\tlies=ss.keys()\n#\tlies.sort()\n#\tfor lie in lies:\n#\t\tprint lie\n<\/pre>","protected":false},"excerpt":{"rendered":"<p>I mean to write more about markets and politics but in the meantime, there are two small software tricks I&#8217;ve had to use this week. You must have noticed that most spam now comes in multiple copies, presumably becasue it&#8217;s &hellip; <a href=\"http:\/\/www.thewormbook.com\/hlog\/?p=1244\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a> <a href=\"http:\/\/www.thewormbook.com\/hlog\/?p=1244\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[19],"tags":[],"_links":{"self":[{"href":"http:\/\/www.thewormbook.com\/hlog\/index.php?rest_route=\/wp\/v2\/posts\/1244"}],"collection":[{"href":"http:\/\/www.thewormbook.com\/hlog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.thewormbook.com\/hlog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.thewormbook.com\/hlog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.thewormbook.com\/hlog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1244"}],"version-history":[{"count":0,"href":"http:\/\/www.thewormbook.com\/hlog\/index.php?rest_route=\/wp\/v2\/posts\/1244\/revisions"}],"wp:attachment":[{"href":"http:\/\/www.thewormbook.com\/hlog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1244"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.thewormbook.com\/hlog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1244"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.thewormbook.com\/hlog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1244"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}