Published on October 7, 2007
SpamAssassin: SpamAssassin I shall talk about 3 aspects: Spam-fighting techniques What SpamAssassin is, and why it’s so good My office setup Spam-fighting techniques: Spam-fighting techniques smtp restrictions -- rDNS, bad syntax, no A or MX record. RBLs (Realtime Blackhole List) Online databases Content analysis Bayesian filters (Other options include graylisting, tarpitting, and charging micro-payments for sending email) SMTP restrictions: SMTP restrictions SMTP restrictions involve examining the headers of incoming mail, and checking that the hostname has valid syntax, both hostname and domain have DNS A or MX record, and that reverse DNS works. The format of the Received headers is: Received: from Announced-name (real-name [real-IP]) A typical spam: A typical spam Received: from femail.sdc.sfba.home.com (femail.sdc.sfba.home.com [18.104.22.168]) by wantadilla.lemis.com (Postfix) with ESMTP id BCBFF6ACC0 for <[email protected]>; Tue, 19 Jun 2001 13:50:57 +0930 (CST) Received: from u319 ([22.214.171.124]) by femail2.sdc1.sfba.home.com (InterMail vM.4.01.03.20 201-229-121-120-20010223) with SMTP id <[email protected]>; Mon, 18 Jun 2001 21:20:05 -0700 From: [email protected] To: Subject: stolen britney spears home video!! Date: Thu, 19 Jun 2025 13:52:44 -0200 SMTP restrictions summary: SMTP restrictions summary SMTP restrictions: + Easy to setup, most mail servers now incorporate them, every legitimate server should comply. - Plenty of misconfigured servers out there. Some ISPs do not provide rDNS. RBLs: RBLs RBLs (Realtime Blackhole Lists) SORBS, SPAMHAUS, SPAMCOP, etc: Checks sending mail server against lists of known spam sites and open relays. + Again, easy to set up at MTA level, deters a significant amount of spam. - Take time, frequently cost money, tend to blacklist whole domains at once, easy to get on RBL and difficult to get off. If implemented at MTA level, will definitely lose legitimate mail. Online spam databases: Online spam databases (A bit like virus scanners…) Razor, Pyzor, DCC are the best known. People send spam to these databases. When new mail arrives, a fuzzy hash of the message is compared with the database. + Catches spam that can beat the other tests. - Takes time, tendency to hit “hard ham”, will only stop reported spam. Content analysis: Content analysis (This is the core of SpamAssassin). Looks for suspicious words in message ie viagra, porn, c0ck, and also for spam-like qualities in headers. + Considerably quicker than network tests, doesn’t rely on spam message or site being known, can be very accurate. - Difficult to catch the single line spam message (Check out this link!!!), spammers can tailor their mail to bypass open source products like SA. Bayesian classifiers: Bayesian classifiers Latest technique in content analysis. Message is tokenized, and the number of times those tokens have appeared previously in spam and ham are compared to give a probability. + Capable of great subtlety and amazing accuracy; builds negative scores to allow ham through; almost impossible for spammers to circumvent, as they can’t know what your ham is like. - Takes time and disk space to build up a database of spam and ham. Works best on an individual’s mail. Threatened by bayes poison? An example of a Bayes DB: An example of a Bayes DB Sample from my office SA setup: Spam prob spam count ham count atime word 0.982 93 4 1072719949 blood 0.198 2 20 1073305967 bloody 0.988 36 1 1073388368 cock Why SpamAssassin?: Why SpamAssassin? SpamAssassin is a perl module that when given a message, runs tests and returns a score. By default any result over 5 is considered spam. SA isn’t a product, but an anti-spam framework, incorporating SMTP restrictions, RBLs, online databases, content analysis and Bayesian filters. Doesn’t rely on just one strategy, so doesn’t tag mail as spam if it fails one test. Lots of low-scoring rules, not a few high-scoring ones. (V. few negative rules). Why SpamAssassin cont.?: Why SpamAssassin cont.? Cool features SA has: Mature product, widely used and deployed The basic content filter is very good out of the box, without network tests and before bayes is trained. AWL (auto-whitelisting) Auto-learning for bayes Easy to customize and write your own rules. Good support on spamassassin-talk mailing list. What I wanted…: What I wanted… An effective anti-spam product. An opportunity to demonstrate that open-source software could be used effectively. A smtp mail relay server, so I wouldn’t have to make any changes to our Exchange 5.5 box I was hoping to find a distro like ipcop! My prior config: My prior config Exchange 5.5 Proxy Firewall internet 30 users, 250-500 incoming mails a day. No direct connection to the internet from Exchange. My current config: My current config Exchange 5.5 Proxy Firewall internet smtp anti-spam relay If you’re not using a proxy firewall, don’t make your existing mail server the second MX record. Before you begin…: Before you begin… Need to decide how you want to handle spam before choosing your implementation. You can: tag and let through spam -- let users handle it via client rules. (gives users flexibility, min. consequences of FPs, but still have to deal with it). Quarantine it. bounce it with a notification message. (min. burden on users saves delivering and storing). 5xx REJECT it. (must run milter to maintain SMTP session, saves NDR). What I used: What I used Many ways of implementing SA -- can work on any *nix. I used these HOWTOs for background: http://www.geocities.com/scottlhenderson/spamfilter.html (RH9/postfix/amavisd-new/SA written for newbies). http://lawmonkey.org/anti-spam.html (OpenBSD/postfix/amavisd-new/SA in chroot jail). Red Hat Linux 9 (Chosen for familiarity). Postfix MTA (Easiest mail server to config). Amavisd-new (Recommended in HOWTOs, and on SA site). SpamAssassin 2.60 Razor2 (DCC wouldn’t get through my firewall). The hardware: The hardware Compaq Proliant 800 Pentium Pro 200Mhz 192MB RAM 2 * 4GB SCSI HDs /dev/sda /boot 100MB ext3fs / 3.9GB ext3fs /dev/sdb swap 500MB /var 3.5GB ext3fs This is what I like about linux! Notes from an install…: Notes from an install… Minimal RH9 installed without problems. Downloaded and installed 2.4.22 kernel (wise?) Got postfix from source too. -- just type make, create user postfix and group postdrop, then make install. Stopped a number of services from auto-starting in runlevel 3-5, isdn, kudzu, pcmcia, xinetd. Key configuration files: Key configuration files /etc/postfix/main.cf (for postfix) /etc/mail/spamassassin/*.cf (rules, altered scores for rules). /etc/amavisd.conf (everything else. Settings in this file, if present will override SA settings). Configuring postfix: Configuring postfix Add to /etc/postfix/main.cf: local_recipient_maps = smtpd_helo_restrictions = reject_invalid_hostname smtpd_sender_restrictions = reject_unknown_sender_domain, hash:/etc/postfix/access smtpd_recipient_restrictions = check_recipient_access hash:/etc/postfix/recip_access /etc/postfix/access: worldwidesalesoffice.com 550 Mail Rejected. [email protected] OK /etc/postfix/recip_access: [email protected] 550 doesn’t work here now [email protected] 550 Ex-employee -- spam! # rejects sender domain without A or MX record. Setting postfix as a smtp relay: Setting postfix as a smtp relay MTA needs to be setup to relay messages for your domain to your internal mail server. Add this to /etc/postfix/main.cf: transport_maps = hash:/etc/postfix/transport Add this to /etc/postfix/transport: #yourdomain.com smtp:[ip addr] idmltd.co.uk smtp:[192.168.0.1] Create transport.db with this line: #postmap /etc/postfix/transport Installing perl modules: Installing perl modules Large number of dependencies! Should use perl 5.6+ Use CPAN: # perl -MCPAN -e shell Essential, or near-essential: HTML::Parser Sys::Syslog (if using spamd) DB_File (for bayes) Net::DNS (for network tests) Mail::SpamAssassin Installing amavisd-new: Installing amavisd-new Even larger number of perl dependencies -- not entirely sure how many as it’s not listed in the docs. Important settings in amavsid.conf: $sa_local_tests_only = 0; $sa_auto_whitelist = 1; $sa_tag_level_deflt = 0; # -999; $sa_tag2_level_deflt = 6; $sa_kill_level_deflt = 6; # 15; $sa_spam_modifies_subject = 1; read_hash(\%whitelist_sender, '/var/amavis/whitelist'); read_hash(\%blacklist_sender, '/var/amavis/blacklist'); read_hash(\%spam_lovers, '/var/amavis/spam_lovers'); # echo caspergasper.com >> /var/amavis/whitelist Installing razor: Installing razor Can’t get it from CPAN. Uses tcp port 2703 to call Razor servers. Need to patch razor (2.36) as SA 2.6 runs in taint mode (Patch comes with SA). Create config files: # razor-client # razor-admin -create Register with razor network: # razor-admin -register -user [email protected] Configuring SpamAssassin: Configuring SpamAssassin Important settings in local.cf: use_razor2 1 auto_learn 1 use_bayes 1 dns_available yes Performance tips: Performance tips Don’t make unicode system lang -- change language setting in /etc/sysconfig/il18nLANG="en_GB” Disable sync logging by syslogd by altering /etc/syslog.conf: mail.* -/var/log/mail.log Turn off logging for SA/amavisd-new/razor2 once it’s working properly. Learning from mistakes: Learning from mistakes If SA screws up on a msg, run it through bayes manually. Resending will destroy headers -- best to copy into IMAP folder. SA remembers msg ID, so cannot learn twice from same msg. Creating custom rules: Creating custom rules Very easy to write custom rules if you know RegExps. Place this in a /etc/mail/spamassassin/*.cf file: header MEDICINE_RULE Subject =~/medicine|prescription/i description MEDICINE_RULE mentions medicine or prescription in the subject line. score MEDICINE_RULE 2.1 Run spamassassin --lint to check for typos. Meta rules: Meta rules Can add cumulative rules that fire when two or more conditions are met: header __LOCAL_FROM_NEWS From ~= /[email protected]\.com/i body __LOCAL_SALES_FIGURES /\bMonthly Sales Figures\b/ meta LOCAL_NEWS_SALES_FIGURES (__LOCAL_FROM_NEWS && __LOCAL_SALES_FIGURES) score LOCAL_NEWS_SALES_FIGURES -1.0 The results…: The results… Almost unmitigated success! Very accurate, so far only one reported FP of any significance. All other FPs were “hard hams”. The MD (who gets a lot of spam) said 6 months after installing it hadn’t made a single mistake. No-one chose to opt-out. Hasn’t crashed once. I think a lot of users don’t complain about spam because they don’t realize something can be done about it. Network tests went down (my fault), yet detection rate stayed high. Problems…: Problems… Amavisd-new doesn’t support encapsulation of messages. Postfix reporting script pflogsumm doubles the number of emails received. Despite high accuracy, a small number of FNs still make it through. Maybe DCC would help? Custom rulesets (esp EvilRules) have made an improvement.