Most of you are probably quite familiar with e-mail spam by now, and might even have noticed that it has increased in intensity in the recent months. Rests assured, this phenomenon is by no means isolated to your mailbox. Network World recently published an article on this exact phenomenon titled: What’s with all this spam?.
What’s with all this spam?
Unwanted e-mail levels ’shot up like crazy’; image spam partly to blame, say experts.
By Cara Garretson, Network World, 11/08/06
Researchers and IT managers are confirming security vendors’ claims that spam levels have spiked in the past month – some say by as much as 80 % — and show no signs of decreasing.
…
The article cites the rise of image spam, spam that contains its text inside an image, as a leading cause for this recent spike in spam filter evasion. Unfortunately, the problem is not nearly as simple as this. Before I dive deeper into the issue, let me first briefly go over the art of mail filtering. Current mail filters function by leveraging 3 distinct methods (simplifed below for illustration purposes):
- Distributed Spam Checksum Networks like Vipul’s Razor.
- Each mail server in the network (including mail.grafitto) would submit a checksum (analogous to a computer file’s fingerprint) for the email that it has classified as spam. The network would then process all the checksums and figures out what it would like to identify as true spam (in case an individual mail server is wrong). This global database is then consulted by all mail servers in the network to block any new emails that match any existing checksum.
- Blacklists like SPEWS
- These blacklists are compiled by what essentially amounts to spam watchdog groups. These groups scour the emails floating around the web and look at what parts of the internet is currently engaged in spamming. Indeed, it then asks that mail servers voluntarily block emails emanating from these addresses (and sometimes entire zones). This method is akin to quarantining a section of the web until their admins fix their behavior.
- Bayesian Spam Filter like SpamAssassin:
- This class of spam filters leverages bayesian statistics to look at the frequency of words (and their patterns) in emails to classify them into either spam or ham. This class of filter is extremely powerful because it is able to continually “learn” by adjusting its bayesian weights for each piece of new spam.
Among these 3 classes of spam filters, Bayesian Spam Filter is the only one that has preventive abilities. The other two classes of spam filters would not recognize a spam as such until a large number of mail servers have already been hit. In the never ending horse-race between the spammers and the filters, it seems that spammers have finally figured out how to circumvent all of our current protections.
- Garbage Emails
- You probably have seen lots of emails with non-sensical texts and wonder why the spammers are being so stupid. In fact, the spammers are not using them to advertise a product, but actually using them to confuse and mis-direct the weights in a Bayesian Filter.
- Semi-Real Emails
- A new class of spammers are now appending thier ads above real webpages and mailing lists emails. This is extremely messy for any heuristic algorithm and will definitely mess up any Bayesian filters.
- Image Emails
- Another way to get around a Bayesian filter is to simply have no words at all! Many spammers are putting down their words in an image file and sending only the image file. Furthermore, changing a pixel per image would throw off all the checksum comparisions.
- Zombie Computers
- In order to circumvent blacklists, many spammers are now hacking random servers on the web and using them to send out their spams.
Spammers have in fact even started using other mediums for their messages:
- Wiki-spam
- Putting spam messages on wiki pages (abusing their open edit nature) in the hopes that people and search engines will see the page. Links are often put there as well in order to increase the spammer’s site’s google page rank.
- Referrer Spam
- Repeatedly hitting a website that has a visit tracker like webalizer in order to inflate a spammer’s site’s google page rank.
- Webpage Spam
- Creating a bazillion copies of the webpage and putting it all over the web so that at least one of its content would appear on the top of google’s search results.
All of this leads me to believe that we are definitely losing this war on spam (and very possibly permanently so). We don’t even have to look that far back at the advent of Capital One (Junk Mail Master) to see how profitable junk mail can get (CEO at Capital One Cashed In Options for Nearly $250 Million). For every 1000 person direct-mailed, only 1 person needs to respond for the whole batch to break even. The break-even percentage of people that need to respond to an email is of course thousands of not millions orders of magnitudes smaller. Sigh, if only attorney generals would actually try to enforce the CAN SPAM Act…





