Why Content Filters Can`t Eradicate spam

WHITEPAPER
Why Content Filters Can’t
Eradicate spam
About Mimecast
Mimecast (www.mimecast.com) delivers cloud-based email management for Microsoft
Exchange, including archiving, continuity and security. By unifying disparate and fragmented
email environments into one holistic solution that is always available from the cloud,
Mimecast minimizes risk and reduces cost and complexity, while providing total end-toend control of email. Founded in the United Kingdom in 2003, Mimecast serves over 5,000
customers worldwide and has offices in Europe, North America, Africa and the Channel
Islands.
For more information, please visit www.mimecast.com or email [email protected].
Contents
03 Defining the Problem
03 Current Spam Management Systems
04 Implications of Content Filtering for Spam Capture
04 Quarantining
04 False Positives
05 Connection Filtering
05 The Mimecast Approach to Spam – ARMed SMTP
www.mimecast.com
2
Defining the Problem
Spam means different things to different people. Some people define any unwanted or unsolicited email
as spam (even if they asked for it – but don’t want it anymore) while others consider all bulk emails (both
solicited and unsolicited newsletters and marketing announcements) to be spam. Many people label any
email from people they don’t know as spam.
This range of perceptions, coupled with spammers’ ability to vary message content to appear more
or less like spam, means it is difficult for a software product to completely eliminate it. As a result, the
fight against spam has become an arms race with spammers and Secure Email Gateway vendors
trying to outfox one another. Legal authorities have engaged in different activities to address the issue,
from shutting down sympathetic hosting providers, to working behind the scenes to kill botnets. While
not wholly effective, the process has at least forced unscrupulous email senders to operate from the
shadows to avoid the risk of legal prosecution – and more interestingly, it has provided mechanisms for
simpler identification of friend from foe.
Recent statistics suggest that spam outweighs legitimate email by more than four to one. Practically
all businesses now have spam management filters in place, but most have not yet been able to find
a solution to the problem, and often the implementation of a spam filter has merely moved the spam
problem and created a number of other message delivery issues.
Current Spam Management Systems
The vanguard of current spam management systems has been the content filter. These spam filters use
various policy driven techniques applied to the words and body content of email in order to determine
whether the email should be delivered to the recipient or not. The most common content filtering
techniques employ Heuristic analysis and Bayesian analysis. We can loosely describe the technical
process involved in applying these techniques as analyzing of the words or makeup of emails in order to
detect a pattern, then comparing that pattern to a database of known bad or malicious email content.
Examining words or phrases commonly used in spam is a simple approach to the problem, but is not
completely effective because it assumes that those words and phrases will not appear in legitimate
emails. More sophisticated techniques apply weighting scores to “known bad” words or phrases and
if the total “score” for the email exceeds a certain limit, then it considers the email to be spam. Some
content filtering systems try to learn about what types of emails the recipient prefers as distinct from
those that are rejected – most others require a regular update of “known bad” phrases and settings.
Some email security companies have even gone as far as to suggest that their systems are close to
artificial intelligence. But, the arms race still continues as the spammers learn how to circumvent the
latest techniques designed to defeat them.
www.mimecast.com
3
Implications of Content Filtering for Spam Capture
The problem with content filtering is not that it cannot block spam, but rather that it is the most expensive
method, in terms of technical processing, as well as the cost of deploying and maintaining that solution.
Content filtering deals with an infinite number of variables, which makes it error prone. However, the key
failure in the process is the flawed assumption that a software program can determine what emails an
individual would like to receive. Systems using content as the only consideration are easily broken when
people subscribe to useful and legitimate email newsletters which have much in common with spam
emails.
Many legitimate emails will also be marked as spam by a content filter – so called false positives - if they
include common spam words and phrases. This issue is a particular problem in certain industries such as
healthcare and financial services where their keywords are frequently the subject matter of real spam.
Quarantining
To try and combat the problem of false positives that is symptomatic of content filters, vendors added
another break in the SMTP chain, called quarantining. Almost all email security systems available on
the market today use a quarantine folder to store all emails marked as spam. Email administrators have
traditionally been required to manually sift through these quarantines on a daily or hourly basis looking for
false positives.
Often, the first indication that an email has been mistakenly identified as spam is when the sender calls
to ask why there has been no response to an email. In a business context, that can mean lost sales and
clients. Some solutions require the end user to review their own quarantine folders in order to locate
incorrectly classified emails. In this scenario, they may as well receive the spam directly because the time
required to scan the quarantine is the same amount to view an overburdened email inbox.
Quarantines create new problems without solving the existing ones. They merely move email delivery
problems around, placing an additional burden on the email administration team or the end user.
False Positives
The more aggressive a spam filter becomes, the more likely it is to reject legitimate emails along with
spam emails. This condition is called a false positive. Email vendors have traditionally focused on the idea
that NO spam should get through to the end user.
However, this approach does not take into account the fact that a single email, incorrectly blocked, is
many times more costly to deal with than one junk email received. It is therefore incorrect to assume that
1% or even 0.1% of emails incorrectly classified as spam is “acceptable”.
Email administrators have begun to realize that the cost of false positives greatly outweighs the cost of
dealing with the spam itself. False positives represent a breakdown in communication, lost opportunities
and productivity, and mistrust of email as a communications medium.
www.mimecast.com
4
Connection Filtering
More advanced Secure Email Gateway vendors, particularly those delivering their service from the cloud
are able to offer an additional level of protection before the email is processed by the content filtering
engines.
Connection filtering seeks to examine the source, and in some cases the destination, of the email to
determine whether or not the sender has a reputation within that system already.
Reputations can be classified into a number of groups. ‘Globally Known Bad’ would imply the sender
already has a bad reputation maintained on the wider Internet, so would be listed by their IP address in
an RBL or Realtime Block List. ‘Locally Known Bad’ would imply the anti-spam solution maintains its own
database of known bad senders; some email administrators achieve this manually with organization-wide
block lists.
Good reputations would be classified as ‘Locally Known Good’, where the Anti-spam solution is
automatically maintaining a database of known good communication pairs, i.e. the email addresses and
IP addresses your users regularly send email to. Anything received from those external contacts can be
assumed to be “Known Good”.
Building a reputation with a Secure Email Gateway is also possible, provided that the gateway supports
RFC compliance checking techniques like Gray Listing. Passing such a test implies that whatever is
sending an email has queued and retried at RFC compliant intervals, so is likely to be a legitimate
SMTP Server.
The Mimecast Approach to Spam – ARMed SMTP
Mimecast precisely and correctly blends advanced reputation and protocol connection techniques into
a powerful and effective anti-spam system. This sophisticated capability is the result of the Mimecast
platform’s market-leading Mail Transfer Agent (MTA) architecture and the unique in-protocol and
connection level anti-spam tests that we apply.
While other vendors have supplemented their legacy content filtering approaches with some standard
connection filtering features to shore up their solutions, Mimecast ARMed SMTP centers on this
progressive methodology, reducing the reliance on content examination techniques for spam detection.
Mimecast can intercept the majority of spam email without examining the body content of an email
because our experience and our technology lets us identify patterns of SMTP delivery behavior that are
typically exhibited by spammers. We identify these patterns by the way spammers deliver their emails
and not according to the content of the emails themselves.
This approach offers profound benefits. Mimecast leaves spam undelivered with the spammers so there
is no local bandwidth loading; it significantly reduces the overhead of managing quarantine folders and
the integrity of the SMTP protocol is maintained so that no legitimate messages go missing. ARMed
SMTP is a way of dealing with emails and spam, developed for the way email is used today, rather than
how it was originally expected to be used twenty years ago. Our approach provides a more successful,
robust, cost effective and intelligent solution to the problem of spam and spam management.
www.mimecast.com
© 2012 Mimecast. ALL RIGHTS RESERVED. WHI-WP-069-001
5