Overview of e-mail SPAM Elimination and its Efficiency

Overview of
e-mail SPAM Elimination
and its Efficiency
Tomáš Sochor
University of Ostrava
Czech Republic
8th Int. Conf. on Research Challenges in Information Science
Marrakesh, May 2014
Motivation of SPAM Study
l 
E-mail is still the most frequently used service
l 
l 
the highest number of users
closely related to other services
l 
incl. social media
l 
l 
l 
account registration
client verification etc.
SPAM amount is still increasing
l 
the percentage is now stable around 90%
l 
but absolute figures still rise!
T. Sochor – RCIS May 2014 Marrakesh
Global SPAM trends
SPAM measurement
•  What is SPAM?
•  a message containing “Viagra”?
• 
• 
YES, BUT NOT if you are medical doctor – urologist!
the SPAM definition is subject to recipient
•  unlike e.g. computer virus, malware etc.
How to measure SPAM?
SPAM ratio =
(No. of SPAM detected)/(Tot.No. of messages)
T. Sochor – RCIS May 2014 Marrakesh
SPAM measurement&detection
•  False positive detections are important
•  it ”costs” more to “dig out” one legitimate message
in SPAM box
• 
than to manually delete SPAM message not detected
False_pos_ratio =
(No. of SPAM non-detected)/(Total No of messages)
• 
BUT: how to count non-detected SPAM?
T. Sochor – RCIS May 2014 Marrakesh
SPAM control history
•  simple filtering
•  “Viagra” -> SPAM
• 
easy to obfuscate: “\/iagra”
•  multifactorial filtering/Bayes heuristics
•  multiple factors are considered
•  L4 filtering
•  using TCP connection parameters
•  etc.
T. Sochor – RCIS May 2014 Marrakesh
SPAM control history – phase 2
Non-traditional methods:
•  Collaborative filtering
•  Sender Policy Framework
•  Make sender pay
•  Authentication of sender
•  Challenge – Response
•  Most of these methods tries to distinguish
before message delivery
T. Sochor – RCIS May 2014 Marrakesh
SPAM control history - continues
•  various ways of filtering
•  insufficient
•  applied only to messages already DELIVERED
•  modern approaches try to eliminate SPAM
during message delivery
•  not applied on recipient’s computer
• 
mail server must check for SPAM
T. Sochor – RCIS May 2014 Marrakesh
E-mail Operation
Reminder
anti-SPAM means
should be applied
HERE
•  Server under control
•  Before distribution
to users
T. Sochor – RCIS May 2014 Marrakesh
Typical SPAM control system
T. Sochor – RCIS May 2014 Marrakesh
Error message
“450 Greylisted”
answered here
T. Sochor – RCIS May 2014 Marrakesh
0
T. Sochor – RCIS May 2014 Marrakesh
July 2012
May 2012
March 2012
January 2012
November 2011
September 2011
July 2011
May 2011
March 2011
January 2011
November 2010
September 2010
July 2010
May 2010
March 2010
January 2010
November 2009
September 2009
July 2009
1 200 000
May 2009
March 2009
January 2009
November 2008
September 2008
July 2008
May 2008
March 2008
January 2008
November 2007
SPAM Totals in long-term
University of Ostrava
1 000 000
800 000
600 000
400 000
200 000
0,00%
T. Sochor – RCIS May 2014 Marrakesh
August 2012
June 2012
April 2012
February 2012
December 2011
October 2011
August 2011
June 2011
April 2011
February 2011
December 2010
October 2010
August 2010
June 2010
April 2010
February 2010
December 2009
October 2009
August 2009
June 2009
April 2009
February 2009
December 2008
October 2008
August 2008
June 2008
April 2008
February 2008
December 2007
October 2007
SPAM percentage long-term
100,00%
90,00%
80,00%
70,00%
60,00%
50,00%
40,00%
30,00%
20,00%
10,00%
SPAM detection before delivery
l 
Formal check
l 
l 
Blacklisting
l 
l 
e.g. existence of recipient address
message coming from SPAMming server
Greylisting
l 
temporary blocking
l 
only applied to unknown SMTP servers
T. Sochor – RCIS May 2014 Marrakesh
Blacklisting idea
l 
verification of the sender
l 
l 
against a BLACKlist of SPAMming servers
known problem:
l 
sender e-mail address can be spoofed easily
l 
sender identification: IP address
l 
at present: even more addresses required
l 
IPv6
T. Sochor – RCIS May 2014 Marrakesh
Blacklisting issues
l 
l 
Almost nobody is able to maintain it own
blacklist
Third-party database (blacklist)
–  not suitable for each organization
l 
Legitimate message sender in blacklist:
–  the delivery is usually impossible
–  refusal is announced to the sender
• 
reason could be unclear
• 
sender has limited tools to ask for exclusion
from blacklist
Blacklisting – error rate
l 
Errors can happen
l 
l 
l 
usually as a result of wrong listing
the frequency of such errors provide a
metric for blacklist correctness
Errors are difficult to detect
l 
but they occur
Blacklisting potential errors
Blue column: No. of IP addresses blocked by blacklisting
Red column: Same number as of mid 2013
Drop means IP addreses removed from the blacklist
Blacklisting error rate
Period
Tot_req
Err1
Err2
Err_ratio
December 2009
45,478
1
4
0.01%
December 2010
47,015
12
6
0.04%
December 2011
47,084
15
1
0.03%
December 2012
44,530
5
1
0.13%
January 2013
2,836
2
9
0.39%
February 2013
3,148
4
1
0.39%
Greylisting principle
l 
simple idea:
–  operates BEFORE message delivery
–  inserting short delay in message delivery
•  approx. 5 minutes
–  SPAMmer does not repeat the attempt
–  in practice only applied to unknown
sources
T. Sochor – RCIS May 2014 Marrakesh
Greylisting weakness
l 
It is easy to adapt to greylisted server
–  so far it seems not efficient for SPAMmers
l 
SPAMmer can gets into AWL:
–  After several successful deliveries through
greylisting it is considered to be a legal
source
• 
and not checked any more
–  this behaviour can be eliminated by
connection with SPAM scanner
• 
DNSB
T. Sochor – RCIS May 2014 Marrakesh
Efficiency of various SPAM
control mechanisms
l 
measurement at 2 independent universities
–  20,000 – 50,000 attempts/day avg.
– 
i.e. 2,000 – 5,000 legal messages/day
–  other smaller SMTP server had been
studied for shorter period
T. Sochor – RCIS May 2014 Marrakesh
Blacklisting and greylisting efficiency
Greylisting and content-search efficiency
Greylisting efficiency comparison
l 
l 
3 SMTP servers as mentioned
Short-term comparison
– average for the only period available
(March 2012)
GL efficiency avg.
Ostrava Uni
89,3%
Nitra Uni
95,0%
Zebra
92,2%
x 100000
SPAM Elimination Efficiency
6
5
4
3
2
1
0
Spam blocked by scanner
Spam blocked by greylist
T. Sochor – RCIS May 2014 Marrakesh
Improving the SPAM detection
efficiency
•  components of the multilevel SPAM protection
do not share information
IP address of
SPAMmer to
remove from AWL
Thousands
SPAM search+greylisting
linkage potential
18
16
14
12
10
8
6
4
2
0
Blocked by scanner
Potentially blocked by greylisting
T. Sochor – RCIS May 2014 Marrakesh
Conclusions
l 
Blacklisting is the most efficient
l 
l 
l 
l 
but not 100% error-free
Greylisting efficiency is stable in long-term
Combination of blacklisting, greylisting and
SPAM scanner is recommended
Better cooperation between
all components will improve the efficiency
T. Sochor – RCIS May 2014 Marrakesh
Inlet filtering results
70,00%
0,7
60,00%
0,6
50,00%
0,5
filtered by S B L
filtered by other pos tfix filters
40,00%
0,4
30,00%
0,3
20,00%
0,2
10,00%
0,1
0,00%
0
J anuary 2010
F ebruary 2010
Marc h 2010
A pril 2010
T. Sochor – RCIS May 2014 Marrakesh
0,0%
04/2010
03/2010
02/2010
01/2010
11/2009
12/2009
10/2009
09/2009
08/2009
07/2009
06/2009
05/2009
03/2009
04/2009
02/2009
01/2009
12/2008
90,0%
11/2008
10/2008
09/2008
08/2008
06/2008
07/2008
05/2008
04/2008
03/2008
02/2008
01/2008
12/2007
11/2007
09/2007
10/2007
08/2007
07/2007
06/2007
05/2007
04/2007
03/2007
02/2007
Greylisting efficiency rectified
100,0%
Blocked deliveries modified
80,0%
70,0%
60,0%
50,0%
40,0%
30,0%
20,0%
10,0%
Thank you for attention
l 
l 
Questions?
Comments?
•  [email protected]
•  http://www1.osu.cz/home/sochor