ThreatMetrix Labs Report

WHITEPAP E
R
WHITEPAP E R
How Successful are Targeted Phishing Attacks - A Real
World example
ThreatMetrix® Labs Report May 2015
Author: Andreas Baumhof, CTO, ThreatMetrix Inc.
WHITEPAP E R
Contents
Introduction ................................................................................................................................... 4
Key Takeaways ............................................................................................................................... 4
The convergence of Phishing and Malware ................................................................................... 5
Targeted Phishing Attack ............................................................................................................... 5
How to remove obviously fake entries ....................................................................................... 7
How the Internet makes it easy to "enrich" stolen PII with other available PII ............................ 8
Reverse Telephone Lookup ........................................................................................................ 8
Match rates ............................................................................................................................. 8
Social Networking Sites .............................................................................................................. 9
Example 1 ................................................................................................................................ 9
Example 2 .............................................................................................................................. 10
Example 3 .............................................................................................................................. 10
Example 4 .............................................................................................................................. 11
Example 5 .............................................................................................................................. 11
Data .............................................................................................................................................. 11
How accurate is the Geolocation of IP addresses? .................................................................. 11
Operating System ..................................................................................................................... 13
Password stats (strength...) ...................................................................................................... 14
Common phrases used .......................................................................................................... 15
Password Strength................................................................................................................. 15
Password Length ................................................................................................................... 16
Password distribution............................................................................................................ 17
Page 2
WHITE P AP E R
Demography ............................................................................................................................. 17
Gender ................................................................................................................................... 17
Age ......................................................................................................................................... 18
Conclusion .................................................................................................................................... 18
Appendix A ................................................................................................................................... 19
Appendix B ................................................................................................................................... 19
Appendix C ................................................................................................................................... 19
More Information ........................................................................................................................ 19
Page 3
WHITEPAP E R
Introduction
Personal Information is being lost everywhere. Some called 2014 the year of the data breaches and 2015
the year of the mega data breaches. At the same time, we spend a lot of time looking at really
sophisticated malware attacks, but how successful are phishing attacks in 2015?
Well, it turns out they are very successful by every measure. And phishing attacks are well and truly alive.
They are certainly not these horrible looking and poorly worded websites anymore, and sophisticated
Trojans such as the Dyre Trojan combine social engineering, sophisticated malware attacks and classical
phishing attacks into one hell of an attack.
So what can a fraudster expect in 2015 when running a sophisticated phishing attack? How much
personal information are people willing to provide – if convinced properly? How easy is it to enrich the
data with other data sources (either public or private)?
This ThreatMetrix Labs report looks behind the scenes of one such phishing campaign1 in detail, and the
results are shocking.
Key Takeaways
This research confirms one of the best known secrets in the industry: Targeted attacks produce high
quality results. This phishing attack wasn’t a poorly written website spammed out to millions of Internet
users around the world. This phishing attack was very targeted and as such the quality of the data is very
high.
Some key takeaways:

It is mind-blowingly easy to remove the fake phishing entries. In fact just three simple rules
eliminated 100 percent of the fake entries, but they still left 17 percent genuine and high-quality
entries – which was far beyond our expectations.
o
We have found publicly available databases to confirm that the remaining good data is
indeed valid in 92 percent of the cases!

It is very easy as a fraudster to "enrich" stolen data with other available data sources (either publicly
available sources such as social media, or other data breach databases).

IP geolocation is surprisingly accurate. The average distance between the geolocation of the IP address and
the geolocation of the mailing address is just 63 miles and in more than 50 percent of the cases, the
distance is less than 10 miles.
1
We have notified the relevant financial institutions with this information immediately upon getting access to this information
to make sure the accounts of the victims can be protected.
Page 4
WHITE P AP E R

The chosen passwords from the victims are a mess: More than 98 percent of the passwords used fall under
the category of "shouldn't be used for anything serious."

Almost 25 percent of the victims responded to the phishing attack on their mobile phone – which is very
high considering that the phishing attack forced the victims to respond to 22 questions!
The convergence of Phishing and Malware
Phishing got popular after 2001, mainly targeting the financial community. Phishing is a tactic whereby
an attacker tricks a victim to disclose personal information into a website. Often the website mimics the
look and feel of the phished brand to trick the victim into entering his or her personal information.
Phishing quickly became very popular with a peak period in 2009 and2010, when this attack was very
successful.
After that time, more and more phishing attacks moved toward a more targeted approach. This had a lot
to do with many financial institutions implementing two-factor authentication, whereby phished
information (such as the one-time password) has a very limited lifespan.
But the fraudsters evolved too. I remember the case of a financial institution in Europe in 2008 finding
out that there was a phishing site asking users to provide their phone numbers. This bank had
implemented transactional two-factor authentication tokens and wondered what this is all about. This
was until they found out that the fraudsters would ring up the victims pretending to call from the bank
and trick them to disclose their one-time-password from their two-factor authentication devices.
More recently, malware such as Dyre includes phishing components as part of their social-engineered
attacks, where they combine malware infections with phishing sites.
In the end, phishing means stealing personal information from you and there are hundreds of different
ways to do it. Only a fool believes that if we mitigate against a successful attack vector that the fraud will
stop. It will move and evolve, which is exactly what we are seeing with phishing.
Targeted Phishing Attack
The data we’ll be looking at in this report is from a targeted phishing attack. This phishing attack included
many layers to hide the origin of the attack. It also included data encryption and moving it around
various C2 servers.
But as mentioned in the last paragraph, in the end what mattered was the information that the
fraudsters have been able to collect.
And it was a lot. Below is the information that we found in the data set. It is comprised of 22 individual
attributes.
Page 5
WHITEPAP E R

Username (Username of the online application)

Password

Description (This field was empty for all entries)

Home phone

Mobile Phone

ATM Pin

Tel Pin

Driver’s License (DL)

Date of Birth (DOB)

Mothers Maiden Name (MMN)

Social Security Number (SSN)

Full Name

Secret Question 1

Secret Answer 1

Secret Question 2

Secret Answer 2

Secret Question 3

Secret Answer 3

Secret Question 4

Secret Answer 4

Secret Question 5

Secret Answer 5

The IP address of the victim (IP)

Browser String (The User Agent of the victim)
From this raw information, we "enriched" this by adding the following attributes:

Browser Type
Page 6
WHITE P AP E R

Browser Name

Browser Version

OS Version

IP Country

IP ISP

IP Region

IP City

IP Latitude

IP Longitude

IP First Seen (TMX)

Trust Tags (TMX)

IP Score (TMX)
How to remove obviously fake entries
The task was to remove all obviously fake entries from this database, something we thought would
involve a lot of manual work but it turns out that this is surprisingly easy and fully automatic.
First of all,as a fraudster, you have to be ready to be abused. The most common name was "F... you." It is
also amazing how many people think that they can hunt a fraudster down.
Just three rules (attached in Appendix A) eliminated 100 percent of the fake entries (!). These rules
eliminated 83 percent of the entries – leaving 17 percent of the entries, which is still pretty high.
Out of the remaining entries, we ran detailed manual checks of approximately 10 percent and they are all
confirmed to be legitimate information (more on this later). We later found out that in more than 75
percent of the cases, we could use a public reverse telephone number search engine to confirm that all
these entries are indeed valid.
Two things are astonishing
1. It is very easy for a fraudster to "weed" out the fake information to focus on the real valuable information
with just three simple rules.
2. Of the rest, we could establish very easily that 92 percent of the submitted information is genuine
information, which is much higher than we ever thought it would be.
Page 7
WHITEPAP E R
A. It is very easy to ascertain that this is real information, which will be important if the fraudster
doesn't intend to use the information, but to sell it in underground markets. Better quality data
equals more money.
How the Internet makes it easy to "enrich" stolen PII with other available PII
One of the assumptions that is quoted quite a bit is that once fraudsters have stolen some part of PII, it is
easy for them to complement this with other sources to come up with a complete picture. In this section,
we'll look into this claim a bit.
Reverse Telephone Lookup
Most of the phished users were from a particular country that was targeted and there are really nice
websites available that allow you to search for people (like a telephone book or yellow pages). In many
countries, there are services available that allow you do to a reverse phone number lookup.
As the phished information contained the phone number, but not the address, we checked on how many
entries we could find that had an associated entry with the telephone number.
Match rates
In more than 92 percent of the returned information from the reverse phone lookup, the name matched
the phished name. This, together with all the information above, confirms that all the remaining entries
are "good," legitimate entries.
This number is much higher than we anticipated and indicates that the quality of this phishing campaign
is very high.
Page 8
WHITE P AP E R
Social Networking Sites
There is this assumption that virtually all of our personal information has been leaked in one of the data
breaches over the last couple of years. While this is certainly true, we were very interested to find out
how much are people willingly sharing on social media sites and how easy it is to take one piece of data
and "enrich" it with publicly available data.
One particular "problem" we faced with this campaign is that many of the victims were not the
millennials who engage heavily in social media. The average age of the (cleaned) dataset was 57.
We still found many example of the power of social media sites and below are a few examples of this.
Example 1

Facebook: Through Facebook, we can confirm that the person exists.

LinkedIn: Provides the information that the person is a self-employed bookkeeper.

Airbnb reveals that
Page 9
o
The city matches the geolocation of the IP address from the victim
o
The last two digits of the telephone number (which matches, too)
o
The partner's name
o
A family picture
WHITEPAP E R

o
Date when the person joined Airbnb
o
Description of the household
o
Confirmation of the job from LinkedIn
Another apartment rental website
o
Confirms all of the above, inclusive of the telephone number in clear.
o
Tells me the address, which matches our records
Example 2

Twitter: Location from Twitter matches the geolocation of the IP address

Dating Site: Date of birth matches the date of birth of the phished credentials
Example 3

Facebook: City on Facebook matches the city from the geolocation of the IP address

Telephone book: confirms the name and the address
Page 10
WHITE P AP E R
Example 4

MeetMe: searching for the name confirms
o
The city (match with the geolocation of the IP address)
o
The age and date of birth
Example 5

Government Agency
o
Name, job title, employer, age and date of birth confirmed by publicly available CV
o
Salary is publicly disclosed too
Data
How accurate is the Geolocation of IP addresses?
The dataset provided us a unique opportunity to see the value of IP addresses. The original dataset did
not include a postal address, only an IP address. Through external services (social media sites, reverse
telephone lookups), we've been able to enrich the data with the postal address.

Armed with the IP address, we can now use geolocation to get the latitude and longitude coordinates
(plenty of providers available).
Page 11
WHITEPAP E R

Armed with the postal address, we can resolve as well the latitude and longitude coordinates of the postal
address (plenty of services available).
Now we can compare these two and the geolocation of the IP address is surprisingly accurate.
For the data below, we removed less than 3 percent of outliers where the geolocation was completely
different (e.g. on a different continent).
The average distance between the geolocation of the IP address and the geolocation of the mailing
address is just 63 miles.
In more than half of the data sets (54%), the distance between the geolocation of the IP address and the
geolocation of the mailing address was less than 10 miles!
The distribution is
Page 12
WHITE P AP E R
Operating System
The operating system of the victims’ computers isn’t really surprising, with Windows holding 60 percent.
More interesting is the difference between desktop and mobile and it is quite surprising that the amount
of victims on the mobile platform is almost 25%!
Page 13
WHITEPAP E R
Password stats (strength...)
There is no better introduction to this section other than https://xkcd.com/936/.
“Through 20 years of effort, we’ve successfully trained everyone to use passwords that are hard for
humans to remember, but easy for computers to guess.”
Page 14
WHITE P AP E R
With the above being said, we took the liberty in making high level assessments in regards to the
passwords used by the phishing victims.
Common phrases used
The first thing we wanted to check was whether common passphrases are used. We used a very simple
list of only 7,187 common passwords (attached in Appendix B) and found that 7 percent of the
passwords were based on common swear words.
A quick Google search reveals hundreds of sites that provide password lists containing millions of common
passwords...
Password Strength
Having set the scene with the xkcd, we tried to evaluate the password strength by calculating the
entropy for the password (in bits). The number of bits listed for entropy is an estimate based on letter
pair combinations in the English language.
We then categorize the passwords into

Very weak (< 28 bits)

Weak (28 - 35 bits)

Reasonable (36 - 59 bits)

Strong (60 - 127 bits)

Very strong (> 128 bits)
Page 15
WHITEPAP E R
Anything less than 60 bits shouldn't be used for anything serious (such as online banking)
The results are devastating:

Not a single password was "very strong" as per this definition

45% of the passwords chosen were "very weak"
o

With a typical desktop PC, a "very weak" password can be cracked in less than 10 minutes (often
just seconds).
More than 98% of the passwords used fall under the category of "shouldn't be used for anything serious"
Password Length
The vast majority (80%) had less than 10 characters with eight characters being the most commonly used
password length.
Page 16
WHITE P AP E R
For an 8-digit password, there are 6.63 quadrillion possibilities - which sounds like a lot, however, with
the advent of dedicated password cracking hardware, 8 digit passwords can be cracked in less than 6
hours! (see http://arstechnica.com/security/2012/12/25-gpu-cluster-cracks-every-standard-windowspassword-in-6-hours/)
Password distribution
The most commonly used password was… “password” (big surprise there).
However, looking at the distribution, more than 98 percent of the passwords were unique, so there
wasn’t much overlap in terms of commonly used passwords.
Demography
Gender
From a gender point of view, there was virtually no difference between male and female.
Page 17
WHITEPAP E R
Age
The youngest victim was 15, the oldest 90 with the average age being 57.
Conclusion
The data analytics exercise presented in this report shows that phishing attacks in 2015 are still highly
effective and that targeted phishing attacks produce high quality results. The trend that phishing is used
with sophisticated malware (such as Dyre) is certainly a trend that we see will continue. The other trend
is that more complex technology being deployed to the customer base (such as two-factor
Page 18
WHITE P AP E R
authentication) actually opens up an opportunity for cybercriminals by leveraging social-engineered
attacks. One great example of this was the aforementioned case where phishing sites were capturing the
telephone numbers of victims where the fraudsters rang up the victims to trick them into revealing the
two-factor authentication code.
Appendix A
The content in Appendix A is not available in the public version. Please request a private version of the
ThreatMetrix Labs report at [email protected].
Appendix B
The content in Appendix B is not available in the public version. Please request a private version of the
ThreatMetrix Labs report at [email protected].
Appendix C
The content in Appendix C is not available in the public version. Please request a private version of the
ThreatMetrix Labs report at [email protected].
More Information
For more information on this report please contact [email protected].
© 2015 ThreatMetrix. All rights reserved. ThreatMetrix, ThreatMetrix Labs, and the ThreatMetrix logo
are trademarks or registered trademarks of ThreatMetrix in the United States and other countries. All
other brand, service or product names are trademarks or registered trademarks of their respective
companies or owners.
Page 19