measuring threat detection effectiveness

MEASURING THREAT
DETECTION EFFECTIVENESS
© LogicHub 2017
INTRODUCTION
“You can’t improve what you can’t measure.” We know: It’s an overused cliché. But, like most clichés, it’s anchored
in truth. And when it comes to the field of data security—specifically threat detection—it’s never been more true—
or more critical.
“Threat detection”—an organization’s desire to detect attacks or risky activities before it’s too late—is a major area
of investment in most enterprises today. But those enterprises are often making these investments more on hope
than on measurable stats—predicted or actual.
Here are the key metrics that should be at the start of any discussion about data security and threat detection:
•
•
•
•
False Negative Rates: Don’t let a massive breach go undetected. This is typically called your False
Negative rate. You want to have as low a False Negative rate as possible.
Detection Time: Once you know of an attack, a breach or an activity that requires response, you can
measure the lag between the earliest signs of this activity and the time it was detected. Several reports
have indicated that the median number of days that attacks were present on a victim’s network before
being discovered is greater than 100s of days.
False Positive Rates: In this day and age, it’s simply impossible to do threat detection manually. To
respond to this challenge, several companies have deployed “Big Data” solutions to collect the literally
billions of events they encounter every day. They’ve deployed systems that will analyze all of this data
and alert them. However—and virtually every larger enterprise we’ve talked to agrees—these systems
have very high false positive rates. Most companies reported these rates to be greater than 80%, some
companies had the number as high as 95%.
Time taken to triage: Most of the large enterprises we talk to are not confident that they can rely on their
automated threat detection systems because of high false negative and positive rates. As such, they have
a team of security analysts that treat alerts generated from SIEM solutions as merely hints - and they
have to investigate much deeper to determine whether the incident is an alert that they can ignore or
something that requires response and/or remediation steps.
What we’ve learned from talking to a large number of security teams in the last 12 months is that:
•
•
Very few security teams have the instrumentation in place to report on these metrics
Most security teams believe that False Negatives cannot be measured
Let’s tackle those two here.
MAKE YOUR WORKFLOW MEASURABLE
Many security teams use a ticketing system to keep track of incidents that are reported and investigated. Once
an investigation is finished and an alert deemed to be a true incident, you can then require certain pieces of
information be entered before the team closes that incident. For example, if you asked for the following:
•
•
•
•
Date/Time of the earliest indicator of suspicious activity
Data/Time when the investigation started
Date/Time when the investigation finished
Final outcome of the investigation (was it a false positive or a real incident)
It would not take very long to setup a workflow that requires this information. And you could then use this
information to calculate:
2
•
•
•
Detection Time: (time when the investigation started and time of the earliest indicator of
suspicious activity)
False Positive Rate: (number of real incidents) / (total number of incidents investigated)
Time taken to triage: (time when investigation finished) - (time when investigation started)
Keep in mind that investing a significant amount of time making these extremely precise might not be worth the
effort. But setting this instrumentation should not take more an hour in most systems. The harder part almost
always is in getting your team’s buy-in and adoption.
MEASURING FALSE NEGATIVES
Why is measuring False Negatives so hard?
The most common response to “do you measure your false negative rates” is “they can’t be measured”. Because
in order to truly measure false negative rates you have to know which ones you’ve missed. And, if you knew which
ones you missed, they wouldn’t be false negatives.
This last point is an important one: you will have to try to measure your false negative rates without knowing “all”
the attacks you might have missed. However, what if there was a “good enough” approach to estimating your false
negative rates?
Is “good enough” good enough?
The inspiration for our approach to data security comes from Nate Silver’s work. He successfully called the
outcomes in 49 of the 50 states in the 2008 U.S. Presidential election without ever asking hundreds of millions of
people how they would vote.
While a perfect metric might be extremely difficult to achieve, the question is whether we can get to “goodenough” metrics. The way the polls work is that their predictions come with two caveats: error margin and
confidence level. We can apply the same concept here, that is estimate your false negative rate with a high degree
of confidence.
Factoring in the strength of the adversary
When it comes to measuring false negatives, another key factor is the strength of the adversary. If the attacks are
simple and obvious, most systems and teams will pick up on these easily, while a more sophisticated adversary
will have a much easier time going undetected for long periods of time, if they’re discovered at all. Which begs the
question: how do we incorporate the strength of the adversary in the False Negative Rate metric? To illustrate, let’s
look at some of the games where multiple teams go against each other and how they compete.
Blue Score: Elo-like Rating for Threat Detection
A number of security teams run blue team vs. red team exercises to test their defenses by simulating
sophisticated attackers going after an enterprise and its IT assets. The blue team is made up of folks whose goal
is to detect and ideally prevent attackers from breaching the defenses. Red teams are cybersecurity folks who are
doing what real world attackers would do—try to breach the defenses.
What we need is a way to capture the results of these exercises - and to standardize on a scoring mechanism, very
much like we have credit ratings or Elo Ratings. (The Elo rating system is a method for calculating the relative skill
levels of players in competitor-versus-competitor games such as chess or cyber warfare.)
3
Ideally, this score would tell you:
•
•
How effective a particular blue team is against a group of attackers (red team)
How one blue team’s effectiveness stacks up against all other blue teams
The above two-day hackathon is demanding and exhausting. But the two days it takes is a significant improvement
over pen testing, which is both expensive and time-consuming (taking up to two weeks).
RECOMMENDATIONS
To get an estimate of what your false negative rates are, you have varying options:
1. Conduct penetration testing, which will typically take about two weeks.
2. Set up an internal red team vs blue team cyber security exercise, which will typically take two days.
3. Participate in a pre-packaged cyberhunt exercise, which will typically take a couple of hours.
Note: LogicHub has developed a free cyberhunt exercise (we call it ‘The LogicHub Challenge’) that takes two
hours and provides remarkable insights into your threat detection efficacy. If interested, please contact us at
www.logichub.com.
SUMMARY
To return to our opening point: “You can’t improve what you can’t measure.” And while there are many
measurements that matter in the field of data security, measuring false negatives is at the top. Because until
you start measuring them accurately, you won’t know how good your threat detection is. The above articulates a
number of ways to measure this. Our advice: pick one and start gaining true insights into how safe your data—and
your enterprise—really is.
Resources
1. https://www2.fireeye.com/rs/848-DID-242/images/Mtrends2016.pdf
2. https://en.wikipedia.org/wiki/Elo_rating_system
© LogicHub 2017
4