Internal Applied Security Research Review

Detecting Threats, Not Sandboxes
(Characterizing Network Environments to Improve Malware Classification)
Blake Anderson ([email protected]), David McGrew ([email protected])
FloCon 2017
Data Collection and Training
Malware
Sandbox
...
Malware
Honeypot
...
Malware
Records
Training/Storage
Benign
Records
•
•
•
•
•
Metadata
Packet lengths
TLS
DNS
HTTP
Classifier/Rules
Deploying Classifier/Rules
Enterprise A
...
Classifier/Rules
…
Enterprise Z
...
Problems with this Architecture
•
Models will not necessarily translate to new environments
•
•
Will be biased towards the artifacts of the malicious / benign collection environments
Collecting data from the target network is not always possible
*Feature selection / normalization is critical to ensure generalizability
Network Features in Academic Literature
• Various statistics on:
• Number of packets
• Bytes in packets/flow
• Timing between packets/flows
• Statistics about URLs
• Port numbers
Network/Transport-Level
Robustness
Straightforward TCP Session
Multi-Packet Messages
Multi-Packet Messages – TCP Stacks
• Delayed ACKs, Nagle’s algorithm, slow-start, PSH behavior
Multi-Packet Messages – GRO
• Collection points significantly affect packet sizes
Source Port Allocation
WinXP:
Win7:
Linux:
System
Registered
Ephemeral
0-1023
1024-49151
49152-65535
1025-5000
49152-65535
32768-61000
Presentation/ApplicationLevel Robustness
TLS Client Fingerprinting
OpenSSL Versions
ClientHello
Record Headers
1.0.2
Random Nonce
1.0.1
[Session ID]
1.0.0
Cipher suites
Compression
Methods
Extensions
Indicative of TLS Client
0.9.8
TLS Dependence on Environment
•
80 unique malware samples were run under both WinXP and Win7
•
•
•
13 samples used the exact same TLS client parameters in both environments
67 samples used the library provided by the underlying OS (some also had custom TLS clients)
Affects the distribution of TLS parameters
•
Also has secondary effects w.r.t. packet lengths
HTTP Dependence on Environment
•
264 unique malware samples were run under both WinXP and Win7
•
•
•
124 samples used the exact same set of HTTP fields in both environments
140 samples varied their HTTP fields based on OS
Effects the distribution of HTTP parameters
•
Also has secondary effects w.r.t. packet lengths
Solutions
Potential Solution I
• Collect training data from target environment
Ground truth is difficult / expensive to obtain
Models are only relevant to target network
Models are relevant to target network
Potential Solution II
• Train models on network/endpoint-independent features
It is not always obvious which features are
network/endpoint-independent
Ignoring features often also ignores interesting
behavior
Models can leverage previously
captured/curated datasets
Potential Solution III
• Modify existing training data to mimic target environment
It is not always obvious which features are
network/endpoint-independent
Models can capture interesting
network/endpoint-dependent behavior
Models can leverage previously
captured/curated datasets
Datasets
• Data Features:
• Packet lengths
• TLS handshake metadata
• TLS record lengths
• Datasets (All from December):
Malware
Ent_1
Ent_2
14,997 Flows
14,981 Flows
Malware
Sandbox
5,512 Flows
Methodology – True Samples
Malware
Malware
Sandbox
Ent_2
5,512 Flows
Ent_1
14,997 Flows
Classifier
Feature
Selection
14,981 Flows
Methodology – Synthetic Samples
Malware
Malware
Sandbox
Ent_2
5,512 Flows
Ent_1
14,997 Flows
Classifier
Modify
Samples
Feature
Selection
14,981 Flows
Results
True Training Set
• 185 total alarms
• 28 “bad” flows
• 26 to external cloud providers
• 2 internal
• 157 “good” flows
• 3 to external cloud providers
• 154 internal
Synthetic Training Set
• 104 total alarms
• 81 “bad” flows
• 67 to external cloud providers
• 14 internal
• 23 “good” flows
• 3 to external cloud providers
• 20 internal
Conclusions
• It is necessary to understand and account for the biases present in different
environments
• Helps to create more robust models that can be effectively deployed in new environments
• Complements transfer learning and locally adaptive models
• Data collection was performed with our open source package for network data
capture and analysis: https://github.com/davidmcgrew/joy
joy
Thank You