Detecting Threats, Not Sandboxes (Characterizing Network Environments to Improve Malware Classification) Blake Anderson ([email protected]), David McGrew ([email protected]) FloCon 2017 Data Collection and Training Malware Sandbox ... Malware Honeypot ... Malware Records Training/Storage Benign Records • • • • • Metadata Packet lengths TLS DNS HTTP Classifier/Rules Deploying Classifier/Rules Enterprise A ... Classifier/Rules … Enterprise Z ... Problems with this Architecture • Models will not necessarily translate to new environments • • Will be biased towards the artifacts of the malicious / benign collection environments Collecting data from the target network is not always possible *Feature selection / normalization is critical to ensure generalizability Network Features in Academic Literature • Various statistics on: • Number of packets • Bytes in packets/flow • Timing between packets/flows • Statistics about URLs • Port numbers Network/Transport-Level Robustness Straightforward TCP Session Multi-Packet Messages Multi-Packet Messages – TCP Stacks • Delayed ACKs, Nagle’s algorithm, slow-start, PSH behavior Multi-Packet Messages – GRO • Collection points significantly affect packet sizes Source Port Allocation WinXP: Win7: Linux: System Registered Ephemeral 0-1023 1024-49151 49152-65535 1025-5000 49152-65535 32768-61000 Presentation/ApplicationLevel Robustness TLS Client Fingerprinting OpenSSL Versions ClientHello Record Headers 1.0.2 Random Nonce 1.0.1 [Session ID] 1.0.0 Cipher suites Compression Methods Extensions Indicative of TLS Client 0.9.8 TLS Dependence on Environment • 80 unique malware samples were run under both WinXP and Win7 • • • 13 samples used the exact same TLS client parameters in both environments 67 samples used the library provided by the underlying OS (some also had custom TLS clients) Affects the distribution of TLS parameters • Also has secondary effects w.r.t. packet lengths HTTP Dependence on Environment • 264 unique malware samples were run under both WinXP and Win7 • • • 124 samples used the exact same set of HTTP fields in both environments 140 samples varied their HTTP fields based on OS Effects the distribution of HTTP parameters • Also has secondary effects w.r.t. packet lengths Solutions Potential Solution I • Collect training data from target environment Ground truth is difficult / expensive to obtain Models are only relevant to target network Models are relevant to target network Potential Solution II • Train models on network/endpoint-independent features It is not always obvious which features are network/endpoint-independent Ignoring features often also ignores interesting behavior Models can leverage previously captured/curated datasets Potential Solution III • Modify existing training data to mimic target environment It is not always obvious which features are network/endpoint-independent Models can capture interesting network/endpoint-dependent behavior Models can leverage previously captured/curated datasets Datasets • Data Features: • Packet lengths • TLS handshake metadata • TLS record lengths • Datasets (All from December): Malware Ent_1 Ent_2 14,997 Flows 14,981 Flows Malware Sandbox 5,512 Flows Methodology – True Samples Malware Malware Sandbox Ent_2 5,512 Flows Ent_1 14,997 Flows Classifier Feature Selection 14,981 Flows Methodology – Synthetic Samples Malware Malware Sandbox Ent_2 5,512 Flows Ent_1 14,997 Flows Classifier Modify Samples Feature Selection 14,981 Flows Results True Training Set • 185 total alarms • 28 “bad” flows • 26 to external cloud providers • 2 internal • 157 “good” flows • 3 to external cloud providers • 154 internal Synthetic Training Set • 104 total alarms • 81 “bad” flows • 67 to external cloud providers • 14 internal • 23 “good” flows • 3 to external cloud providers • 20 internal Conclusions • It is necessary to understand and account for the biases present in different environments • Helps to create more robust models that can be effectively deployed in new environments • Complements transfer learning and locally adaptive models • Data collection was performed with our open source package for network data capture and analysis: https://github.com/davidmcgrew/joy joy Thank You
© Copyright 2026 Paperzz