ZOZZLE: Fast and Precise In-Browser JavaScript Malware Detection

Charles Curtsinger
UMass at Amherst
Benjamin Livshits and Benjamin Zorm
Microsoft Research
Christian Seifert
Microsoft
20th USENIX Security Symposium
(August, 2011)
Charles Curtsinger
UMass at Amherst
Benjamin Livshits and Benjamin Zorm
Microsoft Research
Christian Seifert
Microsoft
Microsoft Research Technical Report
(November, 2010)
Outline
Introduction
 Observation on Offline Nozzle
 Design
 Experiment
 Evaluation

2011/5/24
A Seminar at Advanced Defense Lab
3
Introduction

In the last several years, we have seen
mass-scale exploitation of memorybased vulnerabilities migrate towards
heap spraying attacks.

But many solutions are not lightweight
enough to be integrated into a
commercial browser.
2011/5/24
A Seminar at Advanced Defense Lab
4
About Nozzle

The overhead of this runtime technique
may be 10% or higher.

This paper is based on our experience
using NOZZLE for offline.

Offline scanning is also not as effective
against transient malware that appears
and disappears frequently.
2011/5/24
A Seminar at Advanced Defense Lab
5
About Zozzle

ZOZZLE is integrated with the browser’s
JavaScript engine to collect and process
JavaScript code that is created at
runtime.

Our focus in this paper is on creating a
very low false positive, low overhead
scanner.
2011/5/24
A Seminar at Advanced Defense Lab
6
Observation on Offline Nozzle

Once we determine that JavaScript is
malicious, we invested a considerable
effort in examining the code by hand
and categorizing it in various ways.

we investigated 169 malware samples.
2011/5/24
A Seminar at Advanced Defense Lab
7
Distribution of Different Exploit
Samples
2011/5/24
A Seminar at Advanced Defense Lab
8
Transience of Detected Malicious
URLs
2011/5/24
A Seminar at Advanced Defense Lab
9
Javascript eval Unfolding
2011/5/24
A Seminar at Advanced Defense Lab
10
Distribution of Context Counts
2011/5/24
A Seminar at Advanced Defense Lab
11
Design
2011/5/24
A Seminar at Advanced Defense Lab
12
Training Data Extraction and
Labeling

We start by augmenting the JavaScript
engine in a browser with a “deobfuscator”
that extracts and collects individual
fragments of JavaScript.
 Detours [link]
 jscript.dll [link]
 Compile function
(COlescript::Compile())
2011/5/24
A Seminar at Advanced Defense Lab
13
Feature Extraction

We create features based on the
hierarchical structure of the JavaScript
abstract syntax tree(AST).
2011/5/24
A Seminar at Advanced Defense Lab
14
Feature Selection

χ2 test



2011/5/24
With feature
Without feature
malicious
A
C
benign
B
D

AD  CB 

 10.83  99.9%
 A  C B  D  A  B C  D 
2
2
A Seminar at Advanced Defense Lab
15
Classifier Training

Naϊve Bayesian classifier

PLi F1 ,, Fn  
PLi PF1 ,, Fn Li 
PF1 ,, Fn 
PF1 ,, Fn Li    PFk F1 ,, Fk 1 , Li 
n
k 1

Assume to be conditionally independent
PLi  PFk F1 ,, Fk 1 , Li 
n

PLi F1 ,, Fn  
2011/5/24
k 1
PF1 ,, Fn 
A Seminar at Advanced Defense Lab
PLi  PFk Li 
n

k 1
PF1 ,, Fn 
16
Naϊve Bayesian classifier
n



 PLi  PFk Li 
k 1

Cscript  arg max PLi F1 ,  , Fn   arg max 
i possibleLabels
 PF1 ,  , Fn  


n


Cscript  arg max  PLi  PFk Li 
k 1



Complexity: linear time
2011/5/24
A Seminar at Advanced Defense Lab
17
Fast Pattern Matching
2011/5/24
A Seminar at Advanced Defense Lab
18
Fast Pattern Matching (cont.)
2011/5/24
A Seminar at Advanced Defense Lab
19
Experiment

Malicious Samples
 919 deobfuscated malicious context

Benign Samples
 Alexa top 50 URLs
 7,976 contexts
2011/5/24
A Seminar at Advanced Defense Lab
20
Feature Selection

hand-picked vs. automatically selected
2011/5/24
A Seminar at Advanced Defense Lab
21
Evaluation

HP xw4600 workstation
 Intel Core2 Duo 3.16 GHz
 4 GB memory
 Windows 7 64-bit Enterprise
2011/5/24
A Seminar at Advanced Defense Lab
22
Effectiveness
2011/5/24
A Seminar at Advanced Defense Lab
23
Training Set Size
2011/5/24
A Seminar at Advanced Defense Lab
24
Feature Set Size
2011/5/24
A Seminar at Advanced Defense Lab
25
Comparison with Other
Techniques
2011/5/24
A Seminar at Advanced Defense Lab
26
Performance: Context Size
2011/5/24
A Seminar at Advanced Defense Lab
27
Performance: Feature Set
2011/5/24
A Seminar at Advanced Defense Lab
28
2011/5/24
A Seminar at Advanced Defense Lab
29
2011/5/24
A Seminar at Advanced Defense Lab
30
I think these is the all…
unescape(“%48%65%6c%6c%6f%57
%6f%72%6c%64”)
eval(“alert(1)”);
document.write(“alert(‘1’)”);
"H976e246l3l2o19W42o45r7l88d734
".replace(/[09]/g,"")
“\u0048\u0065\u006C\u006C\u006F
\u0057\u006F\u0072\u006C\u0064”
2011/5/24
A Seminar at Advanced Defense Lab
31
If I want to eval…

<script>
 Fucntion("alert(‘1')")();
 setTimeout("alert(‘1')“;
 execScript("alert(‘1')", "javascript");
 [].constructor.constructor('alert(1)')();
 window["eval"]("alert(‘1’)");

</script>
2011/5/24
A Seminar at Advanced Defense Lab
32
In the network, I find …

<script>
 ([][(![]+[])[!+[]+!+[]+!+[]]+(!![]+[][(![]+[])[+[]]+([![]]+[][[
]])[+!+[]+[+[]]]+(![]+[])[!+[]+!+[]]+(!![]+[])[+[]]+(!![]+[]
)[!+[]+!+[]+!+[]]+(!![]+[])[+!+[]]])[+!+[]+[+[]]]+(!![]+[])[
+!+[]]+(!![]+[])[+[]]][([][(![]+[])[+[]]+([![]]+[][[]])[+!+[]+[
+[]]]+(![]+[])[!+[]+!+[]]+(!![]+[])[+[]]+(!![]+[])[!+[]+!+[]
+!+[]]+(!![]+[])[+!+[]]]+[])[!+[]+!+[]+!+[]]+(![]+[])[+!+[]
]+(![]+[])[!+[]+!+[]]+(![]+[])[!+[]+!+[]]]()[(![]+[])[+!+[]]
+(![]+[])[!+[]+!+[]]+(!![]+[])[!+[]+!+[]+!+[]]+(!![]+[])[+!
+[]]+(!![]+[])[+[]]])(+!+[])

</script>
2011/5/24
A Seminar at Advanced Defense Lab
33
2011/5/24
A Seminar at Advanced Defense Lab
34