DAWN: A Novel Strategy for Detecting ASCII Worms in Networks

DAWN:
A Novel Strategy for Detecting
ASCII Worms in Networks
Parbati Kumar Manna
Sanjay Ranka
Shigang Chen
Department of Computer and Information Science and
Engineering, University of Florida
IEEE INFOCOM 08
1
Outline
Introduction
ASCII Worm
Detection Strategies
Probabilistic Analysis
Implementation
Evaluation
Conclusions
2
Introduction
Almost any ASCII string translates into a
syntactically correct sequence of
instructions
The proportion of branch instructions for
ASCII data is significantly higher than that
of binary data
Prune the number of path to be inspected
3
ASCII Worm
ASCII data: 0x20 ~ 0x7E
Maximal valid instruction sequence
LMVI: Length of Maximal Valid Instruction
sequenece
4
ASCII Worm
Intel opcodes in ASCII
Dual-operand register/memory manipulation
sub, xor, inc, imul
Single-operand register manipulation
inc, dec
Stack-manipulation
push, pop, popa
Jump
jo, jno, jb, jae, je, jne, jbe, ja, js, jns, jp, jnp, jnge, jnl,
jng
5
ASCII Worm
I/O operation
insb, insd, outsb, outsd
Miscellaneous
aaa, daa, das, bound, arpl
Operand and Segment override prefixes
cs, ds, es, fs, gs, ss, a16, o16
Move eax, ebx  push ebx
pop eax
6
ASCII Worm
7
ASCII Worm
Both the decrypter and the encrypted
payload should be ASCII
The size of the decrypter should be small
There should not be a significant size
discrepancy between the encrypted
payload and the cleartext
8
Detection Strategies
Constraints of an ASCII Worm
Opcode Unavailability
Difficulty in Encryption
Control Flow Constraints
Self-mutation is a mandatory constraint
n bytes instructions  O(n) bytes
decrypter
9
Detection Strategies
Prevalence of Privileged Instructions
l, m, n, o  insb, insd, outsb, outsd
Illegal Memory Access
Uninitialized register
Wrong Segment selector
Explicit Memory Address
10
Probabilistic Analysis
Assumptions:
The characters in the traffic are independently
distributed
Bernoulli trial
11
Probabilistic Analysis
Invalid instruction
Privileged instruction
Memory-accessing instructions
12
Probabilistic Analysis
Notation:
p: the probability of invalid instruction
n: the total num of instructions
N: total num of invalid instructions (the num of
valid instruction sequences)
Instruction stream (S1S2S3…SN)
Xi: the length of Si
Xmax: max{X1,X2,…,XN}
13
Probabilistic Analysis
( Nn ) p N (1  p) n  N
 p.m.f of N:
x 1
p
(
1

p
)
 p.m.f of Xi:
x
1

(
1

p)
 c.d.f of Xi:
14
Probabilistic Analysis
For a instance of exactly N sequences
15
Probabilistic Analysis
The c.d.f of Xmax
16
Probabilistic Analysis
The p.m.f of Xmax
17
Probabilistic Analysis
Verifying Model
Using Monte-Carlo Simulation
18
Probabilistic Analysis
 Threshold τ
log( 1  (1   )  log p

log( 1  p )
1
n
19
Implementation
Instruction Disassembly
Instruction Sequence Analysis
20
Evaluation
Creation of the Test Data
Benign data: 100 cases, each containing nearly
4K printable ASCII characters
21
Evaluation
Determining Appropriate Thresholds for
the Test Data
Determining p
0.227
Determining n
1540
Determining the threshold τ
40 (when α = 0.01)
22
Evaluation
Experimental Results and Assessing the
Effectiveness of the Detection Method
23
Evaluation
24
Conclusions
An ASCII worm must self-mutate to
generate binary opcodes
This mutation requires a lots of memorywriting instructions
The size of a decrypter is relatively big for
ASCII worm
25
Conclusions
Benign ASCII data does not have such a
long executable instruction sequence
The length of the maximal valid instruction
sequence can be used to differentiate
between benign and malicious data
26
Determining p
Prob[I/O instruction]
+Prob[wrong-Segment-override memoryaccessing-instruction]
= 18.5% + 4.2% = 22.7%
27
Determining n
E[length of instruction]
= E[length of prefix chain]
+E[length of actual instruction] = 2.6
n = Total num of input characters /
E[instruction size]
= 4000/2.6 = 1540
28