DAWN:
A Novel Strategy for Detecting
ASCII Worms in Networks
Parbati Kumar Manna
Sanjay Ranka
Shigang Chen
Department of Computer and Information Science and
Engineering, University of Florida
IEEE INFOCOM 08
1
Outline
Introduction
ASCII Worm
Detection Strategies
Probabilistic Analysis
Implementation
Evaluation
Conclusions
2
Introduction
Almost any ASCII string translates into a
syntactically correct sequence of
instructions
The proportion of branch instructions for
ASCII data is significantly higher than that
of binary data
Prune the number of path to be inspected
3
ASCII Worm
ASCII data: 0x20 ~ 0x7E
Maximal valid instruction sequence
LMVI: Length of Maximal Valid Instruction
sequenece
4
ASCII Worm
Intel opcodes in ASCII
Dual-operand register/memory manipulation
sub, xor, inc, imul
Single-operand register manipulation
inc, dec
Stack-manipulation
push, pop, popa
Jump
jo, jno, jb, jae, je, jne, jbe, ja, js, jns, jp, jnp, jnge, jnl,
jng
5
ASCII Worm
I/O operation
insb, insd, outsb, outsd
Miscellaneous
aaa, daa, das, bound, arpl
Operand and Segment override prefixes
cs, ds, es, fs, gs, ss, a16, o16
Move eax, ebx push ebx
pop eax
6
ASCII Worm
7
ASCII Worm
Both the decrypter and the encrypted
payload should be ASCII
The size of the decrypter should be small
There should not be a significant size
discrepancy between the encrypted
payload and the cleartext
8
Detection Strategies
Constraints of an ASCII Worm
Opcode Unavailability
Difficulty in Encryption
Control Flow Constraints
Self-mutation is a mandatory constraint
n bytes instructions O(n) bytes
decrypter
9
Detection Strategies
Prevalence of Privileged Instructions
l, m, n, o insb, insd, outsb, outsd
Illegal Memory Access
Uninitialized register
Wrong Segment selector
Explicit Memory Address
10
Probabilistic Analysis
Assumptions:
The characters in the traffic are independently
distributed
Bernoulli trial
11
Probabilistic Analysis
Invalid instruction
Privileged instruction
Memory-accessing instructions
12
Probabilistic Analysis
Notation:
p: the probability of invalid instruction
n: the total num of instructions
N: total num of invalid instructions (the num of
valid instruction sequences)
Instruction stream (S1S2S3…SN)
Xi: the length of Si
Xmax: max{X1,X2,…,XN}
13
Probabilistic Analysis
( Nn ) p N (1 p) n N
p.m.f of N:
x 1
p
(
1
p
)
p.m.f of Xi:
x
1
(
1
p)
c.d.f of Xi:
14
Probabilistic Analysis
For a instance of exactly N sequences
15
Probabilistic Analysis
The c.d.f of Xmax
16
Probabilistic Analysis
The p.m.f of Xmax
17
Probabilistic Analysis
Verifying Model
Using Monte-Carlo Simulation
18
Probabilistic Analysis
Threshold τ
log( 1 (1 ) log p
log( 1 p )
1
n
19
Implementation
Instruction Disassembly
Instruction Sequence Analysis
20
Evaluation
Creation of the Test Data
Benign data: 100 cases, each containing nearly
4K printable ASCII characters
21
Evaluation
Determining Appropriate Thresholds for
the Test Data
Determining p
0.227
Determining n
1540
Determining the threshold τ
40 (when α = 0.01)
22
Evaluation
Experimental Results and Assessing the
Effectiveness of the Detection Method
23
Evaluation
24
Conclusions
An ASCII worm must self-mutate to
generate binary opcodes
This mutation requires a lots of memorywriting instructions
The size of a decrypter is relatively big for
ASCII worm
25
Conclusions
Benign ASCII data does not have such a
long executable instruction sequence
The length of the maximal valid instruction
sequence can be used to differentiate
between benign and malicious data
26
Determining p
Prob[I/O instruction]
+Prob[wrong-Segment-override memoryaccessing-instruction]
= 18.5% + 4.2% = 22.7%
27
Determining n
E[length of instruction]
= E[length of prefix chain]
+E[length of actual instruction] = 2.6
n = Total num of input characters /
E[instruction size]
= 4000/2.6 = 1540
28
© Copyright 2026 Paperzz