Cristopher Kruegel & Giovanni Vigna
CCS ‘03
Presented By: Payas Gupta
Outline
Overview
Approach
Model Description
Evaluation
Conclusion
Web based attacks
XSS attacks
Buffer overflow
Directory transversal
Input validation
Code red
Anomaly Detection v/s Misuse Detection
Data Model
Only GET requests with no header
169.229.60.105 − johndoe [6/Nov/2002:23:59:59 −0800 "GET
/scripts/access.pl?user=johndoe&cred=admin" 200 2122
a1=v1
Path
Only Query string, no path
For query q, Sq={a1,a2}
a2=v2
Query
Detection model
Each model is associated with weight wm.
Each model returns the probability pm.
A value close to 0 indicates anomalous event i.e. a
value of pm close to 1 indicates anomalous event.
Attribute Length
Normal Parameters
Fixed sized tokens (session identifiers)
Short strings (input from HTML form)
So, doesn’t vary much associated with certain prg.
Malicious activity
E.g. for buffer overflow
Goal: to approximate the actual but unknown
distribution of the parameter lengths and detect
deviation from the normal
Learning & Detection
Learning
Calculate mean and variance for the lengths l1,l2,...,ln
for the parameters processed.
N queries with this attribute
Detection
Chebyshev inequality
This computation bound has to be weak, to result in
high degree of tolerance (very weak)
Only obvious outliers are flagged as suspicious
Attribute character distribution
Attributes have regular structure, printable characters
There are similarities between the character frequencies
of query parameters.
Relative character frequencies of the attribute are
sorted in relative order
Passwd – 112 97 115 115 119 110
0.33 0.17 0.17 0.17 0.17 0 255 times
ICD(0) = 0.33 & ICD(1) to ICD(4) = 0.17 ICD(5)=0
Normal
freq. slowly decrease in value
Malicious
Drop extremely fast (peak cause by single character distrib.)
Nearly not at all (random values)
Why is it useful?
Cannot be evaded by some well-known attempts to
hide malicious code in the string.
Nop operation substituted by similar behavior
(add rA,rA,0)
But not useful in when small routine change in the
payload distribution
Learning and detection
Learning
For each query attribute, its character distribution
is stored
ICD is obtained by averaging of all the stored
character distributions
q1
.5
.25
.25
0
0
q2
.75
.2
.1
0
0
q3
.25
.25
.25
.25
0
avg
.5
.22
.2
.08
0
Learning and detection (cont...)
Pearson chi-square test
Not necessary to operate on all values of ICD
consider a small number of intervals, i.e. bins
Calculate observed and expected frequencies
Oi= observer frequencies for each bin
Ei= relative freq of each bin * length of the attribute
Compute chi-square
Calculate probability from chi-square predefined
table
Structural inference
Structural is the regular grammar that describes
all of its normal legitimate values.
Why??
Craft attack in a manner that makes its
manifestation appear more regular.
For example, non-printable characters can be
replaces by groups of printable characters.
Learning and detection
Basic approach is to generalize grammar as long as
it seems reasonable and stop before too much
structural information is lost.
MARKOV model and Bayesian probability
NFA
Each state S has a set of ns possible output symbols
o which are emitted with the probability of ps(o).
Each transition t is marked with probability p(t),
likelihood that the transition is taken.
Learning and detection (cont...)
0.3
So, probability of ‘ab’
Start
a|p(a) = 0.5
b|p(b) = 0.5
0.7
0.2
a|p(a) = 1
0.4
0.4
1.0
c|p(c) = 1
b|p(b) = 1
1.0
Terminal
1.0
P(w) = (1.0*0.3*0.5*0.2*0.5*0.4)+
(1.0*0.7*1.0*1.0*1.0*1.0)
Learning and detection (cont...)
By adding the probabilities calculated for each input training element
Learning and detection (cont...)
Aim to maximize the product.
Conflict between simple models that tend to overgeneralize and models that perfectly fit the data but
are too complex.
Simple model- high probability, but likelihood of
producing the training data is extremely low. So,
product is low
Complex model- low probability, but likelihood of
producing the training data is high. Still product is low.
Model starts building up and generating input data
then the states starts building up using Viterbi
algorithm.
Learning and detection (cont...)
Detection
The problem is that even a legitimate input that has
been regularly seen during the training phase may
receive a very small probability values
The probability values of all possible input words sum to 1
Model return value 1 if valid output otherwise 0
when the value cannot be derived from the given
grammar
Token finder
Whether the values of the attributes are from a
limited set of possible alternatives (enumeration)
When malicious user try to usually pass the illegal
values to the application, the attack can b
detected.
Learning and detection
Learning
Enumeration: when different occurrences of
parameter values is bound by some threshold t.
Random: when the no of different argument
instances grows proportionally
Calculate statistical correlation
Learning and detection (cont...)
< 0, enumeration
> 0, random
Detection
If any unexpected happens in case of enumeration,
then it returns 0, otherwise 1 and in case of
randomness it always return 1.
Attribute presence of absence
Client-side programs, scripts or HTML forms pre-
process the data and transform in into a suitable
request.
Hand crafted attacks focus on exploiting a
vulnerability in the code that processes a certain
parameter value and little attention is paid on the
order.
Learning and detection
Learning
Model of acceptable subsets
Recording each distinct subset Sq={ai,...ak} of
attributes that is seen during the training phase.
Detection
The algorithm performs for each query a lookup of
the current attribute set.
If encountered then 1 otherwise 0
Attribute order
Legitimate invocations of server-side programs
often contain the same parameters in the same
order.
Hand craft attacks don’t
To test whether the given order is consistent
with the model deduced during the learning phase.
Learning and detection
Learning:
A set of attribute pairs O such that:
Each vertex vi in directed G is associated with the
corresponding attribute ai.
For every query ordered list is processed.
Att. Pair (as,at) in this list, with s ~= t and 1<=s,t<=i, a
directed edge is inserted into the graph from vs to
vt .
Learning and detection (cont...)
Graph G contains all ordered constraints imposed
by queries in the training data.
Order is determined by
Directed edge
Path
Detection
Given a query with attributes a1,a2,...,ai and a set of
order constraints O, all the parameter pairs (aj,ak)
with j~=k and 1 <= j,k <= I
Violation then return 0 otherwise 1
Evaluation
Data sets
Apache web server
GOOGLE
University of California, Santa Barbara
Technical university, Vienna
1000 for training
All rest for testing
Model Validation
Significant entries for Nimda and Code red worm
but removed.
Include only queries that results from the
invocation of existing programs into the training
and detection process.
Also for Google, thresholds were changed to
account for higher variability in traffic
Detection effectiveness
Conclusions
Anomaly-based intrusion detection system on web.
Takes advantage of application-specific
correlation between server-side programs and
parameters used in their invocation.
Parameter characteristics are learned from the
input data.
Tested on Google, and two universities in US and
Europe
Q/A
© Copyright 2026 Paperzz