Anomaly Detection of Web

Cristopher Kruegel & Giovanni Vigna
CCS ‘03
Presented By: Payas Gupta
Outline
Overview
Approach
Model Description
Evaluation
Conclusion
Web based attacks
 XSS attacks
 Buffer overflow
 Directory transversal
 Input validation
 Code red
 Anomaly Detection v/s Misuse Detection
Data Model
 Only GET requests with no header
 169.229.60.105 − johndoe [6/Nov/2002:23:59:59 −0800 "GET
/scripts/access.pl?user=johndoe&cred=admin" 200 2122
a1=v1
Path
 Only Query string, no path
 For query q, Sq={a1,a2}
a2=v2
Query
Detection model
 Each model is associated with weight wm.
 Each model returns the probability pm.
 A value close to 0 indicates anomalous event i.e. a
value of pm close to 1 indicates anomalous event.
Attribute Length
 Normal Parameters
 Fixed sized tokens (session identifiers)
 Short strings (input from HTML form)
 So, doesn’t vary much associated with certain prg.
 Malicious activity
 E.g. for buffer overflow
 Goal: to approximate the actual but unknown
distribution of the parameter lengths and detect
deviation from the normal
Learning & Detection
 Learning
 Calculate mean and variance for the lengths l1,l2,...,ln
for the parameters processed.

N queries with this attribute
 Detection
 Chebyshev inequality
 This computation bound has to be weak, to result in
high degree of tolerance (very weak)
 Only obvious outliers are flagged as suspicious
Attribute character distribution
 Attributes have regular structure, printable characters
 There are similarities between the character frequencies
of query parameters.
 Relative character frequencies of the attribute are
sorted in relative order
Passwd – 112 97 115 115 119 110
0.33 0.17 0.17 0.17 0.17 0 255 times
ICD(0) = 0.33 & ICD(1) to ICD(4) = 0.17 ICD(5)=0
 Normal
 freq. slowly decrease in value
 Malicious
 Drop extremely fast (peak cause by single character distrib.)
 Nearly not at all (random values)
Why is it useful?
 Cannot be evaded by some well-known attempts to
hide malicious code in the string.
 Nop operation substituted by similar behavior
(add rA,rA,0)
 But not useful in when small routine change in the
payload distribution
Learning and detection
 Learning
 For each query attribute, its character distribution
is stored
 ICD is obtained by averaging of all the stored
character distributions
q1
.5
.25
.25
0
0
q2
.75
.2
.1
0
0
q3
.25
.25
.25
.25
0
avg
.5
.22
.2
.08
0
Learning and detection (cont...)
 Pearson chi-square test
 Not necessary to operate on all values of ICD
consider a small number of intervals, i.e. bins
 Calculate observed and expected frequencies
 Oi= observer frequencies for each bin
 Ei= relative freq of each bin * length of the attribute
 Compute chi-square
 Calculate probability from chi-square predefined
table
Structural inference
 Structural is the regular grammar that describes
all of its normal legitimate values.
 Why??
 Craft attack in a manner that makes its
manifestation appear more regular.
 For example, non-printable characters can be
replaces by groups of printable characters.
Learning and detection
 Basic approach is to generalize grammar as long as
it seems reasonable and stop before too much
structural information is lost.
 MARKOV model and Bayesian probability
 NFA
 Each state S has a set of ns possible output symbols
o which are emitted with the probability of ps(o).
 Each transition t is marked with probability p(t),
likelihood that the transition is taken.
Learning and detection (cont...)
0.3
So, probability of ‘ab’
Start
a|p(a) = 0.5
b|p(b) = 0.5
0.7
0.2
a|p(a) = 1
0.4
0.4
1.0
c|p(c) = 1
b|p(b) = 1
1.0
Terminal
1.0
P(w) = (1.0*0.3*0.5*0.2*0.5*0.4)+
(1.0*0.7*1.0*1.0*1.0*1.0)
Learning and detection (cont...)
By adding the probabilities calculated for each input training element
Learning and detection (cont...)
 Aim to maximize the product.
 Conflict between simple models that tend to overgeneralize and models that perfectly fit the data but
are too complex.
 Simple model- high probability, but likelihood of
producing the training data is extremely low. So,
product is low
 Complex model- low probability, but likelihood of
producing the training data is high. Still product is low.
 Model starts building up and generating input data
then the states starts building up using Viterbi
algorithm.
Learning and detection (cont...)
 Detection
 The problem is that even a legitimate input that has
been regularly seen during the training phase may
receive a very small probability values

The probability values of all possible input words sum to 1
 Model return value 1 if valid output otherwise 0
when the value cannot be derived from the given
grammar
Token finder
 Whether the values of the attributes are from a
limited set of possible alternatives (enumeration)
 When malicious user try to usually pass the illegal
values to the application, the attack can b
detected.
Learning and detection
 Learning
 Enumeration: when different occurrences of
parameter values is bound by some threshold t.
 Random: when the no of different argument
instances grows proportionally
 Calculate statistical correlation
Learning and detection (cont...)
< 0, enumeration
> 0, random
 Detection
 If any unexpected happens in case of enumeration,
then it returns 0, otherwise 1 and in case of
randomness it always return 1.
Attribute presence of absence
 Client-side programs, scripts or HTML forms pre-
process the data and transform in into a suitable
request.
 Hand crafted attacks focus on exploiting a
vulnerability in the code that processes a certain
parameter value and little attention is paid on the
order.
Learning and detection
 Learning
 Model of acceptable subsets
 Recording each distinct subset Sq={ai,...ak} of
attributes that is seen during the training phase.
 Detection
 The algorithm performs for each query a lookup of
the current attribute set.
 If encountered then 1 otherwise 0
Attribute order
 Legitimate invocations of server-side programs
often contain the same parameters in the same
order.
 Hand craft attacks don’t
 To test whether the given order is consistent
with the model deduced during the learning phase.
Learning and detection
 Learning:
 A set of attribute pairs O such that:
 Each vertex vi in directed G is associated with the
corresponding attribute ai.
 For every query ordered list is processed.
 Att. Pair (as,at) in this list, with s ~= t and 1<=s,t<=i, a
directed edge is inserted into the graph from vs to
vt .
Learning and detection (cont...)
 Graph G contains all ordered constraints imposed
by queries in the training data.
 Order is determined by
 Directed edge
 Path
 Detection
 Given a query with attributes a1,a2,...,ai and a set of
order constraints O, all the parameter pairs (aj,ak)
with j~=k and 1 <= j,k <= I
 Violation then return 0 otherwise 1
Evaluation
 Data sets
 Apache web server



GOOGLE
University of California, Santa Barbara
Technical university, Vienna
 1000 for training
 All rest for testing
Model Validation
 Significant entries for Nimda and Code red worm
but removed.
 Include only queries that results from the
invocation of existing programs into the training
and detection process.
 Also for Google, thresholds were changed to
account for higher variability in traffic
Detection effectiveness
Conclusions
 Anomaly-based intrusion detection system on web.
 Takes advantage of application-specific
correlation between server-side programs and
parameters used in their invocation.
 Parameter characteristics are learned from the
input data.
 Tested on Google, and two universities in US and
Europe
Q/A