Discovering Outlier Filtering Rules from Unlabeled Data

Discovering Outlier Filtering
Rules from Unlabeled
Data
Author: Kenji Yamanishi &
Jun-ichi Takeuchi
Advisor: Dr. Hsu
Graduate: Chia- Hsien Wu
Outline
 Motivation
 Objective
 Introduction
 Main Framework
 Outlier Detector - SmartSifter
 Rule Generator – DL-ESC/DL-SC
 Experimentation–The network intrusion
 Experimental Results
 Conclusion
 Opinion
Motivation
The problem of the SmartSifter’s accuracy
The SmartSifter cannot find the general
pattern of the identified outliers
Objective
 Improving the accuracy of SmartSiFter.
 Discovering a new pattern that outliers in a
specific group may commonly have
Introduction
 Developing SmartSifer : It is an on-line outlier
detection algorithm
 Improving the power of the SamtSifer by
combining supervised learning method
Main Framework
A New Rule
Classifier L
Outlier Detector - SmartSifter ->SS
 Using a probabilistic (Gaussian mixture)
model->P(x,y) = p(x)p(y|x)
 Employing an on-line discounting learning
algorithm (SDLE)/(SDEM) to update the
model
 Giving a score to each datum
Outlier Detector - SmartSifter ->SS
(cont.)
SDLE algorithm: An on-line discounting
variant of the Laplace law based estimation
algorithm
SDEM algorithm: An on-line discounting
variant of the incremental EM (Expectation
Maximization) algorithm
Outlier Detector - SmartSifter ->SS
(cont.)
Outputting a sorted dataset
A highly scored data indicates a high
possibility be an outlier
Rule Generator – DL-ESC/DL-SC
 Using a stochastic decision list
 Employing the principle of minimizing extended
stochastic complexity or stochastic complexity
Rule Generator – DL-ESC/DL-SC
(cont.)
 If ξ makes t1 true, then μ = v1 with probability p1
else if ξ makes t2 true, then μ = v2 with probability p2
………………………
else μ = vs with probability ps
Experimentation - Network intrusion detection
 The purpose of our experiment is to detect
without making use of the labels concerning
intrusions
Experimentation – Dataset (cont.)
 Using the dataset KDD Cup 1999 prepared for
network intrusion detection
 Using the 13 attributes for DL-ESC
 Using four attributes for SmartSifter
(service ,duration ,src_bytes ,dst_bytes)
 Only “service” is categorical
 Y= log(x+0.1),where the base of logarithm is e
 Generating five datasets S0,S1,S2,S3,S4
Experimentation – Dataset (cont.)
Experimentation – Illustration by an Example
(cont.)
First Rule – S1
Update Rule – S1
Update Rule – S2
Experimental Results
 SS : SmartSifter
 R&S: Rule and SmartSifter (This framework)
 Using S0 as a training set to construct a filtering
rule, each of S1,S2,S3,and S4 is used for test
Experimental Results (cont.)
Experimental Results (cont.)
Conclusion
 This new framework has two features
Improving the power of SmartSifter
Helping the user discovers a general pattern
Opinion
 Making the detection process more effective and
more understandable
 This framework can apply to other field