5. Combining Base Learners

Bias Management in Time
Changing Data Streams
• We assume data is generated randomly according to a stationary distribution.
• Data comes in the form of streams through time.
• Examples are network monitoring, web applications, sensor networks.
• This calls for Adaptive Learning Algorithms that change bias through time.
Contexts
Context 1
Context 2
Context 3
Same stationary
distribution
Data stream: Sequence of contexts
How can we detect when there is a change of context?
Change Detection
Context 1
?
Context 2
?
Context 3
Online algorithms – Detect a change on real time
Offline Algorithms – Analyze the whole sequence of data
Tracking Drifting Concepts
Window Approach
h
Continuously monitor accuracy and coverage of the model
If no changes are detected increase h
If changes are detected decrease h
Dynamic Bias Selection
Very Fast Decision Tree Algorithm
New example updates statistics
If evidence is strong enough a new
sub-tree is attached to the leaf node.
Dynamic Bias Selection
Very Fast Decision Tree Algorithm
Hoeffding Bound:
We have made n observations randomly.
A random variable r has range R.
With prob. 1 – A, r is in the range:
mean( r ) +- e
where
e = sqrt( R2 ln (1/A) / 2n)
Dynamic Bias Selection
Very Fast Decision Tree Algorithm
Now assume B = H(xa) – H(xb)
where xa and xb are two features, and H()
is the splitting function.
Then if B > e with n examples seen at
the leaf node, with prob. at least 1 – A,
xa is the attribute wit highest value for H().
Bayesian Network Classifiers
• Start with a simple Naïve Bayes (no attribute dependency is assumed).
• Add dependencies if this brings increase in performance.
• But too many dependencies increases the number of parameters drastically.
Bayesian Network Classifiers
K-DBCs stands for k-Dependence Bayesian Classifiers.
It is a Bayesian algorithm that allows each attribute to have at most k nodes
as parents.
We can iteratively add arcs between attributes to maximize a score until no
more improvements are achieved.
Shewhart P-Chart for
Concept Drift
• Monitor error with limits and warning zones.
• When error increases beyond tolerance then a new model is created.
• Algorithm used: Naive Bayes
Shewhart P-Chart
Lessons Learned
• Trade-off between cost of update and improve in performance.
• Strong variance management methods are O.K. in small datasets.
• But simple methods have high bias.