poster

Algorithm Selection on Data Streams
Jan N. van Rijn
1 Leiden
1
Geoffrey Holmes
2
Bernhard Pfahringer
2
Joaquin Vanschoren
3
Institute of Advanced Computer Science, Leiden University, Leiden, The Netherlands
2 Department
of Computer Science, University of Waikato, Hamilton, New Zealand
3 Department
of Mathematics and Computer Science, TU Eindhoven, Eindhoven, The Netherlands
Introduction
Discovery 1
Modern society produces vast amounts of data coming, for instance, from
sensor networks and various text sources on the internet. Various machine
learning algorithms are able to capture general trends and make predictions
for future observations with a reasonable success rate.
Using Meta-Learning, our goal is to establish knowledge about what type of
algorithm works well on which kind of data.
Using basic meta-features, we can predict the best performing classifier from
the following set of classifiers: {Naive Bayes, k-NN, Hoeffding Tree,
SPegasus, Stochastic Gradient Descent}.
The pie chart shows the
HT
skewness of this dataset. Hoeffding
Trees perform best on most intervals.
Measured
over all data streams, a meta-classifier
65%
was able to predict the best
2%
performing classifier in more than 80%
SGD
5%
of the cases. It improved the baseline on
SPeg
11%
both meta-level accuracy and base-level
17%
accuracy. The results are competitive
NB
with state of the art meta-classifiers.
Data Streams
Requirements: (as stated by [1])
• Process an example at a time,
and inspect it only once
• Use a limited amount of memory
• Work in a limited amount of time
• Be ready to predict at any point
Interleaved test-then-train: Each
instance is first used to test the model,
after which it can be used to train
the model. E.g.: The stock market.
k-NN
Discovery 2
Leveraging Bagging is a state of the art meta-learning technique, proven
superior to other algorithms many times. It is in particular successful when
used on Hoeffding trees.
Meta-Learning
0.002
It is common for learning curves in Data Streams to cross multiple times, as
illustrated by this plot, describing the Electricity data stream.
0.001
0
Percentage
-0.001
0.95
0.9
-0.002
-0.003
-0.004
-0.005
-0.006
0.85
-0.007
)
00
00
00 )
,1 0
al 00
in 00
om 10 )
,n l, 00
re ina 00 )
he m 00 00 0)
sp ,no ,1 00 00
no itis ric 00 00 0)
(io at e ,1 0 00
G p numeric ic,1 00
e
BNG(h or, um mer l,10
b n u a ) 0)
BNG(la or, s,n min 96 00
b iti o 52 0
BNG(la pat t,n c,5 ,100 0)
e en ri c 0 0)
BNG(h gm me eri 000 00
0
0
u
m
e
0)
BNG(s c,n ,nu al,1 100 )
00 )
m
-a in l, 2
BNG(c dit om ina 07 00) 000 00
e n m 1 0 ,1 0
r
BNG(c or, ,no l,13 00 ric 000
b -a a 0 e ,1
BNG(la dit min al,1 num ric
re o in , e
BNG(c te,n om 000 um
n 5 ,n
o
,
0)
BNG(v ins rm- 000 )
a
o 5 1 ) 00
BNG(tr vef rm- .00 96 000
a o ;0 2 1
BNG(w vef (50 l,55 al, ) )
a F a in 0 0 )
BNG(w RB min om 000 000 000
m o ,n 0 0 0
BN do c,n om l,10 l,10 00
an m ro a a l,1
R G(c sh min min ina
)
u
) 00
o
o
m
BNG(m k,n n,n ,no
00 00
ic a d
00 00
BNG(s ybe roi ized 00 l,1
o hy al l,1 a
BN (s ot m na in
G
yp or i om
BNG(h e-n nom e,n
p l, ik
)
) 00 0)
BN erty nea ern
v n z ) 01 0 0
0
0
0
co G(a eat (0; .00 00 00
f F ;0 ,1 0
1
5)
BNG(m RB (10 eric al,
24 )
in
m F
BN do RB num om ) 295 45
an m l, ,n 1 c, 2
R do nea se .00 eri 295
an n ba ;0 m l,
R G(a am (10 ,nu ina 00)
)
p F s m 0
00
BN (s RB ock ,no 00
00
G
0
m bl s 10
BN do ge- ock ric, 1) l,10
0)
l
an a b e 0 a
00
R G(p ge- um .00 min
00 )
a ,n ;0 o
10 00
BN (p el 50 ,n
c, 0
G
i
0
ow F( IG
er 0
BNG(v RB OR
m ,10
0)
m l.
nu al
BN do nea
00
e, in
0
an n a ik m
00
R (a am ern ,no
)
,1
G
ic
dr -z gy
00
BN B. eat olo
er 0) 0
D f t
m 00 00
IM (m a
nu 0 10
0
m
,
,
r
G
0
c
y
)
i
e
og l,1 er 00
BN (d
)
l
a
0
o
m
G
)
n
t
i
0
00
u
0
BN 5 a m ,n 0
00 )
A( rm o G 10
n
,
I
e
l
,
00 0
SE G(d wel OR ina
,1 00
.
al 0
l
o
m
in 00
BNG(v nea ,no
om ic,1 )
n h
BN (a rtr,n er 00
G
ea ) ce m 00
BN (h 00 an nu 00
,
)
G
00 t-c og ,1
0) 00
BN 5 s tl ric
00 00
A( rea sta me
00 00
SE G(b art- ,nu
10 l,1 0)
,
e
c
al a 00
BNG(h artin min 00
e
)
0 )
m
BNG(h 00 ,no ,no c,1 00 0)
0
g ri 0 0
e
0
o
BN (5 hicl tatl me 000 00
D
u 1 00 )
s
e
LE G(v art- ts,n ric, l,1 000
i e a 0
e
g
BNG(h ndi um min 00 00)
,n o l,1 0
e
BNG(p ph ts,n ina 00 00) )
m gi m 0 0 0
BN (ly di no l,1 00 00 )
a
G
en c, in 10 0 0 0)
BNG(p art- om nal, ,100 000 000
n i
0
e
0
c
,
BNG(h tos om eri l,10 ,10
l )
,n m
u
a
a
0
)
BN (a ph nu in in 0
00
G
m g, m m 00
o
00
BNG(ly dit s,n r,no 100
00
re it ie l,
,1
BN (c dig ur ina
al
G
pt -fo m ) in
BNG(o eat ,no 001 nom 0)
f g . , 0
BN (m dit 0;0 00 00 0)
G
re (1 50 00 00
0)
BNG(c ne rm- al,1 00
) 00
la o in 10
BN erp vef m ic, ) 000 000
r
yp a no e 1 0 ,1
H G(w ter, um 000 100 nal
t n . , i
)
,
BN (le os 0;0 ric om
00
G
0) 00
ut (1 me ,n
BNG(a ane ,nu nen
00 00
l e u
00 ,1
BN erp hicl arh zed ,10 ric
yp e -k li al e
H G(v eat rma in num 0)
f o m , 0
BNG(m y-n ,no nen 000
it kp u 0
)
BN tric vs- arh ic,1
00
0
ec r- -k er
)
0
el G(k eat um
00 00
f
0)
n
,1 0
BNG(m ter,
00
ic 00
t
er 10
00
BNG(le e um al, f ,10
p ,n in rf ic
BN rty ar m t.a er
ve on ,no rif um
co G(s nar s.d r,n
o up ie
BN (s o r
G
gr ou
f
s
ed
BN ew eat
iz
al
n f
20 (m lec rm
G kE no
BN Po ndov a
C erh
k
po
accuracy
accuracy(k-NN)-accuracy(LB-kNN)
-0.008
0.8
0.75
0.7
0.65
0.6
Hoeffding Tree
Naive Bayes
SPegasos
k-NN
0.55
0.5
0
5
10
15
20
25
30
35
40
interval
This plot shows that applying Leveraged Bagging to k-NN actually
decreases the performance! Breiman [4] reported on similar behaviour in the
classical setting.
Idea: What if we could determine dynamically at any point in the stream
which algorithm to use
Discovery 3
Algorithm Selection
Naive Bayes works well
when the meta-feature measuring the number
of changes detected in the stream is high.
Naive Bayes generally needs only relatively
few observations to achieve good accuracy
compared to more sophisticated algorithms such
as Hoeffding Trees. Assuming that a high
number of changes detected by this landmarker
indicates that the concept of the stream is
indeed changing quickly, this could explain why
a classifier like Naive Bayes outperforms more
sophisticated learning algorithms that need
more observations of the same concept to perform well.
• Run a set of algorithms full factorial on a set of data streams.
• All algorithms as implemented in the MOA framework [1].
• We used real world data streams, data generators and semi generated data.
• Split data streams up in windows of n instances.
• Calculate a set of meta-features of each interval (SSIL).
• Introduced novel stream-based meta-features.
• Store evaluation scores of each algorithm on each interval.
• All results available on http://www.openml.org/ [3].
References
[1] A. Bifet, G. Holmes, R. Kirkby, B. Pfahringer. MOA: Massive Online Analysis. J. Mach.
Learn. Res. 11, pages 1601–1604, 2010.
[2] J. N. van Rijn, G. Holmes, B. Pfahringer, and J. Vanschoren. Algorithm Selection on
Data Streams. Discovery Science, 17th International Conference, pages 325–336, 2014.
[3] J. Vanschoren, J.N. van Rijn, B. Bischl and L. Torgo. OpenML: networked science in
machine learning. ACM SIGKDD Explorations Newsletter 15, pages 49–60, 2014
[4] L. Breiman. Bagging Predictors. Machine learning 24(2), pages 123–140, 1996.