(url)

An Adaptive Regression Tree for Non-stationary Data
Streams
ABSTRACT
Data streams are endless flow of data produced in high speed,
large size and usually non-stationary environments. These
characteristics make them capable of being used in modeling a
significant part of data mining applications. Learning nonstationary streaming data have received much attention in recent
years. The main property of these streams is the occurrence of
concept drifts. The existing methods handle concept drifts by
updating their models either in a regular basis or upon detecting
them. Most successful methods use ensemble algorithms to deal
with the many concepts found in the stream; however, they
usually lack acceptable running time. Using decision trees is
shown to be a powerful approach for accurate and fast learning of
data streams. In this paper, we present an incremental regression
tree that can predict the target variable of newly coming instances.
The labels of the instances are first predicted by the regression
tree. Then, the true labels are revealed to the algorithm. The
labeled instances are used to incrementally construct the tree. The
tree is updated in the case of occurring concept drifts either by
altering its structure or updating its embedded models.
Experimental results show the effectiveness of our algorithm in
speed and accuracy aspects in comparison to the best state-of-theart methods.
Categories and Subject Descriptors
I.2.6 [Artificial Intelligence]: Learning – concept learning,
regression trees, data streams, non-stationary environments; I.5.2
[Pattern Recognition]: Design Methodologies – classifier design
and evaluation
General Terms
Algorithm, Design, Performance.
Keywords
regression tree, model tree, data stream, concept drift, nonstationary environment, drift detection
1. INTRODUCTION
One very important class of regression models is regression trees.
The significance of them is because in practice, a single
regression model cannot address many regression problems;
hence, regression tree algorithms that work on the basis of
recursively partitioning the data and fitting local models in the
leaves can be an adequate solution. As in the case of decision
trees, regression trees are generally built top-down by partitioning
a feature space; however, the main difference is that the
dependent variable to be predicted is continuous [1].
From another perspective, there are many situations when
incremental learning is required instead of a batch processing
technique. This is because many sources produce continuous
flows of data in the form of streams. As a convenient example,
consider an agent operating in a real-time environment that may
need to constantly process the latest sensor information to
determine its next action.[2] Data streams are distinguished by
three major characteristics: 1) being open-ended 2) flowing at
high-speed and 3) generated from non stationary distributions,
introducing drift into the target function. Contemplating these
features, online learning algorithms which address data streams
should have certain capabilities: keeping up with the rate at which
data arrive, using a single pass on data and fixed memory,
maintaining a prediction model at any time and finally being able
to make the model consistent with the most recent data.
Traditional estimates of prediction errors are mostly based on the
assumption that training instances are observed from the
Independent and Identical Distribution (I.I.D.) of the domain data
[3]. It is important to note that I.I.D condition is not valid in the
streams in which concept drift occurs, but it is rational to assume
that small size batches of data satisfy the I.I.D condition [4].
In this paper we present an algorithm named Adaptive Regression
Tree (ART) that focuses on incrementally building a fast and
accurate regression tree based on recently seen instances in a
timely manner. As its name suggests, the most important
advantage of ART is its adaptability, that is the ability to detect
concept drifts and adapt the tree structure effectively to the new
concept. In fact, detecting concept drift in a portion of the tree, the
associated subtree is updated rather than being discarded. This
preserves speed and accuracy of the algorithm.
The rest of the paper is organized as follows: In section 2, related
works are discussed. The proposed method is presented in section
3. Section 4 evaluates the method and compares it with some
previous algorithms. Finally, section 5 concludes the paper and
suggests some future research directions.
2. RELATED WORK
Instances of data streams fall into two categories: stationary and
time-changing. Many approaches are used in literature for
processing open-ended but stationary streams in dynamic
environments. Some methods use decision trees for learning
evolving data streams. Examples of efforts in this field include [57]. In [7] one such algorithm, the Very Fast Decision Tree
(VFDT) is proposed. VFDT is a decision tree learning algorithm
that dynamically adjusts its bias upon coming new instances.
When an instance is available, it traverses the tree, following the
branch corresponding to the attribute’s value in the instance.
When it reaches a leaf, the sufficient statistics are updated. Then,
each possible condition based on attribute-values is evaluated. If
there is enough statistical support in favor of one test over the
others, the leaf is changed to a decision node. The new decision
node will have as many descendant leaves as the number of
possible values for the chosen attribute (therefore this tree is not
necessarily binary). The decision nodes only maintain the
information about the split-test installed within them. To the best
of our knowledge, however, there are few methods addressing the
problem of incremental learning using model trees [8].
Time-changing data streams, on the other hand, have attracted
fewer interests. Two serious obstacles in this realm are detection
of drifts and developing effective algorithms to cope with unique
characteristics of streams with drift. [9] is an example that
proposes a method for dealing with the former. An idea that has
successfully addressed the latter is using decision trees [6, 8]. In
[6] VFDT has been extended to deal with non-stationary data
streams and the so-called CVFDT algorithm is introduced. By
storing some statistical variables associated with each node over a
window of instances, the algorithm is able to perform regular
periodic checks for its splitting decisions. If some split recognized
to be invalid, a new decision tree rooted at the related node is
started growing. In [8] a fast incremental model tree with drift
detection (FIMT-DD) is proposed. The algorithm starts with an
empty leaf and reads instances in the order of arrival. Each
instance is traversed to a leaf where the necessary statistics are
updated. Given the first portion of instances, the algorithm finds
the best split for each attribute, and then ranks the attributes
according to some valuation measure. If the splitting criterion is
satisfied it makes a split on the best attribute, creating two new
leaves. Upon arrival of new instances to a recently created split,
they are passed down along the branches corresponding to the
outcome of the test in the split for their values. Change detection
tests are updated with every instance from the stream. If a change
is detected, an adaptation of the tree structure will be performed.
3. PROPOSED METHODOLOGY
3.1 Overall Algorithm
In this section, a learning algorithm, named Adaptive Regression
Tree (ART), is presented. The main goal is to incrementally build
a regression tree that can predict the target variable of newly
coming instances with high accuracy and in a timely manner. The
labels of the instances are first predicted by the regression tree.
Then, the true labels are revealed to the algorithm to be used to
update the current tree. Since the stream of instances may contain
concept drifts, the tree is adapted in a way that makes it consistent
with new concepts. The adaptation process either alters the tree
structure or updates the nodes’ regression models while
maintaining the structure unchanged. A number of variables for
drift detection, expanding the tree and also prediction of the target
labels are stored in each node which are updated while processing
consecutive instances. The main novelty of the proposed
algorithm is in the handling of the drifts. ART seeks for adapting
the tree in a way that it becomes suitable for the present concept
instead of building a new subtree for that region.
ART algorithm is demonstrated in procedure 1. The algorithm
starts with a single leaf (line 1). Coming a new instance, it is
classified by the regression tree being at hand up to now (line 3).
For this purpose, the instance is passed over the nodes starting
from the root down toward a leaf of the tree. In the leaf, a very
simple model predicts the label of the instance. The simple rule to
build a model at each leaf is to calculate the average of the labels
for instances reached to that leaf. This average is used as the
model’s output. Then the real label is uncovered and the process
of updating the tree begins.
Once more, the instance is traversed throughout the tree. At each
node it visits, the variables associated with drift detection test,
called PH test, are updated according to this instance (line 6) and
PH test is run, firstly. Drift detection test will be discussed in
Input: data stream along with description
of attributes
Output: predicted labels of instances
1 root ← init_tree()
2 foreach instance i do
l[i] ← root.predict(i);
3
4
cur_node ← root
5
while ( cur_node ≠ null ) do for cur_node
6
update_PH()
7
if PHTestPassed()
8
tagNode(“drifted”)
9
resetSubtree()
10
if isDrifted() or isLeaf()
11
updateEBST(i)
12
if !isLeaf()
13
if isDrifted()
14
if adaptRequired()
15
adapt()
16
if adapted() or driftPassed()
17
resetEBST()
18
setDrifted(false)
19
cur_node←cur_node.next(i)
20
else//leaf
21
if !driftedNow()
22
update_model(i)
23
if splitTestPassed()
24
split()
Procedure 1. Overall pseudo code of ART algorithm
section 3.3. If drift has occurred (line 7), the node will be marked
as “drifted” and all variables associated with this node and its
successors are reset to an initializing value, since the previous
variables do not describe the environment properly anymore (lines
8, 9). A summary of values of attributes of previous instances are
stored in so-called E-BST data structures per each node. These
statistics help identify the best attribute to split over. In the case
that the node is either a leaf or is faced with drift, a split may be
required and thus these structures should be updated regarding the
new instance (lines 10,11). Otherwise, no updates should be
performed because large E-BST structures can make the updating
process very ineffective so they are to be kept as small as
possible. Details will be explained in section 3.2. A non-leaf
drifted node is given the opportunity to become consistent with
new concepts via updating the models of its subtree. This prevents
structural altering of the tree. Only if it fails to update itself
appropriately, the altering process will go through (line 15). If the
drift is not due to the change in the order of discriminative
attributes, the structural altering should not be performed so some
strict condition (line 14) is set to prevent the tree from these
unnecessary changes. In addition, compared to structural altering,
adapting the tree just by updating the leaves’ models is preferable,
because of its much less time consumption. Section 3.4 describes
the structural altering process in details. If the altering process is
done or sufficient number of instances are processed after
occurring a drift, E-BST structures are reset and the node will be
unmarked.(lines 16-18). If the node under review is a leaf, its
regression model will be updated provided that no concept drift
has occurred currently (lines 21, 22). This condition for updating
the models is used to alleviate the problem of noisy instances for
the tree. In fact, even one noisy instance may be the cause for
concept drift detection. Therefore, the updating process for
models will not be done right after detecting concept drifts. As the
final step of each loop, if necessary conditions for split (refer to
section 3.2) are fulfilled in the leaf, then a split will occur (lines
23, 24).
3.2 Split Test
Once sufficient number of instances reach a leaf, it will have the
potential to be split over some attribute. The very simple split
condition that must be satisfied within a leaf is that the number of
instances reaching that leaf is at least Nsplit that is a parameter of
the algorithm.
In order to make a split, the best split value over each attribute has
to be identified. Because attributes are assumed to be numerical,
any real value can potentially act as a split value and thus the
process of finding the best one is not straight-forward.
Furthermore, according to intrinsic characteristics of data streams,
storing the values of attributes of coming instances is not
practical. Consequently, a method based on extended binary
search tree (E-BST), proposed by [8] is used in ART.
As the E-BST method suggests, for each possible split value, h, a
corresponding value measured by Standard Deviation Reduction
(SDR) is calculated. That is:
(ℎ) = ( ) =
( ) −
1
(
( ) −
−
1
(
(
) ),
),
(1)
(2)
where S is a set of size N containing all split values and SL (SR) is
a set of size NL (NR) containing the values less than h.
For each attribute, then, the best value to split over will be the one
having maximum SDR among others. Finally, the attribute
corresponding to the highest SDR value will be chosen as the
splitter of that leaf.
As a new instance arrives, SDR values will change and should be
recalculated, but for running time purposes, this is done only
when the number of instances is a multiple of a user defined
parameter, Nmin.
There are also two monitoring leverages that prevent the tree from
exceeding a normal size. The first one is a constraint on a
variable, called split_countA, that determines how many times
split has been done over an attribute A within a single path from
root to a specified node. If the attribute is used max_split times,
there will be no more splits on it for successors of that node. The
second one is a constraint on the depth of the tree. If splitting a
leaf makes depth more than so-called max_height value, the leaf
will not be split anymore. max_height and max_split are two
parameters of the algorithm that are to be tuned.
3.3 Drift Detection
In the case that a substantial decrease in accuracy is detected, a
concept drift has occurred. A simple test, named Page-Hinckley
(PH) proposed in [10] is used with some modifications to detect
drift. This test works based on the cumulative classification error
up to this new instance compared to that of previous instances
with the underlying aim of monitoring the tree’s accuracy.
Suppose that for instance i, li and pi denote its true and predicted
labels, respectively. Then:
= | − |.
(3)
If total number of instances reaching a node, starting from
previous time a drift has taken place, is nt, then the average value
of the labels of these instances is:
̅=
1
,
(4)
− ̅ − ,
(5)
and
=
whereis a constant. Define:
= min
,
(6)
=
.
(7)
−
Coming a new instance, if the value of mT shows a substantial
change from its previous values, that is if the difference between
mT and its minimum over the period starting from last occurrence
of drift exceeds a predefined threshold, , (i.e. PH > ) then a
concept drift has been detected.
Once a drift detected, all variables of PH test including ̅ , mT, MT
and PH are reset to zero, in order to be able to test for later drifts.
A similar drift detection mechanism is used in [8] which is based
on the changes in the values of the labels of the instances reaching
a node. However, in that of ART it is based on the accuracy of
models.
3.4 Altering Tree Structure
In the case of detecting a concept drift, two options are ahead.
One is to disregard the subtree at which the drifted node is rooted
and start to build it from scratch. The other is to maintain the
current subtree and try to apply some structural changes in order
to make the tree adapted to new concept.
The former option is too costly both in time and performance to
be used in data streams. Therefore we suggest using the latter.
Nevertheless, structural adapting is also a time consuming
procedure, so only under certain circumstances, explained below,
it will be done. To put it briefly, if drift is identified to be occurred
in a node for the first time, the tree’s structure will not be adapted
immediately. Only if further investigation, specifically detection
of drift for the second time, approves that the node really needs a
change in the structure of its subtree then the adaptation procedure
will be invoked.
When the Page-Hinckley test for drift detection (discussed in
section 3) turns to be true for a node, the node is marked as
“drifted”. In this case, the only reaction is to reset all variables
including E-BST structures, PH test variables and models in
leaves to an initializing value. This is done for this node as well as
for all nodes that have current node as their predecessor. These
resets are due to the fact that E-BSTs have to be built from scratch
to identify the best attribute after drift and PH test should be
restarted to check for further drifts. While the process of coming
new instances goes through, two different situations are likely to
happen in this node. First, the node may recover itself by updating
its model in a way adjusted to new instances, and second, the PH
test warns about the second occurrence of drift in that node. If the
former takes place, the tree is rescued from structural changes and
time is saved also. The “drifted” tag will be removed after Nmin
instances come to node without a sign of second drift. On the
contrary, if the latter happens, then structural adapt should be
done.
From a conceptual point of view, an adaptation in the structure, is
supposed to replace the attribute used to split up a node with
another attribute that is more consistent with new concept. The
procedure for finding a better attribute is the same as what is done
while splitting a node. It means, following the split test, the
attribute that currently owns the maximum SDR value will be the
new attribute we prefer to substitute for the old one.
This new attribute and the previously used one are related to each
other in one of the four cases discussed here:
Case I. Two attributes are identical.
In this case, no additional changes are required and the “drifted”
tag will be removed.
Case II. The new attribute is the attribute used to split up one of
the two children of this node and these two children have different
splitting attributes.
Figure 1 shows this situation. The tags on nodes denote the name
of attribute on which the split occurs. Assume that node A is to be
adapted. Further, assume the new attribute known to be the best
after drift, is B, exactly the one used in the node’s left child.
Adaptation process, changes the subtree structure in a way shown
in figure. The two nodes denoted by A, in adapted tree are copies
of A node in original tree, identical in each and every aspect.
Figure 2. Altering tree structure (case III)
Case IV. Otherwise.
This case in concerned with situations in which the new attribute
in neither identical to old splitting attribute, nor is it the one used
in one of the children. This reveals that this attribute is either used
in indirect children of that node or is not used at all. No matter
which state it is, a change more drastic than the ones described in
cases II and III has to be applied. To put it another way, a global
change in the structure of the tree is required in this case which
may even lead to higher values of classification error.
Furthermore, the change – replacing the old attribute by the new
one – can take place with a short delay in later consequent
adaptations of tree provided that those adaptations fall into case II
or III.
The sensible idea behind these adaptations is that the previously
built structure for the tree is worth keeping. The new structure
should be as close as possible to the old one, that is, instances
should be conducted through the same path to a node that they
would have reached before the adaptation happens. This is
because the model adopted for an instance, is still the best model
for that instance even after the first moments the drift has
occurred.
4. EXPERIMENTAL RESULTS
Figure 1. Altering tree structure (case II)
In the special case that B and C are leaves, the new attribute
simply substitutes for the old one and all of the variables of these
two children will be reset. The reason for reset is that the
instances seen before adaptation should not influence the model
which is going to be built after adaptation.
As it is clear in the figure, a new attribute, namely B, is added to
the path A-C-G and A-C-F. This may put the tree beyond the
constraints of the maximum split number and/or maximum height
defined for it. Thus, a pruning procedure will be activated which
traverses all the nodes below the node B and is responsible for
pruning the subtree beneath a node in which at least one of these
two rules are violated.
Finally, the models and PH test variables of the right subtree, C,
F, G and all of their children, are reset to an initializing value.
Exactly the same discussion holds for the case that the new
attribute is C, the one used in right child of current node.
Case III. The new attribute is the attribute used to split up one of
the two children of this node and these two children have identical
splitting attributes.
The following figure shows this situation. Structural adaptation is
done in a slightly different manner with that of case II.
In order to evaluate ART, its implementation is developed1 in
C++ language. Some computer experiments are conducted based
on this implementation. A major hurdle in regression problems,
especially in the presence of concept drift is the lack of suitable
datasets. We found five real datasets appropriate for our testing
purpose. In the following, firstly these datasets are described.
Then parameter tunings and evaluation metrics are discussed.
Finally, results of the proposed method on the introduced datasets
are reported. The algorithm is compared with FIMT-DD method.
According to [8] FIMT-DD overcame other algorithms proposed
in the field of regression in data streams with concept drift. In
addition, this method also uses regression tree as its model
infrastructure and is the most similar methods to ours. The
experiments show effectiveness of our algorithm in comparison to
FIMT-DD method.
4.1 Datasets
We used five real datasets on which the performance of the
algorithm is evaluated. Three of these datasets are manipulated in
order to make them appropriate for regression problem, while the
other two remained unchanged.
Sensor: The dataset introduced by [11] consists of about
2,219,000 records and 5 attributes, namely time, humidity, light,
voltage and temperature. These records include the information
collected from 54 sensors deployed in Intel Berkeley Research
laboratory in a two-month period. The type and place of concept
drift is not specified. This dataset is prepared for classification
tasks and target label is the sensor ID. Thus, in order to make it
usable for regression, a few modifications are necessary. The
sensor ID is omitted and the temperature is used as target value.
1
Accessible from {first_author_home_page/ART/code}
Table 1. Comparison of MSE and time on all datasets
ART
MSE
FIMT-DD
time(s)
MSE
time(s)
Sensor
0.338
218.731
0.348
271.524
Electricity
63976.200
4.795
1216282
1.720
Houses
0.295
3.703
0.336
3.950
Dodgers
43.656
5.182
81.766
2.88
Airline
786.487
79.421
799.491
79.512
Noisy data also are removed. The ultimate dataset contains 4
attributes, one target attribute and 1,500,000 instances.
of tree. max_split and max_hight are set to 4 and 15 for the
first four datasets and 2 and 3 for airline dataset.
Electricity: The data was first described in [12]. They are
collected from the Australian New South Wales Electricity
Market. In this market, the prices are affected by demand and
supply of the market and are set every five minutes. The dataset
contains 45,312 instances dated from 7 May 1996 to 5 December
1998. Each instance of the dataset refers to a period of 30 minutes.
The dataset has 8 attributes and the goal is to predict whether
prices go “up” or “down” compared to last 24 hours. For our
regression purpose, it is modified to have 6 attributes including
date and price. The target is to predict electricity demand.
Two sets of default parameters are used in the experiments of
FIMT-DD method. In order to provide a fair comparison, the best
set is used for all datasets [8].
Dodgers: This loop sensor data, accessible from [14], was
collected for the Glendale on ramp for the 101 North freeway in
Los Angeles. It is close enough to the stadium to see unusual
traffic after a Dodgers game, but not too close so that the signal
for the extra traffic is overly obvious. The observations were
taken over 25 weeks, 288 time slices per day (5 minute count
aggregates). The goal is to predict the presence of a baseball game
at stadium. The dataset is changed in the way that the number of
cars passing the freeway regarding the 5 attributes month, day,
year, hour and minute is the target variable. Records containing
missing value are removed and final size of dataset is 47,497.
Airline: Introduced by [8], this dataset consists of about 116
million records and 13 attributes. The records include flight
arrival and departure information for the commercial flights
within the USA, from 1987 to 2008. The target value is “arrival
delay”. The last 1 million records are used in our tests.
4.2 Parameters
All experiments were run on Intel Pentium 3.4 GHz CPU and 2
GB RAM. Furthermore, some variables should be initialized to a
constant value before the algorithms start to run. Same values of
these parameters are used for the first four datasets. However, for
airline dataset different values have to be selected, due to its
different properties. These parameters are categorized into three
parts:
a. E-BST parameters: including Nsplit used for split test and
Nmin, denoting the chunk size on which split condition is
tested. Nsplit and Nmin are set to 60 and 200 for all datasets.
b. PH test parameters: including, used to calculate PH formula
and, the drift detection threshold. andis set to 20 and 0.005
for the first four datasets and 10000 and 0.01 for airline
dataset.
c. Pruning parameters: including max_split, identifying how
many times split on each attribute is allowed within a single
path and max_height, identifying maximum allowed height
4.4 Results
Running ART and FIMT-DD algorithms on five described
datasets leads to the results reported in table 1. ART shows lower
errors in all cases and better running times in most of them.
Specifically, for sensor, houses and airline datasets, both MSEs
and running times of ART are lower than that of FIMT-DD.
Running time for the sensor dataset is much lower (about 20%),
while for houses and airline datasets, the running times are almost
the same. For the other two datasets, i.e. electricity and dodgers,
ART shows a significantly lower MSE at a cost of more running
time. By setting max_split and max_height parameters to lower
values, better running times also could be obtained.
The main advantage of ART over FIMT-DD that leads to notable
results is that in ART a drifted area within the tree is not discarded
completely and it is intended to be restructured to adapt to new
concept. However, in FIMT-DD a drifted subtree is replaced
thoroughly by a new tree that is built upon instances arrived after
drift.
In the following diagrams, a comparison on MSE of ART and
FIMT-DD is given on each dataset separately. The horizontal axis
shows the arrival of instances in chronological order. The
variations of MSEs according to concept drifts can be seen in
these plots.
MSE
Houses: The dataset firstly appeared in [13]. It contains 20,640
instances on housing prices in California with 9 economic
covariates.
4.3 Evaluation Method and Metrics
In data stream mining, the most frequently used measure for
evaluating predictive error of a model is predictive sequential or
prequential error. For each instance of the stream, the actual
model makes a prediction based only on the instance attributevalues. The prequential error is computed based on an
accumulated sum of a loss function between the predicted and real
values [15]. Here, Mean Squared Error (MSE) is used to compare
prequential errors of the regression models. Running time is also
an important evaluating factor. The results given in this section
are averages over 5 runs on each configuration of the algorithms.
Seen Instances
Figure 3. Comparison of MSE on sensor dataset
6. REFERENCES
MSE
[1] Malerba D., Appice A., Ceci M., and Monopoli M. 2002.
Trading-off local versus global effects of regression nodes in
model trees. In Proceedings of the 13th international
symposium on foundations of intelligent systems, LNCS.
Springer, Berlin, vol 2366, 393–402.
Seen Instances
Figure 4. Comparison of MSE on electricity dataset
[2] Potts D., and Sammut C. 2005. Incremental learning of linear
model trees. J. Mach. Learn. Res. 6, 15–48.
doi:10.1007/s10994-005-1121-8
MSE
[3] Rodrigues P. P., Gama J., and Bosnic Z. 2008. Online
reliability estimates for individual predictions in data
streams. In Proceedings of IEEE international conference on
data mining workshops. (IEEE Computer Society, Los
Alamitos, CA), 36–45.
Seen Instances
Figure 5. Comparison of MSE on houses dataset
[4] Hosseini M.J., Ahmadi Z., and Beigy H. ,2011. Pool and
Accuracy Based Stream Classification: A new ensemble
algorithm on data stream classification using recurring
concepts detection. In Proceedings of the 11th International
conference on data minging workshops, Vancouver, Canada.
MSE
[5] Gama, J., Medas, P., and Rodrigues, P. 2005. Learning
Decision Trees from Dynamic Data Streams. In Proceedings
of the 2005 ACM Symposium on Applied Computing, 573–
577.
[6] Hulten, G., Spencer, L., and Domingos, P. 2001. Mining
Time-changing Data Streams. In Proceedings of the 7th
ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining, 97–106.
Seen Instances
Figure 6. Comparison of MSE on dodgers dataset
[7] Domingos, P., and Hulten, G. 2000. Mining High-speed Data
Streams. In Proceedings of the ACM Sixth International
Conference on Knowledge Discovery and Data Mining, 71–
80.
MSE
[8] Ikonomovska E., Gama J., and Dzeroski S., 2011. Learning
model trees from evolving data streams. In Data mining and
knowledge discovery. (Kluwer Academic Publishers
Hingham, MA, USA) vol. 23, 128-168.
Seen Instances
Figure 7. Comparison of MSE on airline dataset
5. CONCLUSION AND FUTURE WORKS
We proposed a regression tree algorithm, named ART, for data
streams in the presence of concept drift. In this method, a tree was
built incrementally for predicting the target value of incoming
instances. After the prediction, the real label of the instances was
revealed to the algorithm and used to update it. In the case of
detecting concept drifts in a portion of the tree, either altering the
tree structure or updating the nodes’ regression models were done
to handle it. This method was compared to FIMT-DD and it was
shown that it improves the accuracy or running times over the
used data sets.
Some future research directions might include the followings.
First, new conditions for altering the tree structure could be tested.
Second, other efficient methods could be used instead of the
current drift detection or splitting test. Third, the algorithm could
be run on more real data sets in order to achieve more reliable
results.
[9] Song, X., Wu, M., Jermaine, C., and Ranka, S. 2007.
Statistical change detection for multidimensional data. In
Proceedings of the 13th ACM SIGKDD conference on
knowledge discovery and data mining, 667–676.
[10] Mouss H., Mouss D., Mouss N., and Sefouhi L. 2004. Test of
Page–Hinckley, an approach for fault detection in an agroalimentary production system. In Proceedings of the 5th
Asian control conference (IEEE Computer Society, Los
Alamitos, CA), vol 2, 815–818.
[11] Zhu, X. 2010. Stream Data Mining repository. Accessed on
Sep 2012; Available from:
http://www.cse.fau.edu/~xqzhu/stream.html.
[12] Harries, M. 1999. Splice-2 comparative evaluation:
Electricity pricing. Technical report, The University of South
Wales.
[13] Pace, R. K., and Barry, R. 1997. Sparse Spatial
Autoregressions, Statistics and Probability Letters, vol. 83,
no. 3, 291-297. dataset accessible from
http://lib.stat.cmu.edu/datasets/houses.zip
[14] http://archive.ics.uci.edu/ml/datasets/Dodgers+Loop+Sensor
accessed on July 2012.
[15] Gama, J. 2010. Knowledge discovery from data streams,
Chapman & Hall/CRC.