Tracking Concept Drifting with an Online-Optimized Incremental Learning Framework Jun Wu*, Dayong Ding Xian-Sheng Hua Bo Zhang AI Lab, Tsinghua University Beijing 100084, P. R. China Microsoft Research Asia Beijing 100080, P. R. China AI Lab, Tsinghua University Beijing 100084, P. R. China {wujun01, ddy01} @mails.tsinghua.edu.cn [email protected] [email protected] ABSTRACT Concept drifting is an important and challenging research issue in the field of machine learning. This paper mainly addresses the issue of semantic concept drifting in time series such as video streams over a relatively long period of time. An OnlineOptimized Incremental Learning framework is proposed as an example learning system for tracking the drifting concepts. Furthermore, a set of measures are defined to track the process of concept drifting in the learning system. These tracking measures are also applied to determine the corresponding parameters used for model updating in order to obtain the optimal up-to-date classifiers. Experiments on the data set of TREC Video Retrieval Evaluation 2004 not only demonstrate the inside concept drifting process of the learning system, but also prove that the proposed learning framework is promising for tackling the issue of concept drifting. Categories and Subject Descriptors H.3.1 [Information Storage and Retrieval]: Content Analysis and Indexing-indexing methods; I.2.10 [Artificial Intelligence]: Vision and Scene Understanding-video analysis. General Terms Algorithms, Experimentation. Keywords Incremental Learning, Gaussian Mixture Model, Video Content Analysis, Concept Drifting, TREC Video Retrieval Evaluation 1. INTRODUCTION In time series, the underlying data distribution, or the concept that we are trying to learn from the data sequences, typically is constantly evolving over time. Often these changes make the models built on old data inconsistent with the new data, thus instant updating of the models is required [1]. This problem, known as concept drifting [2], complicates the task of learning concepts from data. An effective learner should be able to track such changes and quickly adapt to them [1]. To model concept Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MIR’05, November 10–11, 2005, Singapore. Copyright 2005 ACM 1-59593-244-5/05/0011…$5.00. drifting in time sequences has become an important and challenging task. Klinkenberg et al [3][4] propose a new method to recognize and handle concept changes with support vector machines, which maintains an automatically adjusted window on training data so that the estimated generation error is minimized. Fan [5] points out that the additional old data do not always help produce a more accurate hypothesis than using the most recent data only. It increases the accuracy only in some “lucky” situations. In [6] Fan also demonstrates a random decision-tree ensemble based engine, named as StreamMiner, to mine concept drifts in data streams. In StreamMiner systematic data selection of old data and new data chunks is utilized to compute the optimal model that best fits on the changing data streams. Wang et al [7] propose a general framework for mining drifting concept in data streams using weighted ensemble classifiers based on their expected classification accuracy on the test data under the time-evolving environment. Though many methods have been proposed to deal with concept drifting, few researchers have covered the issue of how to track concept drifting in a systematic viewpoint. This paper will address this issue based on a novel online learning framework, termed OOIL (Online-Optimized Incremental Learning) [11]. The evolving processes of the drifting concepts, as well as a couple of tracking measures related to concept drifting, will be investigated. The remainder of this paper is organized as follows. Section 2 briefly introduces the issue of concept drifting. The onlineoptimized incremental learning framework is presented in Section 3. Section 4 discusses how to track concept drifting in detail. Experiments are introduced in Section 5, followed by conclusions and future works in Section 6. 2. CONCEPT DRIFTING For an incoming data streams, there are two important issues: data sufficiency and concept drifting [5]. Traditional machine learning schemes typically do not consider the problems of concept drifting. Actually if there is no concept drifting, and the training data set is sufficient, there is no need to update the models. But, when concept drifting occurs, besides the new data, the old data should also be considered to enhance the performance of the system. However, how much old data should be used and how to use these data are not trivial issues [5]. * Supported by National NSF of China (No.60135010), National NSF of China (No.60321002) and the Chinese National Key Foundation Research Development Plan (2004CB318108). Most of existing research works related to concept drifting in machine learning are mainly concerning the final classification results. A more sophisticated way is to make clear whether there is concept drifting in the system and how much the concepts are drifting in a quantitative way before classification. This paper will follow this idea and define a couple of measures to investigate the intrinsic properties of the learning systems. 3. OOIL FRAMEWORK - AN EXAMPLE LEARNING SYSTEM 3.1 Framework Overview As mentioned above, the scenario we are investigating is a batch learning problem. That is, data are supposed to arrive over time in batches. Different from traditional batch learning [16] [17], we suppose all of the upcoming batches are unlabeled. Only a small preliminary pre-labeled training data set is required during the whole learning process. 3.2 Symbol Definitions To describe our system more clearly, some symbols which will be used in this paper are listed in Table 1 and Table 2. It should be noted that yt and xt denote the t-th batch and the first portion of the t-th batch. For simplicity, if there is no confusion, we denote y as any element batch in D, x as the part (a small portion) of y (i.e., the superscript t is omitted). Table 1. Main symbols definition (1) meanings feature vector an outcome of Y pre-labeled training sets t-th batch all batch set part of t-th batch symbols Y = [Y(1), Y(2), …, Y(d)]T y = [y(1), y(2), …, y(d)]T y*c = {y1,c, y2,c, …, yn(c),c} yt = {yt1, yt2, …, ytN(t) } D = {y1 ,…, yt-2, yt-1, yt ,…} xt = {xt1, xt2, …, xtT} (1<T<N(t)) Table 2. Main symbols definition (2) symbols c C d n(c) N(t) y x meanings semantic concept c (1 c C) total number of considered concepts the dimension of the feature vector the number of labeled samples for c the number of samples in t-th batch any element batch in D the first portion of y 3.3 OOIL Framework As aforementioned, our proposed OOIL framework consists of three main modules, which will be presented in detail as follows. Figure 1. Online-Optimized Incremental Learning Framework Figure 1 shows the flowchart of our proposed online-optimized incremental learning (OOIL) framework. As illustrated, Video 1, 2, …, K (for the box at top-left) are the pre-labeled training data set, and batch 1, 2, ..., N (for the white boxes below) are unlabeled upcoming data batches. The OOIL framework consists of three primary modules: Global Model Pre-Training (GMPT), Local Adaptation (LA) and Global Model Incremental Updating (GMIU). When the concept drifts along the timeline, the underlying data distribution, which is related to the corresponding model, will change among different batches. Therefore, the key issue of our proposed framework is how to update the models instantly in order to adapt to the distribution changing of the new upcoming data batch. In more detail, in this framework, firstly C (the number of all target concepts) original GMMs (termed Global GMMs) are generated by training on the pre-labeled training samples using a so-called Agglomerative EM (AEM) algorithm [8]. Then for the under-test t-th batch, its first portion (xt) is also modeled by a GMM using AEM algorithm. Thereafter, this “reference GMM” is compared with all the Global GMMs, to generate C corresponding local optimized GMMs which are then applied to classify the samples in the under-test batch. At last, all the Global GMMs are updated by combining the local optimized models and the original global models. And, these updated global models will be used in the next process for the newly-upcoming batch. 3.3.1 Global Model Pre-Training Firstly, pre-labeled training samples are used to train a set of Global GMMs by AEM algorithm [8]. Suppose a certain semantic concept c has a finite mixture distribution in feature space Y, represented by ( f Y y *c | k ,c )= k α m,c N ( m =1 m,c , m,c ) (1) where k is the number of components, N ( m,c, m,c) is an Gaussian component, m,c = ( m,c, m,c) is its mean and covariance matrix, and m is the mixing probabilities ( km=1 m,c=1). Let k,c= { 1,c, 2,c, … , k,c, 1,c, 2,c, … , k-1,c} be the parameter set defining a given mixture. Typical EM algorithm will iteratively maximize maximum likelihood (ML) to estimate k,c based on the training samples yc as ˆ k ,c = arg max L ( k ,c , y c ) (2) k ,c In this paper, a modified EM algorithm, AEM, based on Mixture Minimum Description Length (MMDL) criterion [8], is adopted to estimate the best k,c (the hat “ ˆ ” is omitted for simplicity) and the best k = k(c) for GMMs from the labeled samples. 3.3.2 Local Adaptation (Optimization) As mentioned above, semantic concepts have a so-called “Time Local Consistency” property, which enlightens us to improve the classification accuracy by exploring the characteristics of certain amount of the unlabeled samples in current under-test batch. Recall x is the first portion of unlabeled samples in the under-test batch. Similar to pre-training process in above sub-section, we estimate a GMM for the sample set x. Suppose the estimated GMM parameters are denoted by t kt = { t t 1, 2, t , kt , α1t , α 2t , ,α t t k −1 }. (3) Local adaptation is to find a set of local GMMs which is the combination of the original global GMMs, represented by { k ( c ),c , 1 c C }, and t t , aiming at optimizing the k classification performance on current under-test batch, i.e., y. As aforementioned, the semantic concepts may drift gradually over time. GMM’s local adaptation is designed to reduce the affection caused by concept drifting by locally adapting GMMs on a small portion of the under-test batch, as following steps. Step 1: Compute the symmetric Kullback-Leibler (KL) divergence (distance) [9] (Ds) of every pair of Gaussian components in the GMMs represented by k ( c ),c and t t , as below form 3.3.3 Global Model Incremental Updating Since drifting concept accords with the change of parameters for generative models, in this sub-section, we will investigate how to update the parameters of GMMs. As we all know, a GMM is completely determined by a series of Gaussian components and their corresponding weights. It is observed that in theory there are three basic operations for changing GMM components. The first operation is adding, i.e., one or more new Gaussian components will be added into the model. The second is deleting, i.e., some existing components will be removed in the new model. The third one is drifting, in which some components trend to drift to a new position in the parametric space. As a result, we decompose the updating procedure of the Gaussian components into these three cases. Based on above observations, we introduce the scheme for updating global GMMs by combining the original global models with the localized models. The updated models will be applied as up-to-date global models for new upcoming batches. For convenience, the localized GMM for concept c is represented by l k l ( c ),c fX x | = k l (c ) j =1 k ( D s (c, i, j) = D s N (y c | = 1 tr 2 + 1 2 ( ( i,c i,c − − i,c , t j i,c ), N (x(t )| )( ) t −1 j ) t T j −1 i,c + t j, t j )) (4) −1 i,c − ( ) ( t −1 j i,c t j − ) k ( c ),c ( Step 3: Gaussian components N x t j, t j ) Gaussian components in the localized GMM. It should be noted that some concepts possibly are not able to obtain a local model. For this case, the global model will applied be as the local model and the global model incremental learning process in the next subsection will be skipped. Therefore, for a sample yi in the under-test batch, the classification result is determined by 1≤ s ≤C { ( l j,c , l j,c ) in localized , find the most “close” component N ( i,c , i,c ) in ( ( i = arg min DKL N 1≤m≤ k(c) l j,c , l j,c ), N ( m,c , m,c )) . (10) Step 2: (Adding New Components) Gaussian component N ( i,c , i,c ) in global GMM k ( c ),c is replaced by N *i,c , i,*c , t α j , kl (c) = | J t(c) | is the number of c(y i ) = arg max f X y i | ( ) , j ∈ J (c), are taken as ( j∈J t ( c ) (9) (6) } a new GMM estimation of semantic concept c for the under-test tth batch. That is, the local GMM for the certain semantic concept c is α tj (7) fX x | l l = N tj , tj , k ( c ) ,c α l (c) j∈J t (c) where α l (c) = ). by comparing Kullback-Leibler divergences (DKL) [9] as (5) J t (c ) = j : c = arg min D (s, j ) . 1≤ s ≤C l j,c If the minimum DKL in (10) is larger than a predefined threshold (denoted as KLDThresh), go to Step 2’ (adding new components). Otherwise, go to Step 2 (drifting existing components). Let J t(c) be a subset of {1, …, kt}, defined by { l k l (c),c GMM k 1≤ i ≤ k ( c ) l j,c , To update the global GMMs, we combine the components in original GMMs with the most “close” ones in the localized GMMs, delete those which no longer appear in the local GMMs, or add new components to the global models, as follows. Step 1: For each Gaussian component N Step 2: Compute the distance between each semantic concept c and each Gaussian component in t t , defined by D (c, j ) = min D (c, i , j ) ( α lj,c N l k l ( s ),s )}. That is, the sample yi is classified to semantic concept c(yi). (8) ( ) which is defined by ( * i,c , * i,c )= ( arg min DKL (1 − α ) N ( ( , ) i,c , i,c )+ α N ( l j,c , l j,c ), N ( * i,c , * i,c )) , (11) where is a parameter standing for the “updating speed”, which will be discussed in detail in Section 4.1.8. According to reference [10], *i,c , i,*c has a close form as ( ) * i,c * i,c ( = (1 − α ) i,c + If a component N ( = (1 − α ) i,c +α )+ α l j,c + i,c i,c , T i,c i,c ) in k ( c ),c l j,c l j,c , ( ) l T j,c (12) − * i,c ( ) * T i,c . (13) drifts to a new position, add label i into a label set, JG(c), which is initialized as an empty set (at the same time, set kG(c) = 0. Note that kG(c) is the number of elements in label set JG(c)), and kG(c) = kG(c) + 1. Step 2’: (Drifting Existing Components) N into k ( c ),c ( l j,c , l j,c ) is added as a new component in the global model, and updated ′ +1,c k(c) global Gaussian model ( fY yc | ' k (c )+1,c )= k 4.1.1 Component Lifecycle has the form of (1 − β )α m,c N ( m =1 N add +β N j =1 ( l j,c , l j,c m,c , m,c ) , ) (14) where Nadd is the total number of added components from the * local model, = *•Wadd (0 <1.0) is also a parameter controlling the “updating speed”, which will be discussed in detail Wjl is the total weight of newlyin Section 4.1.8. Wadd = added components from local model N ( l j,c , l j,c ) , and kG(c) = kG(c) + Nadd . Finally, add a label: {kG(c)+j} into the label set JG(c). The current Global GMM has N1 = k+Nadd components. Step 3: (Deleting Outdated Components) For 1 i ∉ J G (c) , then delete the i-th component N ( i,c , i i,c kG(c), if ) in k ( c ),c by setting the corresponding weight i,c = 0 and normalizing the weights of remaining components so the sum equals to 1.0. Supposing total number of deleted components is Ndelete. And now the current Global GMM has N2 = k + Nadd Ndelete Gaussian components. Finally, the updated GMM ( f yc | = kG ( c ), c ) kG ( c ) m =1 = new α old N ( m,c m,c kG ( c ) m =1 α old N ( m,c m,c , , m,c m,c ) ) new k G ( c ),c + N add j =1 has the form of α new N( j + l j,c , α new N ( p p∈J G ( c ), p ≥ k ( c ) l j,c l j,c , ) , (15) l j,c ) new is the weight of added new ones from k ( c ),c , and α p As aforementioned, for certain component in a GMM, it may disappear at a certain stage. As a result, how long it survives in the concept model is an important property. For component i within a GMM, Component Lifecycle (CL) is defined as the period from its first “birth” to its “death”, i.e. LC i = T delete − T added . (16) The “Maximum Lifecycle Component” (MLC) is the one that has the longest Component Lifecycle among the whole GMM, which demonstrates the importance of a component in the current GMMbased learning system. CL and MLC are mainly used to describe the importance of a certain component. That is to say, the longer the Component Lifecycle is, the more important the corresponding component is. Figure 2 gives an example of a GMM’s evolving process. It illustrates a GMM with four components (labeled as C-1,…, C-4 in the figure) in time t = 1. When t = 4, C-4 is deleted and C-5 is added. In time t = 7, C-3 is also deleted and at this time there are only three components in this GMM. Each column represents a snapshot of the GMM at a certain moment, and each block means one component. Furthermore, the height of each block equals its weight in GMM. GMM Component Lifecycle and Weights Distribution t=1 t=2 t=3 t=4 t=5 t=6 t=7 C-1 C-1 C-1 C-1 C-1 C-1 C-1 C-2 C-2 C-2 C-2 C-2 C-3 C-3 C-3 C-4 C-4 C-4 C-3 C-3 C-5 C-5 C-2 C-2 t=8 C-1 C-2 C-3 C-5 C-5 C-5 Figure 2. An Toy Example of a GMM’s Evolving Process. old in which, α m,c is the weight of all remaindered components in l k l (c),c . 4. TRACKING CONCEPT DRIFTING In this section, we will present how to track the drifting concepts in the leaning framework we have presented in Section 3. In addition, we will also discuss in a learning system how to set the in Equation (14)) two parameters ( in Equation (13), and which have not been detailed in previous section. Actually they are related to the “tracking measures” to be introduced in this section. 4.1 Measure Definitions semantic concepts in the evolving process to describe how concepts drift in a learning system in detail. Practically, when one concept is represented by a generative model, the concept drifting can be mapped to the changes of parameters in the parametric space. For easier analysis of concept drifting, based on the three typical operations in the updating procedures for GMM components mentioned in Section 3.3.3, in this section we define several measures in terms of tracking the 4.1.2 Dominant Component As shown in Figure 2, for a particular component such as C-1, its weight also changes alone timeline. Under the formulation of GMM, we can define the significance of a certain component according to its weight. The Dominant Component (DC) is the one that has the largest weight among all currently existing Gaussian components at a certain moment, i.e., l DC = arg max(α p ) . (17) 1≤ p ≤ k (c ) As illustrated in Figure 2, when t = 1, component 1 (labeled as C1 in the figure) is the DC, and when t = 7, C-5 is the DC. 4.1.3 Component Saliency As described in Section 4.1.2, in a GMM, for a particular component, its weight may also keep changing along the timeline. The average weight for one component is defined as Component Saliency (CS), which reflects the importance of a component over time. Though C-1 and C-2 are both Maximum Lifecycle Components, the Component Saliency of C-1 is greater than C-2. As a result, C-1 is more important than C-2 according to their Component Saliencies. 4.1.4 Component Saliency Variance and Component Saliency Drifting Curve The variance of the weight of one component over time is defined as Component Saliency Variance (CSV), which is able to describe the stability of this component, which can be visualized by Component Saliency Drifting Curve (CSDC). Component Saliency Variance is a component property over time in a GMM. 4.1.5 Dominant Component over Time As aforementioned, the measure Component Lifecycle only considers the factor of time, while Dominant Component only regards the factor of weight. If we combine both time and weight, Dominant Component over Time (DCT) can be defined as: l DCT = arg max[ AWm ⋅ CLm ] , (18) 1≤i ≤ I where I is the maximum number of possible components in the whole learning system. 4.1.6 Component Drifting Distance As has mentioned, a GMM component may gradually drifts along the timeline. Supposing the component N ( i,c , i,c ) in k ( c ),c drifts to N ( l j,c , l j,c ) in l k l (c),c , Component Drifting Distance (CDD) is defined as the KL divergence [9] between the original component and the drifted one, CD S t,ci = D KL [ N ( i,c , i,c ), N ( l i,c , l i,c )] . (19) CDD reflects not only individual drifting magnitude but also the overall drifting “speed” of the whole learning system. Figure 3 illustrates the CDD values of the toy example given in Figure 2. Component Drifting Distance Distribution CDD Timeline Figure 3. A Toy Example - Component Drifting Distance. 4.1.7 System Stability The numbers of adding and deleting operations, as well as the magnitudes of CDDs reflect the stability of the concept tracking system. Suppose at certain moment, totally the GMM has been updated for H times. We define System Stability (SS) as SS = H u =1 Su , (20) k (c ) j =1 α j CDSuc, j δ (u, j) + γ [ N u (delete) + N u (add)] , (21) δ (u , j ) = 1, component j is drifting at time u 0 , otherwise. 4.1.8 Updating Speed Till now we have defined several useful measures for describing the concept drifting system. Our main goal is to judge whether there is concept drifting and how much the concepts are drifting in the learning system based on these defined measures, especially SS values. According to these measures, we are able to set the corresponding parameters in our proposed OOIL framework introduced in Section 3. As mentioned above, the parameter , appearing in Equation (13), is used to control the updating speed for the components’ drifting, while parameter in Equation (14) is utilized to control the weights of newly-added components, which reflects the “adding speed” for the newly-added components. The parameter , ranging from 0 to 1.0, represents the Updating Intensity (UI) of the Global Model Incremental Updating (GMIU) procedure. Based on the values of System Stability, when the dynamic system drifts very quickly, more information in the newly-trained local model should be considered. On the contrary, when the system drifts slowly, the global model is considered as the more reliable one. As a result, System Stability helps to guide us how to control the updating speed for the original global models. That is, for the former case, the parameter should be set larger, and for the later case, it will be set smaller. In addition, the parameter in Equation (14) controls the component’s “adding speed”, which ranges from 0 to Wadd (recall the total weight of newly-added components in the local optimized model). Therefore, we named it as Newly-Added Component Ratio (NACR). When NACR is larger, the reallocated weights of the newly-added components are also larger, which means the newly-added components in the updated global model have relatively higher importance. On the contrary, when NACR is smaller, the newly-added components will play a less important role in the updated global model. At the initial stage, these two parameters are pre-set according to the prior knowledge obtained from analyzing the defined tracking measures on the preliminary training data. And with the newupcoming batch available, existing tracking information such as System Stability on all out-dated bathes will be computed as the reference information for tuning these two parameters. That is, if System Stability is getting larger, the two parameters will be increased. Otherwise, they will be decreased. 5. EXPERIMENTS where, Su = N u (delete) (or N u (add ) ) is the total number of deleted (or newlyadded) components in the current under-investigating GMM, and γ is a predefined parameter to balance these two sum items. We are also able to graphically observe the GMM’s evolving process by plotting the Su curve (Su~u). SS value and Su curve reflect the overall property of the whole learning system, and it is a very important and helpful measure for judge whether the concept is drifting in the system. A real-world example will be provided in Section 5.2.2. , (22) We execute a series of experiments on the development data set of TREC Video Retrieval Evaluation (TRECVID) 2004 [12]. Our goal is to investigate the tracking measures of concept drifting in this data set, and at the same time to evaluate OOIL framework by comparing it with several other similar schemes. 5.1 Data Set and Feature TECVID 2004 development data [13] used in our experiments includes 114 news videos (spanning four months in 1998) from ABC and CNN, about 60 hours in total. Each news video is about half an hour. We deal with ABC and CNN collections separately, for the properties of videos are quite different from two different stations or sources. For ABC/CNN collection, 57 ABC/CNN news videos are divided into 11 data batches (groups) along the timeline, and 5 videos (about 2.5-hour videos, recorded every other day or so) for each batch. We select two semantic concepts as samples to test our schemes: basketball and Studio_Setting [12]. All pre-labeled shots for the concepts are from the annotations provided by TECVID 2004. The training data set in each batch is derived by extracting features from all frames in the positive-labeled shots. Totally, there are 520 positive basketball shots (139 for ABC, 381 for CNN, and 2% in the whole collections) and 6225 positive Studio_Setting shots (3223 for ABC, 3032 for CNN, and 19% in the whole collections). And the test data set is all the key-frames (maybe multiple key-frames in common shot boundary reference, which are also provided by TRECVID 2004) in the whole data batch. One video (about half an hour) in each batch will be considered as reference video especially for our OOIL framework. In addition, the feature used in our experiments is the correlogram in HSV color space, totally a 144-dimension feature vector (denoted as HSVCorrelogram144 hereinafter), as presented in [19]. For the lacking of sufficient training samples in current data batch when training such a high-dimension GMM, the standard Principle Component Analysis (PCA) method is applied to reduce the feature dimension to 6. In addition, the mean of each component has not been normalized as zero in our experiments. 5.2 Insight into Data with Tracking Measures is defined as Component Drifting Distance (CDD). In Section 4.1, Figure 3 has given a toy example for CDD. In this subsection, we will introduce how to obtain a CDD figure from real-world data set such as the TRECVID 2004 development data set. After training all the GMMs on all the data batches, we compare the differences between adjacent model pairs. For simplicity, we do not consider the case of adding or deleting components here, and all the GMMs have 5 components. Our goal here is to track the drifting information between these GMMs and search for the nearest pair of components between adjacent GMMs. After comparing pair-wise KL-divergences, we regard the matched pair of components that belongs to the same group. Thereafter, we label all these GMM components (totally, 5 11 = 55) into five groups denoted by C-1, C-2, …, C-5, similar to the case illustrated in Figure 2. All the components in a group are considered to keep drifting along the timeline. CDD is then computed between the adjacent two components which belong to the same group. Except for CDD, the Component Saliency Drifting Curve can also be illustrated explicitly, which reflects the stability of one particular component (figure is omitted due to space limitation). 5.2.2 System Stability Comparison As aforementioned, SS value reflects the system stability in a global viewpoint. To compare the System Stability between different data collections, the SS values for concept Studio_Setting on both ABC and CNN collections are illustrated in Figure 4. Since we do not consider the adding and deleting cases, the SS value defined in Equation (21) is simplified as follow: Su = k (c ) j =1 α j CDSuc, j δ (u, j ) . (23) In this subsection, we examine the tracking measures (define in Section 4.1) for the concept tracking system based on the TRECVID 2004 development data set. According to the statistical properties directly from the data, we will survey on how to utilize the information of the data itself in order to track concept drifting in more sophisticated way. To avoid the unpredictable differences between the performances of different learning systems, we directly train the models from the pre-labeled training data on each batch. For an underinvestigating concept, totally 11 GMMs are trained from prelabeled samples in 11 data batches. If there is concept drifting among these 11 data batches, these models will vary remarkably. Our goal of this subsection is to track the differences among these GMMs. For convenience, in the experiments of this section, the number of the component of these GMMs are all set as 5. Therefore, it should be noted that the measures which will be illustrated in this subsection are not exactly the same as (but estimations of) the ones defined in Section 4.1. 5.2.1 Component Drifting Distance and Component Saliency Drifting Curve As mentioned in Section 4.1, the corresponding components between two adjacent GMMs (trained from the adjacent batches) maybe drift along the timeline. The KL divergence between them Figure 4. System Stability Comparison between ABC and CNN - Studio_Setting. It can be found that the Su values are different for different data batches within ABC/CNN collection, and they are also different between ABC and CNN data collections. For Studio_Setting concept, SSABC+S = 560.4, and SSCNN+S = 18132.0. For basketball concept, SSCNN+B = 652.9. As SSABC+S is much smaller than SSCNN+S, it can be concluded that concept Studio_Setting drifts more evidently in CNN collection than that in ABC collection. Moreover, in CNN collection, concept basketball is much stabber than concept Studio_Setting, but drifts a little more evidently than Studio_Setting in ABC collection. As a result, we will update the models quicker when tracking concept Studio_Setting on CNN collection than that on ABC collection. And the updating speed for concept basketball is a little quicker than concept Studio_Setting in ABC. In addition, these results can explain why the schemes’ performances on ABC collection are a little better than that on CNN collection, as to be detailed in Section 5.3.3. This section has illustrated some defined tracking measures on ABC and CNN collections, which helps to find a more sophisticated way to update the models. Based on these illustrated figures of defined tracking measures, we are able to obtain more reliable knowledge on how to set the initial corresponding parameters evolved in learning systems. 5.3 OOIL Framework and Other Schemes To evaluate the proposed OOIL framework in tracking drifting concept, we compare it with a related scheme or the same scheme but under different settings. 5.3.1 Evaluation on TRECVID Data We use the evaluation method applied in TRECVID 2004 task on “high-level feature extraction”. Due to multiple key-frames in common shots, the maximum value among the GMM’s outputs of all these key-frames within one shot will be considered as the final score of this shot. And the final classification results will return a ranked shot list for each concept. An evaluation tool named trec_eval [15] is used to compute the AP (Average Precision) value [19] as the ultimate evaluation measure. This evaluation method is mainly focused on returning a highly-ranked list, and there is no need to determine whether it is positive sample or not. To achieve a higher AP value, the system must rank the mostlikely sample as ahead as possible (see the appendix of TREC-10 Proceedings on common evaluation measures for detailed information [20]). 5.3.2 Schemes for Comparison As we have presented, the OOIL framework has three major features, i.e., effectively utilizing unlabeled samples in under-test batches, GMM local adaptation (LA), and global Model incremental updating (GMIU). Experiments are designed to evaluate the effectiveness of each feature. Accordingly, three different schemes are designed as follows. S1 - Our proposed OOIL framework. S2 - OOIL but using updated global models for classification: Compared with S1, the online updated global GMMs are employed for online classification. S3 - Allow checking more “labeled” data: Similar to S1, except the local models are obtained directly from the training set of current under-test batch which is manually labeled, instead of using unlabeled samples by LA which is presented in Section 3.3.2. Obviously, this scheme uses more training data than S1 and S2, for which only a preliminary pre-labeled training data set is need and all the up-coming batch has no any labeled data. We will show that the performance of our proposed scheme (S1) is close to that of this scheme, but without using any labeled samples from the current under-test batch. 5.3.3 Results We apply these different schemes on ABC and CNN data collections separately. For Studio_Setting, scheme comparison is executed both on ABC and CNN collection. While for basketball, scheme comparison is only executed on CNN collection, for there are very few basketball shots in ABC collection. Table 3. Overall AP Comparisons on ABC and CNN collections Schemes S1 S2 S3 ABC Studio_Setting 0.139 0.098 0.203 basketball 0.039 0.028 0.052 CNN Studio_Setting 0.124 0.098 0.194 Table 3 shows the overall performance comparisons of different schemes for different concepts on ABC and CNN collections. For each scheme, there is an overall Average Precision (AP) value on all batches (that is, the average AP value) along the timeline. From this table, it can be concluded that our proposed OOIL framework achieves better overall performances than S2 (though for few batches, its performance is not the best one), and is very close to the ideal case (scheme S3) on both ABC and CNN data collections. That is to say, our OOIL framework is an effective learning system in tracking concept drifting. Figure 5. Comparisons on CNN Collection - basketball. For concept basketball on CNN collection, a series of Average Precision values are represented by a corresponding curve, as illustrated in Figure 5. The x-coordinate is the index of the data batches and the y-coordinate are the Average Precision values for all data batches. In addition, it can be seen that for concept Studio_Setting, the overall performances in ABC collection (0.139) is better than that in CNN collection (0.124). The reasons for these results can be explained by the comparison of System Stability illustrated in Section 5.2.2. 5.3.4 Parameters Setting Parameters for GMMs: Based on our experiments, the number of components of the global GMM is pre-set to 5, and the number of components of the reference GMM (recall it is trained from the reference video for OOIL framework) is 30. In the GMM training procedure, the maximum iteration number is set as 100, and the alternative termination condition for basic EM algorithm is that the difference between two iteration processes is smaller than a pre-defined threshold, which is preset to 0.01. Parameter of UI: As aforementioned, the parameter (UI) stands for the Updating Intensity for the global model updating procedure. The dynamic system may drift very quickly, or relatively slowly. For the former case, the System Stability will be larger than the later case. As a result, if large value of System Stability is calculated from the pre-labeled data set, the parameter should be set larger at initial stage of the OOIL system, even near to 1.0. On the contrary, it should be set smaller. From the analysis for the System Stability on ABC and CNN data collections in Section 5.2.2, it can be found that for Studio_Setting concept, the ABC collection is stabber than CNN collection. That is, the SS value on CNN is smaller than that on ABC. Furthermore, concept Studio_Setting on ABC collection is stabber than concept basketball on CNN. Parameter of NACR: The parameter (NACR) is applied when certain component in the local optimized model is added to the global model. In our experiments, the range of NACR in our experiments is set to [0, 0.5]. Threshold for GMIU Process: In Global Model Incremental Updating (GMIU) procedure of OOIL framework, there is a predefined threshold named as KLDThresh which control the alternative condition whether to drift the old component in global model or to add a new component from the local model to the global model. If the KL-divergence defined in Equation (10) is larger than KLDThresh, add the corresponding new component into the global model. Otherwise, drift the existing component according to the nearest component in the local model. As a summary, for OOIL framework, the initial pre-set parameters for concept basketball and Studio_Setting on both ABC and CNN collections are listed in Table 4. Table 4. OOIL Parameter Pre-setting ABC based learning systems currently. However, similar methods may be easily applied to other generative model based learning algorithms. Future work will be focused on applying our framework on more semantic concepts and a larger data collection, as well as finding a better or optimal way to determine the systematic parameters. Furthermore, to study concept drifting for discriminative model based learning systems (such as SVM and boosting) may be another future work. 7. REFERENCES [1] A. Tsymbal, The problem of concept drift: definitions and related work, Available at http://www.cs.tcd.ie. [2] Widmer G., et al, Learning in the presence of concept drifting and hidden contexts, Machine Learning, 23 (1) 1996. [3] R. Klinkenberg, et al, Detecting Concept Drift with Support Vector Machines. ICML 2000, pp 487-494. [4] R. Klinkenberg. Using Labeled and Unlabeled Data to Learn [5] [6] [7] [8] [9] [10] [11] Jun Wu, et al, An Online-Optimized Incremental Learning Framework for Video Semantic Classification, ACM MM’04. [12] TREC Video Retrieval Evaluation (NIST, USA) Homepage. basketball Studio_Setting basketball Studio_Setting UI NACR KLDThresh - 0.1 0.05 5.0 0.15 0.05 5.0 0.3 0.2 5.0 Adaptive Threshold Determination: The parameters for OOIL framework (UI, NACR, and KLDThresh) firstly are preset as aforementioned at the initial stage. With more and more new batches available, the corresponding parameters will be adaptive adjusted according to the existing tracking information, such as SS values, obtained from the Global Model Incremental Updating (GMIU) process. And then these adjusted parameters will be applied in the next learning process for the new upcoming batch. 6. CONCLUSIONS AND FUTURE WORKS In this paper, we have proposed a set of systematic measures to tracking concept drifting based on a novel Online-Optimized Incremental Learning framework. A limitation of the proposed concept drifting tracking scheme is that it only works for GMM Available at: http://www-nlpir.nist.gov/projects/trecvid/ Available at: http://www-nlpir.nist.gov/projects/trecvid/trecvid.data.html C.-Y. Lin, B. Tseng, J.R. Smith, IBM T.J. Watson Research Center, Video Collaborative Annotation Forum: Establishing Ground-Truth Labels on Large Multimedia Datasets, http://www-nlpir.nist.gov /projects/tvpubs/tvpapers03/ibm.final2.paper.pdf. TREC Video Retrieval Evaluation Tools. Available at: http://wwwnlpir.nist.gov/projects/trecvid/trecvid.tools/. Scott H. Clearwater, Tze-Pin Cheng, Haym Hirsh, Bruce G. Buchanan, Incremental batch learning, Proceedings of the sixth international workshop on Machine learning, Ithaca, New York, USA, P: 366 - 370 ,1989 D. A. Pomerleau, Efficient Training of artificial neural networks for autonomous navigation, Neural Computation, vol. 3, 1991. Lei Zhang, F.Z Lin, Bo Zhang, A CBIR method based on colorspatial feature, IEEE Region 10 annual International Conference 1999 (TENCON’99), Cheju, Korea, 1999, pp 166-169. J. Huang, S.R Kumar, M. Mitra, et al, Image Indexing Using Color Correlograms, IEEE Conference on CVPR 1997, pp762-768. TREC-10 Proceedings appendix on common evaluation measures. Available at: http://trec.nist.gov/pubs/trec10/appendices/measures.pdf. [13] TREC Video Retrieval Evaluation Past Data. [14] CNN Parameters in OOIL Drifting Concepts. IJCAI-2001 Workshop on Learning from Temporal and Spatial Data, pp 16-24. Wei Fan, Systematic Data Selection to Mine Concept-Drifting Data Streams, ACM SIGKDD 2004. Wei Fan, StreamMiner: A Classifier Ensemble-based Engine to Mine Concept Drifting, VLDB 2004 H.X. Wang, et al, Mining Concept-Drifting Data Streams using Ensemble Classifiers, ACM SIGKDD 2003. M. Figueiredo, et al, On Fitting Mixture Models, Energy Minimization, Computer Vision and Pattern Recognition, E. Hancock and M. Pellilo (Eds.), Springer-Verlag, 1999. S. Kullback, Information Theory and Statistics, J. Wiley & Sons, New York, 1959. M. West, J. Harrison, Bayesian Forecasting and Dynamic Models, Springer Verlag, New York, 1989. [15] [16] [17] [18] [19] [20]
© Copyright 2025 Paperzz