Negative Correlation Discovery for Big Multimedia Data Semantic Concept Mining and Retrieval Yilin Yan, Mei-Ling Shyu Qiusha Zhu Department of Electrical and Computer Engineering University of Miami Coral Gables, FL 33146, USA Email: [email protected], [email protected] Senzari, Inc. 601 Brickell Key Drive Miami, FL 33131, USA Email: [email protected] Abstract—With massive amounts of data producing each day in almost every field, traditional data processing techniques have become more and more inadequate. However, the research of effectively managing and retrieving these big data is still under development. Multimedia high-level semantic concept mining and retrieval in big data is one of the most challenging research topics, which requires joint efforts from researchers in both big data mining and multimedia domains. In order to bridge the semantic gap between high-level concepts and low-level visual features, correlation discovery in semantic concept mining is worth exploring. Meanwhile, correlation discovery is a computationally intensive task in the sense that it requires a deep analysis of very large and growing repositories. This paper presents a novel system of discovering negative correlation for semantic concept mining and retrieval. It is designed to adapt to Hadoop MapReduce framework, which is further extended to utilize Spark, a more efficient and general cluster computing engine. The experimental results demonstrate the feasibility of utilizing big data technologies in negative correlation discovery. Key words—Big Data, Negative Correlation, Hadoop, MapReduce, Spark, Multimedia Semantic Mining and Retrieval, Information Integration. I. I NTRODUCTION In recent years, we have witnessed a deluge of big multimedia data such as texts, images, and videos. The data is big in terms of volume, variety, velocity, etc. For instance, Facebook ingests 500 terabytes of new data every day; while Walmart handles more than 1 million customer transactions every hour. Take Boeing 737 as another example. It generates 240 terabytes of flight data during a single flight across the US. During the fight, infrastructure and sensors generate massive log data in real-time. The big data are not just numbers, dates, and strings, but also the geospatial data, 3D data, audio data, video data, and unstructured texts (including log files and social media). Therefore, different approaches, techniques, tools and architectures are required for mining and retrieving knowledge, which is still in the development stage [1][2][3][4][5]. Despite the ubiquity of big multimedia data, effective management and retrieval of big multimedia data are considered challenging research topics. For example, the conventional tag-based indexing and searching technologies suffer from the noisy and missing tag issue. As a result, more and more researchers turn to content-based approaches [6][7][8][9][10][11][12][13][14]. One main research task in the content-based big multimedia data retrieval field is multimedia concept mining and retrieval [15][16][17][18][19][20][21][22], which focus on mining semantic concepts such as face, car and airplane from raw videos directly. Many approaches treat each concept as an individual class and convert one multi-concept detection problem into multiple binary classification problems [23]. Therefore, it ignores the correlations among different concepts. Nevertheless, the concepts are correlated in real-world multimedia data sets. For instance, some concepts co-occur more frequently, such as sky and cloud; while others rarely cooccur like road and fish. Such characteristics of correlations provide important context cues that can assist concept detection. Therefore, the calculation of correlations between concepts can help a lot in semantic concept mining and retrieval. There are many types of correlations, including Pearson correlation, Spearman correlation and Cross correlation, which detect the number of times two things occur together [24]. One major challenge in correlation discovery is the huge volume of related datasets. With the rapid development of multimedia, communication, and Web 2.0 technology, massive amounts of multimedia data have been increasingly available on desktops and smart mobile devices via the Internet. Statistics shows that 72 hours of videos with all sorts of tags are uploaded to YouTube every minute and about 1.54 million photos are uploaded to Flickr every day [25]. Accordingly, the annual TREC Video Retrieval (TRECVID) competition organized by National Institute of Standards and Technology (NIST) has the “Semantic Indexing” task for concept detection from a large amount of videos collected from Internet [26]. In this paper, we focus on a novel framework that calculates negative concept correlations. In order to do it efficiently, two popular and widely used big data technologies, namely Hadoop and Spark, are adopted. The Apache HadoopT M software library is designed to scale up from a single server to thousands of machines, each offering local computation and storage. Rather than relying on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer. It delivers a highly-available service on top of a cluster of computers, each of which may be prone to failures. Apache SparkT M is a fast and general engine for large-scale data processing. Comparing to Hadoop, Spark is more advanced and contains many new features, and therefore has been deployed in many popular big data systems. Using these techniques, an efficient system is built to discover negative concept correlations from the multimedia big data. In summary, the contribution of this paper are as follows. First, we propose a novel ICF (Integrated Correlation Factor) algorithm to estimate the negative correlation. Second, an efficient big multimedia data system using the ICFbased negative correlation discovery is built. Finally, ICF is utilized for multimedia concept mining and retrieval with the promising performance. This paper is organized as follows. In Section II, we list and introduce several types of correlations and discuss their usages. In Section III, our proposed novel framework that includes negative correlation discovery and the ICF (Integrated Correlation Factor) algorithm are discussed. In Section IV, the proposed MapReduce algorithm on Hadoop is introduced. In Section V, an advanced system based on Spark is presented with its important modules explained in details. Section VI shows the comparison results of using the generated ICF values on the TRECVID 2015 dataset. Section VII draws the conclusion and identifies future research directions. II. C ORRELATION C OEFFICIENT Based on the category of the input data, correlations can be divided into the following four different types. A. Nominal Scale As the name implies, a nominal scale simply places the data into categories, without any order or structure. Nominal scales are used for labeling variables without any quantitative values. Nominal is like “name” while nominal scales could simply be called “labels”. A physical example of a nominal scale is the terms we use for colors. The underlying spectrum is ordered but the names are nominal. In research activities, a YES/NO scale is nominal. It has no order and there is no distance between YES and NO. B. Ordinal Scale With ordinal scales, it is the order of the values with importance and significance, so smaller (<) and bigger (>) can be applied but the differences between them is not really known. The simplest ordinal scale is a ranking. There is no objective distance between any two points on the subjective scale. For one case, the top may be far superior to the second; but to another case, the distance may be subjectively small. An ordinal scale only lets you interpret a gross order and not the relative positional distances. Spearman’s rank correlation coefficient is a nonparametric measure of statistical dependence between two variables. It assesses how well the relationship between two variables can be described using a monotonic function. If there are no repeated data values, a perfect Spearman correlation of -1 or +1 occurs when each of the variables is a perfect monotone function of the other. C. Interval Scale Interval scales are numeric scales in which not only the order but also the exact differences between the values are known. The classic example of an interval scale is Celsius temperature. An interval scale is nice because the realm of statistical analysis on these data sets opens up. It is for numeric variables, plus (+) and minus (−) can also be applied. Pearson product-moment correlation (ρ or r) coefficient is a measure of the linear correlation dependence between two variables X and Y , giving a value between -1 and +1 inclusive, where -1 is a total negative correlation, 0 is no correlation, and +1 is a total positive correlation. It is widely used in sciences as a measure of the degree of the linear dependence between two variables. D. Ratio Scale Ratio scales are the ultimate nirvana when it comes to measurement scales because they tell us about the order, they tell us the exact value between units, and they also have an absolute zero which allows for a wide range of both descriptive and inferential statistics to be applied. A ratio scale provides a wealth of possibilities when it comes to statistical analysis. It is for numeric variables with an absolute zero, like temperature, mass, etc. Multiply (×) and divid (÷) can also be applied. This kind of scales is not often available in multimedia and social research; while most correlation coefficient methods for interval scales can also be applied for ratio scales. III. I NTEGRATED C ORRELATION FACTOR (ICF) While positive correlations are widely used in semantic concept mining and retrieval, very few research approaches explore negative correlations to improve the performance. Some studies speculated that negative correlations captured by the negative graph might slightly improve the performance. Jiang et al. [27] studied the effect of negative correlations and conducted an experiment on the TRECVID data set showing the performance gain is merely 1.3% when using the negative graph alone. Although the co-occurrence of two concepts in one video shot/image increases the probability that they are positively correlated, the fact that one concept does not occur when the other appears does not indicate they are negatively correlated. For example, given an image that depicts a cat with no human face in the picture does not mean the appearance of a cat will exclude the appearance of a face. This makes it more challenging to discover negative correlations. To address this challenge, a two-step hierarchical selection strategy is proposed in this study. First, a conditional probability-based coarse filtering approach is applied. The purpose of this step is to eliminate the irrelevant correlations in an efficient way. Specifically, for a target concept CT , CT+ and CT− represent the events that a data instance is positive or negative for CT . Likewise, + − for a reference concept, CR and CR represent the events that a data instance is positive or negative for CR . If CT and CR are negatively correlated, the following conditions must hold. − P (CT+ |CR ) >1 + P (CT ) (1) + P (CT− |CR ) >1 − P (CT ) (2) Here, P (E) indicates the probability of event E. The threshold of 1, appeared in the above inequalities, is not selected arbitrarily. It is the necessary condition for − the negative correlation, i.e., P (CT+ |CR ) > P (CT+ ) and − + − P (CT |CR ) > P (CT ). The first inequality indicates that the probability of CT appearing decreases if CR appears. On the other hand, the second inequality indicates that the probability of CT appearing increases if CR does not appear. From the association rule point of view, these two values are related to the conviction measurement introduced in [28]. Actually, for each concept pair, we just need to get the result of Equation (1) since: + + + P (CT− |CR ) P (CT− CR ) P (CR |CT− ) = = + + P (CT− ) P (CR )P (CT− ) P (CR ) (3) Therefore, once we have the result of Equation (1) for all concepts, it is easy to find the value of Equation (2) in the result table (simply switching the target concept and related concept) and save a half of the computation time. Then we define a Negative Independent Coefficient (NIC) to measure the negative correlations between concepts as follows. N IC(T, R) = + − ) P (CT+ |CR ) P (CT− |CR + + − P (CT ) P (CT ) (4) All concept pairs with the NIC value less than a certain threshold will be filtered. This step eliminates a large number of possible correlations and reduces the computational complexity significantly. The second step is to filter the selected concepts more rigorously. The reason of adding this step is based on the observation that a huge number of data instances are not labeled and are given the inferred label 0. This is common in a large multimedia dataset as manual labeling is very expensive. For example, the concept pair “Indoor” and “Outdoor” should show perfect negative correlations. However, in our experiment, there are 104054 out of 115806 instances with negative labels for both concepts. Therefore, + − the conditional probability P (COutdoor |CIndoor ) is only 0.0786, which severely deviates from 1. The problem could not be solved by discarding those data instances simply as it will introduce the bias that the two concepts are negatively correlated. These kinds of bias are clearly not what we want since we wish to use correlation information to improve concept detection for all kinds of testing instances. In order to tackle this difficulty, a novel strategy is proposed. The general assumption is that if two concepts are negatively correlated, their correlations would not be affected by the existence of the third concept, which is named as a control concept in this study. To formulate this problem, we define an integrated correlation factor (ICF) between the target concept and the reference concept, as formulated in Equation (5). ICF (T, R) = 1 |Ω| − 2 X + ρ(CT , CR |CD ) (5) D∈Ω,D6=T,D6=R Here, Ω indicates the set of all concepts and |Ω| indicates the total number of concepts. CD represents the control + concept. CD is the condition that a data instance is positive + for CD , and ρ(CT , CR |CD ) indicates the Pearson productmoment correlation coefficient [29] between the labels of + CT and CR given CD . The reasons for adding the control concept CD is that ICF represents an average quantitative metric of correlations under different conditions. For special + cases where ρ(CT , CR |CD ) is not defined, the default values are assigned as given in Table I. In this table, “All CT1 ” indicates that all data instances are positive for CT and “All CT−1 ” indicates that all data instances are negative for −1 1 ” and “All CR ” have similar CT . Correspondingly, “All CR meanings. All the special cases happen because the data instances have unique labels for either CT or CR , in which case the Pearson correlation coefficients are not defined. As shown in Table I, as long as CT and CR co-occur once, the value is set to a positive value, which imposes a relatively large penalty on that concept pair. Our empirical studies showed that when this value is 0.5, most of the significant negative concept pairs are captured. After sorting all the ICF values in an ascending order, we observe that the combined correlation coefficients follow a quasi-normal distribution. Hence, a Gaussian probability density function is fit for all the ICF values. Different thresholds depending on the significance levels such as 95% and 67% could be set to select the significant negative correlations. As the Pearson product-moment correlation coefficients are symmetric, the concept pairs are selected. Therefore, the concept whose corresponding detector is less accurate is chosen as the target concept and the other one as the reference concept. The selection results are introduced in Section VI-B. TABLE I T HE VALUES TO SET UNDER SPECIAL CONDITIONS CT All CT1 All CT−1 CR 1 All CR −1 All CR All CT1 −1 All CR CT−1 All Both CT−1 and CT1 appear All CT−1 All CT1 Both CT−1 and CT1 1 All CR −1 All CR 1 and C −1 appear Both CR R 1 and C −1 appear Both CR R 1 All CR IV. C ORRELATION DISCOVERY BY H ADOOP M AP R EDUCE MapReduce (MR) is a popular programming model introduced by Google [30], on which the distributed applications can be developed to efficiently process large amounts of data. Hadoop [31], its open-source implementation, provides a distributed file system called Hadoop Distributed File System (HDFS) and MR. Hadoop splits the files into large blocks and distributes them amongst the nodes in the cluster. To process the data, Hadoop MR transfers the packaged code for nodes to process in parallel, based on the data each node needs to process [32]. An MR program consists of two user-defined functions: a map function to process pieces of the input data called input splits, and a reduce function to aggregate the output of invocations of the map function. Both functions use user-defined key-value pairs as the input and output. A collection of videos can be represented by Table II. Let S denote a shot in a video, C denote a concept, and v denote whether a concept appears in a shot. Value 1 represents the concept appears in the shot; while value −1 represents otherwise. Assume that there are totally M concepts and N shots in this video collection. To build the MR framework, we read the concept ID corresponding values from each video file and build the <Key, Value> pairs. Hence, each video file has several <Key, Value> pairs like: Key = (C1 , C2 ), V alue = (v1 , v2 ) Key = (C1 , C3 ), V alue = (v1 , v3 ) ... Key = (C1 , CM ), V alue = (v1 , vM ) Key = (C2 , C1 ), V alue = (v2 , v1 ) ... Key = (C2 , CM ), V alue = (v2 , vM ) ... Key = (CM −1 , CM ), V alue = (vM −1 , vM ) The Map() function is used to generate the key-value pairs mentioned above; while the Reduce() function is used to calculate the conditional probability value in Equation (1) for each key. As discussed before, these two functions Value To Set 1 0 Average value of negative Pearson correlation coefficients Same as above Same as above Same as above 0.5 0.5 TABLE II A V IDEO DATASET Shot ID S1 ... S1 S2 ... S2 ... SN ... SN Concept ID C1 ... CM C1 ... CM ... C1 ... CM Value v11 ... vM 1 v12 ... vM 2 ... v1N ... vM N use the defined key-value pairs as the input and output. Different from many symmetrical correlations like Pearson and Spearman correlations, conditional probability values are asymmetrical. Therefore, there are totally N × M × M keys and M × M outputs as well. However, since too many keys are generated, the computation gain is limited. It can be seen that the Mappers running time complexity is O(N M 2 ) and the Reducers is O(M 2 ). If multiple Reducers can be run in parallel, we should definitely see the speed improvement based on the scale. The time wasted on the mapping phase is much more than the time saved on the reducing part. In the next step, we can filter the outputs as mentioned in Section III to generate the ICF values for each concept pair. V. C ORRELATION DISCOVERY ON S PARK In contrast to Hadoop’s two-stage disk-based MapReduce paradigm, Spark’s multi-stage in-memory primitives provide performance up to 100 times faster for certain applications [33][34][35]. Spark provides many operations called transformations like map, filter, flatMap, sample, groupByKey, reduceByKey, union, join, cogroup, mapValues, sort, and partionBy; while Hadoop only provides two operations, namely map and reduce. Spark Core is the foundation of the overall project. It provides distributed task dispatching, scheduling, and basic I/O functionalities. The fundamental programming abstraction is called Resilient Distributed Datasets (RDDs), a logical collection of data partitioned across machines. RDDs can be created by referencing datasets in external storage systems or by applying coarse-grained transformations (e.g., map, filter, TABLE III C ORRELATION M ATRIX C1 C2 C3 ... CM S1 v11 v21 v31 ... vM 1 S2 v12 v22 v32 ... vM 2 ... ... ... ... ... ... SN v1N v2N v3N ... vM N keys, the target concept and related concept are different, as mentioned in Equation (5). For each key pair, the rest concepts would be the control concepts. The ICF values are then generated for each key pair and the final output is shown as follows. Key = (C1 , C2 ), V alue = (ICF1,2 ) ... reduce, join) on the existing RDDs. The RDD abstraction is exposed through a language-integrated API in Java, Python, Scala, and R similar to local, in-process collections. This simplifies the programming complexity because the way that the applications manipulate RDDs is similar to how to manipulate the local collections of data. Our Spark solution is presented as a single Java driver class that uses the rich Spark API to find the correlations of all versus all concepts. In order to build a better correlation discovery system using Spark, we first read the keys(shot ID) and values in Table II. After reading the data from all video files, we can group the values as an array or an iterator in Java using combineByKey. Table III shows the data representation. Key = (C1 , CM ), V alue = (ICF1,M ) Key = (C2 , C3 ), V alue = (ICF2,3 ) ... Key = (C2 , CM ), V alue = (ICF2,M ) ... Key = (CM −1 , CM ), V alue = (ICFM −1,M ) VI. E XPERIMENTS AND R ESULTS A. Dataset Key = (C1 , CM ), V alue = (v1 [], vM []) In the experiment, the IACC.1.A dataset is chosen from the TRECVID 2015 benchmark [36], whose semantic indexing (SIN) task aims to recognize the semantic concept contained within a video shot, which is an essential technology for retrieval, categorization, and other video exploitations. Here, the concepts refer to the high-level semantic objects such as a car, road, and tree. It has several challenges such as data imbalance, scalability, and semantic gap [37][38]. This data set contains 200 hours of videos with the durations between 10 seconds and 3.5 minutes. The labels of 130 concepts were given by the collaborative annotation organized by NIST. After data pre-processing, there are 144774 shots and one keyframe was extracted from each shot. The detection scores of all shots were downloaded from the DVMM Lab of Columbia University [39]. The list of concepts and detailed explanations can be found in [26]. In order to increase the number of ground truth in the negative association selection module, the TRECVID 2015 training labels are also utilized. Key = (C2 , C1 ), V alue = (v2 [], v1 []) B. Negative Correlation Selection Results ... For 130 concepts, there are totally 8385 pair-wise concept correlations. The top 10 negative correlations using the NIC-based selection and the ICF-based selection are shown in Table IV. For the NIC-based selection, the concept pairs are selected by adding the two probability ratios on the left side of Equations (1) and (2). It should be pointed out that some negative correlations are caused by the definitions of the concepts. For example, the “Two people” concept indicates that there must be exactly two people in the video shot so the “Single Person” concept does not occur. The “Building” concept means the shots of an exterior of a building so that it has a negative correlation with the “Indoor” concept. The full explanations of all concepts can be found in [26]. As Key = (C1 ), V alue = (v11 , v12 , ..., v1N ) Key = (C2 ), V alue = (v21 , v22 , ..., v2N ) Key = (C3 ), V alue = (v31 , v32 , ..., v3N ) ... Key = (CM ), V alue = (vM 1 , vM 2 , ..., vM N ) To correlate all versus all concepts, we have to create the Cartesian product of groups by groups. This will generate all possible combinations of (Ci , Cj ). These combinations are the output values of the Cartesian products, which are the input keys for computing the correlations: Key = (C1 , C2 ), V alue = (v1 [], v2 []) Key = (C1 , C3 ), V alue = (v1 [], v3 []) ... Key = (C2 , CM ), V alue = (v2 [], vM []) ... Key = (CM −1 , CM ), V alue = (vM −1 [], vM []) Now, we are ready to calculate the conditional probability values for all possible combinations of concepts. Afterward, we can filter the half of the key pairs when i>j in (Ci , Cj ) and generate the NIC values using Equation (4). The key pairs with the NIC values less than a threshold will be discarded. In the next step, the ICF values are generated. We create new key value pairs as follows. The keys have been filtered based on the NIC values between the concept pairs. The values here are the same. However, for different TABLE IV C OMPARISON OF N EGATIVE C ORRELATION S ELECTION Rank 1 2 3 4 5 6 7 8 9 10 NIC-based Entertainment, Building Infants, Industrial Person, Helicopter Hovering Person, Natural-Disaster Person, Airplane Flying Canoe, Bus Telephones, Swimming Cats, Person Canoe, Car Racing Person, Birds introduced in Section III, we used the average value minus one standard deviation as the threshold, which corresponds to 0.67 significance level, to select the negative concept pairs. The selected correlations are shown in the right column of Table IV. There are totally seven target concepts in the top 10 ICF-based negative correlations, which are “Road”, “Indoor”, “Daytime Outdoor”, “Suburban”, “Trees”, “Male Person”, and “Two People”. These concepts are common and appear in many papers [40][41][42]. We obtain the shot IDs and the corresponding values from all the training video files and use the Map() function to build the key-value pairs (shot ID, value). Next, we use the combine function in Spark to combine their pairs by the shot IDs and get a table similar to Table III. This table is considered as the input for the next layer. In the next layer, the output key is the key pairs of the Cartesian products of all shots; while the output values are the pairwise vectors that contain values from each shot. The conditional probability values are then calculated to generate the NIC values by the vector pairs. As mentioned in Sections III and V, some key pairs are discarded due to the low NIC values and the rest are kept to generate the ICF values. These steps are shown in Fig. 1. A framework of semantic concept mining and retrieval was proposed in [43]. Its training section consists of the “Multimedia Concept Mining” subcomponent and the “Concept Mining Enhancement” subcomponent. The basic idea of the “Multimedia Concept Mining” subcomponent is as follows. For N data instances (e.g., images/video shots) and M concepts, M concept detection models are trained such that for each instance, the model m outputs a score measuring the likelihood that concept m exists in that data instance. A more detailed discussion of this subcomponent can be found in [43]. In this paper, an enhanced “Concept Mining Enhancement” subcomponent within the multimedia big data infrastructure is proposed, which improves the overall semantic concept mining and retrieval framework. First, all the class labels are organized into a label matrix so that each row contains the labels of different concepts for one data instance. Next, significant negative correlations are selected using this label matrix in the negative correlation selection module. A set of features are extracted from the original training data set to train a ICF-based Road, Waterscape Waterfront Indoor, Plant Indoor, Vegetation Daytime Outdoor, Indoor Indoor, Outdoor Suburban, Indoor Indoor, Building Trees, Indoor Male Person, Female Human Face Closeup Two People, Single Person Fig. 1. Spark implantation of the ICF calculation Multiple Correspondence Analysis (MCA) based negative weight estimation model [44][45][46][47]. The weights generated from this model together with the output scores from the “Multimedia Concept Mining” subcomponent are normalized and then used to train a regression-based score integration model. The selected negative correlations, MCA models, and regression models are stored for the testing section. In the testing section, the testing data instances are plugged into all the concept detection models to generate the testing scores. The same features are extracted from that data instance to get the MCA-based weights. After the scores and weights are normalized, they are input to the TABLE V MAP VALUES AT D IFFERENT N UMBER OF I NSTANCES R ETRIEVED Framework Base Subtraction Random DASD ICF-based MAP@10 0.45077 0.47292 0.36005 0.48268 0.86262 MAP@20 0.40841 0.39973 0.31563 0.40204 0.73554 MAP@30 0.37285 0.36313 0.25743 0.36449 0.65211 MAP@40 0.35755 0.33913 0.23170 0.33404 0.60537 regression-based score integration model to generate a new set of re-ranked scores. Finally, the new output scores are evaluated. The score integration module could be viewed as a two-layer neural network. C. Performance of the Proposed Framework In order to evaluate the proposed negative correlation discovery method, it is compared with the following four approaches. The first one, denoted as “Base”, has no modification on the raw scores. The second one, denoted as “Subtraction”, is an intuitive approach that subtracts the scores of a reference concept from those of a target concept. The third one, denoted as “Random”, randomly selects a reference concept. In the experiment, the randomly selected target concepts are: “Running” for “Road”; “Hand”, “Old People”, and “Computer” for “Indoor”; “Beards” for “Daytime Outdoor”; “Greeting” for “Suburban”; “Doorway” for “Trees”; “Chair” for “Male Person”; and “Infants” for “Two People”. The fourth one, denoted as “DASD”, applies the domain adaptive semantic diffusion (DASD) framework [48]. All negative affinities are kept as described in [48] and the number of iterations was set to 20. We compare our results with DASD since it also focuses on correlation discovery and achieves the best performance among many similar technologies. The Mean Average Precision (MAP) values at different numbers of the retrieved data instances are reported. All results are the average of three-fold cross-validation over the seven ICF-based target concepts, appeared in Table IV. Table V shows the MAP comparison results. As can be seen, our proposed ICF-based method outperforms the four comparison methods by a large margin across all different MAP measures. “Random”, as expected, shows the worst performance. “DASD” obtains a slightly higher MAP@10 value compared to “Base” and “Subtraction”, but quickly drops after MAP@20. In addition, we found that a further increasing number of iterations did not improve the performance. ‘Subtraction” shows a small gain at MAP@10 compared to “Raw”, but the values also decline when more retrieval results are considered. Moreover, the proposed ICF-based negative correlation discovery adapts to Hadoop and further to Spark, which makes it ready to scale-up to larger datasets, in terms of both the number of concepts and the number of data instances. VII. C ONCLUSION AND F UTURE W ORK In this paper, we propose a novel system of learning the negative correlations to provide valuable context cue MAP@50 0.33353 0.31965 0.20680 0.32726 0.57122 MAP@100 0.24407 0.22274 0.13966 0.24311 0.47294 MAP@200 0.15928 0.15056 0.09827 0.15950 0.36569 MAP@500 0.13049 0.11547 0.08455 0.12220 0.33970 MAP@2000 0.16538 0.12814 0.09293 0.13391 0.40617 for concept mining and retrieval. The system adopts both Hadoop and Spark for efficient big data processing. Our proposed framework is compared to four approaches with the performance of semantic concept mining and retrieval. The experimental results demonstrate that negative correlations between the reference concepts and the target concepts discovered by the ICF-based method can effectively help retrieve the target concepts. The proposed correlation discovery technique can be extended and used in many kinds of correlations and coefficients. The only difference is in the last steps of Fig. 1, meaning that different correlation algorithms can be applied to get the desired correlations in different circumstances. Furthermore, some correlations can be ternary or even quaternary. In such cases, the Cartesian part can be replaced to generate the corresponding groups for correlation discovery. R EFERENCES [1] D. Liu, Y. Yan, M.-L. Shyu, G. Zhao, and M. Chen, “Spatio-temporal analysis for human action detection and recognition in uncontrolled environments,” Int. J. Multimed. Data Eng. Manag., vol. 6, no. 1, pp. 1–18, Jan. 2015. [2] Y. Yan, Y. Liu, M.-L. Shyu, and M. Chen, “Utilizing concept correlations for effective imbalanced data classification,” in Proceedings of the IEEE 15th International Conference on Information Reuse and Integration, Aug 2014, pp. 561–568. [3] X. Li, S.-C. Chen, M.-L. Shyu, and B. Furht, “An effective contentbased visual image retrieval system,” in Proceedings of the Computer Software and Applications Conference, 2002, pp. 914–919. [4] S.-C. Chen, S. Sista, M.-L. Shyu, and R. Kashyap, “Augmented transition networks as video browsing models for multimedia databases and multimedia information systems,” in Proceedings of the 11th IEEE International Conference on Tools with Artificial Intelligence, 1999, pp. 175–182. [5] S.-C. Chen, M.-L. Shyu, and R. Kashyap, “Augmented transition network as a semantic model for video data,” International Journal of Networking and Information Systems, vol. 3, no. 1, pp. 9–25, 2000. [6] L.-C. Chen, J.-W. Hsieh, Y. Yan, and D.-Y. Chen, “Vehicle make and model recognition using sparse representation and symmetrical {SURFs},” Pattern Recognition, vol. 48, no. 6, pp. 1979 – 1998, 2015. [7] C. Chen, Q. Zhu, L. Lin, and M.-L. Shyu, “Web media semantic concept retrieval via tag removal and model fusion,” ACM Transactions on Intelligent Systems and Technology, vol. 4, no. 4, pp. 61:1–61:22, October 2013. [8] K.-T. Chuang, J.-W. Hsieh, and Y. Yan, “Modeling and recognizing action contexts in persons using sparse representation,” in Advances in Intelligent Systems and Applications - Volume 2, ser. Smart Innovation, Systems and Technologies, J.-S. Pan, C.-N. Yang, and C.-C. Lin, Eds. Springer Berlin Heidelberg, 2013, vol. 21, pp. 531–541. [9] L. Lin and M.-L. Shyu, “Weighted association rule mining for video semantic detection,” International Journal of Multimedia Data Engineering and Management, vol. 1, no. 1, pp. 37–54, 2010. [10] Q. Zhu, L. Lin, M.-L. Shyu, and S.-C. Chen, “Feature selection using correlation and reliability based scoring metric for video semantic detection,” in Proceedings of the Fourth IEEE International Conference on Semantic Computing, 2010, pp. 462–469. [11] L. Lin, G. Ravitz, M.-L. Shyu, and S.-C. Chen, “Video semantic concept discovery using multimodal-based association classification,” in Proceedings of the IEEE International Conference on Multimedia & Expo, July 2007, pp. 859–862. [12] X. Li, S.-C. Chen, M.-L. Shyu, and B. Furht, “Image retrieval by color, texture, and spatial information,” in Proceedings of the 8th International Conference on Distributed Multimedia Systems, September 2002, pp. 152–159. [13] X. Huang, S.-C. Chen, M.-L. Shyu, and C. Zhang, “User concept pattern discovery using relevance feedback and multiple instance learning for content-based image retrieval,” in Proceedings of the Third International Workshop on Multimedia Data Mining, in conjunction with the 8th ACM International Conference on Knowledge Discovery & Data Mining, July 2002, pp. 100–108. [14] M.-L. Shyu, S.-C. Chen, and R. Kashyap, “Generalized affinitybased association rule mining for multimedia database queries,” Knowledge and Information Systems (KAIS): An International Journal, vol. 3, no. 3, pp. 319–337, August 2001. [15] D. Liu and M.-L. Shyu, “Semantic motion concept retrieval in nonstatic background utilizing spatial and temporal visual information,” International Journal of Semantic Computing, vol. 7, p. 43:67, 2013. [16] Y. Yan, J.-W. Hsieh, H.-F. Chiang, S.-C. Cheng, and D.-Y. Chen, “Plsa-based sparse representation for object classification,” in Proceedings of the 22nd International Conference on Pattern Recognition, Aug 2014, pp. 1295–1300. [17] S.-C. Chen, S. Rubin, M.-L. Shyu, and C. Zhang, “A dynamic user concept pattern learning framework for content-based image retrieval,” IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, vol. 36, no. 6, pp. 772–783, Nov 2006. [18] M.-L. Shyu, C. Haruechaiyasak, S.-C. Chen, and N. Zhao, “Collaborative filtering by mining association rules from user access sequences,” in Proceedings of the International Workshop on Challenges in Web Information Retrieval and Integration, April 2005, pp. 128–135. [19] T. Meng and M.-L. Shyu, “Leveraging concept association network for multimedia rare concept mining and retrieval,” in Proceedings of the IEEE International Conference on Multimedia & Expo, July 2012, pp. 860–865. [20] S.-C. Chen, M.-L. Shyu, and C. Zhang, “Innovative shot boundary detection for video indexing,” in Video Data Management and Information Retrieval, S. Deb, Ed. Idea Group Publishing, 2005, pp. 217–236. [21] S.-C. Chen, M.-L. Shyu, C. Zhang, and R. L. Kashyap, “Identifying overlapped objects for video indexing and modeling in multimedia database systems,” International Journal on Artificial Intelligence Tools, vol. 10, no. 4, pp. 715–734, 2001. [22] S.-C. Chen, M.-L. Shyu, and C. Zhang, “An intelligent framework for spatio-temporal vehicle tracking,” in Proceedings of the 4th IEEE International Conference on Intelligent Transportation Systems, August 2001, pp. 213–218. [23] E. A. Cherman, J. Metz, and M. C. Monard, “Incorporating label dependency into the binary relevance framework for multi-label classification,” Expert Systems with Applications, vol. 39, no. 2, pp. 1647–1655, February 2011. [24] S. Perera, Instant MapReduce Patterns - Hadoop Essentials How-to. Packt Publishing, May 2005. [25] E. Gabarron, L. Fernandez-Luque, M. Armayones, and A. Y. Lau, “Identifying measures used for assessing quality of youtube videos with patient health information: A review of current literature,” Interactive Journal of Medical Research, vol. 2, no. 1, March 2013. [26] A. F. Smeaton, P. Over, and W. Kraaij, “Evaluation campaigns and TRECVid,” in Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval, October 2006, pp. 321–330. [27] Y.-G. Jiang, J. Wang, S.-F. Chang, and C.-W. Ngo, “Domain adaptive semantic diffusion for large scale context-based video annotation,” in Proceedings of the12th IEEE International Conference on Computer Vision, Sept 2009, pp. 1420–1427. [28] S. Brin, R. Motwani, J. D. Ullman, and S. Tsur, “Dynamic itemset counting and implication rules for market basket data,” in Proceed- [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] ings of the ACM SIGMOD international conference on management of data, vol. 26, 1997, pp. 255–264. K. Pearson, “Notes on regression and inheritance in the case of two parents,” Proceedings of the Royal Society of London, vol. 58, pp. 240–242, 1895. J. Dean and S. Ghemawat, “Mapreduce: Simplified data processing on large clusters,” Commun. ACM, vol. 51, no. 1, pp. 107–113, Jan. 2008. [Online]. Available: http://doi.acm.org/10.1145/1327452. 1327492 Apache, “Hadoop,” http://hadoop.apache.org, accessed Oct, 2015. F. Fleites, S. Cocke, S.-C. Chen, and S. Hamid, “Efficiently integrating mapreduce-based computing into a hurricane loss projection model,” in Proceedings of the 14th IEEE International Conference on Information Reuse and Integration, Aug 2013, pp. 402–407. Apache, “Spark,” https://spark.apache.org, accessed Oct, 2015. U. Berkeley, “amplab,” https://amplab.cs.berkeley.edu, accessed Oct, 2015. R. S. Xin, J. Rosen, M. Zaharia, M. J. Franklin, S. Shenker, and I. Stoica, “Shark: Sql and rich analytics at scale,” in Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD ’13. New York, NY, USA: ACM, 2013, pp. 13–24. [Online]. Available: http: //doi.acm.org/10.1145/2463676.2465288 P. Over, G. Awad, M. Michel, J. Fiscus, G. Sanders, W. Kraaij, A. F. Smeaton, G. Quenot, and R. Ordelman, “Trecvid 2015 – an overview of the goals, tasks, data, evaluation mechanisms and metrics,” in Proceedings of TRECVID 2015. NIST, USA, 2015. Q. Zhu, L. Lin, M.-L. Shyu, and D. Liu, “Utilizing context information to enhance content-based image classification,” International Journal of Multimedia Data Engineering and Management, vol. 2, no. 3, pp. 34–51, 2011. L. Lin, C. Chen, M.-L. Shyu, and S.-C. Chen, “Weighted subspace filtering and ranking algorithms for video concept retrieval,” IEEE Multimedia, vol. 18, no. 3, pp. 32–43, March 2011. Y.-G. Jiang, “Prediction scores on TRECVID 2010 data set,” http://www.ee.columbia.edu/ln/dvmm/CU-VIREO374/, 2010, last accessed on September 8, 2011. [Online]. Available: http: //www.ee.columbia.edu/ln/dvmm/CU-VIREO374/ L.-C. Chen, J.-W. Hsieh, Y. Yan, and D.-Y. Chen, “Vehicle make and model recognition using sparse representation and symmetrical surfs,” in Proceedings of the 16th IEEE International Conference on Intelligent Transportation Systems, Oct 2013, pp. 1143–1148. J.-W. Hsieh, K.-T. Chuang, Y. Yan, and L.-C. Chen, “Sparse representation for recognizing object-to-object actions under occlusions,” in Proceedings of the Fifth International Conference on Internet Multimedia Computing and Service, ser. ICIMCS ’13. New York, NY, USA: ACM, 2013, pp. 117–120. L.-C. Chen, J.-W. Hsieh, Y. Yan, and B.-Y. Wong, “Real-time vehicle make and model recognition from roads,” in Proceedings of the 12th International Conference on Information Technology and Applications in Outlying Islands, May 2013, pp. 1033–1040. T. Meng, Y. Liu, M.-L. Shyu, Y. Yan, and C.-M. Shu, “Enhancing multimedia semantic concept mining and retrieval by incorporating negative correlations,” in Proceedings of the IEEE International Conference on Semantic Computing, June 2014, pp. 28–35. Q. Zhu, Z. Li, H. Wang, Y. Yang, and M.-L. Shyu, “Multimodal sparse linear integration for content-based item recommendation,” in Proceedings of the IEEE International Symposium on Multimedia, 2013, pp. 187–194. Q. Zhu, M.-L. Shyu, and H. Wang, “Videotopic: Content-based video recommendation using a topic model,” in Proceedings of the IEEE International Symposium on Multimedia, 2013, pp. 219–222. Q. Zhu and M.-L. Shyu, “Sparse linear integration of content and context modalities for semantic concept retrieval,” IEEE Transactions on Emerging Topics in Computing, vol. 3, no. 2, pp. 152–160, June 2015. Q. Zhu, M.-L. Shyu, and H. Wang, “Videotopic: Modeling user interests for content-based video recommendation,” International Journal of Multimedia Data Engineering and Management (IJMDEM), vol. 5, no. 4, pp. 1–21, October 2014. Y.-G. Jiang, J. Wang, S.-F. Chang, and C.-W. Ngo, “Domain adaptive semantic diffusion for large scale context-based video annotation,” in Proceedings of the International Conference on Computer Vision, Kyoto, Japan, September 2009, pp. 1420–1427.
© Copyright 2026 Paperzz