Multimedia Tools and Applications, 13, 93–118, 2001 c 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. ° Retrieval of Commercials by Semantic Content: The Semiotic Perspective CARLO COLOMBO [email protected] ALBERTO DEL BIMBO [email protected] PIETRO PALA [email protected] Dipartimento Di Sistemi e Informatica, Universitá di Firenze, via Santa Marta 3, I-50139 Firenze, Italy Abstract. Video information processing and retrieval is a key aspect of future multimedia technologies and applications. Commercial videos encode several planes of expression through a rich and dense use of colors, editing effects, viewpoints and rhythms, which are exploited together to attract potential purchasers. Databases of commercials can be accessed in order to analyze how a commercial has been developed, retrieve commercials similar to an example, catalog commercials according to the kind of message conveyed to the user. In this paper, we present a system allowing the retrieval of commercial streams based on their salient semantics. Semantics is regarded from the semiotics perspective: collections of signs and semantic features like colors, editing effects, motion, etc. are used as basic blocks with which the meaning of a commercial is constructed. In our system, it is possible to retrieve commercials according to both the meaning they convey and to their similarity to examples. Keywords: video content analysis, video libraries, video indexing, semiotics 1. Introduction The extraction and manipulation of the information embedded in video data is a key challenge for future multimedia technologies and emerging applications such as digital libraries and interactive video analysis. The expansion of low cost mass storage devices, improved compression techniques, the availability of high transmission rates and computing power have all contributed to make video archives accessible through digital networks and available on everyday desktop computers. However, the advances of computer systems towards becoming “true” multimedia applications largely depend on the availability of tools for manipulating video information oriented to cataloguing videos into a content-searchable database. Much more than text, video conveys information through a multiplicity of planes of communication which encompass what is represented in the images, how the images are linked together, how the subject is imaged and so on. Several recent papers have addressed aspects and problems related to the access and retrieval by content of video streams. Research on automatic segmentation has been presented by several authors [2, 7, 13, 16, 18, 20]. All of them analyze interframe differences to automatically detect sharp and gradual shot transitions (cuts, fades and dissolves) as well as other special transition effects (such as mattes and wipes), and to estimate the motion of camera and objects in the scene. Automatic segmentation into higher level aggregates has also been addressed by some authors. In [1], a set of rules to identify macro segments of 94 COLOMBO, DEL BIMBO AND PALA a video was proposed. Algorithms to extract story units from video are described in [22]. Also in [8, 19] the specific characteristics of a video type are exploited to build higher level aggregates of shots. While significant research efforts have been devoted to the analysis of movies, news reports, and sport videos, the analysis of commercial videos has been virtually neglected by the research community. Only quite recently a few works explicitly addressing commercials investigate the possibility of detecting and extracting advertising content from a stream of video data [6, 15]. The automatic characterization of a commercial video is complicated by the intrinsic and peculiar features of its time-varying content. First, the effectiveness of commercials is mainly related to its perceptual impact than to its mere content or explicit message: the way colors are chosen and modified throughout a spot, characters are coupled and shooting techniques are selected create a large part of the message in a commercial, while the extraction of canonical contents (e.g., imaged objects) has less conceptual relevance than in other contexts. Second, strict time requirements compel the director to make a condensed use of color, rhythm, camerawork, sound, etc. Finally, in commercials, traditional editing effects are augmented with novel and specific artifacts (e.g., computer graphics, cartoons, etc.), so as to draw at best the audience’s attention and emotions to the product being promoted. Until recently, commercials design and production has been a discipline based on a set of usually effective yet empirical rules. Formalized by professionals in the marketing field, such rules associate each single induced impression and emotion with given combinations of editing effects [12]. The use of color effects provides a significant example: color editing is the discipline or method of working with, creating, manipulating, and/or selecting specific colors for the explicit purpose of improving a product’s saleability through aesthetics, decorative composition or function and quality. A color designer must be able to understand all that affects a product’s color from composition to marketing strategy and to assimilate this information providing the most appropriate colors for a product. Quite recently, marketing companies have introduced semiotic methodologies in the process of spot making, so as to ground the principles of advertising into a solid scientific context, and to better combine together the artistic quality of a commercial with its communication effectiveness. Semiotics is more concerned with the sense being vehiculated than with the set of induced emotions. According to semioticians, the analysis (and production) of commercials must be focused on the same narrative mechanisms and structures fairy-tales are based on: this leads to characterize a commercial according to four broad semiotic classes, or categories [4, 10]. In this paper, we address the problem of retrieval by semantic content of commercials based on the vehiculated message at the semiotic level of the signification hierarchy. The choice of semiotic level retrieval is motivated by the peculiar characteristics of commercials. Strategies based on higher semantic levels, such as retrieval by external keywords, would prove less appropriate or even impossible for the scope, due to the fact that the meaning of a commercial is mainly vehiculated at a syntactic level. In fact, commercials usually involve and steer the audience through a rich use and combination of editing effects, colors, graphics, camera movements, etc., while the emphasis on the story is quite poor. RETRIEVAL OF COMMERCIALS BY SEMANTIC CONTENT 95 In our system, a four-category semiotic characterization of commercials is produced; the four semiotic categories are then used to access a database of commercial videos both by explicit query and by similarity with a template. Perceptual level features of visual content—colors, motion, cuts, dissolves, etc.—are extracted through both standard and ad hoc visual processing techniques. The mapping rules connecting perceptual features to semiotic categories has been developed by conducting an extensive empirical model validation starting from heuristics provided by semiotics and marketing experts. The paper is organized as follows: in Section 2 the basic semiotic categories of commercials are introduced. In Section 3 the perceptual features characterizing commercials are described. In Section 4 the rules used to construct the semiotic level starting from the feature level are presented and motivated. In Section 5 the system and its interface are described, and experimental results of commercials retrieval are discussed. Finally, in Section 6 conclusions are drawn and future work is addressed. 2. Semiotics and commercials Semiotics is the science of signs as carriers of sense. In semiotics, “sign” is anything which conveys a sense: words, pictures, sounds, gestures, clothes, etc. Semiotics suggests that signs are related to their meaning by social conventions (or, in the semiotics jargon, codes), i.e., by a specific cultural context. Semiotic principles represent a reference and useful framework for the analysis of videos. Codes in a video include genre, editing effects (cuts, fades, dissolves, cutting rate and rhythm), camerawork (shot size, focus, camera movement, angle, slope of framing), manipulation of time (compression, flashbacks, flashforward, slow motion), and well defined choices of lighting, color, sound, graphics (text or cartoons) and narrative style. The difficulty to realize the existence of these codes is mainly due to the fact that we are often so used with them that the meaning of signs deceptively seems to be natural and univocal. Through a semiotic analysis, the nature and use of each code can be highlighted, thus making it explicit how signs are properly organized for the construction of sense. Commercials are a particular kind of videos where, due to time constraints, the link between sign and sense is particularly stressed, so as to obtain the best quality and effectiveness for the conveyed message. Therefore it is not surprising that, quite recently, semiotics studies have appeared explicitly addressing the analysis of commercial videos and their characteristics [4]. According to research in this field, the narrative structure of commercials conforms to a four-element morphology closely related to the one introduced by Propp for fairy-tales [10]. Semiotics introduce a classification of commercials into four different categories, related to the narrative element which is relevant w.r.t. the others. Practical commercials emphasize the qualities of a product according to a common set of values. These commercials represent everyday life scenes, commonplaces that are recognized by the audience. The product is described in a familiar environment so that the audience naturally perceives it as useful in everyday life. Critical commercials introduce a hierarchy of reference values. In this kind of commercials, the product is the subject of the story. It is a real story, allowing to focus 96 COLOMBO, DEL BIMBO AND PALA Figure 1. The semiotic square for commercials. on the qualities of the product through an apparently objective description of its features. Utopic commercials provide the definite evidence that the product is able to succeed in critical tests. In this kind of commercials, the story doesn’t follow a realistic plot: rather, situations are presented as in a dream. Wide scenarios are used to present the product, which is shown to succeed in critical conditions often in a fantastic and unrealistic way. Playful commercials emphasize the match between user’s needs and product’s qualities. These commercials represent a manifest parody of the other typologies of commercials: it is clearly stated to the audience that they are watching advertising material. Situations and places are visibly different from everyday life, and deformed in such a caricatural and grotesque fashion that the agreement between product qualities and purchaser’s needs is often remarked in an ironical way (e.g., an old woman driving a Ferrari at 30 Km/hour). A common representation of commercials categories uses the semiotic square [11]. This square allows to combine in pairs each out of four semiotic objects with a same semantic level according to three basic relationships: opposition, completion, contradiction (see figure 1). Of these, only the last one has a quantitative characterization: this implies that the objects placed at opposite sides of the square diagonals are strongly related each other, being complementary. In Section 4 it is shown how all the conceptual categories introduced here come into play in the creative process of spot making—the “spot language.” 3. Feature extraction In this section we introduce the perceptual-level features which are significant for a semioticlevel characterization, and summarize the algorithms devised to extract such features automatically in a commercial video. 3.1. Video Segmentation The primary task of video analysis is its segmentation, i.e., the identification of the start and end points of each shot that has been edited, in order to characterize the entire shot through its most representative keyframes. The automatic recognition of the beginning and end of each shot implies solving two problems: i) avoiding incorrect identification of shot RETRIEVAL OF COMMERCIALS BY SEMANTIC CONTENT 97 changes due to rapid motion or sudden lighting change in the scene; ii) identification of sharp transitions (cuts) as well as gradual transitions (dissolves). 3.1.0.1. Cuts. Rapid motion in the scene and sudden change in lighting yield low correlation between contiguous frames especially in the case in which a high temporal subsampling rate is adopted. To avoid false cut detection, a metric has been studied which proves highly insensitive to such variations, while maintaining reliable in detecting “true” cuts [8]. For this reason, each frame has been partitioned into nine subframes. Each subframe is represented by considering the color histograms in the HSI color space. More precisely, to improve independence with respect to lighting conditions, the histogram takes into account only hue H and saturation S properties. It can thus be represented as a function H(H, S) : <2 → <+ . Cut detection is performed by considering the volume of the difference of subframe hisj tograms in two consecutive frames. If Hi (H, S) is the histogram computed for the j-th subframe ( j ranging from 1 to 9) of the i-th video frame, for each subframe the following quantity is computed: Z Z j vi = £ ¤ j j Hi (H, S) − Hi+1 (H, S) dH dS. (1) Such quantity represents for the j-th subframe the volume of the difference of histograms in two consecutive frames. The presence of a cut at frame i is detected by thresholding the j average value of vi for the nine subframes. 3.1.0.2. Dissolves. The dissolve effect merges two sequences by partly overlapping them. Dissolves detection in commercials is particularly difficult because of their very limited duration. Due to this peculiarity, existing approaches to dissolve detection (developed for movie segmentation purposes) have shown a very poor performance. We have developed instead an original method, based on corner statistics, as a means to detect dissolves [5]. Indeed, during the dissolve, the first sequence gradually fades out (i.e., is darkened) while the second sequence fades in. Therefore, during the editing effect, corners associated to the first sequence gradually disappear and those associated with the second sequence gradually appear. This yields a local minimum in the number of corners detected during the dissolve. Corner detection is based on the algorithm presented in [17]. An image location (x, y) is defined as a corner if the intensity gradient ∇ I in a patch (x + u, y + v) around it is not isotropic, i.e., it is distributed along two preferred directions. Operationally speaking, a corner is characterized by large and distinct values of λ1 (x, y) and λ2 (x, y), the eigenvalues of the gradient auto-correlation matrix A(x, y) = µ 2® Ix hI y Ix i ¶ hIx I y i 2® , Iy (2) where (I x , I y ) = ∇ I and h i denotes Gaussian smoothing over the patch. The overall feature related to the presence of cuts and dissolves in a video is obtained as Fcuts = #cuts +#cuts #dissolves and Fdissolves = #cuts +#cuts with Fcuts , Fdissolves ∈ [0, 1]. #dissolves 98 COLOMBO, DEL BIMBO AND PALA 3.1.0.3. Rhythm. Another relevant video editing characteristic is the rhythm of a sequence of shots, as related to shot duration and to the use of cuts and dissolves to join shots. For instance, a sequence of short shots can be used to keep continuously alive the audience’s attention and emphasize dynamism, modernity, etc., while a sequence of gradually shorter shots can induce an increase of tension. The rhythm r (i 1 , i 2 ) ∈ [0, 1] of a video sequence over a frame interval [i 1 , i 2 ] is defined as r (i 1 , i 2 ) = #cuts + #dissolves , i2 − i1 + 1 (3) where #cuts and #dissolves are measured in the same interval. A simple feature measuring the internal rhythm of an entire sequence is the average rhythm, as related to the overall number of breaks: Fbreaks = r (1, #frames). This is a normalized quantity, such that Fbreaks = Fcuts + Fdissolves . The absence of breaks is obviously described by the dual feature Fcontinuous = 1 − Fbreaks . 3.2. Shot content Once a video has been fragmented into shots and video editing features have been extracted, the content of each shot needs to be internally described. To this end, features are extracted from each shot keyframe describing characteristics such as the presence and distribution of relevant colors in the scene, and the distribution and orientation of lines highlighting specific camera takes. 3.2.0.4. Colors. A description of the shot chromatic content is obtained by performing keyframe color segmentation, thus highlighting its main color regions. In our system, image segmentation is carried out by color cluster analysis: segmentation is then achieved by backprojecting cluster centroids (feature space) onto the image [9]. The use of the LUV color space allows that small feature distances correspond to similar colors in the perceptual domain. Clustering in the 3-dimensional feature space is obtained using an improved version of the standard K-means algorithm [14], which avoids convergence to non-optimal solutions. Competitive learning is adopted as the basic technique for grouping points in the color space as in [21]. The chromatic content of a video sequence is expressed using a set of eight numbers ci ∈ [0, 1] with i denoting one color out of the set {red, orange, yellow, green, blue, purple, white, black}, each number quantifying the presence in the keyframe of a region exhibiting the i-th color. Color related features used in the system are Frecurrent ∈ [0, 1] (expressing the ratio between colors which recur in a high percentage of keyframes and the overall number of significant colors in the video sequence) and Fsaturated (expressing the relative presence of saturated colors in the scene). Their dual are respectively Fsporadic = 1 − Frecurrent and Funsaturated = 1 − Fsaturated . 3.2.0.5. Lines. The detection of significant line slopes in the keyframe is accomplished by exploiting the Hough transform [3] to generate a line slope histogram. The feature Fslanted ∈ [0, 1] gives the ratio of slanted (i.e., with a slope neither horizontal nor vertical) RETRIEVAL OF COMMERCIALS BY SEMANTIC CONTENT 99 lines with respect to the overall number of lines in the sequence. Its dual is Fhor/vert = 1 − Fslanted . 4. Mapping features onto semiotic categories We summarize below the mapping between the set of perceptual features (lower semantic level) and each of the four semiotic categories (higher semantic level) of commercials. Such a mapping allows to organize a database of commercials on the basis of their semiotic content and provide content-based access facilities. The idea is to express the degree of similarity S of each video with a given query by a simple weighted average of the partial scores expressing the match with each individual semiotic category: S = q1 Spractical + q2 Splayful + q3 Sutopic + q4 Scritical , (4) the query being expressed in terms of the set of weights {qi }. The way partial scores are obtained highlights the language link for each semiotic category. Language links have been formalized with the support of experts in the semiotics and marketing fields, who provided a number of heuristics also used in the practice of commercial production (a good survey can be found in [4] see also [12]). The heuristics are used to identify the perceptual features contributing to each semiotic category. Specifically, we assume that each partial score is obtained as a linear combination of perceptual features according to a set of weights {wi j } ∈ [0, 1], i = 1 . . . 4, j = 1 . . . 3. Weights are then empirically adjusted by linear regression based on ground truth data provided by the experts. The rationale for the mapping follows. Practical commercials have a linear narrative style: everything in the video must appear real and close to everyday experience. Camera takes are usually frontal, and care is taken that all transitions take place in a smooth and natural way. This implies choosing long dissolves for merging shots (short dissolves are deliberately interpreted by the system as cuts), and the prevalence of horizontal and vertical lines—giving respectively the impression of relax and solidity—over slanted lines: Spractical = w11 Fdissolves + w12 Fhor/vert + w13 Funsaturated . (5) In playful commercials, the presence of the camera is always emphasized, and all possible effects are used to stimulate the active participation of the audience in the creation of sense. Everything looks strange and “false” (colors are unnatural, camera takes are usually unprobable, etc.). Hence Splayful = w21 Fcuts + w22 Fslanted + w23 Fsaturated . (6) The main characteristic of utopic commercials is to present the product as part of a world which loops real but doesn’t resemble everyday life (i.e., it is a realistically rendered ideal world). For this reason, care is taken to produce a movie-like atmosphere, with a set of 100 COLOMBO, DEL BIMBO AND PALA Table 1. Typical ranges of values of the perceptual features for the four semiotic categories. The symbol—means that the feature is not relevant for the category. P. feature Practical Fdissolves Playful Utopic Critical – [0, 1] – [0, 0.5] Fcuts – [0.6, 1] [0, 0.5] – Fslanted – [0.3, 1] – – Fhor/vert [0, 1] – – [0, 1] [0.5, 1] – – – Fsaturated – [0.5, 1] – – Frecurrent – – [0.3, 1] – Fsporadic – – – [0.3, 1] Fcontinuous – – – [0.3, 1] Funsaturated dominant colors defining a closed chromatic world and with all of the traditional editing effects (cuts, dissolves) possibly taking place: Sutopic = w31 Fcuts + w32 Fdissolves + w33 Frecurrent . (7) Critical commercials spend most of the spot time displaying the product (typically in central and frontal views), while the audio comment continues listing its qualities: the scene has to appear “more realistic than reality itself.” For this reason, the number of breaks is kept low, while the ever changing colors in the background due to smooth camera motions contribute to draw the attention to the (constant) color of the product: Scritical = w41 Fcontinuous + w42 Fsporadic + w43 Fhor/vert . (8) The discussion above validates the usual semiotic square representation of commercials categories (see again figure 1). Specifically, the pairs practical-playful and utopic-critical appear to be strongly complementary: notice for instance that the features characterizing the “playful” category (Eq. (6)) are virtually dual to those of the “practical” category (Eq. (5)). Table 1 reports typical ranges of values of the perceptual features for the four semiotic categories. 5. 5.1. Implementation and results System setup and implementation The retrieval system has been implemented on a stand-alone platform, featuring a Silicon Graphics R4600 133 MHz processor, 128 MB Memory and IRIX 5.3 Operating System. The system is composed of three main components: i) a feature extraction engine, ii) a retrieval engine and iii) a graphic interface. At database population time, each commercial RETRIEVAL OF COMMERCIALS BY SEMANTIC CONTENT 101 is automatically processed through the feature extraction engine. This associates with each video a set of four scores (Spractical , Splayful , Sutopic , Scritical ) representing the extent to which the video conforms to the four semiotic categories (as expounded in Sections 3 and 4). Processing of a 450-frames video requires about 5 minutes. Scores extracted from all the database videos are stored in an index signature file associating each set of scores with the video they refer to. Presently the system includes over 150 commercial videos digitized from several Italian TV channels. The retrieval engine and the graphic interface (see figure 3) have been developed using JDK 1.1.3. The graphic interface is designed to support video browsing (upper left part of the interface) and two retrieval modalities: – the user can select the degree to which the video should conform to the four basic semiotic categories (upper central part of the interface); – the user can select one of the videos from the database and query for similar videos (upper right part of the interface). In the first case, a set of four weights is extracted according to the values selected by the user for the degree of conformity to each category. Categories are arranged according to the semiotic square of figure 1: the relevance of each category, ranging from 0% to 100%, can be selected through a scroll-bar. The matching score S is computed for each video in the database according to Eq. (4), and videos are presented in decreasing order of match Figure 2. Representation of database commercials semiotic features (see text). 102 COLOMBO, DEL BIMBO AND PALA in the lower left part of the interface. In the case of search by global similarity, matching scores are computed by considering the correlation between the characteristic features of the video used as reference and those of the rest of the database. Videos are then presented to the user in decreasing order of similarity. Also, by selecting a video from the output list, the video can be either viewed at full or in its most salient keyframes through a movie player application. In both cases, retrieval is performed in about 4 sec. 5.2. Experimental results Several tests were conducted to assess system performance and conformity with human experts judgement. Each video was classified in advance by a team composed of 5 experts in the semiotic and marketing fields. The experts were asked to classify the commercials by associating with each commercial a position on the semiotic square (figure 2(a) shows the practical-playful and critical-utopic diagonals of the square as coordinate axes). To ease the classification task, 25 rectangular regions were identified in the square through a regular partition, supporting the definition of three bands of conformity of a commercial to a generic category (the bands correspond to conformity degrees of 0–20%, 20–60%, 60–100% repectively). Figure 3. Retrieval of playful commercials. RETRIEVAL OF COMMERCIALS BY SEMANTIC CONTENT Figure 4. Some of the frames for the first ranked spot in figure 3 (“Tacchini”). Figure 5. Some of the frames for the second ranked spot in figure 3 (“Audi”). 103 104 COLOMBO, DEL BIMBO AND PALA Table 2. Agreement (in terms of city block distance) between system and experts classification of commercials with reference to queries for purely practical, critical, utopic and playful (d) commercials. Practical Rank Critical Utopic Playful Dist. Rank Dist. Rank Dist. Rank Dist. 1 0 1 0 1 1 1 0 2 5 2 3 2 0 2 1 3 4 3 1 3 4 3 0 4 3 4 3 4 3 4 1 5 4 5 3 5 0 5 0 Figure 2 evidences how database commercials were classified by the semiotic experts (figure 2(b)) and by the system (figure 2(c)): each region in the square is associated with a vertical bar, whose height is proportional to the percentage of commercials located in that region. Notice that many database commercials conform to the playful category, thus evidencing a general trend of nowadays commercials to prefer an unconventional and “smart” way of presenting the product. A more accurate quantitative measure of system effectiveness is shown in Table 2. This table shows the agreement between the system and experts with reference to queries for purely practical, critical, utopic and playful commercials. For each query, the five best ranked commercials are considered. For a generic commercial, the agreement between the system and experts is measured as the city block distance between the two blocks in the semiotic square where the system and the experts located the commercial. The best average agreement corresponds to the query for playful commercial, evidencing the effectiveness of the features used to model this category of commercials. The worst performance corresponds to queries for practical commercials: in this case, many commecials that are classified as practical by the system, are classified as critical by the experts. The reason why the experts classify these commercials as critical is that these commercials feature a remarkable presence of foreground views of the promoted product. However, this is not a feature which can be coped with by the system—presently the system isn’t able to check whether the scene represents a foreground object or not—and consequently the system classifies as practical all those critical commercials exhibiting as distinguishing feature the foreground view of the promoted product. Figure 3 shows the output of the retrieval system in response to a query of purely playful commercials. At the right of the list of retrieved items, a bar-graph displays the degree by which each shot of the best ranked video—cuts and dissolves are represented by white thin vertical lines—belongs to the playful category. The best ranked videos in this category all advertise products for “young and smart” people (sportswear, sport watches, blue jeans etc.): this is not occasional, but reflects instead the common marketing practice of calibrating a commercial to a specific audience. Figures 4 and 5 report some of the keyframes for the two best ranked spots. The first ranked commercial features a relevant conformity to the playful category and a weak conformity to the critical one (Splayful = 90% and Scritical = 15%). Differently, the second ranked RETRIEVAL OF COMMERCIALS BY SEMANTIC CONTENT 105 commercial features a relevant conformity to the playful category and a weak conformity to the utopic one (Splayful = 89% and Sutopic = 21%). The first best ranked spot (advertising sportswear), presents all the typical features of a playful commercial, including a very fast rhythm, non orthodox camera takes—and situations, like skating in a tennis court, reflecting semantic issues at a higher level w.r.t. the level of our computer analysis—and very saturated colors. Similar features are present in the second ranked spot. Concerning the second ranked spot (figure 5), notice the presence of quasi-identical keyframes (e.g., the close-ups of the man), which are typical of a non linear Figure 6. Feature values for the “Tacchini” commercial. 106 COLOMBO, DEL BIMBO AND PALA development of the story. Indeed, in this spot the camera frenetically switches between close-up views of the man and his dogs, almost never alternating details and global views like a utopic spot would do. In figures 6 and 7 they are shown respectively the diagrams of the characteristic features (saturation, hue, luminance, line slopes, number of cuts and dissolves) for the two best ranked commercials in figure 3. Notice how in playful commercials cuts are by far the prevailing editing effect, while dissolves are almost absent. Also, as evident from the figures, line slopes are typically slanted, and colors are saturated. Figure 7. Feature values for the “Audi” commercial. RETRIEVAL OF COMMERCIALS BY SEMANTIC CONTENT Figure 8. Retrieval of practical commercials. Figure 9. Some of the frames for the first ranked spot in figure 8 (“Knorr”). 107 108 COLOMBO, DEL BIMBO AND PALA Figure 8 shows the output of the retrieval system in response to a query of purely practical commercials. This time, the best ranked videos obtained in response to the query all advertize typical family products (the first three retrieved commercials advertize a soup, a soap and a kind of rice, respectively): this again highlights the fact that the kind of promoted product often drives the choice of the semiotic category to use. Figures 9 and 10 show some relevant frames for the two best ranked commercials in figure 8. The first ranked commercial features a relevant conformity to the practical category and a weak conformity to the critical one (Spractical = 86% and Scritical = 20%). The second ranked commercial features a relevant conformity to the practical category and a less relevant conformity to the critical one (Spractical = 85% and Scritical = 50%). From a rapid inspection of these frame sequences, it is evident that practical spots have a quite linear narrative structure, making it is easy to reconstruct the story told in the spot and fill in the “semantic gaps” from frame to frame: this could be a very hard task with the playful sequences of figures 4 and 5. Figures 11 and 12 show the feature distributions for the two best ranked commercials in figure 8. Notice how, as opposite to playful commercials, in practical commercials long dissolves prevail over cuts (see again figure 9), horizontal/vertical lines are dominant, and colors are non-saturated. Figure 13 shows retrieval results for a query where a database commercial is used as template. In this case, the system output includes commercials in decreasing degree of similarity w.r.t. the template. Figures 14 and 16 report respectively some keyframes for the template and for the first two best ranked commercials. Similarity between commercials is evaluated by global correlation between perceptual level features. The commercial used as template in figure 13 is a practical commercial (wax for pavements), featuring long Figure 10. Some of the frames for the second ranked spot in figure 8 (“Dixan”). RETRIEVAL OF COMMERCIALS BY SEMANTIC CONTENT 109 dissolves, lines with horizontal slopes, and non-saturated colors (see figure 17). Despite the apparent dissimilarity with the template feature distributions, the characteristic features similarity of two best ranked commercials is quite high. As an example, notice that color saturation and cut density in figure 18 are quite similar to those of figure 17. Similarly, despite a higher value for the saturation, figure 19 has a hue and luminance distribution very close to that of the template commercial, thus reaching a high value of the similarity score by linear superposition of the partial scores. Figure 11. Feature values for the “Knorr” commercial. 110 Figure 12. COLOMBO, DEL BIMBO AND PALA Feature values for the “Dixan” commercial. RETRIEVAL OF COMMERCIALS BY SEMANTIC CONTENT Figure 13. Retrieval by similarity. Figure 14. Some of the frames for the commercial used as template in figure 13 (“Emulsio”). 111 112 COLOMBO, DEL BIMBO AND PALA Figure 15. Some of the frames for the first best ranked commercial in figure 13 (“Barilla”). Figure 16. Some of the frames for the second best ranked commercial in figure 13 (“Egitto”). RETRIEVAL OF COMMERCIALS BY SEMANTIC CONTENT Figure 17. Feature values for the “Emulsio” commercial. 113 114 Figure 18. COLOMBO, DEL BIMBO AND PALA Feature values for the “Barilla” commercial. RETRIEVAL OF COMMERCIALS BY SEMANTIC CONTENT Figure 19. Feature values for the “Egitto” commercial. 115 116 6. COLOMBO, DEL BIMBO AND PALA Conclusions and future work In this paper, we have addressed the problem of the semantic characterization of commercial videos from a semiotic perspective. A specific set of rules allows us to map low (perceptual) level features of a commercial into higher semantic categories, thus making it possible to represent and retrieve commercials based on their dominant semiotic characterization and to global similarity w.r.t. a template video. Our system is intended to support the marketing professionals both in the creation process (as a source of inspiration for choosing a message with predefined characteristics) and in the detection of possible “sense overlaps” and conflicts occurring when two or more semiotic categories happen to coexist in the same advertising campaign or even in the same spot. Future work will address including in the system a higher number of perceptual features, working with very large archives (e.g., some thousands videos), and performing a further analysis of the effectiveness and usability of the system with the aid of experts in the marketing and semiotics fields. Acknowledgments The authors warmly thank Bruno Bertelli, Laura Lombardi, Mauro Caliani and Jacopo M. Corridoni for their help in the development of this research. References 1. P. Aigrain, P. Joly, P. Lepain, and V. Longueville, “Medium knowledge-based macro segmentation of video sequences,” in Intelligent Multimedia Information Retrieval, M. Maybury (Ed.), 1996. 2. F. Arman, A. Hsu, and M. Chi, “Feature management for large video databases,” in Conf. on Storage and Retrieval for Image and Video Databases, W. Niblack (Ed.), San Jose, CA, May 1993, pp. 2–12. 3. D.H. Ballard and C.M. Brown, Computer Vision, Prentice-Hall: Engelwood Cliffs, NJ, 1982. 4. B. Bertelli, “La pubblicitazione: la citazione delle avanguardie storiche nelle pubblicità video e stampa,” Ph.D. Thesis, Dept. of Visual Arts, Univerity of Siena, Italy, 1997. 5. M. Caliani, “ Ricerca per contenuto di filmati pubblicitari,” Master’s Thesis, Dept. of Systems and Informatics, University of Florence, Italy, 1997. 6. M. Caliani, C. Colombo, A. Del Bimbo, and P. Pala, “Commercial video retrieval by induced semantics,” in Proc. IEEE Int’l Workshop on Content-based Access of Image and Video Databases CAIVD’98, Bombay, India, Jan. 1998, pp. 72–80. 7. J.M. Corridoni and A. Del Bimbo, “Film editing reconstruction and semantic analysis,” in Proc. CAIP’95, Prague, Czech Republic, Sept. 1995. 8. J.M. Corridoni and A. Del Bimbo, “Structured digital video indexing,” in Proc. 13th Int’l Conf. on Pattern Recognition ICPR’96, Wien, Austria. Aug. 1996, pp. (III):125–129. 9. J.M. Corridoni, A. Del Bimbo, and P. Pala, “Sensations and psychological effects in color image databases,” ACM Multimedia Systems Journal, Vol. 7, No. 3, May 1999, pp. 175–183. 10. J.-M. Floch, Sémiotique, marketing et communication. Sous les signes, les stratégies. Presses Universitaires de France, Paris, France, 1990. 11. A.J. Greimas, Sémantique structurale. Larousse, Paris, 1966. 12. C.R. Haas, Pratique de la publicité, Bordas: Paris, France, 1988. 13. A. Hampapur, R. Jain, and T. Weymouth, “Digital video segmentation,” in 2nd Annual ACM Multimedia Conference and Exposition, San Francisco, CA, Oct. 1994. RETRIEVAL OF COMMERCIALS BY SEMANTIC CONTENT 117 14. A.K. Jain, Algorithms for Clustering Data, Prentice Hall: Englewood Cliffs, NJ, 1991. 15. R. Lienhart, C. Kuhmünch, and W. Effelsberg, “On the detection and recognition of television commercials,” in Proc. Int’l Conf. on Multimedia Computing and Systems, Ottawa, Canada, June 1997, pp. 509–516. 16. A. Nagasaka and Y. Tanaka, “Automatic video indexing and full video search for object appearances,” in IFIP Trans., Visual Database Systems II, W.E. Knuth (Ed.), 1992, pp. 113–128. 17. J.M. Pike and C.G. Harris, “A combined corner and edge detector,” in Proc. Fourth Alvey Vision Conference, 1988, pp. 147–151. 18. S. Smoliar and H. Zhang, “Content-based video indexing and retrieval,” IEEE Multimedia, Vol. 2, No. 1, pp. 63–75, Summer 1994. 19. D. Swanberg, C.F. Shu, and R. Jain, “Knowledge guided parsing in video databases,” in Conf. on Storage and Retrieval for Image and Video Databases, W. Niblack (Ed.), San Jose, CA, May 1993, pp. 13–24. 20. Y. Tonomura, A. Akutsu, Y. Taniguchi, and G. Suzuki, “Structured video computing,” IEEE Multimedia, Vol. 1, No. 3, pp. 35–43, Fall 1994. 21. T. Uchiyama and M.A. Arbib, “Color image segmentation using competitive learning,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 16, No. 12, pp. 1197–1206, Dec. 1994. 22. M. Yeung, B.L. Yeo, and B. Liu, “Extracting story units from long programs for video browsing and navigation,” in Proc. IEEE Int’l Conf. on Multimedia Computing and Systems, Hiroshima, Japan, June 1996, pp. 296–305. Carlo Colombo holds a M.S. in electronic engineering from the University of Florence, Italy (1992) and a Ph.D. in robotics from the Scuola Superiore di Studi Universitari e di Perfezionamento Sant’Anna, Pisa, Italy (1996). He is presently an assistant professor at the Faculty of Engineering of the University of Florence, Italy. His main research interests are in computer vision and its applications to advanced human-machine interfaces, robotics and multimedia. Alberto Del Bimbo is Full Professor and Director of the Department of Sistemi e Informatica at the Università degli Studi di Firenze, Italy. He is also the Director of the Master in Multimedia at the same University. His scientific interests and activity have addressed the subject of Image Technology and Multimedia, with particular reference to object recognition and image sequence analysis, content-based retrieval for image and video databases, visual languages and advanced man-machine interaction. Prof. Del Bimbo is the author of over 150 publications, 118 COLOMBO, DEL BIMBO AND PALA appeared in the most distinguished international journals and conference proceedings and is the author of the monography “Visual Information Retrieval” edited by Morgan Kaufman in 1999. He has also been the Guest Editor of several special issues of International Journals and the Chairman of several conferences in the field of Image Processing, Image Databases and Multimedia. He is IAPR fellow and presently a Member of the Steering Committee of IEEE ICME, Int. Conference on Multimedia and Expo and of the VISUAL conference series. From 1996 to 2000 he was the President of the Italian Chapter of IAPR, the International Association for of the Italian Chapter of IAPR, the International Association for Pattern Recognition. Since 1999 he is a Member of the IEEE Publications Board. He presently serves as Associate Editor of IEEE Trans. On Multimedia, IEEE Trans. enditemize on Pattern Analysis and Machine Intelligence, Pattern Recognition, Journal of Visual Languages and Computing, and Multimedia Tools and Applications Journals. Pietro Pala holds a Laurea degree in Electronic Engineering (1994) and a Ph.D. in Computer Science (1998) both from the Università di Firenze, Italy. Presently he is an Assistant Professor at the Dipartimento Sistemi e Informatica of the Università di Firenze, Italy. His main research include pattern recognition, image and video databases, neural networks and related applications.
© Copyright 2024 Paperzz