2012 IEEE International Conference on Multimedia and Expo Salient Object Detection through Over-Segmentation Xuejie Zhang, Zhixiang Ren, Deepu Rajan School of Computer Engineering Nanyang Technological University Singapore {zhangxuejie,renz0002,asdrajan}@ntu.edu.sg Abstract—In this paper we present a salient object detection model from an over-segmented image. The input image is initially segmented by the mean-shift segmentation algorithm and then over-segmented by a quad mesh to even smaller segments. Such segmented regions overcome the disadvantage of using patches or single pixels to compute saliency. Segments that are similar and spread over the image receive low saliency and a segment which is distinct in the whole image or in a local region receives high saliency. We express this as a color compactness measure which is used to derive saliency level directly. Our method is shown to outperform six existing methods in the literature using a saliency detection database containing images with human-labeled object contour ground truth. The proposed saliency model has been shown to be useful for an image retargeting application. Keywords-Saliency detection, image segmentation, image retargeting I. I NTRODUCTION Saliency detection is the process of detecting interesting visual information in an image. It is used in applications such as image and video retartgeting, object detection and recognition [1] and video analysis [2]. One of the most popular saliency models is based on a center-surround mechanism inspired by biological processes in the mammalian vision system [3]. In this model, the center-surround mechanism is realized through filtering feature maps of the input image by Difference-of-Gaussian (DoG) filters. The DoG response maps from different scales are combined to form a saliency map. Other saliency models analyze visual information in the frequency domain. Through observing the properties of DoG filters in frequency domain, Achanta et al. proposed the frequency tuned saliency model [4] which simplifies the effect of a group of DoG filters as the difference between a large and a small Gaussian kernel. The saliency map is generated through filtering the image by these two kernels only. Another frequency domain method is the spectral residual method proposed by Hou and Zhang [5]. This method produces saliency by obtaining the spectral residual of the image and removing redundant or non-salient information. Another approach for saliency detection is based on the theory of sparse coding. Incremental coding length method [6] belongs to this category and it models visual information as 978-0-7695-4711-4/12 $26.00 © 2012 IEEE DOI 10.1109/ICME.2012.166 Yiqun Hu School of Computer Science & Software Engineering University of Western Australia Perth, Australia [email protected] sparse basis functions and detects salient feature channels. A saliency map is produced from the responses to the salient feature channels. Gopalakrishnan et al. presented a method to derive saliency through modeling the distributions of color and orientation [7]. Goferman et al. proposed a contextaware saliency model by comparing each image patch with other image patches and considering the surrounding context information [8]. In this paper we propose a simple but effective saliency detection model through over-segmenting an image and analyzing the color compactness in the image. The proposed model detects salient object with an accurate contour, which is not possible for many existing models. Many saliency detection models such as [6], [8] utilize image patches as the processing unit for saliency analysis. Image patches suffer from the curse of dimensionality. Moreover, patches with complex distribution of color appear more salient or different from other patches. In our method the input image is oversegmented to small segments with perceptually uniform color properties. The small segments are the basic processing units whose color and spatial positions are used for saliency detection. A color compactness measure is also proposed. The relationships between the small segments in both spatial and color domains are used for calculating the color compactness of a region. We demonstrate the effectiveness of the proposed saliency model on a benchmark data set and also by applying it for image retargeting with improved results. The rest of this paper is organized as follows. Section II presents our model, including the over-segmentation and color compactness measure. Section III presents a benchmarking experiment and compares our models with other saliency models. An image retargeting application is also presented in this section. Section IV concludes this paper. II. T HE P ROPOSED M ETHOD We present a simple and efficient method to identify salient objects in an image in a bottom-up manner. The first step is to obtain an over-segmented image so that each segment is small and has uniform color. This helps to improve saliency detection accuracy when compared to patch-based techniques which are not suitable for describing 1033 (a) (b) (c) (d) Figure 1: Left: Original image; Top right: Visualization of mean-shift segmentation of zoomed region; Bottom right: Visualization of over segmentation using a mesh grid. region appearances. Specifically, these methods produce high saliency on patches falling on object boundaries since they are more different from surrounding patches in high dimensional patch space [8]. Although multi-scale approaches have been proposed to alleviate this problem, we show in the experimental results that such methods, even with the additional contextual information, is not sufficient to extract the salient object completely. Following over-segmentation, each segmented region is compared to all other regions through a compactness measure that is a function of color similarity and spatial distance. Indeed, this simple formulation of compactness measure serves as the measure of saliency of a segment. Figure 2: (a) Input image, (b) Initial mean-shift segmentation, (c) Visualization of initial segmentation labels and (d) Visualization of segmentation labels after over-segmentation. A. Image Over-Segmentation B. Compactness Measure An initial segmentation of the image is done through the popular mean-shift segmentation [9]. Our objective is to model the compactness of color information. With the initial segmentation, the segments vary widely in size and are of irregular shapes. In these kinds of segments, it is not meaningful to compute the compactness of color. On the other hand, considering each pixel individually leads to computationally expensive algorithms. Hence, we take the middle ground of over-segmentation which results in small segments that have uniform chromatic properties while not suffering from the curse of dimensionality. Over-segmentation is achieved by overlaying a quad mesh on the mean-shift segmented image and splitting each segment into smaller segments. We use the quad size of 20×20 pixels. If a quad contains only one label as determined by the initial segmentation, it is considered as one segment. If the neighboring quad also has the same label, it is considered as a different segment. This is illustrated in fig. 1 which shows an eagle image on the left and a visualization of the meanshift segmentation of a zoomed region in the top right where each color represents a region label. The over-segmented visualization on the bottom right shows that the sky region is split into several segments although they contain the same label. If a quad contains more than one label, then each We define saliency as the compactness of color information in a particular segment. Since the segments themselves are small, comparison of color compactness with other segments gives saliency information at a finer resolution. The two hypotheses that guide our formulation of color compactness in a segment are the following label is considered as a segment. This is shown in the over segmentation of the wing regions where each quad may have segments from the wing as well as the background sky. Fig. 2 shows the results of over segmentation for the entire image in which fig. 2(a) and (b) show the original image and the mean-shift segmented image, respectively while fig. 2(c) and (d) show a visualization of the initial segments and the oversegmented regions, respectively. 1) When two segments have the same or similar color, the nearer they are to each other, the higher is the compactness of this color. 2) Given two segments at a certain distance from each other, the more similar their colors are, the lower is the compactness of this color. Hypothesis 1 is illustrated in fig. 3, where the spread of the red color implies the color is not compact in the left image but is compact in the right image leading to higher saliency of the color. Hypothesis 2 is illustrated in fig. 4 in which dissimilar colors of segments separated by the same distance indicate higher saliency. Thus, in the left image, the red color has less saliency compared to the right image. The compactness of color information in an image can be measured using the relationship between each segment and the rest of the segments in the image. We express these two hypothesis as a compactness measure in the following form: 1034 (a) Figure 3: Spread of color indicating its compactness. the saliency map of the eagle image in fig. 2. Comparing to the result of [8] in fig. 5(b), our method produces consistent high saliency level for the salient object in the image and the boundary of the eagle is also well preserved. The drawback of [8] is that it only produces higher saliency for regions with very high contrast boundaries such as the boundary of the eagle’s head and tail and leaves hollow space inside these boundaries with low saliency. Figure 4: The distances between two segments in the left and right images are the same. Red is more compact in the right image. 1 1 + C(i, j) COM P (i) = N 2 + P (i, j) (b) Figure 5: The saliency map of the eagle image: (a) Our result and (b) The result by [8]. C. Why Over-Segmentation The reason for an over-segmentation with a mesh grid is to divide the image into small patches with uniform color properties. It is natural to ask what happens if over segmentation is not done and the compactness measure is applied to the initial segmented image. The compactness of a color might not be correctly derived if the segments are large. This can be shown through a simple experiment. Fig. 6(a) shows the ground truth salient region for the original image shown in fig. 7(a). If the over-segmentation step in the proposed method is replaced by a normal meanshift segmentation, the result of saliency detection is shown in Fig. 6(b). We can see that in such a scene where there is a large background segment spreading across the image, it is possible that the large segment gains a high saliency level although it is not compact in the image. Fig. 6(c) is the result by our method using mesh grid for over-segmentation and is clearly closer to human perception and to the ground truth. It is possible to set the mean-shift algorithm to oversegmentation mode by varying its parameters. However, we show that this type of over-segmentation will also fail. Consider fig. 7(b) which shows the segmented regions when the mean-shift algorithm parameters are set as follows: spatial radius = 2, feature radius = 4, and minimum segment size = 10. In this case, the entire background still formed a single large segment that can not be compared directly with other segments for color compactness because its size is much larger than the rest of the segments. Fig. 7(c) shows our over-segmentation result using a mesh grid. This is suitable for deriving the compactness of color since the segment are all small. The saliency map obtained by using over-segmentation by the mean-shift algorithm is shown in fig. 6(d). Although there is a slightly lower saliency for the background, it does not get completely eliminated as in the (1) j=i where C(i, j) is the Euclidean distance between the mean color of segment i and j in CIELUV color space, P (i, j) is the distance between the centroids of segment i and j in the image. Both C and P are normalized to the range of [0, 1]. N is the number of segments j. The reason for a small constant value 1 (taken as 0.1) is that, without 1 , if two segments have the same color, i.e., C(i, j) = 0, the compactness measure would be independent of how far the segments are, which is contrary to the first hypothesis. 2 in the denominator serves to avoid dividing by 0 when P (i, j) = 0. Of course, this is a rare case when the centroid of a segment coincides with that of a neighboring segment if one of them is concave. In order to ignore segments whose color is very different from the segment under consideration, we choose a subset of segments to compute C(i, j) instead of all the segments in the image. Thus, we consider only those segments i such that C(s, i) < mean(C(s, j), j = s). This implies that we use a flexible color distance (C) threshold for the number of segments over which the compactness is to be computed instead of using a fixed number. Note that the spatial distance (P ) between a segment s and other segments is not chosen as the threshold for segments selection since we want the color compactness over the entire image. After deriving the color compactness for each segment, each pixel in the image is assigned a saliency level equal to the color compactness of the segment to which it belongs. We smooth this initial saliency map with a Gaussian kernel of size 8 × 8 and standard deviation of 3. This is only to provide a better visual experience of the saliency map by filtering out the over-segmented boundaries. It is optional and will not affect the final results significantly. Fig. 5(a) shows 1035 (a) (b) (c) (d) Figure 6: (a) Ground truth salient region as labeled by human. Saliency map using (b) Initial mean-shift segmentation only (image border was included for illustration), (c) Proposed over-segmentation method and (d) Mean-shift over-segmentation. (a) (b) Figure 9: Average ROC curves for different methods. is large. The FT method tends to assign higher saliency to regions with very uniform intensity, which produces poor quality saliency maps for cases such as row 1 and row 2 in fig. 8 where the bowl and the black table receive high saliency. The SR method does not generate satisfying result when the image contains textured regions across the image as seen in row 3, row 4 and row 7 in the figure. Both FT and SR do not make use of color information and so they may lose some salient colored object such as the brown cross in row 3. The ICL method divides the image into overlapping patches and detects salient feature channels. Thus it produces higher saliency for positions with strong contours and corners because these are rare image patches in the image. Similar to ICL, the CA method detects only corners and contours and may need further hole-filling to mark the entire salient object. From the results of ICL and CA, it is shown that using the image patches directly as a processing unit may lead to higher saliency for contours and corners as seen in row 3 and row 9. The COD method also models distributions of color but since it is based on patches, the boundaries of the salient objects are not extracted faithfully. thus it produces reasonable results for most images but not satisfying for the brown cross image. In summary, our methods have most promising performance in detecting salient objects with good contours. We present a quantitative evaluation of our method and compare it to the other six methods using ROC curves. Each saliency map was split into a salient region and a non-salient region using a threshold and compared with the ground truth saliency map which is generated by human. In this case salient is considered as positive and non-salient is considered as negative. True positive rate (TPR) is the ratio of the number of pixels correctly classified as salient to the total number of pixels classified as salient. False positive rate (FPR) is the ratio of the number of pixels wrongly classified as salient to the total number of pixels classified as non-salient. The ROC curve is a plot of TPR (c) Figure 7: (a) Original image, (b) Mean-shift oversegmentation, and (c) Our over-segmentation. proposed method shown in fig. 6(c). III. E XPERIMENTAL R ESULTS We evaluate the performance of the proposed salient object detection method on a benchmark data set of 1000 images selected from the salient object database [10] whose ground truth is manually labeled [4]. We compare our performance with six other existing models that report their results on the same database and whose code is made available. These are Itti’s saliency model (ITTI) [3], frequency tuned saliency (FT) [4], the spectral residual model (SR) [5], the incremental coding length method (ICL) [6], the color and orientation modeling method (COD) [7], and the context aware saliency detection model (CA) [8]. In the COD method, only the distribution of color information is used here for fair comparison with our model because our model uses color information only. We also demonstrate the effectiveness of the proposed method through an application in image retargeting. A. Saliency Detection Fig. 8 shows the saliency maps of the six methods and also our method. Since ITTI models the center-surround contrast, it tends to follow strong contours which give high responses in DoG filters in different frequency spectrum. It is not able to mark the whole object when the salient object 1036 1 2 3 4 5 6 7 8 9 (a) Image (b) ITTI (c) FT (d) SR (e) ICL (f) COD (g) CA (h) Ours (i) Human Figure 8: Comparison of saliency maps obtained by different mehtods: (a) Original image, (b) Itti’s method [3], (c) Frequency tuned saliency [4], (d) Spectral residual [5], (e) Incremental coding length [6], (f) Color and orientation modeling [7], (g) Context-aware saliency [8], (h) Our method and (i) Ground truth. versus FPR for different thresholds. A larger area under ROC curve indicates better agreement with human labeled ground truth saliency maps. Fig. 9 shows the resulting average ROC curves for 1000 images. The area under the curve of our method is the largest among all the methods, indicating a superior agreement with the human-labeled salient object ground truth. pan. For the second image, the human subject marked the monk as the salient object. Our method only produces high saliency for the red robe of the monk. It is difficult for saliency detection methods to produce high saliency for the monk’s face and body without including some kind of face recognition methods. Indeed, there are methods that include face recognition as part of salient object detection, e.g. [11] but such methods are not as simple as the proposed method. However, our methods performs very well for the face image in row 8 of Fig. 8. Other methods listed in Fig. 8 also produce results that are comparable with or even worse than the result of our method for these two images. Failure cases. Fig. 10 shows two cases in which the proposed method fails. In the first image, our method only produces high saliency for the egg yolk regions because the yellow color is more compact in this image. However, the human subject marked the whole egg region in the 1037 (a) (b) (c) Figure 10: Failure cases: (a) Original image, (b) Ground truth salient object and (c) Our results. IV. C ONCLUSIONS We present a simple and effective method for salient object detection in images. This method generates saliency maps with full resolution and preserves object contours accurately. Based on the experiments using a benchmark image data set labeled with ground truth salient region, it outperforms six existing saliency models. This model can be used in many potential applications such as photo composition, image editing and video compression. We demonstrated an image retargeting application using the saliency maps produced by our model. ACKNOWLEDGMENT This research is supported by the Singapore National Research Foundation under its Interactive & Digital Media (IDM) Public Sector R & D Funding Initiative and administered by the IDM Programme Office (Grant No. NRF2008IDM-IDM004-032). R EFERENCES [1] D. Gao and N. Vasconcelos, “Integrated learning of saliency, complex features, and object detectors from cluttered scenes,” in CVPR, 2005. [2] N. Jacobson and T. Q. Nguyen, “Video processing with scaleaware saliency: Application to frame rate up-conversion,” in ICASSP, 2011. [3] L. Itti, C. Koch, and E. Niebur, “A model of saliency-based visual attention for rapid scene analysis,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 20, no. 11, pp. 1254–1259, 1998. (a) (b) (c) (d) Figure 11: Illustration of image retargeting using the proposed salient object detection method. (a) Original image, (b) proposed saliency map, Image retargeting by (c) Seam carving [12] and (d) Seam carving using our saliency maps. [4] R. Achanta, S. Hemami, F. Estrada, and S. Susstrunk, “Frequency-tuned salient region detection,” in CVPR, 2009. B. An Image Retargeting Application Saliency models can be used in many applications. Here we demonstrate an image retargeting application combining our saliency map and the seam carving method [12]. Image retargeting is the process of adaptively resizing an image to fit to another display size while preserving important content in the image. The seam carving method removes connected seams from the input image to reduce the image size. The removed seam possesses the lowest gradient energy among all possible seams. As our method can successfully determine the salient region in the image, we substitute our saliency map to the seam carving method instead of using a simple gradient map. Fig. 11 shows the results of retargeting the input image to 75% width. Compared to the original seam carving method, the saliency maps generated from our method can help to better preserve salient regions in the image. [7] V. Gopalakrishnan, Y. Hu, and D. Rajan, “Salient region detection by modeling distributions of color and orientation,” IEEE Trans. on Multimedia, vol. 11, no. 5, pp. 892–905, 2009. [5] X. Hou and L. Zhang, “Saliency detection: A spectral residual approach,” in CVPR, June 2007. [6] ——, “Dynamic visual attention: searching for coding length increments,” in NIPS, 2008, pp. 681–688. [8] S. Goferman, L. Zelnik-Manor, and A. Tal, “Context-aware saliency detection,” in CVPR, 2010. [9] D. Comaniciu and P. Meer, “Mean shift: A robust approach toward feature space analysis,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 24, no. 5, pp. 603–619, 2002. [10] T. Liu, J. Sun, N.-N. Zheng, X. Tang, and H.-Y. Shum, “Learning to detect a salient object,” in CVPR, 2007. [11] M. Cerf, J. Harel, W. Einhauser, and C. Koch, “Predicting human gaze using low-level saliency combined with face detection,” in NIPS, 2007. [12] S. Avidan and A. Shamir, “Seam carving for content-aware image resizing,” in SIGGRAPH, 2007. 1038
© Copyright 2025 Paperzz