Stable Image Descriptions using Gestalt Principles Yi-Zhe Song and Peter M. Hall Media Technology Research Centre Department of Computer Science University of Bath Abstract. This paper addresses the problem of grouping image primitives; its principal contribution is an explicit definition of the Gestalt principle of Prägnanz, which organizes primitives into descriptions of images that are both simple and stable. Our definition of Prägnanz assumes just two things: that a vector of free variables controls some general grouping algorithm, and a scalar function measures the information in a grouping. Stable descriptions exist where the gradient of the function is zero, and these can be ordered by information content (simplicity) to create a “grouping” or “Gestalt” scale description. We provide a simple measure for information in a grouping based on its structure alone, leaving our grouper free to exploit other Gestalt principles as we see fit. We demonstrate the value of our definition of Prägnanz on several real-world images. 1 Introduction Partitioning an image into meaningful structures is a central problem of Computer Vision. The “top-down” solutions fit a model of some kind, they can perform well but restrict image content. In contrast, “bottom-up” solutions aggregate image primitives, they impose less restriction on content but performance can be questionable. The essential problem with bottom-up methods is deciding which groupings to use from amongst a vast possible number. This paper addresses the problem of bottom-up grouping by appeal to Gestalt principles. In the early 1920s, psychologists proposed that Gestalt principles play an important role in human perceptual organization, including proximity, continuity, similarity, closure, and symmetry; later common-region and connectedness were added [1–3]. We call these “simple” principles because they act on a few primitives at any one time. Many of these principles have been used in the computational literature. Lowe [4] uses proximity; Carreira et al use parallelism [5]. Others have sought to use more than one principle at once. Dolan and Weiss [6] use proximity and continuity, a pairing also used by Parent and Zucker [7], and also Feldman [8]. Elder and James [9] aimed for contour completion by studying the mutual relationships amongst proximity, continuity and similarity in the task of contour grouping, and concluded that proximity is the most important among 2 Yi-Zhe Song and Peter M. Hall those studies. Despite this work, a full computational account of how Gestalt principles interact is yet to be given. Although the above work groups all primitives, and some of them operate hierarchically so can handle more than one primitive at once, it remains true that none of them make use of any principle of global organization. It is a common observation that context, by which we mean the presence (or absence) of structures in an image, affects the outcome of grouping. Therefore, we argue, some notion of global organization must be included in any account that seeks to form groupings, and such an account should be integral to the way in which Gestalt principles are combined. Prägnanz is the Gestalt principle that seeks to organize all primitives at once, and so acts in a “global” sense. Introduced by Wertheimer [3], Prägnanz was developed by Koffka’s [1] who advocated that “of several geometrically possible organizations that one will actually occur which possesses the best, simplest and most stable shape.” Kanizsa [10] too suggested Prägnanz implies an orderly, rulebased , non-random and stable organization of primitives. In a computational setting, the notion of non-randomness influenced Marr [11], and was explicitly proposed as “common cause” (amongst other names) independently by Witkin and Tenenbuan [12], and Lowe [4]. Lowe argued that it is highly unlikely for organized image structures to occur by chance; hence they are salient. Yet, common cause can be used without reference to any principle of global organization, for example a local curve may be the common cause of a set of points: common cause is not Prägnanz. Techniques that aim to find best groupings in a global sense do exist. Guy and Medioni [13] combine proximity and continuation to find contours, which differs from those previously cited in that a global voting scheme was introduced, where each pixel gets votes from all other ones. Probably the most recognized global grouping technique, certainly one that gained considerable popularity is the normalized cut technique from Shi and Malik [14]. They use normalized cut over an affinity matrix built from proximal distances and pixel similarities to generate an adjacency matrix, forests of which are groups. Normalized cut has also been used by Stella and Shi [15] who advocated the use of prior knowledge (such as the position of foreground object, often input interactively for a particular image). Sarkar and Soundararajan [16] used a modified version of normalized cut, and applied it to the adjacency graphs obtained using a small set of Bayesian Networks, each of which corresponds to a certain Gestalt principle and is trained by a set of mutually competing automatas. Despite the popularity of normalized cut, it does not provide an explicit definition of Prägnanz, rather its advocates suggest the method adheres to that principle because it operates globally. Of the literature we read, only Ommer and Buhmann [17] explicitly define Prägnanz, in their case as the minimum of an entropy function based on the probability that pairs of primitives should be grouped. By minimizing entropy, these authors ensure simple groupings, as do the other global methods. But Simplicity is just one half of Koffka’s requirement that global organization should be simple, but there is no guarantee that the organization is stable. Lecture Notes in Computer Science Grouping I line fit over an edge map area segment + 3 line points to regions Gestalt grouper Grouping II region points to lines Fig. 1. A picture is processed into edges and areas. Lines are fitted to edges which reference back to areas. Our Gestalt grouper optimally connects lines into groups. In this and all Figures in this paper, groups are color coded; singleton groups are in grey. Please refer to the electronic version for best viewing results. The contributions this paper makes are, in order of importance: 1. We are the first to explicitly define Prägnanz, to account for both stability and simplicity of groupings, and which integrates simpler Gestalt principles. 2. We uniquely allow multiple possible groupings over one image; these grouping are naturally form a subset/superset structure that leads to the notion of “grouping scale”. In addition, we provide a flexible system that is easy to experiment with, which is based on a rich description of images including both edge and area primitives. In the reminder of this paper, we describe our approach (Section 2) including our definition of Prägnanz in subsection 2.1. In the results, Section 3, we show our method produces multiple solutions that appeal to intuition. Finally, a conclusion and discussion on future work is offered in Section 4. 2 A description of our grouper The input to our grouping algorithm is a graph based description of an image in which a node is one of two different types; one type corresponds to straight line primitives over edges, the other image areas. The graph is such that given a line the areas it separates can be determined, and given an area its lines can be identified. The input graph is bipartite — nodes of the same type are not connected. The output of our grouper is a graph embellished with arcs that link nodes of the same type. In common with other literature [14–16] forests in the embellishment delimit groups, but unlike others we recognize there is no “absolute best” solution and so produce more than one plausible grouping. 4 Yi-Zhe Song and Peter M. Hall The input graph is not formed using any Gestalt principle, but does aggregate pixels, into convenient image primitives. Straight line primitives are fitted to an edge map obtained by thresholding an Elder-Zucker edge detector [18], this requires monochrome images. Area primitives are connected pixel sets, determined by a color-image segmentation from Sinclair [19]. To appease any controversy regarding the grouping these processes enact, we have tuned both of them to operate conservatively: fitting produces many straight lines and the image is typically over segmented. This graph is conveniently mid-level in the sense that its primitives are super-pixel, but is nonetheless a weak description. It is the task of our grouper to strengthen this description. Figure 1 shows a typical example of the full process. As a note, we reason in favor of independently detected lines and areas on the grounds that area boundaries do not necessarily correspond to object edges, and that the presence of an edge does not necessarily imply an area boundary, see Figure 1. Independent line and area processes raise the information content of the input graph, and the competition (disagreement) and cooperation (agreement) between them is implicitly resolved by our grouper. 2.1 Prägnanz: stable and simple groupings We provide a computational definition of Prägnanz that takes in account both stability and simplicity, as required by Koffka [1]. Our definition is sufficiently general to be used as a controlling principle for any grouping algorithm whose output depends upon a set of control variables, not just our own grouper. Let P be a set of image primitives; straight line segments in our case. Let Q be a partition of P, Prägnanz selects between the many partitions. Suppose f (Q) is a scalar valued function that measures some property of a given partition, typically information content. In practice, any partitioning will depend on a vector of free variables, x ∈ ℜk , threshold variables for example, hence f (.) also depends upon x. We define the stability of a partition as the magnitude of the gradient of f (.) with respect to the control vector: s(Q(x)) = 2 k X ∂f (Q(x)) i=1 ∂xi !1/2 We define a partition to be stable if s(Q(x)) = 0. In practice, the discrete nature of the control variables means zeros are rarely observed, so we seek local minima of s(.). We define a partition to be (consistent with) Prägnanz, if it is stable and can be justified by appeal to simple Gestalt principles. In the next subsection we will explain how we generate partitions, but here we continue discussing our general approach to Prägnanz. As mentioned, we wish f (.) to somehow measure the information content of a given partition. Any partition can be represented by a graph of nodes and arcs, forests in the graph delimit a particular group (see [15, 14, 16]). Of the Lecture Notes in Computer Science 5 many alternatives we have found the most satisfactory results are produced by a simple analysis of these graphs, no reference to the “affinity” between primitive pairs is required. Let Mg be the maximum number of groups in any observed partition, and Ma be the maximum number of arcs in any observed partition. Let Ng and Na be the number of groups and arcs in a particular partition. We define f (.) as A(x) f (Q(x)) = −G(x) log G(x) in which G = Ng /Mg is the normalized number of groups, and A = Na /Ma is the normalized number of arcs. This function is monotonic, but, importantly, contains saddle points corresponding to stable groups. Intuitively, f (.) relates to the number of binary digits required to encode the grouping, when compared to the most complex alternative. Also, the number of arcs in a group relates to group stability: a change in control vector may add new arcs to a stable group, but may merge unstable groups. By construction we find groupings that are stable at local minima of s(.), which depends on the differential of f (.). But Koffka’s definition [1] requires simplicity too. Because our measure f (.) is related to entropy it can be used for simplicity, we need only favor stable groupings with smaller f (.). 2.2 Grouping image primitives by combining Gestalt principles To manufacture a partition of image primitives, we combine two simple Gestalt principles: proximity and common-region, each of which depends upon its own control parameter. We adopt a simple approach that is strongly influenced by the work of Feldman [8], who argued in favor of combining Gestalt principles via logical propositions. In our case, image primitives are line segments, and the proximal distance between any pair is just the smallest distance between their end points. For line segments i and j, we define a “proximity proposition”: p(i, j|x1 ) = 1, if d(i, j) < x1 , and 0 otherwise, where d(., .) is the smallest distance between the ends of the line segments, and x1 is a threshold. Similarly, we define a “common-region proposition”: c(i, j|x2 ) = 1, if r(i, j) > x2 , and 0 otherwise, where r(i, j) counts the number of areas the line primitives have in common and x2 is a threshold. The number of common regions is readily determined by simply intersecting the list of area identifiers associated with line segment (recall our embellished image description in Section 2). The outcome of the combination determines the value in the adjacency matrix used to create a partition. A simple “or” combination was found to be effective: a(i, j|[x1 , x2 ]) = p(i, j|x1 ) ∨ c(i, j|x2 ) The control vector x = [x1 , x2 ] therefore determines the adjacency matrix, hence the partitioning (grouping). 6 Yi-Zhe Song and Peter M. Hall (a) Original Image (b) Grouping I (c) Grouping II Fig. 2. Should we be interested in two distinct eagles (left), or one fighting pair (right)? Our grouper finds both interpretations. 3 Experiments and Results In this section, we demonstrate the value of our definition of Prägnanz using several real world images. Please note that for all Figures in this section, primitives of different groups have been uniquely color coded and singleton groups are drawn in gray lines of width 1. Due to space restrictions, only three representative groupings for each image are shown. The first test image is a simple image of two eagles fighting in the sky. The image is relatively simple in that the background is of a rather flat color. The main purpose of showing this is to demonstrate that our technique is capable of finding several plausible groupings; a qualitative measure by which we mean the grouping can be readily understood and described by a human. In this particular case, humans tend to perceive either two individual eagles, or else one pair of fighting birds; the latter being at a more “coarse scale” than the former. Both these interpretations are found by our grouper, as shown in Figure 2. Figure 3 shows three groupings of “musician”, ordered by simplicity. The first and most favored, grouping I, is at the “finest” scale, windows are differentiated, for example. Grouping II is “middle” scale, at which the musician and trees become visible (subject to clutter). Grouping III is at the “coarse” scale which tends to discriminate yet larger objects. Interestingly, the musician grouping survives this scale change. If we made the scale large enough, then all primitives would be grouping into one. “Bus”, in Figure 4, again shows three stable groupings ordered in terms of their simplicity. As before Grouping I, the first stable grouping and the simplest of all, demonstrates local structures such as windows. In grouping II these group- Lecture Notes in Computer Science (a) Original Image (b) Grouping I (c) Grouping II (d) Grouping III 7 Fig. 3. Three groups for “musician”, Top-right, Grouping I, bottom-left Grouping II, bottom-right Grouping III, representing different grouping scales from “fine” to “coarse”. As scale rises, the number of groups increases and more primitives are aggregated together. Salient structures tend to persist across scales. ings are given context as the aggregate into larger scale structures. In Grouping III, there are two large groups, one corresponds to the buildings, which can be treated as background, and another one corresponds to the bus in front, which is the foreground. Following the pattern of previous examples, grouping results in “alpine”, Figure 5, are ordered by simplicity. Our explanation follows the same pattern too: fine-scale structures are preferred by our grouper, then mid-scale structures, finally coarse-scale. Across scales the house group barely changed. Clutter might be removed by appeal to the Gestalt principle of closure, for example: notice that much of the clutter is in the form of “spurs”, and this principle could be invoked to better differentiate the car from the road. Grouping III offers a more global grouping, comparing to the previous two, and plausibly renders the image into fore-ground, mid-ground, and back-ground. 4 Discussion and conclusions We have explicitly defined Prägnanz, for the first time, requiring groupings to be both stable and simple. Furthermore, our definition can be used as a controlling mechanism for any potential grouping algorithm, it is not restricting to our own. In addition, we introduced a rich underlying image description as a graph of nodes and arcs, with notes representing image primitives, in this case, line segments, edge pixels and areas, and used our grouper to embellish this de- 8 Yi-Zhe Song and Peter M. Hall scription. In principle we could continue this trend toward a hierarchy in which a node at any “level” is semantically more meaningful that its ancestors. An important benefit of our definition of Prägnanz is that it yields more than one grouping, ordered by simplicity, and that interesting subset/superset structure can be observed in this ordering. In particular, the notion of scale is seen to play an important role, with fine scales corresponding to simpler image elements, while coarse scales can be associated with broad descriptive terms. Mid-range scales tend to correspond to semantic objects, whether this is an artefact of the scale at which they appear in images is an open question. The ordered groupings we produce and the relation between them might be employed by applications, matching for example. We believe that “grouping scale” or “Gestalt scale” deserves further investigation. Other immediate further direction would be incorporating more Gestalt principles, in the hope that they would “clean up” some of the clutter; such research (a) Original Image (c) Grouping II (b) Grouping I (d) Grouping III Fig. 4. Three groupings for “bus”, increasing in scale top-right to bottom-left. As with other examples, each grouping is plausible, albeit subject to some clutter, and salient structures persist over scale. Lecture Notes in Computer Science (a) Original Image (c) Grouping II 9 (b) Grouping I (d) Grouping III Fig. 5. The “alpine” again shows groups as increasing scale, top-right over which structures merge in a plausible way, eventually relating to fore-ground, middle-ground, and back-ground. The hut is visible over all scales, and the car over two, again subject to clutter. The clutter may be removable by invocation of a Gestalt principle such as closure. would be of interest for it would have to address the issue of combining Gestalt principles for which there is no clear solution at present. We conclude that: (i) Koffka’s observation on Prägnanz is a useful one, to which our definition adheres. (ii) Groupers should be able to return more than one solution at different scales; investigating Gestalt scale is likely to prove rewarding, as might investigations into biasing the grouper with a prior to favor a description at a specific scale. (iii) Proximity and common-region produce clutter which might be reduced using other simple Gestalt principles in addition. References 1. Koffka, K.: Principles of Gestalt Psychology. New York: Harcourt (1935) 2. Kohler, W.: Gestalt Psychology. New York: Liveright (1929) 10 Yi-Zhe Song and Peter M. Hall 3. Wertheimer, M.: Laws of Organization in Perceptual Forms. Volume 4. Psycologische Forschung (1923) Translation published in Ellis, W. (1938). 4. Lowe, D.G.: Perceptual Organization and Visual Recognition. MA: Kluwer, Boston (1985) 5. Carreira, M., Mirmehdi, M., Thomas, B., Penas, M.: Perceptual primitives from an extended 4D Hough transform. Image and Vision Computing 20 (2002) 969–980 6. Dolan, J., Weiss, R.: Perceptual grouping of curved lines. In: Proceedings of a workshop on Image understanding workshop, San Francisco, CA, USA, Morgan Kaufmann Publishers Inc. (1989) 1135–1145 7. Parent, P., Zucker, S.W.: Trace inference, curvature consistency, and curve detection. IEEE Trans. Pattern Anal. Mach. Intell. 11 (1989) 823–839 8. Feldman, J.: Perceptual grouping by selection of a logically minimal model. Int. J. Comput. Vision 55 (2003) 5–25 9. Elder, J.H., Goldberg, R.M.: Ecological statistics of Gestalt laws for the perceptual organization of contours. J. Vis. 2 (2002) 324–353 10. Kanizsa, G.: Organization in Vision. New York: Praeger (1979) 11. Marr, D.: VISION: A Computational Investigation into the Human Representation and Processing of Visual Information. Freeman, San Francisco (1981) 12. Witkin, A., Tenenbaum, J.: On the role of structure in vision. In J. Beck, B.H., Rosenfeld, A., eds.: Human and Machine Vision. New York: Academic (1983) 481–543 13. Guy, G., Medioni, G.: Inferring global perceptual contours from local features. Int. J. Comput. Vision 20 (1996) 113–133 14. Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22 (2000) 888–905 15. Yu, S.X., Shi, J.: Segmentation given partial grouping constraints. IEEE Trans. Pattern Anal. Mach. Intell. 26 (2004) 173–183 16. Sarkar, S., Soundararajan, P.: Supervised learning of large perceptual organization: Graph spectral partitioning and learning automata. IEEE Trans. Pattern Anal. Mach. Intell. 22 (2000) 504–525 17. Ommer, B., Buhmann, J.M.: A compositionality architecture for perceptual feature grouping. Energy Minimization Methods in Computer Vision and Pattern Recognition (2003) 275–290 18. Elder, J.H., Zucker, S.W.: Local scale control for edge detection and blur estimation. IEEE Trans. Pattern Anal. Mach. Intell. 20 (1998) 699–716 19. Sinclair, D.: Voronoi seeded colour image segmentation. Technical Report TR9904, AT&T Laboratories Cambridge (1999)
© Copyright 2025 Paperzz