Stable Image Descriptions using Gestalt Principles

Stable Image Descriptions using Gestalt
Principles
Yi-Zhe Song and Peter M. Hall
Media Technology Research Centre
Department of Computer Science
University of Bath
Abstract. This paper addresses the problem of grouping image primitives; its principal contribution is an explicit definition of the Gestalt
principle of Prägnanz, which organizes primitives into descriptions of
images that are both simple and stable. Our definition of Prägnanz assumes just two things: that a vector of free variables controls some general
grouping algorithm, and a scalar function measures the information in
a grouping. Stable descriptions exist where the gradient of the function
is zero, and these can be ordered by information content (simplicity) to
create a “grouping” or “Gestalt” scale description. We provide a simple
measure for information in a grouping based on its structure alone, leaving our grouper free to exploit other Gestalt principles as we see fit. We
demonstrate the value of our definition of Prägnanz on several real-world
images.
1
Introduction
Partitioning an image into meaningful structures is a central problem of Computer Vision. The “top-down” solutions fit a model of some kind, they can perform well but restrict image content. In contrast, “bottom-up” solutions aggregate image primitives, they impose less restriction on content but performance
can be questionable. The essential problem with bottom-up methods is deciding which groupings to use from amongst a vast possible number. This paper
addresses the problem of bottom-up grouping by appeal to Gestalt principles.
In the early 1920s, psychologists proposed that Gestalt principles play an
important role in human perceptual organization, including proximity, continuity, similarity, closure, and symmetry; later common-region and connectedness
were added [1–3]. We call these “simple” principles because they act on a few
primitives at any one time. Many of these principles have been used in the computational literature. Lowe [4] uses proximity; Carreira et al use parallelism [5].
Others have sought to use more than one principle at once. Dolan and Weiss [6]
use proximity and continuity, a pairing also used by Parent and Zucker [7], and
also Feldman [8]. Elder and James [9] aimed for contour completion by studying
the mutual relationships amongst proximity, continuity and similarity in the task
of contour grouping, and concluded that proximity is the most important among
2
Yi-Zhe Song and Peter M. Hall
those studies. Despite this work, a full computational account of how Gestalt
principles interact is yet to be given.
Although the above work groups all primitives, and some of them operate
hierarchically so can handle more than one primitive at once, it remains true
that none of them make use of any principle of global organization. It is a
common observation that context, by which we mean the presence (or absence)
of structures in an image, affects the outcome of grouping. Therefore, we argue,
some notion of global organization must be included in any account that seeks
to form groupings, and such an account should be integral to the way in which
Gestalt principles are combined.
Prägnanz is the Gestalt principle that seeks to organize all primitives at once,
and so acts in a “global” sense. Introduced by Wertheimer [3], Prägnanz was developed by Koffka’s [1] who advocated that “of several geometrically possible
organizations that one will actually occur which possesses the best, simplest and
most stable shape.” Kanizsa [10] too suggested Prägnanz implies an orderly, rulebased , non-random and stable organization of primitives. In a computational
setting, the notion of non-randomness influenced Marr [11], and was explicitly
proposed as “common cause” (amongst other names) independently by Witkin
and Tenenbuan [12], and Lowe [4]. Lowe argued that it is highly unlikely for
organized image structures to occur by chance; hence they are salient. Yet, common cause can be used without reference to any principle of global organization,
for example a local curve may be the common cause of a set of points: common
cause is not Prägnanz.
Techniques that aim to find best groupings in a global sense do exist. Guy
and Medioni [13] combine proximity and continuation to find contours, which
differs from those previously cited in that a global voting scheme was introduced,
where each pixel gets votes from all other ones. Probably the most recognized
global grouping technique, certainly one that gained considerable popularity is
the normalized cut technique from Shi and Malik [14]. They use normalized cut
over an affinity matrix built from proximal distances and pixel similarities to
generate an adjacency matrix, forests of which are groups. Normalized cut has
also been used by Stella and Shi [15] who advocated the use of prior knowledge (such as the position of foreground object, often input interactively for a
particular image). Sarkar and Soundararajan [16] used a modified version of normalized cut, and applied it to the adjacency graphs obtained using a small set
of Bayesian Networks, each of which corresponds to a certain Gestalt principle
and is trained by a set of mutually competing automatas. Despite the popularity
of normalized cut, it does not provide an explicit definition of Prägnanz, rather
its advocates suggest the method adheres to that principle because it operates
globally. Of the literature we read, only Ommer and Buhmann [17] explicitly
define Prägnanz, in their case as the minimum of an entropy function based
on the probability that pairs of primitives should be grouped. By minimizing
entropy, these authors ensure simple groupings, as do the other global methods.
But Simplicity is just one half of Koffka’s requirement that global organization
should be simple, but there is no guarantee that the organization is stable.
Lecture Notes in Computer Science
Grouping I
line fit over
an edge map
area
segment
+
3
line points
to regions
Gestalt
grouper
Grouping II
region points
to lines
Fig. 1. A picture is processed into edges and areas. Lines are fitted to edges which
reference back to areas. Our Gestalt grouper optimally connects lines into groups. In
this and all Figures in this paper, groups are color coded; singleton groups are in grey.
Please refer to the electronic version for best viewing results.
The contributions this paper makes are, in order of importance:
1. We are the first to explicitly define Prägnanz, to account for both stability
and simplicity of groupings, and which integrates simpler Gestalt principles.
2. We uniquely allow multiple possible groupings over one image; these grouping are naturally form a subset/superset structure that leads to the notion
of “grouping scale”.
In addition, we provide a flexible system that is easy to experiment with, which
is based on a rich description of images including both edge and area primitives.
In the reminder of this paper, we describe our approach (Section 2) including our definition of Prägnanz in subsection 2.1. In the results, Section 3, we
show our method produces multiple solutions that appeal to intuition. Finally,
a conclusion and discussion on future work is offered in Section 4.
2
A description of our grouper
The input to our grouping algorithm is a graph based description of an image
in which a node is one of two different types; one type corresponds to straight
line primitives over edges, the other image areas. The graph is such that given
a line the areas it separates can be determined, and given an area its lines can
be identified. The input graph is bipartite — nodes of the same type are not
connected. The output of our grouper is a graph embellished with arcs that
link nodes of the same type. In common with other literature [14–16] forests
in the embellishment delimit groups, but unlike others we recognize there is no
“absolute best” solution and so produce more than one plausible grouping.
4
Yi-Zhe Song and Peter M. Hall
The input graph is not formed using any Gestalt principle, but does aggregate
pixels, into convenient image primitives. Straight line primitives are fitted to an
edge map obtained by thresholding an Elder-Zucker edge detector [18], this requires monochrome images. Area primitives are connected pixel sets, determined
by a color-image segmentation from Sinclair [19]. To appease any controversy
regarding the grouping these processes enact, we have tuned both of them to
operate conservatively: fitting produces many straight lines and the image is
typically over segmented. This graph is conveniently mid-level in the sense that
its primitives are super-pixel, but is nonetheless a weak description. It is the task
of our grouper to strengthen this description. Figure 1 shows a typical example
of the full process.
As a note, we reason in favor of independently detected lines and areas on the
grounds that area boundaries do not necessarily correspond to object edges, and
that the presence of an edge does not necessarily imply an area boundary, see
Figure 1. Independent line and area processes raise the information content of the
input graph, and the competition (disagreement) and cooperation (agreement)
between them is implicitly resolved by our grouper.
2.1
Prägnanz: stable and simple groupings
We provide a computational definition of Prägnanz that takes in account both
stability and simplicity, as required by Koffka [1]. Our definition is sufficiently
general to be used as a controlling principle for any grouping algorithm whose
output depends upon a set of control variables, not just our own grouper.
Let P be a set of image primitives; straight line segments in our case. Let Q be
a partition of P, Prägnanz selects between the many partitions. Suppose f (Q)
is a scalar valued function that measures some property of a given partition,
typically information content. In practice, any partitioning will depend on a
vector of free variables, x ∈ ℜk , threshold variables for example, hence f (.) also
depends upon x.
We define the stability of a partition as the magnitude of the gradient of f (.)
with respect to the control vector:
s(Q(x)) =
2
k X
∂f (Q(x))
i=1
∂xi
!1/2
We define a partition to be stable if s(Q(x)) = 0. In practice, the discrete nature
of the control variables means zeros are rarely observed, so we seek local minima
of s(.). We define a partition to be (consistent with) Prägnanz, if it is stable and
can be justified by appeal to simple Gestalt principles. In the next subsection
we will explain how we generate partitions, but here we continue discussing our
general approach to Prägnanz.
As mentioned, we wish f (.) to somehow measure the information content of
a given partition. Any partition can be represented by a graph of nodes and
arcs, forests in the graph delimit a particular group (see [15, 14, 16]). Of the
Lecture Notes in Computer Science
5
many alternatives we have found the most satisfactory results are produced by a
simple analysis of these graphs, no reference to the “affinity” between primitive
pairs is required. Let Mg be the maximum number of groups in any observed
partition, and Ma be the maximum number of arcs in any observed partition.
Let Ng and Na be the number of groups and arcs in a particular partition. We
define f (.) as
A(x)
f (Q(x)) = −G(x) log
G(x)
in which G = Ng /Mg is the normalized number of groups, and A = Na /Ma is
the normalized number of arcs. This function is monotonic, but, importantly,
contains saddle points corresponding to stable groups. Intuitively, f (.) relates to
the number of binary digits required to encode the grouping, when compared
to the most complex alternative. Also, the number of arcs in a group relates to
group stability: a change in control vector may add new arcs to a stable group,
but may merge unstable groups.
By construction we find groupings that are stable at local minima of s(.),
which depends on the differential of f (.). But Koffka’s definition [1] requires
simplicity too. Because our measure f (.) is related to entropy it can be used for
simplicity, we need only favor stable groupings with smaller f (.).
2.2
Grouping image primitives by combining Gestalt principles
To manufacture a partition of image primitives, we combine two simple Gestalt
principles: proximity and common-region, each of which depends upon its own
control parameter. We adopt a simple approach that is strongly influenced by
the work of Feldman [8], who argued in favor of combining Gestalt principles via
logical propositions.
In our case, image primitives are line segments, and the proximal distance
between any pair is just the smallest distance between their end points. For line
segments i and j, we define a “proximity proposition”: p(i, j|x1 ) = 1, if d(i, j)
< x1 , and 0 otherwise, where d(., .) is the smallest distance between the ends of
the line segments, and x1 is a threshold.
Similarly, we define a “common-region proposition”: c(i, j|x2 ) = 1, if r(i, j) >
x2 , and 0 otherwise, where r(i, j) counts the number of areas the line primitives
have in common and x2 is a threshold. The number of common regions is readily
determined by simply intersecting the list of area identifiers associated with line
segment (recall our embellished image description in Section 2).
The outcome of the combination determines the value in the adjacency matrix
used to create a partition. A simple “or” combination was found to be effective:
a(i, j|[x1 , x2 ]) = p(i, j|x1 ) ∨ c(i, j|x2 )
The control vector x = [x1 , x2 ] therefore determines the adjacency matrix, hence
the partitioning (grouping).
6
Yi-Zhe Song and Peter M. Hall
(a) Original Image
(b) Grouping I
(c) Grouping II
Fig. 2. Should we be interested in two distinct eagles (left), or one fighting pair (right)?
Our grouper finds both interpretations.
3
Experiments and Results
In this section, we demonstrate the value of our definition of Prägnanz using
several real world images. Please note that for all Figures in this section, primitives of different groups have been uniquely color coded and singleton groups
are drawn in gray lines of width 1. Due to space restrictions, only three representative groupings for each image are shown.
The first test image is a simple image of two eagles fighting in the sky. The
image is relatively simple in that the background is of a rather flat color. The
main purpose of showing this is to demonstrate that our technique is capable of
finding several plausible groupings; a qualitative measure by which we mean the
grouping can be readily understood and described by a human. In this particular
case, humans tend to perceive either two individual eagles, or else one pair of
fighting birds; the latter being at a more “coarse scale” than the former. Both
these interpretations are found by our grouper, as shown in Figure 2.
Figure 3 shows three groupings of “musician”, ordered by simplicity. The first
and most favored, grouping I, is at the “finest” scale, windows are differentiated,
for example. Grouping II is “middle” scale, at which the musician and trees
become visible (subject to clutter). Grouping III is at the “coarse” scale which
tends to discriminate yet larger objects. Interestingly, the musician grouping
survives this scale change. If we made the scale large enough, then all primitives
would be grouping into one.
“Bus”, in Figure 4, again shows three stable groupings ordered in terms of
their simplicity. As before Grouping I, the first stable grouping and the simplest
of all, demonstrates local structures such as windows. In grouping II these group-
Lecture Notes in Computer Science
(a) Original Image
(b) Grouping I
(c) Grouping II
(d) Grouping III
7
Fig. 3. Three groups for “musician”, Top-right, Grouping I, bottom-left Grouping
II, bottom-right Grouping III, representing different grouping scales from “fine” to
“coarse”. As scale rises, the number of groups increases and more primitives are aggregated together. Salient structures tend to persist across scales.
ings are given context as the aggregate into larger scale structures. In Grouping
III, there are two large groups, one corresponds to the buildings, which can be
treated as background, and another one corresponds to the bus in front, which
is the foreground.
Following the pattern of previous examples, grouping results in “alpine”,
Figure 5, are ordered by simplicity. Our explanation follows the same pattern
too: fine-scale structures are preferred by our grouper, then mid-scale structures,
finally coarse-scale. Across scales the house group barely changed. Clutter might
be removed by appeal to the Gestalt principle of closure, for example: notice
that much of the clutter is in the form of “spurs”, and this principle could be
invoked to better differentiate the car from the road. Grouping III offers a more
global grouping, comparing to the previous two, and plausibly renders the image
into fore-ground, mid-ground, and back-ground.
4
Discussion and conclusions
We have explicitly defined Prägnanz, for the first time, requiring groupings to
be both stable and simple. Furthermore, our definition can be used as a controlling mechanism for any potential grouping algorithm, it is not restricting to
our own. In addition, we introduced a rich underlying image description as a
graph of nodes and arcs, with notes representing image primitives, in this case,
line segments, edge pixels and areas, and used our grouper to embellish this de-
8
Yi-Zhe Song and Peter M. Hall
scription. In principle we could continue this trend toward a hierarchy in which
a node at any “level” is semantically more meaningful that its ancestors.
An important benefit of our definition of Prägnanz is that it yields more than
one grouping, ordered by simplicity, and that interesting subset/superset structure can be observed in this ordering. In particular, the notion of scale is seen to
play an important role, with fine scales corresponding to simpler image elements,
while coarse scales can be associated with broad descriptive terms. Mid-range
scales tend to correspond to semantic objects, whether this is an artefact of the
scale at which they appear in images is an open question. The ordered groupings
we produce and the relation between them might be employed by applications,
matching for example. We believe that “grouping scale” or “Gestalt scale” deserves further investigation.
Other immediate further direction would be incorporating more Gestalt principles, in the hope that they would “clean up” some of the clutter; such research
(a) Original Image
(c) Grouping II
(b) Grouping I
(d) Grouping III
Fig. 4. Three groupings for “bus”, increasing in scale top-right to bottom-left. As with
other examples, each grouping is plausible, albeit subject to some clutter, and salient
structures persist over scale.
Lecture Notes in Computer Science
(a) Original Image
(c) Grouping II
9
(b) Grouping I
(d) Grouping III
Fig. 5. The “alpine” again shows groups as increasing scale, top-right over which
structures merge in a plausible way, eventually relating to fore-ground, middle-ground,
and back-ground. The hut is visible over all scales, and the car over two, again subject
to clutter. The clutter may be removable by invocation of a Gestalt principle such as
closure.
would be of interest for it would have to address the issue of combining Gestalt
principles for which there is no clear solution at present.
We conclude that: (i) Koffka’s observation on Prägnanz is a useful one, to
which our definition adheres. (ii) Groupers should be able to return more than
one solution at different scales; investigating Gestalt scale is likely to prove rewarding, as might investigations into biasing the grouper with a prior to favor a
description at a specific scale. (iii) Proximity and common-region produce clutter
which might be reduced using other simple Gestalt principles in addition.
References
1. Koffka, K.: Principles of Gestalt Psychology. New York: Harcourt (1935)
2. Kohler, W.: Gestalt Psychology. New York: Liveright (1929)
10
Yi-Zhe Song and Peter M. Hall
3. Wertheimer, M.: Laws of Organization in Perceptual Forms. Volume 4. Psycologische Forschung (1923) Translation published in Ellis, W. (1938).
4. Lowe, D.G.: Perceptual Organization and Visual Recognition. MA: Kluwer, Boston
(1985)
5. Carreira, M., Mirmehdi, M., Thomas, B., Penas, M.: Perceptual primitives from an
extended 4D Hough transform. Image and Vision Computing 20 (2002) 969–980
6. Dolan, J., Weiss, R.: Perceptual grouping of curved lines. In: Proceedings of a
workshop on Image understanding workshop, San Francisco, CA, USA, Morgan
Kaufmann Publishers Inc. (1989) 1135–1145
7. Parent, P., Zucker, S.W.: Trace inference, curvature consistency, and curve detection. IEEE Trans. Pattern Anal. Mach. Intell. 11 (1989) 823–839
8. Feldman, J.: Perceptual grouping by selection of a logically minimal model. Int.
J. Comput. Vision 55 (2003) 5–25
9. Elder, J.H., Goldberg, R.M.: Ecological statistics of Gestalt laws for the perceptual
organization of contours. J. Vis. 2 (2002) 324–353
10. Kanizsa, G.: Organization in Vision. New York: Praeger (1979)
11. Marr, D.: VISION: A Computational Investigation into the Human Representation
and Processing of Visual Information. Freeman, San Francisco (1981)
12. Witkin, A., Tenenbaum, J.: On the role of structure in vision. In J. Beck, B.H.,
Rosenfeld, A., eds.: Human and Machine Vision. New York: Academic (1983)
481–543
13. Guy, G., Medioni, G.: Inferring global perceptual contours from local features. Int.
J. Comput. Vision 20 (1996) 113–133
14. Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern
Anal. Mach. Intell. 22 (2000) 888–905
15. Yu, S.X., Shi, J.: Segmentation given partial grouping constraints. IEEE Trans.
Pattern Anal. Mach. Intell. 26 (2004) 173–183
16. Sarkar, S., Soundararajan, P.: Supervised learning of large perceptual organization:
Graph spectral partitioning and learning automata. IEEE Trans. Pattern Anal.
Mach. Intell. 22 (2000) 504–525
17. Ommer, B., Buhmann, J.M.: A compositionality architecture for perceptual feature grouping. Energy Minimization Methods in Computer Vision and Pattern
Recognition (2003) 275–290
18. Elder, J.H., Zucker, S.W.: Local scale control for edge detection and blur estimation. IEEE Trans. Pattern Anal. Mach. Intell. 20 (1998) 699–716
19. Sinclair, D.: Voronoi seeded colour image segmentation. Technical Report TR9904, AT&T Laboratories Cambridge (1999)