The regular polygon detector

The regular polygon detector
Nick Barnes1 , Gareth Loy2 and David Shaw1
1. NICTA, Locked Bag 8001, Canberra, ACT, 2601, Australia,
1. Dept. Information Engineering, The Australian National University.
2. Computer Vision & Active Perception Laboratory,
Royal Institute of Technology (KTH), Stockholm
1. Introduction
Regular polygons are a frequent feature of our built environment. In road
scenes, triangular, rectangular and octagonal signs frequently display critical
information. Polygonal shapes occur frequently in the structure of office environments, outdoor urban scenes, and in manufactured parts. Partial regular
polygons can be repeatably found over short distances and so form useful
interest points. Regular polygons also frequently occur in lattice structures
as part of texture.
However, regular polygons are complex. Matching based algorithms are
standard methods for their detection; these start by recovering line segments,
and construct shapes from these. This is inherently slow as the possible
match candidates grow exponentially with the number of edges. It is also
non-robust, performing poorly under partial occlusions. Alternatively, one
may take perceptual grouping approaches, however, as these attempt to solve
for more general shapes they have a higher computational cost.
Preprint submitted to Elsevier
August 4, 2009
We pose regular polygon space as a five dimensional space describing the
possible appearance of all regular polygon, and introduce the regular polygon
transform that projects from this space into a three dimensional oriented edge
image. Thus, for a given oriented edge image, we may derive the a posteriori
probability for the appearance of a mixture of regular polygons, and its
probability density function. To simplify the search for regular polygons, we
may collapse the space into three dimensions to give an upper bound on the
probability density of the location and size of regular polygons of unknown
number of sides and orientation. This simplified detection problem can be
solved efficiently as an upper bound on a discretised likelihood function. We
derive an efficient algorithm for recovering the maximum likelihood number
of sides and orientation only at the locations of the most likely polygons. The
first stage of the detection is posed as a discrete Hough-based algorithm. The
second stage takes an approximation to the full likelihood function to recover
orientation and number of sides.
We evaluate the detector, including stability in the face of noise, and
performance on traffic sign detection, as well as image-based matching. We
differentiate sign types using the second stage of the algorithm, and investigate the application a priori information for real-time performance.
This paper gives a complete presentation of the regular polygon detector
[2, 3, 17, 28]. The new contributions here are a full derivation of the algorithm for efficiently recovering number of sides and orientation, and extensive
results of this recovery against a large database of road sign images.
2
2. Related literature
2.1. Finding shapes
In perceptual grouping, image elements are grouped according to some
perceptual criteria. Several authors have investigated iterative relaxation
style operators for grouping edgels. This began with Shashua and Ullman
[27], which required many iterations. Parent and Zucker [21] examined curvature and cocircularity to give curvature consistency of neighbours. Such
approaches were refined by Guy and Medioni [11] among others so as to be
O(k 2 ) where k is the number of edgels. This paper examines what shapes
may be contributed to by each pair of edgels. At a high level this can be
composed as recovering a graph of relational arrangement. A common approach is spectral graph theory [22, 26]. Probabilistic approaches in this area
include Bayes nets [8] and combining evidence from raw edge attributes [6].
More recent approaches have used EM [25, 7]. One of the basic algorithmic
approaches here is examining pairwise relations. For examining edge pixels,
the complexity is at least O(N 2 ), where N is the number of edge elements.
Alternatively, we may fit functions to image data to find particular shapes.
In the Hough transform [12] edgels ‘vote’ to the parameter space of the shape
to be detected. Circular Hough [20] detects circle centre points and radii.
Each pixel votes for each feature that it could contribute to. This approach
is inherently robust, as gaps, noise, and partial occlusion are ignored, but
decrease the support for the feature. These methods have been generalised
[1] to detect arbitrary shapes, where the detection space has one dimension
3
for each parameter of the shape. In high dimensional spaces this becomes
computationally intensive, and in practice, the Hough transform is seldom
used in more than three dimensions [31]. The distributed Hough transform is
between perceptual grouping and Hough algorithms, by performing a Hough
transform over possible combinations of edgels [14]. Castano and Hutchinson
[5] examined probabilistic perceptual grouping for straight-lines and bilateral
symmetries with a Gaussian over the distance of points from the line.
The Hough transform also can be formulated as maximum likelihood
estimation [29]. Assuming independent noise between points, Kiryati and
Bruckstein [15], formulated robust maximum likelihood line finding on a grid,
in a Hough-like algorithm. Geyer et al. [10], applied a sampled likelihood
function approach to estimation of the essential matrix. Variants of shape
detectors have also included edge orientation information to constrain the
voting dimensionality of edge points. The fast radial symmetry detector [18]
improves on the circular Hough giving O(N l) performance, where l is the
range of radii of the shapes being examined. This allows it to be used in
real-time applications [4]. The radial symmetry detector is the closest work
to this paper. Circle detection is a degenerate case of the algorithm presented
(a regular polygon with an infinite number of sides).
2.2. Sign detection
Road sign recognition research has been around since the mid 1980’s.
Good results may be achieved for classification by approaches such as normalised cross correlation [23]. Approaches such as a nearest feature transform
4
and coarse to fine matching [9] have been used to ease the heavy computational burden, with limited success.
Frequently systems apply a low computational cost detection stage: aiming to detect most signs with a limited number of false positives, massively
reducing the input stream. A sign must be reliably detected over the period
it is visible, rather than every frame. Colour-based methods are sometimes
based on the assumption that only the intensity of lighting changes in outdoor environments, often with statements such as that HSV or HSI spaces
are invariant to lighting conditions [24]. However these methods break down
under variation in lighting chrominance.
Road scenes are generally well-structured, so some approaches search only
part of the image by combining assumptions about structure with colour segmentation (e.g., [13]). Such assumptions often break down on hills and sharp
corners. Piccoli et. al [23] remove large uniform regions corresponding to road
and sky. However, this will fail with overhead branches and road shadows
(e.g., Figure 1). [19] shows a recent reliable system using two stage detection, firstly colour, followed by shaped-based detection using distance from
bounding box as a feature. The second stage of detection and the subsequent
classification are performed using an SVM. Such a system demonstrates the
utility of shape-based detection followed by classification.
This paper presents a computationally efficient algorithm to recover regular polygons using an a posteriori probability approach that takes advantage
of locality and gradient information. The algorithm breaks the task into two
5
Figure 1: Road scene images may be quite complex with the sky broken up by branches,
shadows falling across the road, and the sign in partial light and shadow.
stages: 1) finding likely regular polygons in a lower dimensional space; and,
2) resolving all parameters for the most likely regular polygons. This allows
us to recover all regular polygons in one computationally efficient algorithm.
The first stage is order (N lrw), where N is the size of the image, l is the
maximum radius length, r is the number of radii being considered, and w
is the width of the operation of the distance function. The parameter w is
close to one pixel for most implementations, and r and l tend to be small
numbers leading to an efficient algorithm. The second stage of the algorithm
is order (mpn2 ) where m is the number of significant peaks in the distribution found in the first stage, n is maximum number of sides considered on a
regular polygon, and p is the number of significant peaks considered in the
distribution of orientations.
For comparison, a generalised Hough transform implementation would be
O(N rlon), where o is the discretization of orientation, and n is the range of
number of sides. This is two significant dimensions larger.
6
3. Regular polygons
Archimedes bound π by the sum of the side lengths of a regular polygon
of n sides inscribed on a circle, and the sum of the lengths of the sides
circumscribed around a circle. We take our regular polygon definition from
that circumscribed around a circle. For an n sided regular polygon, divide
the circle into a set of n,
n
2π
angle isosceles triangles, where the centre point
of the baseline falls on the circle. The length of the individual sides is:
π
l(n, r) = 2r tan .
n
(1)
We may define a regular polygon as having a centroid (cx , cy ), the radius
r of the circle it is circumscribed about, an orientation γ, and a number of
sides n. Thus we may define the regular polygon transform as a function frp
that maps the space of all possible regular polygons (regular polygon space)
to 3D space. Regular polygon space is five dimensional, and the transform is
a map: frp : R4 ×Z → R3 , where the integer dimension is the number of sides
(three or greater for closed polygons) and the space mapped to is position
in the image plus orientation of edges. Specifically, we label regular polygon
space: Φ = (cx , cy , r, γ, n). An example shape explaining the parameters is
shown in Figure 3. Note that a circle is the degenerate case.
The mapping of all regular polygons into the image from 5D regular
polygon space is the integral across the space:
7
Figure 2: Left, a sample five sided (n = 5) polygon. Right, the line over which the centre
may be located for a given oriented edge pixel, as described in Equation 11.
Z
frp =
frp (φ)dφ.
(2)
Φ
A typical image is a sampling over a regular Cartesian grid. We represent
the image as a set of gradient pixels x = {xj (j = 1, ...s) ∈ Z} corresponding
to the edge pixels of the regular polygons, plus any noise. Each xj consists
of position (x, y) and orientation, θ (direction of the maximum gradient).
4. Likelihood formulation
The gradient points x may be regarded as the result of the transform from
regular polygon space of some set of polygons φm ∈ Φ, where m = 1, . . . , M .
In order to recover φm from the xj , we may estimate the probability density
function over regular polygon space given xj , considering the image as the
result of a mixture of regular polygons, each of which is a Gaussian. That is:
f=
M
X
αm p(φm |x),
(3)
m=1
where αm is a mixing parameter. If we assume a uniform mixture, then αm
is constant for all m. We may apply Bayes Law:
8
M
X
M
X
p(x|φm )p(φm )
p(φm |x) =
p(x)
m=1
m=1
(4)
Note the distinction between a regular polygon in the scene and what
appears in an image. An accidental view where several straight lines at
different scene depths align to form a polygon is not a world structure, but is
an image polygon. We may define detection as finding all apparent regular
polygons. Further, with incomplete data, any edge pixel may be regarded as
a partial regular polygon. Thus, the edge image can be seen as being made
up of regular polygons with noise.
Noise in a single pixel intensity, or irregular fading of a road sign can
cause errors in orientation estimates. For small angles this will be linear, and
therefore may be approximated by a Gaussian. Noise in intensity, uncertainty
due to image sampling, and artefacts in edge detectors can lead to position
displacement, which can also be approximated by a Gaussian. As gradient
magnitude is highly correlated with edge contrast, we threshold to suppress
noise, and set the remaining xj to unit magnitude.
We may take it that the xj are the result of a set of polygons with
missing data. Also, that the noise in the appearance of the edge pixels in
their position and estimated orientation is additive and independent between
the points. Then we may take the probability of the Gaussian mixture of
regular polygons in the image for the full set of points to be:
9
M
X
p(φm |x) =
m=1
M Y
X
p(xj |φm )p(φm )
m=1 j
(5)
p(xj )
The probability density for an individual polygon φm and edge point xj
can be modelled as a Gaussian in 3D (xj , yj , θj ) with zero mean and a pointspecific covariance matrix:
p(xj |φm ) =
exp −g(xj , frp (Φ))Σ−1
j g(xj , frp (Φ)
1
2π|Σj | 2
,
(6)
where g(xj , frp (Φ)) is a function giving the distance from the edge pixel to
the closest point on the image projection of the regular polygon, Σj is the
covariance matrix:


σx2j
σxyj σxθj


2
Σj = 
 σxyj σyj σyθj

σxθj σyθj σθ2j


,


(7)
and |Σj | is the determinant of the covariance matrix. A similar formulation
has been applied to straight lines [15]. By the independence of noise between
points, we may take the product over all edge points:
p(x|φm ) =
Y exp −g(xj , frp (Φ))Σ−1
j g(xj , frp (Φ)
1
2π|Σj | 2
j
Substituting back into Equation 4:
10
.
(8)
M
X
p(φm |x) =
m=1
m Y
X
exp(−g(xj ,frp (Φ))Σ−1
j g(xj ,frp (Φ))p(φm )
1
2π|Σj | 2 p(xj )
m=1 j
.
(9)
Taking log-likelihoods and collecting constants:
M
X
m=1
log(L(φm |x) =
m X
X
m=1
−g(xj , frp (Φ))Σ−1
j g(xj , frp (Φ)) + const.
(10)
j
4.1. Matching to the regular polygon function
Equation (10) gives a maximum likelihood formulation for the appearance of regular polygons in an image. An ideal algorithmic implementation
of this function algorithm would produce an optimal regular polygon detector. However, Φ is a continuous 5D space (computationally intractable), and
g(xj , frp (Φ)) requires finding the nearest point on the regular polygon. In
this section, we derive an algorithm that is a discretised approximation to
Equation (10). The approximations made are precisely described here, and
leave the algorithm close to an optimal regular polygon detector, and as we
shall see, highly effective in practice.
As regular polygon space has more dimensions than the image, and edge
points are generally sparse, we may pose the problem as a mapping from image edge pixels to possible regular polygons, by forming the inverse mapping
−1
frp
from an xj to all regular polygons that it may be a part of, i.e., Φ.
−1
Consider frp
for a given xj , and unknown r and γ. If we do not con-
sider orientation, xj may form a part of a regular polygon with centres in
11
all positions, xj may be an element of a large subspace of Φ, and the func−1
tion frp
is intractable. However, using orientation (e.g., from Sobel) gives
xj = (xj , yj , θj ), where θj is the direction of the maximum intensity gradient.
Then, regardless of n, (cx , cy ) is constrained to be on a line, orthogonal to the
gradient edge, at a distance r from (xj , yj ), as shown in Figure 3. Call this
line the line of the projected centre for xj and some Φ. Only the length of this
line is determined by n, which is twice the side length of the corresponding
regular polygon. Representing a line in the complex plane [30], consider that
z1 is a complex number, with argument θj − π2 , and modulus 1, we have:
Z
l
t= 2r
xj + r(z1 t + iz1 )dt,
h(xj ) =
(11)
l
t=− 2r
where xj is the (xj , yj ) component of the point represented in the complex
plane, and l is the side length given n and r.
Using Equation 11, we may define a new regular polygon detection subspace in 3D that is ambiguous in γ and n, computing the likelihood density function of Equation 10 over (cx , cy , r). Let this new subspace be Ψ =
(cx , cy , r), and the function from this space to the image be frps (s for subspace). This suggests a two part algorithm: first find the likelihood density
of any polygon; then, resolve γ and n.
4.2. Finding likely regular polygons
In order to collapse n, we assume n = 3, which gives the longest side
length. If, in fact, n > 3 this leads to some edge pixels being considered that
are not actually part of the true regular polygon, see Figure 5 for example.
12
In this case the likelihood will be an over estimate, and thus an upper bound.
Provided the algorithm recovers all polygons that have likelihood higher than
a required threshold in Ψ, it will be sufficient to ensure that all regular
polygons in Φ of likelihood greater than this threshold are recovered. We may
also calculate the correct likelihood after resolving (γ, n) if this is desired.
For any particular ψ ∈ Ψ, we may define g as the minimum distance from
Ψ to the line defined in Equation 11:
M
X
log(L(ψm |x) =
M X
X
m=1
m=1
−1
−1
−g(frps
(xj ), Ψm )Σ−1
j g(frps (xj ), Ψm ) + const.(12)
j
−1
Expanding frp
we have:
M
X
log(L(ψm |x) =
M X
X
m=1
m=1
Z
−g(
j
Z
−1
Σj g(
l
t= 2r
xj + r(z1 t + iz1 )dt, Ψm )
(13)
l
t=− 2r
l
t= 2r
xj + r(z1 t + iz1 )dt, Ψm ) + const.
l
t=− 2r
Here l is the side length assuming n = 3 and given radius r.
4.2.1. Distance function
Consider a reparameterisation of error in xj suggested in Figure 3 in terms
of (∆θb , ∆ρb , ∆lb ). The shortest distance between the infinite line from xj
in the direction of the maximum gradient, and the image projection of the
centroid of ψ is named lb . For n = 3 and given r, the maximum length that
lb can be without error is known via Equation 1, let us call this length la .
13
Figure 3: The distance between the location of xj and the position in would need to be
in for it to be part of a regular polygon corresponding to ψ.
Define, ∆lb = lb − la , if lb > la , or zero otherwise. Let ∆ρb be the distance
between the image projection of the centroid and the closest point on the
line of the projected centre. ∆ρb and ∆lb are parametric functions of ∆θb
Let us consider two different values of ∆θb for g ≥ 0. For each value, ∆ρb
and ∆lb have a single possible value (aside from sign reversals). In Figure
4 we can see that for the same xj a family of lines pass through the image
projection of the center of a given Φ for different values of (∆θb , ∆ρb , ∆lb ).
This family can be parameterised by a single variable, which we can take to
be ∆θb , thus we may form a parametric function for g in terms of ∆θb .
Firstly, consider lb ≤ la , then g is a combination of the distance arising
from ∆θb and ∆ρb . Define ||Φ − x|| = sqrt((Φx − xx )2 + (Φy − xy )2 ), and
let ∠cx be the angle formed between the line joining (xx , xy ) and (Φx , Φy )
and the same coordinate system of xθ (as shown in Figure 3). Then:
14
Figure 4: The likelihood of a regular polygon Φ given xj has a one dimensional family of
solutions with error in ρ and θ.
∆ρb = r − ||Φ − x|| cos(∆θb )
(14)
Now to consider the case where lb > la . Equation 14 is unchanged, but an
additional error appears in the direction of the line of the projected centre.
The limit is defined as in Equation 1, where n = 3, and r = Φr . The general
error function ∆lb can be defined:
∆lb = 0 lb ≤ 2φr tan φπn
∆lb = lb − 2φr tan
π
φn
otherwise
where n = 3. Note that lb is also a function of ∆θb , specifically:
15
(15)
Figure 5: The likelihood of the hexagon may be overestimated if the ambiguity of the
greater edge length is not resolved in the second stage of the algorithm.
lb = ||c − x|| sin(∠cx − (xθ + ∆θb )).
(16)
Thus, we may form our entire equation for g as a function of ∆θb .
g = fθb (∆θb ) + fρb (r − ||Φ − x|| cos(∆θb )
lb ≤ 2φr tan φπn
g = fθb (∆θb ) + fρb (r − ||Φ − x|| cos(∆θb )
otherwise
+flb (lb − 2φr tan
(17)
π
)
φn
where fθb , fρb and flb are the distance functions over ∆θb and ∆ρb respectively. For known covariance there is a minimum of g that can be obtained
by a one dimensional minimisation.
4.3. Regular polygon detection
Using Equation 13 to find likelihood density across Ψ, we may take the
peaks of the distribution as the set of n regular polygons that are most
likely in this subspace. This assumes zero mean, and that the actual regular
polygons are spatially separate. We will deal with the difficulties that arise
from the assumption of spatial separation subsequently. With a noisy image,
each peak will appear as a cluster of maxima over a small spatial region.
16
This suggests a first stage of the algorithm to detect likely regular polygons.
If we truncate the distance function, we may focus computation on image
and polygon space regions of most likely polygons.
In practice the second stage consists of: recovering n and γ; and, resolving
overlaid regular polygons.
4.4. Resolving orientation and shape
From Equation 13 we have values for ψh = (cxh , cyh , rh ) ∈ Ψ for the set
of regular polygons that are most likely in Ψ. We may construct a likelihood
function given this information to resolve the remaining dimensions, that is:
log(L(γ, n|x, cx , cy , r)) =
X
−g(xj , frp (Φ))Σ−1
j g(xj , frp (Φ)) + const (18)
j
To recover the likelihood function, we seek θ and n that maximises:
log(L(γ, n|x, cx , cy , r)) =
n X
X
ns=0
Z
−g(
j
Z
−1
Σj g(
l
t= 2r
cx,y + Φr .p̂r(z1 t + iz1 )dt, xj )(19)
l
t=− 2r
l
t= 2r
cx,y + Φr .p̂r(z1 t + iz1 )dt, xj ) + const.
l
t=− 2r
where p is a unit vector from Φx , Φy in direction Φγ + m2π
, m−{0, 1, ..., n−1},
n
i.e., towards the centre of each of the edge segments in turn. This is shown in
Figure 6, with Φ having a set of implied edges, and the distance likelihood for
a particular value of n and γ being the sum of the likelihoods of the closest
implied polygonal edge given the presence of the edge segments xj .
17
Figure 6: The likelihood of a regular polygon Φ given an edge xj is the likelihood of that
edge forming part of the nearest implied edge of the polygon.
For any edge segment, we may form a likelihood function similar to Equation 17 over the angle of the closest oriented edge, that is γ +
ϑ=γ+
m2π
,
n
m2π
.
n
Let
we have:
g = fγ (γ +
mn
2π
− bf xθ ) + fρ (r − ||Φ − x|| cos ϑ) lb ≤ 2φr tan
g = fγ (γ +
mn
2π
− bf xθ ) + fρ (r − ||Φ − x|| cos ϑ) otherwise
π
(20)
φn
+flb (lb − 2φr tan φπn )
We can combine all such functions for all the xj that contribute to a given
Φ. For each point in detection subspace, this is a finite set using a truncated
18
distance function for finite discrete images.
Within this set, we wish to evaluate the likelihood density over γ and n.
4.5. An algorithm for searching over n and γ
We now have a objective function to optimise. The problem is that of
finding the actual edges of the shape in orientation space from the subset
of xj identified in the first stage. The error function of Equation 20 has a
parameter of the number of sides. The total solution can be optimised by
forming a likelihood density function over orientation from these points for
each n, then in that density function, performing a one dimensional search
n
, summing the likelihood density at points in the function
from angle 0 to 2π
Pn
n
i=0 i ∗ 2π . The maximum of this is the maximum likelihood polygon of n
sides for this location. This sum can be performed for all possible n.
4.6. An efficient algorithm for searching over n and d
Any strong polygon of reasonable size must have edges with many pixels.
Thus, at least one edge will be a significant peak in the likelihood distribution
function. Thus, for a given n, we need not search the complete space, but
only need search in the regions of the top k peaks in the function.
If computation is pressing, we may also bound the algorithm. Suppose
that rather than form the likelihood density functions for all n, we simply
take n = 3, and search in this. The n = 3 likelihood density function is an
upper bound on all density functions for n > 3, as for smaller lb , Equation
20 is always less. The full likelihood could then also be calculated for these
explicit candidates. In practice, use of this bound without full search leads to
19
confusing factors of n, e.g., finding a square instead of an octagon, as shown
in Figure 11(t).
5. Implementation
5.1. Stage 1
Our implementation aims to find the complete parameters of the most
likely regular polygons in the image. We make approximations to the likelihood distribution for a fast implementation:
1. discretely sample the likelihood distribution;
2. truncate g; and,
3. approximate Equation 11 by variance around the line of the projected
centre.
(1) means we may use a Hough-like algorithm, note that a coarse-tofine strategy may be employed or found peaks may be better located by
subsequent EM based on Equation 13.
As we use squared distance, the likelihood contribution to a regular polygon of xj rapidly becomes small as distance increases, so (2) has little impact
on accuracy, but it keeps the contributions of edges localised.
The error in estimated edge position is small for clear contrast edges
(mostly less than a pixel). The most likely error is that an edge pixel is
not detected. Sobel causes ghosting of edges, so even if the edge is shifted
slightly, it is likely that a edge response will be within a pixel of the correct
20
location. However, edge orientation estimation is the main source of error in
edge parameters.
Our implementation is to step along the line of h for the width implied
by , and adjust the log likelihood of each sample point of the distribution
according to L(xj |Ψ). Alternatively, we may approximate by adjusting only
the closest pixels to the line, and convolving the final likelihood image with
a Gaussian. Thus for each xj , we update log-likelihoods for rlw points in
Ψ, where w is the width implied by . In practice, small w is sufficient,
for a fast implementation this may be one to three pixels. Also, for specific
applications, r may be over a small range.
In fact, to best represent orientation error we should rotate the line of
the projected centre about its centre, and set the likelihoods based on ∆θb .
However, if we set the width based on the expected orientation variation, we
will simply overestimate the points close to the centre. This results in a faster
implementation, and alleviates the quantisation issues typical of Hough-based
methods. The cost is to potentially alias the peaks. Given contributions of
multiple separate edges from different directions, a well supported peak will
be clear. Some imprecision is apparent in the locations of detected regular
polygons in some of our results.
5.2. Stage Two
We take the xj that voted to a peak (assuming n = 3). Equation 20 forms
a distribution that will find the maximum fit for each edge point to allow for
error in position of the point. For computational purposes we approximated
21
Given edge elements xj of an image, detect regular polygons with n ∈ N sides. For each
polygon radius r:
1. Estimate likelihood image:
(a) For each xj : Compute all triangle locations pk that could generate xj , accumulate locations in a vote image, and record information: pk , xθ and distance
la .
(b) Convolve vote image with Gaussian to generate discrete approximation of likelihood image.
2. Evaluate each maxima qi for each n ∈ N :
(a) Build a weighted angular histogram of all recorded xθ values with la < l(n,r)
2 ,
2
weighted by a Gaussian of kpk − qi k .
(b) Convolve histogram with string of impulse functions δ(γ − 2π
n ), the γ that maximises this convolution is the most likely orientation of an n sided polygon, and
the values at γ ± m2π
n , m ∈ Z indicate the support for each side.
(c) Total support is determined as a function of the support for each side of the
shape.
Figure 7: Summary of the algorithm.
this simply by squared distance over ∆θb . This will lead to under estimates
of likelihood for points for which errors in position are present. A fast approximation to this can be formed by convolving the orientation histogram
of the set of xj that voted to the peak.
In practice, one typically either seeks the set of k regular polygons with
maximum likelihood, or all regular polygons with a likelihood greater than a
threshold. In either case, this search may be restricted by taking only the top
m peaks of the above histogram. For smaller radii, only the first few regular
polygons can actually be reliably discriminated from the image information
22
(a)
(b)
(c)
Figure 8: (a) Noise test image, (b) with SNR = 3dB (σ = 0.5), (c) mean and standard
deviation of detection with additive Gaussian noise.
due to pixel quantisation. This stage only requires a small number of accesses
of the composed histogram, for our experiments m = 5 was adequate. For
larger polygons, the search may initially only treat prime numbers of sides,
and explore further if these have a high likelihood. We perform the search
by considering each n in turn, and each of the m peaks in turn. For each
combination of m, n we lookup the values in the histogram at the positions
of the edges at intervals of
n
2π
offset from the peak. The sum of the values
over each edge forms an approximation to summing over Equation 20.
We may quickly refine these estimates by allowing n > 3. For any likelihoods that we find to be significant, we may remove xj that fall outside the
edge length implied by r, n. Implementation is summarised in Figure 7.
6. Experimental results
We evaluate algorithm performance in the presence of noise on artificial
images as well as on real images, both a database of road signs, and sequences
taken from a vehicle. We also demonstrate the application of the detector to
wide-baseline matching.
23
(a)
(b)
(c)
(d)
Figure 9: (a) Input image, (b) likelihood image for r = 17, (c) results for r = 17, and (d)
results across all radii. Thickness indicates strength of detection.
6.1. On still images
Figure 8 (a) was corrupted by additive Gaussian noise σ = 0, 0.05, ..., 1
and performance was evaluated over 20 runs at each noise level. (c) shows
the percentage of the top six shape hypotheses deemed correct (sides, and
location within 4 pixels). Above 3dB detection falls below 50% (b).
Figure 9 shows highly ranked detection results for 3 to 8 sided regular
polygons for r ∈ {8, 11, 14, 17}. (b) shows the likelihood image. In (d) we
see good detection of all shapes, with some inaccuracy in centroid estimation
as expected. Several polygons are aligned to only partial edges (e.g., the
triangle on the middle left), others find a smaller number of sides.
Figure 10 shows the second stage; resolving n and γ. (b) shows the input
gradient histogram used for the second stage, with dominant peaks at 90,
210, and 330. In (c) we see the edges still present around the local area,
but with other peaks. While in (d), we see only the vertical peak remains
dominant in the whole image.
24
(a)
(b)
(c)
Figure 10: The gradient orientation histogram. (a) input image, (b) gradient orientation histogram for the edge pixels that voted for the give way sign, and (c) the gradient
orientation histogram for the whole image.
6.1.1. Road sign database
Members of the NICTA Smart Cars project collected digital camera
road sign images around Canberra, and hand annotated the contents (giveway/roundabout, stop, diamond information). The images were 64x64 with
a large visible sign approximately centred. The regular polygon detector was
run on each returning the most likely sign, number of sides in {3,4,5,6,7,8},
radius in {15, 18, 22} pixels, orientation and location.
Table 1 shows results by sign type: number of images; frequency of correct
classification of number of sides; frequency that the estimated centre was less
than three in x and less than three pixels in y from deemed ground truth; and,
results associated with visual verification. Here we see: frequency of correct
with respect to visual classification; and, when we only consider position.
Finally, the database included images that are rotated out of the plane,
naturally performance will decrease on such images, so the final line shows
frequency when these are discarded.
25
image set:
give way diamond
total number
160
174
sides correct
0.9938
0.9540
< 3 pixels in x and y
0.9563
1.0
visually correct
0.9814
0.9540
visually correct position 0.9814
0.9713
visually correct position
removing tilted errors
0.9938
0.9769
stop
106
0.8679
0.9811
0.8584
0.8868
0.9583
Table 1: Road sign database classification. Note all signs were localised to within 5 pixels
in x and five pixels in y.
Figure 11 shows example detection images with the most likely regular
polygon superimposed. (a), (b) and (c) show typical correct classifications.
(d) to (l) illustrate successfully detected difficult cases including occlusion,
out of plane rotations, a dent, and difficult lighting conditions. While (m)
and (n) have impeded localisation due to rotation and occlusion.
Results for approximate location of giveways signs are excellent. Of the
11 with shortcomings, eight were significantly tilted (e.g., (m)), and one also
dented. Occlusion impairs localisation in two cases (e.g., (n). (o) is the
only misclassification with rotation and a strong background shadow. 15
stop signs have incorrect n (octagons are more complex shapes), of which 10
are tilted (e.g., (p), (q)). A square with the text had higher likelihood for
four cases (e.g., (r), (s)). Finally, in (t) a diamond was found with greater
likelihood. Eight diamonds were incorrectly classified; two due to tilting
(e.g., (u)). The remainder were due to poor contrast, as warning only signs,
the edges are less contrasting: the roundabout symbol was detected more
26
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
(k)
(l)
(m)
(n)
(o)
(p)
(q)
(r)
(s)
(t)
(u)
(v)
(w)
(x)
(y)
Figure 11: The full detector applied to a set of sign sample images for up to eight sided
polygons. (a)-(c) Correct detection in difficult cases. (d)-(n) correct recognition of difficult
signs. (o)-(y) show the error cases.
strongly for three (e.g., (v)); and (w) - (y) show very low contrast.
Excluding tilted signs, overall recognition is greater than 95%. As we will
illustrate in road trials, we expect to see more again in the top n candidates.
6.1.2. Real-time sign detection
We applied a constrained form of the regular polygon detector (triangles)
to two sequences from the Smart Car, see Figure 12. For detection, we
required the sign to be one of the two most likely triangles for any radius
for an initial image, and then one of the top 10 for the next k images. This
removes false positives due to accidental alignment of features at different
27
(a)
(b)
(c)
(d)
Figure 12: Sample images from two roundabout sequences: head-on approach of 100
frames; and, the sign appears early in a sequence, before the car passes through the
roundabout. Sequence 1 - frame (a) 52 where the sign is first of sufficient size for classification (52), (b) middle (75), and (c) end (99). Sequence 2 - (d) the first and (f) last image
where the sign is visible, and (e) one half way in between.
depths. Radii of 6, 8, 10, and 12 pixels were used. The a priori constraint
over n and r meant all orientations other than 0±12◦ , 120±12◦ and 240±12◦
were discarded, and the second algorithm stage was not required. For a
320 × 240 image in C++ on a standard PC detection took less than 50ms
per frame. Thus many frames can be processed while a sign is visible. For
recognition, we apply normalised cross correlation.
For the first sequence we show: a sample detection image (Figure 13)
(a),(b)); false positive and false negative curves (Figure 13 (c)); and, a
summary of correct and false detections for different k (Figure 14). The
sign is detected many times over the sequence, and with small number of
false positives, classification need investigate a maximum of four image locations per frame. The sign was correctly classified by NCC 17 times during the sequence, with no false positives. The performance on the second
sequence is similar. The complete processed sequences can be viewed at
http://users.rsise.anu.edu.au/~nmb/regularpolygon/.
28
(a)
(b)
(c)
Figure 13: (a) Most likely triangle midway through the sequence; (b) likelihood function
for radius 8, and (c) average and standard deviation false positives (top) and false negatives
(bottom), for number of frames that the sign is required to be present.
numframes
2
3
4
5
sign present
correct detections false positives
0.936
0.979
0.936
0.936
0.936
0.809
0.936
0.771
sign absent
false positives
3.01
2.80
2.46
2.26
Figure 14: For the number of frames the sign is required to be present, average correct
positive rate and false positive rate the 48 frames where the sign was detectable, and false
positive rate for the 52 frames where the sign was too small for detection.
6.2. Indoor scenes
We applied the detector for robotic mapping for wide-baseline images over
time typically around 1m, on a 69 image sequence as a robot moves down
a corridor (Figure 15) and turns a corner (Figure 15(c),(d)). We searched
for n = 4 features, resolving γ after initial detection at three radii, finding
all squares above a low likelihood. Features were matched according to size
and orientation, local greyscale environment, and position. Typically, around
70% of points were matched, with reduced rates during rotation. Figure 15
29
(a)
(b)
(c)
(d)
Figure 15: Regular polygons for feature matching. (a) Square features: colour represents
likelihood decreasing from black to light grey. (b),(c) Matches for turning a corner. The
black lines represent correct matches, while the light ones represent erroneous matches.
(d) Only one match was found by SIFT for this pair.
(d) shows the matches obtained by SIFT [16], using the reference implementation with its packaged parameters, showing only one effective match.1
Note that the detector will be most effective in this type of rectilinear environment, however, it is highly suitable to be part of a battery of detectors
for matching. See [28] for complete implementation details.
7. Conclusions
Regular polygon detection is an important problem with multiple applications in robotics and computer vision. Using an a posteriori probability
approach, we presented a new algorithm for detecting regular polygons. We
defined the continuous log-likelihood of the probability density function of
regular polygons. In order to make the algorithm computationally efficient
we find initial likely regular polygons in a lower dimensional space, and then
resolve the remaining parameters for likely regular polygons. We presented
1
http://www.cs.ubc.ca/~lowe/keypoints/
30
an efficient algorithm based on this, and adaptations of this algorithm that
can run at 20Hz detecting signs in an intelligent vehicle application. Experimental results show the efficacy and robustness of the algorithm, and
its application to driver assistance, and as a feature detector for matching,
applied to a robot sequence for mapping.
Acknowledgements
NICTA is funded by the Australian Government as represented by the Department of Broadband, Communications and the Digital Economy and the
Australian Research Council through the ICT Centre of Excellence program.
The support of the STINT foundation through the KTH-ANU grant IG20012011-03, and the EU FP7 ICT Integrated Project “CogX” (FP7-ICT-215181)
are gratefully acknowledged.
References
[1] D. H. Ballard, Generalizing the hough transform to detect arbitrary
shapes, Pattern Recognition, 13(2), 111–122 (1981).
[2] N. Barnes and G. Loy, Real-time regular polygonal sign detection, in
Proc. 5th Int Conf on Field and Service Robotics (FSR05) (2005).
[3] N. Barnes, G. Loy, D. Shaw, and A. Robles-Kelly, Regular polygon
detection, in International Conference on Computer Vision (2005).
[4] N. Barnes, A. Zelinsky, and L. Fletcher, Real-time speed sign detection
31
using the radial symmetry detector, IEEE Trans. Int Trans Systems,
9(2), 321–332 (2008).
[5] R. L. Castano and S. Hutchinson, A probabilistic approach to perceptual
grouping, Computer Vision and Image Understanding, 64(3), 399–419
(1996).
[6] I. J. Cox, J. M. Rehg, and S. L. Hingorani, A bayesian multiple hypothesis approach to contour segmentation, International Journal of
Computer Vision, 11, 5–24 (1993).
[7] D. Crevier, A probabilistic method for extracting chains of collinear
segments, Image and Vision Computing, 76(1), 36–53 (1999).
[8] W. Dickson, Feature grouping in a hierarchical probabistic network,
Image and Vision Computing, 9(1), 51–57 (1991).
[9] D. M. Gavrila, A road sign recognition system based on dynamic visual
model, in Proc 14th Int. Conf. on Pattern Recognition, 1, 16–20 (1998).
[10] C. Geyer, S. Sastry, and R. Bajcsy, Euclid meets fourier: Applying harmonic analysis to essential matrix estimation in omnidirectional cameras, OMNIVIS’04, (2004).
[11] G. Guy and G. Medioni, Inferring global perceptual contours from local
features, International Journal of Computer Vision, 20(1-2), 113–33
(1996).
32
[12] P. V. C. Hough, Method and means for recognizing complex patterns,
(3,069,654) (1962). U.S. Patent, 3,069,654.
[13] S. H. Hsu and C. L. Huang, Road sign detection and recognition using
matching pursuit method, Image and Vision Computing, 19, 119–129
(2001).
[14] J. N. Huddleston and J. Ben-Arie, Grouping edgels into structural entities using circular symetry, the distributed hough transform and probabilistic non-accidentalness, CVGIP: Image Understanding, 57(2), 227–
242 (1993).
[15] N. Kiryati and A. M. Bruckstein, Heteroscedastic hough transform
(htht): An efficient method for robust line fitting in the ‘errors in the
variables’ problem, Computer Vision and Image Understanding, 78(1),
69–83 (2000).
[16] D. G. Lowe, Distinctive image features from scale-invariant keypoints,
International Journal of Computer Vision, 60(2), 91–110 (2004).
[17] G. Loy and N. Barnes, Fast shape-based road sign detection for a driver
assistance system, in Proc. 2004 IEEE/RSJ International Conference
on Intelligent Robots and Systems. IROS2004 (2004).
[18] G. Loy and A. Zelinsky, Fast radial symmetry for detecting points of
interest, IEEE Trans Pattern Analysis and Machine Intelligence, 25(8),
959–973 (2003).
33
[19] S. Maldonado-Bascon, S. Laufente-Arroyo, P. Gil-Jimenez, H. GomezMoreno, and F. Lopez-Ferreras, Road-sign detection and recognition
based on support vector machines, IEEE Trans. on Intelligent Transportation Systems, 8(2), 264–278 (2007).
[20] L. G. Minor and J. Sklansky, Detection and segmentation of blobs in
ifrared images, IEEE Trans. on Systems, Man, and Cybernetics, 11(3),
194–201 (1981).
[21] P. Parent and S. W. Zucker, Trace inference, curvature consistency, and
curve detection, IEEE Trans. on Pattern Analysis and Machine Intelligence, 11(8), 823–839 (1989).
[22] P. Perona and W. T. Freeman, Factorization approach to grouping, in
European Conference on Computer Vision 1998, 655–70 (1998).
[23] G. Piccioli, E. D. Micheli, P. Parodi, and M. Campani, Robust method
for road sign detection and recognition, Image and Vision Computing,
14(3), 209–223 (1996).
[24] L. Priese, J. Klieber, R. Lakmann, V. Rehrmann, and R. Schian, New
results on traffic sign recognition, in Proc. Intelligent Vehicles Symposium, Paris, IEEE Press, 249–254 (1994).
[25] A. Robles-Kelly and E. R. Hancock, A probabilistic spectral framework
for grouping and segmentation, Pattern Recognition, 37(7), 1387–1405
(2004).
34
[26] S. Sarkar and K. L. Boyer, Quantitative measures of change based on
feature organisation: Eigenvalues and eigenvectors, Computer Vision
and Image Understanding, 71(1), 110–136 (1998).
[27] A. Shashua and S. Ullman, Structural saliency: The detection of globally
salient structures using a locally connected network, in Proc. International Conference on Computer Vision, 321–327 (1988).
[28] D. Shaw and N. Barnes, Regular polygon detection as an interest point
operator for slam, in Proc 2004 Australasian Conference on Robotics &
Automation (2004), www.araa.asn.au/acra/acra2004.
[29] R. S. Stephens, Probabilistic approach to the hough transform, Image
and Vision Computing, 9(1), 66–71 (1991).
[30] C. F. R. Weiman and G. Chaikin, Logarithmic spiral grids for image
processing and display, Computer Graphics and Image Processing, 11,
197–226 (1979).
[31] Z. Zhang, Clustering or hough transform, Technical report, INRIA
(1996).
35