Real-Time Natural Scene Analysis for a Blind Prosthesis

Real-Time N a t u r a l Scene Analysis f o r a B l i n d Prosthesis
Michael Deering
C a r t e r Collins
C o m p u t e r S c i e n c e D i v i s i o n , D e p a r t m e n t of EECS
University of California, Berkeley
B e r k e l e y . C a l i f o r n i a 94720
S m i t h - K e t t l e w e l l I n s t i t u t e o f Visual Sciences
San F r a n c i s c o . C a l i f o r n i a 94115
ABSTRACT
A real-time c o m p u t e r vision system designed for the
l i m i t e d e n v i r o n m e n t o f c i t y sidewalks i s p r e s e n t e d . This
s y s t e m i s p a r t o f a p r o t o t y p e m o b i l i t y aid f o r t h e b l i n d .
The o v e r a l l d e v i c e e n d e a v o r s t o k e e p b l i n d p e d e s t r i a n s
on a safe p a t h d o w n t h e sidewalk, a n d also w a r n of
u p c o m i n g o b s t a c l e s . The scene analysis a l g o r i t h m uses
s e m a n t i c m o d e l s o f t h e e n v i r o n m e n t t o i n t e r p r e t edges
in t h e m u l t i - f r a m e i m a g e d a t a as b o r d e r s of v a r i o u s
o b j e c t s , as w e l l as to assign d i s t a n c e e s t i m a t e s to t h e s e
o b j e c t s . The i n p u t is a 64 by 64 by 6 b i t g r a y - s c a l e i m a g e
t a k e n f r o m t h e v a n t a g e p o i n t of t h e s h o u l d e r of a pedest r i a n once a s e c o n d . Along w i t h each i m a g e , t h e t h r e e
d i m e n s i o n a l t r a n s f o r m a t i o n o f t h e c a m e r a l o c a t i o n since
the previous f r a m e is assumed to be provided by
h a r d w a r e . A f t e r a n i n i t i a l s e g m e n t a t i o n i n t o edge lines
r e p r e s e n t e d as a r c s of c i r c l e s , p r e d i c t i o n s of edges ( g e n e r a t e d b y a n a l y s i s o f p r e v i o u s f r a m e s ) are used t o i d e n t i f y edges i n t h e c u r r e n t f r a m e . Edges n o t i d e n t i f i e d b y
this process are i n c o r p o r a t e d into the p o r t i o n of the
t h r e e d i m e n s i o n a l w o r l d m o d e l t h a t t h e y are t h e m o s t
c o n s i s t e n t w i t h . The i n d u c e d t h r e e d i m e n s i o n a l w o r l d
m o d e l o f o b j e c t s can t h e n b e used t o p r o v i d e m o b i l i t y
i n f o r m a t i o n t o t h e b l i n d user. The e m p h a s i s t h r o u g h o u t
the s y s t e m has b e e n on e f f i c i e n c y . The d e s i g n t r a d e - o f f s
a n d t e c h n i q u e s used t o o b t a i n h i g h p r o c e s s i n g r a t e s are
d i s c u s s e d . Most o f t h e v i s i o n s y s t e m i s c u r r e n t l y r u n n i n g
in r e a l - t i m e on a 16 b i t m i c r o - p r o c e s s o r . Field t r i a l s of
t h e c o m p l e t e p r o t o t y p e device w i l l b e g i n soon.
I
Most r e a l - t i m e v i s i o n s y s t e m s t o d a t e have d e a l t
w i t h v e r y c o n s t r a i n e d i m a g e d o m a i n s , due t o t h e enormous c o m p u t a t i o n a l r e q u i r e m e n t s inherent in visual proc e s s i n g . These i n c l u d e i n d u s t r i a l p a r t s r e c o g n i t i o n [ l ] ,
b l o o d cell c o u n t i n g , and a u t o m a t i c n a v i g a t i o n . F a s t e r
h a r d w a r e and m o r e e f f i c i e n t s o f t w a r e t e c h n i q u e s w i l l
g r a d u a l l y allow m o r e c o m p l e x d o m a i n s t o b e h a n d l e d a t
h i g h speeds. Our a p p r o a c h has been t o s t a r t w i t h v e r y
fast s e g m e n t a t i o n , and a m o r t i z e s e m a n t i c p r o c e s s i n g
over s e v e r a l f r a m e s , u t i l i z i n g p r e d i c t i o n s f r o m m o d e l s o f
previously recognized objects to guide the parse of the
c u r r e n t image. At the high end, our system is similar to
m a n y s e m a n t i c a l l y o r i e n t e d s y s t e m s , such a s [ 2 ] and [ 3 ] .
F i n a l l y o u r use of m u l t i - f r a m e d a t a is s i m i l a r to m a n y
aspects of [4] [ 5 ] [ 6 ] .
II
Constraints
I n o r d e r t o achieve r e a l - t i m e p r o c e s s i n g , w e have
had to impose several constraints upon the operations of
the system:
1.
I n p u t is r e s t r i c t e d to c l e a n , s u n - l i t . m o s t l y shadowfree sidewalks.
2.
C e r t a i n i n i t i a l s t a r t i n g c o n d i t i o n s will b e s u p p l i e d t o
t h e s y s t e m f r o m t h e o u t s i d e ( w h i c h way t h e c a m e r a
is p o i n t i n g , w h e r e t h e sun is, e t c )
3.
False positives are allowed ( i t is OK to o c c a s i o n a l l y
warn about non-existent obstacles)
4.
We m u s t accept t h a t w i t h i n our resolution and processing
time
certain
classes
of
objects
are
u n d e t e c t a b l e . These i n c l u d e o b j e c t s whose w i d t h
falls below t h e N y q u i s t s a m p l i n g r a t e o f t h e c a m e r a
( m a i n l y s k i n n y poles), a n d o b j e c t s w i t h v e r y low c o n t r a s t with respect to the b a c k g r o u n d , or those
a g a i n s t a w i l d l y c h a n g i n g b a c k g r o u n d . At t h e same
t i m e , we wished to c o n s t r u c t the p r o g r a m modulely.
Introduction
A n e f f o r t t o p r o d u c e a n o p t i c a l l y based e l e c t r o n i c
m o b i l i t y aid f o r b l i n d p e d e s t r i a n s has l e d t o t h e d e v e l o p m e n t o f a n a t u r a l scene a n a l y s i s p r o g r a m f o r t h e t y p i c a l
scenes e n c o u n t e r e d by a p e d e s t r i a n . The r e s t r i c t i o n to
t h e s e m a n t i c l y r i c h d o m a i n o f c i t y sidewalks has allowed
t h e visual p r o c e s s i n g to be p e r f o r m e d in r e a l t i m e on a
16 b i t m i c r o p r o c e s s o r . The n a t u r e of t h e t a s k is s u c h
t h a t p e r f e c t o b j e c t d e t e c t i o n a n d r e c o g n i t i o n are n o t
required, but rather the probabilistic detection of potent i a l o b s t a c l e s . The m a i n g o a l o f t h e s y s t e m i s t o d e t e r m i n e a p p r o x i m a t e l y w h e r e t h e sidewalk i s ( w i t h r e s p e c t
to the user), and secondarily to warn of objects blocking
t h e p a t h a h e a d . This t a s k m u s t b e p e r f o r m e d i n r e a l t i m e o n r e a l - w o r l d sidewalks.
allowing k n o w l e d g e a b o u t o b j e c t s and scenes to be
separated from the control s t r u c t u r e (but without
sacrificing efficiency.)
Ill
Overview of t h e P r o g r a m
The i n p u t scenes a r e successive 64 by 64 by 6 b i t
g r a y wide angle images t a k e n f r o m t h e v a n t a g e p o i n t o f
t h e s h o u l d e r of a p e d e s t r i a n , at a r a t e of one or two
f r a m e s per s e c o n d . This r e l a t i v e l y low r e s o l u t i o n is t h e
h i g h e s t possible u n d e r t h e h a r d w a r e and t i m i n g c o n s t r a i n t s . The o v e r a l l o r g a n i z a t i o n o f the p r o g r a m is:
A f t e r video a c q u i s i t i o n o f t h e i n p u t scene, d i g i t i z a t i o n
and noise-removal, the i n f o r m a t i o n processed in t h r e e
passes as follows:
• This w o r k was s u p p o r t e d by NSF g r a n t n o . PFR-7906299
f r o m t h e S c i e n c e a n d T e c h n o l o g y t o Aid t h e H a n d i c a p p e d
Program.
704
1.
s e g m e n t t h e p i c t u r e i n t o l i n k e d c h a i n s o f edges,
2.
fit curves to these chains and put the m a t h e m a t i c a l
d e s c r i p t i o n of the curves into an associative database, a n d
3.
m a t c h t h e s e c u r v e s a g a i n s t several d a t a bases ( t h e
world model) which include curve predictions f r o m
p r e v i o u s scenes.
The r e s u l t s o f these m a t c h e s i d e n t i f y s e m a n t i c a l l y
t h e o b j e c t s b e l o n g i n g t o t h e c u r v e s . Knowing w h e t h e r a n
o b j e c t i s h o r i z o n t a l o r v e r t i c a l allow one t o p r o j e c t t h e
c u r v e s o u t i n t o t h r e e d i m e n s i o n a l space t o d e t e r m i n e
their direction and range.
Further heuristics are
employed that utilize location information from previous
f r a m e s t o m a k e a n i n d e p e n d e n t m o t i o n s t e r e o based
e s t i m a t e o f t h e o b j e c t r a n g e s . A t t h i s stage t h e p r o g r a m
s h o u l d know w h e r e t h e sidewalk a n d any close o b s t a c l e s
are located, and can p r o c e e d to o u t p u t this i n f o r m a t i o n .
(Obstacles a r e t h e c o m m o n s o r t s o f l a r g e p h y s i c a l
o b j e c t s t h a t one m i g h t e n c o u n t e r o n a c i t y s i d e w a l k :
p h o n e poles, l a m p p o s t s , Are h y d r a n t s , t r a s h c a n s , sign
p o s t s , a u t o m o b i l e s (off t o one side), p a r k i n g m e t e r s ,
trees, bushes, etc.)
IV
Coordinate System
The c o o r d i n a t e s y s t e m used for t h e t h r e e d i m e n sional o u t s i d e w o r l d i s c e n t e r e d o n t h e f o c a l p o i n t o f t h e
c a m e r a as it moves t h r o u g h space.
The Z-axis is
oriented in the direction that the camera is pointing, the
Y-axis p o i n t s s t r a i g h t u p , and t h e X-axis p o i n t s t o t h e
r i g h t o f t h e c a m e r a . Thus t h e t h r e e d i m e n s i o n a l l o c a t i o n
of a l l o b j e c t s is always d e t e r m i n e d r e l a t i v e to t h e p e d e s t r i a n , and are r e - c o m p u t e d each frame. Points in the
w o r l d are m a p p e d t o p o i n t s i n t h e i m a g e plane t h r o u g h
t h e usual p r o j e c t i v e g e o m e t r i c a l e q u a t i o n s .
equations to obtain the location within the c u r r e n t image
plane of a w o r l d p o i n t f r o m a p r e v i o u s f r a m e .
V
The World Model ( t h e Sidewalk World)
Our w o r l d m o d e l i s t h e t y p i c a l sidewalk e n v i r o n m e n t
as e n c o u n t e r e d by a p e d e s t r i a n . Most m o d e r n s i d e w a l k s
are c o n s t r u c t e d o f slabs o f w h i t e c o n c r e t e , a n d a r e t h r e e
t o twelve f e e t wide. Many r u n i n s t r a i g h t lines f o r a n
e n t i r e b l o c k b e f o r e e n d i n g i n a c o r n e r , while o t h e r s m a y
be c u r v e d . For s i m p l i c i t y it is a s s u m e d t h a t all sidewalks
e n c o u n t e r e d b y t h e vision s y s t e m a r e s t r a i g h t f o r t h i r t y
f e e t b e y o n d t h e c a m e r a unless a c o r n e r is a h e a d . Sidewalks m a i n l y d i f f e r i n t h e i r w i d t h a n d t h e p r e s e n c e ( o r
absence) of a grass b o r d e r on t h e i r s t r e e t side. These
v a r i a t i o n s are m o d e l e d by a few s i m p l e p a r a m e t e r s .
W i t h i n t h e 64 x 64 i m a g e t h e b o r d e r s of t h e s i d e w a l k
on e i t h e r side will o f t e n a p p e a r as h i g h c o n t r a s t lines.
These lines w i l l be in t h e lower half of t h e i m a g e , at a
h i g h l y i n c l i n e d angle. V a r i o u s o b j e c t s b o r d e r i n g o n t h e
s i d e w a l k s o m e t i m e s are o f s i m i l a r o p t i c a l i n t e n s i t y ,
r e d u c i n g t h e c o n t r a s t of t h e sidewalk edges. A l a r g e
v a r i e t y o f o b j e c t s c a n b o r d e r c i t y a n d s u b u r b a n sidewalks. These i n c l u d e : bushes, s h r u b s , t r e e s , grass, d i r t ,
d r i v e w a y s , w a l k w a y s , walls of a l l t y p e s , d o o r s , a n d w i n dows. On t h e s t r e e t side u s u a l l y one finds p a v e m e n t and
a u t o m o b i l e s . These o b j e c t s o c c u r a t f a i r l y p r e d i c t a b l e
l o c a t i o n s , and m a n y t i m e s w i t h good c o n t r a s t c o m p a r e d
t o t h e w h i t e sidewalk.
Most o b j e c t s l o c a t e d u p o n t h e sidewalk p r o p e r have
t h r e e f o r t u n a t e p r o p e r t i e s : t h e y d o n o t m o v e , have
s t e r e o t y p i c a l l o c a t i o n s w i t h r e s p e c t t o t h e edge o f t h e
s i d e w a l k , a n d u s u a l l y d o n o t b l o c k t h e p a t h . Objects i n
t h i s class i n c l u d e : p h o n e poles, l a m p posts, fire h y d r a n t s ,
t r a f f i c signs, p a r k i n g m e t e r s , t r e e s , m a i l boxes, phone
b o o t h s , m o s t t r a s h c a n s , a n d bushes. Many of t h e s e
o b j e c t s also have t h e p r o p e r t y o f being r e c t i l i n e a r , and
a p p r o x i m a t a b l e a s c y l i n d e r s o r boxes ( a n d t h u s p r o d u c e
g o o d , h i g h c o n t r a s t edges.)
One o f t h e f u n d a m e n t a l p r o b l e m s o f c o m p u t e r vision
is that this projection of the three dimensional world
(X.Y.Z) t o t h e i m a g e p l a n e ( x . y ) c a n n o t b e r e v e r s e d
w i t h o u t a d d i t i o n a l d a t a o f some k i n d . One o f t h e m a i n
goals of t h e s e m a n t i c phase of o u r s y s t e m is to p r o v i d e
t h i s a d d i t i o n a l d a t a via s e m a n t i c k n o w l e d g e a b o u t t h e
probable
locations,
orientations,
and
relationships
b e t w e e n t y p i c a l o b j e c t s e n c o u n t e r e d w i t h i n t h e sidewalk
e n v i r o n m e n t . This a d d i t i o n a l i n f o r m a t i o n u s u a l l y i s i n
t h e f o r m of a h y p o t h e s i s on t h e value of one of X.Y, or Z.
Given t h i s v a l u e , along w i t h t h e i m a g e plane f e a t u r e locat i o n ( x , y ) , t h e r e m a i n i n g two t h r e e - d i m e n s i o n a l c o o r d i nates can be found by suitable m a n i p u l a t i o n of the projection equations.
U n f o r t u n a t e l y other objects are m o r e unpredictable. These i n c l u d e : p a p e r boxes, bags, n e w s p a p e r s ,
t r a s h , garbage cans, p a r k e d bicycles, and badly p a r k e d
c a r s . S u c h o b s t a c l e s c a n a p p e a r a n y w h e r e o n t h e sidew a l k , a n d are n o t always v e r y r e c t i l i n e a r . However, t h e y
d o have s o m e p r o p e r t i e s t h a t f a c i l i t a t e t h e i r d e t e c t i o n .
Many a r e s h o r t , s o t h e i r b o r d e r s are w i t h i n t h e t w o edges
o f t h e sidewalk. They also r e s t d i r e c t l y u p o n t h e g r o u n d ,
e n a b l i n g t h e i r d i s t a n c e t o b e a c c u r a t e l y d e t e r m i n e d and
v e r i f i e d over s e v e r a l f r a m e s .
F i n a l l y t h e r e is t h e class of m o v i n g o b j e c t s w h i c h
a r e t h e h a r d e s t t o h a n d l e , a s t h e i r shape a n d l o c a t i o n
m a y c h a n g e d r a s t i c a l l y f r o m f r a m e t o f r a m e . This class
i n c l u d e s : p e d e s t r i a n s , dogs, b i c y c l e s , o c c a s i o n a l l y a c a r
crossing the sidewalk, and wind-blown t r a s h .
Fort u n a t e l y , m o s t m o b i l e o b s t a c l e s are alive o r c o n t r o l l e d
by humans, and will t r y not to collide with a pedestrian.
The m o t i o n o f t h e c a m e r a i n t h e w o r l d b e t w e e n
f r a m e s w i l l cause t h e p r o j e c t i o n s o f edges o f t h r e e
d i m e n s i o n a l o b j e c t s o n t o t h e i m a g e plane t o c h a n g e .
The g e n e r a l case o f t h e c a m e r a t r a n s f o r m a t i o n involves
six p a r a m e t e r s . (AX.AY,AZ,t),^,p). For a c a m e r a m o u n t e d
u p o n t h e s h o u l d e r of a p e d e s t r i a n it c a n be safely
a s s u m e d t h a t AY a n d p a r e a p p r o x i m a t e l y zero, as t h e r e
is little t o r s i o n a l r o t a t i o n , and the height of a p a r t i c u l a r
pedestrian remains roughly constant.
(It should be
noted t h a t the mechanics of the h u m a n visual system
goes to a g r e a t deal of e f f o r t to k e e p p n e a r 0, a p r o t a t i o n of up to six d e g r e e s of t h e h e a d is c o u n t e r e d by an
o p p o s i t e r o t a t i o n o f t h e e y e b a l l i n t h e s o c k e t . The shape
of the horoptor in many animals indicate that the height
of t h e a n i m a l is t a k e n to be a c o n s t a n t f o r s o m e v i s u a l
p r o c e s s i n g . ) The e q u a t i o n s t o p e r f o r m t h i s t r a n s f o r m a t i o n can be c o m b i n e d w i t h the image plane p r o j e c t i o n
VI
Low Level Processing
I n i t i a l s e g m e n t a t i o n of d i g i t a l images is now a f a i r l y
well d e v e l o p e d a r t . b u t n o g e n e r a l t e c h n i q u e i s k n o w t o
be (close t o ) o p t i m a l f o r t h e large class of n a t u r a l
i m a g e s , a n d t h e speed o f v a r i o u s a l g o r i t h m s c a n d i f f e r b y
o r d e r s o f m a g n i t u d e . The necessity f o r r e a l - t i m e p e r f o r m a n c e i s t h e m o s t severe c o n s t r a i n t i m p o s e d u p o n o u r
s y s t e m . Many p r o m i s i n g s e g m e n t a t i o n t e c h n i q u e s h a d
to be r e j e c t e d out of hand on efficiency grounds.
705
VII
The s p e e d o f t h e i n i t i a l s e g m e n t a t i o n a l g o r i t h m
dominates the p e r f o r m a n c e of the overall system, as the
s e m a n t i c phase is u s u a l l y m a n y t i m e s f a s t e r [ 1 ] . Thus a
poor quality (but fast) segmentation algorithm may be
p r e f e r a b l e t o h i g h e r q u a l i f y ( b u t s l o w e r ) a l g o r i t h m i f one
c a n m a k e t h e s e m a n t i c phase w o r k a l i t t l e h a r d e r . This
i s t h e case i n o u r s y s t e m . Our s e g m e n t a t i o n a l g o r i t h m
o n l y d i r e c t l y c o m p a r e s two p i x e l s at a t i m e , a n d t h u s is
sensitive to noise, b u t r u n s at a very high speed. In some
sense e v e r y m o d u l e i n t h e s y s t e m a f t e r t h e p i x e l c o m p a r i s o n has s o m e c o m p o n e n t who's j o b i s t o h e l p c o r r e c t
for the defects i n t r o d u c e d by the initial fast segmentation.
The F i t t i n g of Edges to Curves
Pass 2 g a t h e r s i n f o r m a t i o n a b o u t e a c h edge l i n e ,
s u m m a r i z e s its a t t r i b u t e s , and sorts it to p e r m i t quick
s e a r c h e s . Pass 1 s o r t s t h e edge lines by x-y l o c a t i o n a n d
c o m p u t e s t h e i r l e n g t h . Pass 2 c o m p u t e s t h e i r " c u r v a t u r e " and angle o f i n c l i n a t i o n f r o m t h e h o r i z o n t a l , and
s o r t s t h e m b y angle. Edge lines t h a t a r e t o o c o n v o l u t e d
a r e b r o k e n u p i n t o s m a l l e r ( a n d s i m p l e r ) s e g m e n t s . This
covers t h e few cases in w h i c h t h e c o n s e r v a t i v e edge follower d e s c r i b e d above is n o t s u f f i c i e n t l y c o n s e r v a t i v e .
This can o c c u r when two s t r a i g h t line s e g m e n t s i n t e r s e c t
at a shallow angle. Pass 1 c a n n o t d i s t i n g u i s h t h i s case
f r o m a single shallowly c u r v e d l i n e s e g m e n t . Pass 2's
c u r v a t u r e s t a t i s t i c s are n e e d e d t o resolve t h i s case.
The s e g m e n t a t i o n a l g o r i t h m used is an edge following a l g o r i t h m t h a t differs f r o m the usual (such as
d e s c r i b e d i n [ 7 ] ) . i n t h a t w e follow s e v e r a l edges s i m u l t a n e o u s l y . Most edge f o l l o w e r s g r o w an edge l i n e p o i n t
by p o i n t s e r i a l l y f r o m one end of t h e l i n e . We i n s t e a d
g r o w m a n y edge lines i n p a r a l l e l b y a d d i n g t o b o t h ends
of m a n y edge lines s i m u l t a n e o u s l y . The a d v a n t a g e s of
this m e t h o d corresponds roughly to those gained by a
b r e a d t h first versus a d e p t h first search, in t h a t t h e r e is
m o r e g l o b a l i n f o r m a t i o n available w h e n one i s f o r c e d t o
m a k e l o c a l d e c i s i o n s . This allows edge t h i n n i n g t o t a k e
p l a c e a t t h e s a m e t i m e a s edge f o l l o w i n g , a n d c o n t r i butes to the speed of the a l g o r i t h m .
We devised a f a s t a l g o r i t h m to fit lines to a r c s c i r cles. The m a i n p o i n t for o u r a p p l i c a t i o n was to w i t h i n a
m i l l i - s e c o n d classify a g i v e n l i n e s e g m e n t as a r o u g h l y
s t r a i g h t l i n e , a g e n t l y c u r v i n g l i n e , or noise. It is possible t h a t in t h e f u t u r e we m a y m a k e use of f i n e r d e t a i l s of
the curves, but in an environment of r o t a t i n g three
d i m e n s i o n a l o b j e c t s a b s o l u t e c u r v a t u r e i s o f l i t t l e use.
and m o r e general i n f o r m a t i o n on object surface o r i e n t a
t i o n and d i s t a n c e is n e e d e d ( s u c h as d i s c u s s e d in [ 10].)
VIII
I n m o r e d e t a i l , edge p o i n t s i n t h e i n p u t are f o u n d
using a p s e u d o - r a n d o m s c a n [ 8 ] . I n t h e a r e a a r o u n d
e a c h p o i n t a s e a r c h is m a d e f o r e x i s t i n g edge lines and
o t h e r i s o l a t e d edge p o i n t s . Based o n c o m p l e x d e c i s i o n
r u l e s , a n e x i s t i n g edge line m a y b e e x t e n d e d t o t h e new
p o i n t , a new edge l i n e m a y be c r e a t e d b e t w e e n t h e new
p o i n t a n d a n o t h e r p o i n t , o r t w o edge lines m a y b e j o i n e d
t h r o u g h t h e new p o i n t . (These r u l e s a r e s i m i l a r t o t h o s e
f o u n d i n [ 9 ] , b u t w i t h less c o m p l e x w e i g h t i n g s . ) The d e c i sion r u l e s m e n t i o n e d above h e l p t h i n t h e edges, m a i n l y
by f o r c i n g edge lines to be e s s e n t i a l l y c o n t i n u o u s . The
f i n a l r e s u l t of pass 1 is t h e c o l l e c t i o n of edge lines
o b t a i n e d a f t e r a l l t h e edge p o i n t s have b e e n p r o c e s s e d .
Representation of Objects
Our t h r e e d i m e n s i o n a l r e p r e s e n t a t i o n n e c e s s a r i y
e m p h a s i s t h e edges of o b j e c t s , g i v e n t h e n a t u r e of o u r
i n i t i a l s e g m e n t a t i o n . The r e p r e s e n t a t i o n r o u g h l y r e s e m bles a c o l l e c t i o n of t h r e e d i m e n s i o n a l edges of t h e
o b j e c t , b u t t h e l o c a t i o n s o f t h e edge lines r e l a t i v e t o
e a c h o t h e r is not f i x e d as in 3D w i r e f r a m e m o d e l s in
c o m p u t e r g r a p h i c s , b u t r a t h e r are allowed t o v a r y a s
n e e d e d . As m o s t o b j e c t s in t h e sidewalk w o r l d are s o m e w h a t r e c t i l i n e a r , m a n y are r e p r e s e n t e d b y planes p a r a l lel to t h e X.Y or Z axis (a s i m i l a r r e p r e s e n t a t i o n was used
in [ 4 ] . ) For e x a m p l e , a p h o n e pole c a n be f a i r l y w e l l
a p p r o x i m a t e d by a r e c t a n g l e f a c i n g t h e o b s e r v e r . The
s i d e w a l k p r o p e r is a p p r o x i m a t e d by a r e c t a n g u l a r slab
who's p o s i t i o n and w i d t h i s u p d a t e d e v e r y f r a m e w i t h new
data.
The edge f o l l o w e r t e n d s to be c o n s e r v a t i v e , as it
knows t h a t pass 3 w i l l c o n n e c t b r o k e n lines. This is possible as pass 3 has access to m o r e g l o b a l i n f o r m a t i o n
a b o u t t h e o b j e c t s w i t h i n t h e scene, and t h u s m a y have
r e a s o n t o believe t h a t t h r e e r o u g h l y c o l l i n e a r l i n e segm e n t s m a y in f a c t be t h e edge of one o b j e c t . Pass 1 does
n o t have e n o u g h i n f o r m a t i o n t o d e c i d e i f a g a p b e t w e e n
t w o line s e g m e n t s is due to noise b r e a k i n g up a single
edge, or is r e a l l y an o c c l u d i n g o b j e c t or a g a p b e t w e e n
two objects.
To achieve h i g h p r o c e s s i n g speeds, some of o u r
representation is procedural r a t h e r than semantic
But.
as we have b u i l t up t h e n u m b e r of o b j e c t s t h a t we n a n
dle, a n u m b e r of c o m m o n s u b r o u t i n e s have e m e r g e d ,
allowing new o b j e c t s to be a d d e d a n d r e p r e s e n t e d f a i r l y
easily. H i g h p r o c e s s i n g speeds verses s e p a r a t i o n of c o n t r o l s t r u c t u r e s and k n o w l e d g e are n o t n e c e s s a r i l y i n c o m p a t i b l e , b u t t o o b t a i n b o t h one m u s t have i n t e r v e n i n g
s o f t w a r e s t e p t h a t t r a n s f o r m s h i g h level a b s t r a c t i o n s
i n t o a f o r m c o m b i n a b l e w i t h c o n t r o l s t r u c t u r e s . I t also
helps to have a very flexible c o n t r o l s t r u c t u r e . Our syst e m p u t s b o t h edge d a t a and o b j e c t b u i l d i n g p r o c e d u r e s
i n t o associative d a t a - b a s e s , t h u s a l l o w i n g t h e flow of c o n t r o l to be determined by the data. In retrospect, most
o f t h e o b j e c t h a n d l i n g p r o c e d u r e s c o u l d have been g e n erated by machine f r o m static descriptors rather than
by h a n d , a n d we m a y go to s u c h a s y s t e m in t h e f u t u r e .
I n o u r s y s t e m , noise i n a single p i x e l m a n y t i m e s
c a n l e a d t o t h e b r e a k - u p o f a p o t e n t i a l edge l i n e i n t o t w o
pieces. The d e f e c t s i n t h i s edge s e g m e n t a t i o n c a n b e
m o d e l e d as h i g h e r l e v e l noise. T h a t is. as b r o k e n edges,
m i s s i n g edges, a n d m i s o r i e n t e d edges. S e m a n t i c r u l e s
a b o u t l i n e s e g m e n t s c a n t a k e these d e f e c t s i n t o a c c o u n t ,
a n d s o m e t i m e s even m a k e use o f c e r t a i n p r o p e r t i e s o f
t h e " n o i s e " . For e x a m p l e , t h e b r e a k u p of a edge line
c o r r e s p o n d i n g t o t h e edge o f a n o b j e c t i n t h e scene m a y
be c a u s e d by a s u r f a c e d i s c o l o r a t i o n n e a r t h e o b j e c t
edge. I f t h i s i s t h e case, t h e n t h e " n o i s e " w i l l b e s e r i a l l y
c o r r e l a t e d f r o m f r a m e t o f r a m e . Thus t h e b r o k e n edge
w i l l b e b r o k e n i n t h e s a m e way i n s e v e r a l f r a m e s , a n d t h e
r e l a t i v e l o c a t i o n of t h e b r e a k c a n be ( a n d is) used as a
f e a t u r e o f t h a t edge ( w h i c h c a n h e l p t o r e - i d e n t i f y i t i n
successive f r a m e s . )
DC
Representation of Visual Knowledge
With t h e k n o w l e d g e of how to r e c o g n i z e and
represent objects handled by the object representation,
the r e m a i n i n g visual knowledge of i n t e r e s t is t h a t t h a t
t e l l s y o u where an o b j e c t is ( i t ' s d i s t a n c e and d i r e c t i o n . )
A n u m b e r of h u e r i s t i c s of v a r y i n g d e g r e e s of g e n e r a l i t y
706
s i m p l i f y t h e s e m a n t i c analysis t a s k .
e x i s t t o d o t h i s j o b . w i t h v a r y i n g d e g r e e s o f a c c u r a c y and
c o n s t r a i n t s . These a r e :
1.
If t h e o b j e c t is k n o w to be r e s t i n g on (or v e r y n e a r )
t h e g r o u n d p l a n e , t h e n Y is k n o w n to be - U s e r H i g t h ,
a n d X a n d Z c a n be o b t a i n e d by b a c k p r o j e c t i o n .
2.
If t h e o b j e c t has a k n o w n d i s t a n c e f r o m t h e edge of
t h e s i d e w a l k ( f o r e x a m p l e , p h o n e poles a r e u s u a l l y
one f o o t i n ) , a n d all one has is a piece of an edge of
t h e o b j e c t ( u s u a l l y n o t t h e g r o u n d plane i n t e r s e c t i o n ) , t h e n one c a n o b t a i n t h e o b j e c t s (X.Y.Z) locat i o n a s follows: t a k e t h e i m a g e p l a n e ( x . y ) l o c a t i o n
of t h e edge p i e c e , p r o j e c t it as a line t h r o u g h t h e
o r i g i n ( t h e f o c a l p o i n t o f t h e c a m e r a ) i n t o (X.Y.Z)
s p a c e , a n d i n t e r s e c t t h i s line w i t h t h e plane w h i c h i s
t h e c o n s t a n t d i s t a n c e i n f r o m t h e sidewalk. This
i n t e r s e c t i o n X and Z w i l l be t h e o b j e c t ' s l o c a t i o n on
t h e g r o u n d p l a n e . (The e q u a t i o n o f t h e plane p a r a l l e l t o t h e s i d e w a l k i s o b t a i n a b l e because t h e equat i o n of t h e sidewalk edge is a s s u m e d to be k n o w n . )
Even i f t h e c o n s t a n t d i s t a n c e i n f r o m t h e sidewalk
edge is i n c o r r e c t l y g u e s s e d , w h i c h can lead to dist a n c e e r r o r on t h e o r d e r of 50% or m o r e , at least
s o m e d i s t a n c e i n f o r m a t i o n has b e e n p r o v i d e d , and
one c a n m a k e s i m p l e d e c i s i o n s l i k e "will 1 r u n i n t o i t
i n t w o seconds o r t w e n t y s e c o n d s " .
In f u t u r e
frames, m o t i o n stereo can t i g h t e n up this distance
estimate (and c o r r e c t the constant distance t e r m ) .
By the t i m e an object initially sighted twenty or
m o r e f e e t away c o m e s t o w i t h i n f i v e feet o f t h e
p e d e s t r i a n t h e l o c a t i o n o f t h e o b j e c t w i l l have
a p p e a r e d i n t h i r t y f r a m e s o f solid d a t a , h o p e f u l l y
p i n p o i n t i n g i t ' s l o c a t i o n t o w i t h i n a foot.
3.
Once t h e e f f e c t s of c a m e r a t i l t a n d pan have been
s u b t r a c t e d , t h e d i f f e r e n c e s in l o c a t i o n s of a f e a t u r e
i n successive f r a m e s c a n b e used t o d e t e r m i n e i t s
r a n g e and d i s t a n c e b y w o r k i n g b a c k w a r d s f r o m t h e
p r o j e c t i o n and c a m e r a t r a n s f o r m a t i o n equations.
"We e m p l o y m o t i o n s t e r e o as a s e c o n d a r y d i s t a n c e
cue t h a t is used to c h e c k up on our p r i m a r y cues 1
a n d 2 above.
4.
T h e r e a r e s o m e d i s t a n c e h e u r i s t i c s t h a t are only of
use for d e t e r m i n i n g t h e e q u a t i o n of t h e s i d e w a l k ' s
b o r d e r s . These i n c l u d e m a k i n g use o f t h e k n o w n
constant width of the sidewalk.
5.
Most o b j e c t s i n t h e s i d e w a l k w o r l d r e s t o n t h e
g r o u n d plane ( t h o u g h we do not assume t h a t t h e i r
p o i n t of c o n t a c t is visible.)
2.
Most o b j e c t s c a n be r o u g h l y a p p r o x i m a t e d by planes
p a r a l l e l to t h e x, y. or z axis (as in [ 4 ] . )
3.
L o c a t i o n a c c u r a c y n e e d o n l y be e n o u g h to avoid
o b j e c t s most o f t h e t i m e . For e x a m p l e , d i s t a n c e s t o
objects need only be c o m p u t e d with an a c c u r a c y of
±20% when o b j e c t s a r e c l o s e r t h a n 8 f e e t , a n d ±40%
w h e n o b j e c t s a r e f u r t h e r away.
4.
The c a m e r a t r a n s f o r m a t i o n w i l l b e c o r r e c t l y supplied most of the t i m e (by hardware) to within 1%
a n g u l a r a c c u r a c y a n d 10% t r a n s l a t i o n a c c u r a c y .
In o r d e r to speed t h e i d e n t i f i c a t i o n of edges in a new
f r a m e , p r e d i c t i o n s o f edge l o c a t i o n s f r o m p r e v i o u s
f r a m e s a r e u s e d . M u c h of t h e speed of t h e s e m a n t i c pass
is due to t h e e s s e n t i a l l y h a r d w a r e s o l u t i o n of t h e successive f r a m e r e g i s t r a t i o n p r o b l e m .
Most r e - o c c u r r i n g
edges c a n have t h e i r l o c a t i o n i n successive f r a m e s d e t e r m i n e e d to w i t h a few p i x e l s be using t h e c a m e r a
t r a n s f o r m a t i o n . The o v e r a l l e f f e c t is a s o r t of " b o o t s t r a p p i n g " r e - i d e n t i f i c a t i o n o f scene f e a t u r e s , s i m i l a r t o
t h a t described in [ l l ] . (It may be possible t h a t in the
f u t u r e w e c a n dispense w i t h t h e s p e c i a l c a m e r a m o t i o n
t r a c k i n g hardware and recover this information increm e n t a l l y f r o m o p t i c a l flow.)
I n m o r e d e t a i l , t h e s e m a n t i c phase i s b r o k e n u p i n t o
s e v e r a l s u b p a r t s . These a r e :
F i n i a l l y , t h e l o c a t i o n of an i m a g e f e a t u r e r e l a t i v e to
t h e c u r r e n t , i n t e r p r e t a t i o n of scene can be used to
o b t a i n a p r o b a b l e d i s t a n c e e s t i m a t e . For e x a m p l e ,
edges n e a r t h e v a n i s h i n g p o i n t of t h e sidewalk are
p r o b a b l y ( b u t n o t n e c e s s a r i l y ) f a r away. Edges way
off to one side and in t h e sky m o s t l i k e l y belong to
u p p e r s t o r i e s o f b u i l d i n g s o r t h e b a c k g r o u n d , and
m a y be safely i g n o r e d . (One misses o v e r h a n g s t h i s
way, b u t o v e r h a n g s i n g e n e r a l are v e r y h a r d t o
r e c o g n i z e , . a s m a n y a r e o f v e r y low c o n t r a s t t o b e g i n
with.)
X
These a r e :
1.
Semantic Analysis
The s e m a n t i c phase e n d e a v o r s to b u i l d a t h r e e
dimensional model of the outside world that it is moving
through,
such
t h a t edges
in
the
image
frames
c o r r e s p o n d t o edges o f o b j e c t s i n t h r e e d i m e n s i o n a l
m o d e l . The v a r i o u s d i s t a n c e h u e r i s t i c s l i s t e d above a r e
e m p l o y e d b o t h to i n i t i a l l y place o b j e c t s as well as to v e r ify t h e i r l o c a t i o n / i d e n t i t y over several f r a m e s .
(An
o b j e c t whose d i s t a n c e v a r i e s w i l d l y f r o m f r a m e t o f r a m e
may be mis-identified.) Further constraints exist that
1.
Edge
lines f r o m t h e p r e v i o u s f r a m e a r e f i r s t
t r a n s f o r m e d b y t h e c a m e r a t r a n s f o r m a t i o n and
t h e n m a t c h e d a g a i n s t edge lines i n t h e c u r r e n t
f r a m e , c u r r e n t lines t h a t a p p e a r t o b e d i r e c t desc e n d e n t s o f p r e v i o u s l i n e s are r e m o v e d f r o m t h e
c u r r e n t d a t a base a s " e x p l a i n e d " . The o r d e r i n
w h i c h o l d o b j e c t ' s edges a r e s e a r c h e d f o r i s d e t e r m i n e d b y t h e i r r e l a t i v e s e m a n t i c i m p o r t a n c e and
d a t a q u a l i t y . Thus t h e edges of t h e sidewalk are
u s u a l l y s e a r c h e d f o r f i r s t , followed b y t h e o t h e r
o b j e c t s r o u g h l y r a n k e d b y t h e i r n u m b e r o f (visible)
edges.
2.
The m a t c h e s m a d e in 1 i n d u c e new i n f o r m a t i o n t h a t
c a n b e used t o c o n s t r u c t a n u p d a t e d t h r e e d i m e n s i o n a l m o d e l o f t h e o b j e c t s t h a t t h e edges belong to.
These m o d e l s c a n t h e n m a k e c l a i m s f o r gaps i n
t h e i r edge o u t l i n e s .
3.
The c l a i m s m a d e in 2, along
c l a i m s f o r new o b j e c t s are
r e m a i n i n g d a t a base of edge
w i l l be c l a i m e d as b a c k g r o u n d
4.
New o b j e c t edges o b t a i n e d in 3 allow f o r f u r t h e r
updating of the three dimensional models, which at
t h i s p o i n t c a n b e u s e d b y t h e b l i n d n a v i g a t i o n system.
P r e d i c t i o n s and s e a r c h s c h e d u l i n g f o r t h e
n e x t f r a m e a r e m a d e a t t h i s p o i n t . Objects who's
e x i s t e n c e i s n o l o n g e r s u p p o r t e d b y t h e edge l i n e
e v i d e n c e a r e d e l e t e d i n favor o f m o r e c o n s i s t e n t
interpretations.
with various generic
m a t c h e d against t h e
lines. Residual lines
noise.
Thus a t any one p o i n t i n t i m e t h e w o r l d m o d e l d a t a
base of t h e s y s t e m c o n t a i n s m o d e l s of s e v e r a l o b j e c t s
( p h o n e poles, bushes, a u t o m o b i l e s , e t c . ) t h a t a r e m o v i n g
by. as w e l l as a m o d e l of t h e sidewalk p r o p e r .
707
XI
E x p e r i m e n t a l Results
Figure 1 is a sample image taken from an Bmm
movie of a sidewalk. One of our test sequences consists
of 30 digitized images from this movie. The forward
motion between each frame was one and a half feet. On
our half speed XL68.000, passes 1 and 2 can process this
movie at the rate of one second per frame. The semantic processing of pass three takes an additional half
second per frame. When applied to this movie, the system correctly discovered and tracked the sidewalk
edges, as well as edges of several objects off to the right
of the sidewalk. No objects were found to be blocking
the sidewalk. Figure 2 displays a digitized image from
the middle of this sequence with the wire-frame model of
pass 3 superimposed. (A similar fit is made for each
frame of the movie.) A computer animated reconstruction of the outside environment given the world model
produced by pass 3 is seen in figure 3. (The detail on the
parked car is simulated.) We expect to be running field
trials of the entire system in a portable cart trailing
behind a blind subject shortly.
XII
I n c o r p o r a t i o n i n t o a B l i n d Aid
The overall functioning of this system as a blind aid
is part of the lineage of a large number of previous tactile blind aids devices designed over the last twenty
years [12][ 13][l4). The computer vision component and
the blind interface component have been separated out
from each other via the following reasoning:
1. Assuming a perfect computer vision system that
knows where every object of interest is located, how
could one best communicate this information to a
blind pedestrian? What sort of user interfaces and
interactions will allow the user to make rapid, accurate use of the information provided?
2. How does one build a portable (wearable!) computer
vision system that will locate at least the majority of
the objects of interest?
Our solution to 1 has been presented in the previous
portions of this paper. Our solution to 2 is to use two
output channels - stereo synthetic speech for cognitive
(high level) information, and a linear array of 18 tactile
elements as a pointing device. The stereo speech unit is
a combination of a normal speech synthesis system with
a audio processing unit that can "throw" the computer's
speech, allowing it to appear to come from a particular
direction and distance. (For example, a phone pole
might seem to announce "phone pole".) The tactile
array is a skin tapping device worn as a head-band, each
element corresponds to a particular angular direction,
and the frequency of taps of an element corresponds
inversely to the distance of the feature being pointed to.
Currently we intention to have the speech unit make
major announcements (the blind don't want it babbling
all the time, they listen to sound shadows and street
sounds.) The tactile output will be used for communicating more mundane information, such as "you're veering
off to the left of the sidewalk, veer a bit to the right", by
"flashing" the edge of the sidewalk on the appropriate
side of the tactile display, should the user veer toward it
In any case, one of the main reasons for putting the
whole system on a micro processor, rather than simulating it on a mainframe, was to have the capability of
expermenting in the real world with various blind interface systems. Also, despite years of experience in testing blind aids, it is very hard to tell how the blind will
react to a particular device without letting them make
extensive use of it under real world conditions
XIII
Perspectives on F u t u r e Directions
Within the hardware and timing constraints
imposed, we feel that the current system performs well.
and cannot be much improved upon. However, for use in
a robust blind aid. the system has several limitations
which must be overcome. These include the low sensitivity to low contrast edges in shadowed or complex
scenes, and the low resolution (""1 degree/pixel ) More
important is the limitation to processing of edge data
only, at the exclusion of surface data. It would be nice to
have more information about the sidewalk than just
where it's edges are. such as how flat it is, are there any
broken sidewalk slabs or holes 9 Also, the current system
must treat any high contrast edge on the sidewalk as a
potential object edge, even though most are only flat
shadows or stains.
Various surface processing techniques proposed in
the literature can solve many of these limitations Texture gradients [15] should provide a fairly robust broad
classification of the scene into flat and upright surfaces
Optical flow can provide approximate distance estimates.
Luminance gradients could indicate surface curvature,
which could be used in identifying (and segmenting)
phone poles, walls, automobiles, etc. [10). Finally,
stereo gradients can not only determine general distance estimates, but for the special case of the almost
flat sidewalk, they can be tunned to spot vertical devia-
t i o n s as s m a l l as half an i n c h ( s u c h as un-even sidewalk
t i l e s t h a t one m i g h t t r i p u p o n ) . More i m p o r t a n t l y , s t e r e o
c a n d e t e r m i n e t h a t a d a r k p a t c h is flat, a n d can be
safely i g n o r e d . This w o u l d b e s i m i l a r t o t h e s y s t e m
d e s c r i b e d i n [ 1 6 ] . However, f o r t h i s s p e c i a l i z e d s t e r e o
a l g o r i t h m to w o r k , it m u s t have a v e r y good e s t i m a t e as
to w h e r e t h e flat s i d e w a l k is in t h e f i r s t p l a c e . This is
where the other surface processing techniques enter the
loop. Such a s y s t e m c o u l d p r o v i d e v e r y r o b u s t p e r f o r m a n c e u n d e r even e x t r e m e c o n d i t i o n s ( s u c h a s wet (and
reflet l i v e ! ) sidewalks in t h e r a i n ) , b u t at t h e expense of
special purpose hardware.
[ 1 0 ] T e n e n b a u m . J. a n d B a r r o w , H. " R e c o v e r i n g I n t r i n s i c
Scene C h a r a c t e r i s t i c s f r o m I m a g e s . " i n Computer
Vision Systems, H a n s o n , A. and R i s e m a n , E. eds., 326 ( A c a d e m i c Press, New Y o r k . NY, 1978).
[ 1 1 ] H a n n a h , M. " B o o t s t r a p S t e r e o . "
Stanford.CA. A u g u s t . 1980. 38-40.
[2]
B r o o k s . R., G r e m e r . R. a n d B i n f o r d . T. "The ACRONYM Model-Based Vision S y s t e m . " In Proc IJCAI-79
T o k y o . August. 1979. 105-113.
[ 1 4 ] Collins. C, S c a d d e n . L. and A l d e n . A. " M o b i l i t y s t u dies w i t h a t a c t i l e i m a g i n g d e v i c e . " Proc Conf. on
Systems
and
Devices for
the
Disabled.
Seattle,
(1977) 170-174.
[ 1 5 ] W i t k i n . A. "A S t a t i s t i c a l T e c h n i q u e f o r R e c o v e r i n g
Surface
Orientation
from
Texture
in
Natural
I m a g e r y . " In Proc First NCAI. Stanford.CA. A u g u s t .
1980. 1-3.
[ 1 6 ] G e n n e r y . D. "A S t e r e o Vision S y s t e m f o r an A u t o n o m o u s V e h i c l e . " In Proc IJCAI-77, C a m b r i d g e . MA.
August, 1977, 576-582.
[ 3 ] S a p h i r o . L.. M o r i a r t y . J.. M u l g a o n k a r . P. and H a r a l i c k . R.
" S t i c k s . P l a t e s , and Blobs: A ThreeDimensional
Object
Representation
for
Scene
A n a l y s i s . " In Proc First NCAI Stanford.CA. August,
1980. 26-30.
[4]
Williams. T. " D e p t h f r o m C a m e r a Motion in a Real
World S c e n e . " IEEE Trans
Pattern Analysis and
Machine Intelligence
v o l . PAMI-2, no. 6 (1980) 5 1 1 516.
[5]
Tsuji, S.. Osada. M and Y a c h i d a . M. " T r a c k i n g and
S e g m e n t a t i o n of Moving Objects in D y n a m i c Line
I m a g e s . " IEEE Trans
Pattern Analysis and Machine
Intelligence
vol. PAM1-2. no. 6 (1980) 516-522.
[6]
Roach. J. and A g g a r w a l , J. " D e t e r m i n i n g t h e T h r e e D i m e n s i o n a l M o t i o n a n d Model of Objects f r o m a
Sequence of I m a g e s " . U n i v e r s i t y of Texas at A u s t i n
L a b o r a t o r y f o r I m a g e and Signal Analysis r e s e a r c h
r e p o r t TR-B0-2.
[7]
McKee, J. and A g g a r w a l , J. " F i n d i n g t h e edges of t h e
surfaces of three-dimensional curved objects by
c o m p u t e r . " Pattern Recognition
vol. 7 (1975) 2552.
[B]
Kavasznay. L., and Joseph. H. " I m a g e p r o c e s s i n g . "
P r o c IRE no. 43 (1955) 56-57.
[9]
P r a g e r . J . " E x t r a c t i n g a n d Labeling B o u n d a r y Segm e n t s in N a t u r a l Scenes." IEEE Trans
Pattern
Analysis and Machine Intelligence
vol. PAM1-2. no.
1 (1980) 16-27.
NCAI.
[ 1 3 ] Collins, C. a n d Madey. J. " T a c t i l e sensory r e p l a c e ment."
Proc.
San
Diego
Biomedical
Symposium
Vol. 1 3 ( 1 9 7 4 ) 15-28.
REFERENCES
P e r k i n s . W. A. "A m o d e l based vision s y s t e m f o r
i n d u s t r i a l p a r t s . " IEEE Trans Comput
vol. C-27
(1978) 126-143.
First
[ 1 2 ] Collins, C.
" T a c t i l e t e l e v i s i o n : M e c h a n i c a l and
E l e c t r i c a l I m a g e P r o j e c t i o n . " IEEE Trans
on ManMachine System
no. 11 (1970) 6 5 - 7 1 .
C u r r e n t l y we a r e in t h e i n i t i a l stages of d e s i g n i n g a
s u r f a c e p r o c e s s i n g o r i e n t e d v e r s i o n of o u r s y s t e m syst e m along t h e lines d e s c r i b e d above, w h i c h is to be
i m p l e m e n t e d i n VLSI. This s y s t e m will b e c h a r a c t e r i z e
b y h i g h e r r e s o l u t i o n ( w i t h s e p a r a t e foveal a n d p e r i p h e r a l
r e s o l u t i o n ) , h i g h e r f r a m e r a t e s ( a p p r o a c h i n g 30 f r a m e s a
s e c o n d ) , s t e r e o p r o c e s s i n g , a n d extensive use of s u r f a c e
p r o c e s s i n g t e c h n i q u e s . I t w i l l d i f f e r f r o m o t h e r vision
s y s t e m s i n t h a t i t w i l l b e o p t i m i z e d f o r t h e sidewalk
e n v i r o n m e n t . For e x a m p l e , t h e s t e r e o s e c t i o n will n o t
have t o deal w i t h t h e s t e r e o f r a m e r e g i s t r a t i o n p r o b l e m
i n i t s g e n e r a l f o r m , b u t f o r the m u c h s i m p h e r case o f
e x t r a c t i n g t h e ( m o s t l y f l a t ) g r o u n d plane. Evidence i n d i cates t h a t the vision system of many animals (including
m a n ' s ) has a b u i l t in s p e c i a l case s o l u t i o n f o r e x t r a c t i n g
the ground plane, which is similar to our proposed technique.
[l]
Proc.
709