Tang, Dershuen A.; (1982)Verbal and Nonverbal Aspects of Machine Perception and Knowledge Representation in Artificial Intelligence."

VERBAL AND NONVERBAL ASPECTS OF MACHINE PERCEPTION
···e
AND KNOWLEDGE REPRESENTATION IN ARTIFICIAL INTELLIGENCE
by
DERSHUEN ALLEN TANG
A thesis SUbmitted to the Graduate Facul ty of
North Carolina State University
in partial fulfillment of the
requirements for the degree of
Ibctor of Philosophy
BIOMATHEMATICS PROGRAM
DEPARTMENTcOF STATISTICS
RALEIGH
1981
APPROVED BY:
;' }
/-1 .
/ ." o. ~ ... '-,
Co-chairman of advisory committee
ii
BIOGRAPHY
Dershuen Allen. Tang was born in China, 1946.
In 1967, he graduated
from National Taiwan University, Taipei, Taiwan. Republic of China.
He
entered the Graduated School of Purdue University at West Lafayette,
Indiana in 1969 and obtained an HS degree in Biology in 1971.
In SUDmer
of 1971 he enrolled in the Biomathematics Program at North Carolina
State University at Raleigh.
After many courses and part-time jobs, he'
became a Ph .D. candidate in 1978.
He 'has been a system engineer at
Research Triangle Institute since 1980.
111
ACKNOWLEDGEMENTS
I would like to express my sincere appreciation to my mentor at
Biomathematics Program, Harvey J. Gold, for his invaluable counseling
and support, for allowing me the opportunity to work on this topic and
for many inspirational discussions.
To my
Co-ehairman in
Computer
Science, Alan L. Tharp, for suggesting knowledge representation as an
area to study and for his advice and guidance.
To other members of my
committee, Wesley E. Snyder and H. R. van der Vaart, for their helpful
,
suggestions and
advice.
Wesley E. Snyder,
whose course in
Pattern
Recognition introduced me to the wondetful world of machine perception,
deserves special thanks.
G. Haidt
\
l
of
Research
I also wish to thank C'lris M. Davis and James
Triangle
Institute
for
their· support
and
encour agement.
To my wife and daughter,
whose love and
difference, I dedicate this thesis.
patience make
all the
tv
--e
CONTENTS
Chapter.
I.
INTRODUC TION
II.
SEMANTIC NETWORK
1
...
4
...
2. 1 Introduction
4
2.2 Criticisms of semantic netloDrks
5
2.2.1 The corresponding net manipulators are often not
explicitly defined' • • • • • • • • • • • • •
5
2.2.2 Entities in many SNs are represented in a
superficially identical manner • • • • • • • •
7
2.2.2. 1 Primitive links and nonpr1mitive links in many
SNs are represented in a superficially identical
manner . . . • • . . . . . . . . . •
7
2.2.2.2 Primitives on different levels are represented
in superficially identical manner • • • • • • • 9
2.2.2.3 Attribute-specifying links at concept nodes
and at instance nodes are represented in a superficially identical mannner • • • • • • • • 10
2.2.2.4 Attribute-specifying links at the instance
node and attribute-describing links at the
concept node are represented identically •• • 11
2.2.2.5 Attribute-specifying links and arbitrary
relation links· at an instance node are
represented 1n a superficially identical manner 12
2.2.3 A small set of uniformly applied case links • •
13
2.2.4 No structural body for concept nodes
• •
15
2.3 The structured-Inheritance NetloDrk
• • • • •
16
2.3. 1 Concepts 1n SI-Net
• • • • • • • •
16
2.3.2 Role description nodes • • • • .
• • • • 18
2.3.3 Structural conditions..
• • • • • • • • • • • 19
2.3.4 Instances of a concept 1n an SI-Net
• • • • • 21
2.3.5 SUbconcepts and brother concepts in SI-Net
• 22
2.3.6 The representational paradigm of SI-Net • • •
• 23
2.3.7 SI-Net and domain-specific lmowledge representation .27
III.
CHARAC1'ERIZATIOlf OF THE DICHOT(]lIY OF EXTEfttfAL WORLD KNOfLEDGE
REPRESENTATIONS • • • • • • • • • • • • • • • • ••
• 28
3.1 A representation is a representor/representee pair
3.2 Nonverbal representations are modality-specific
3.3 "SimilaritY" constraints on nonverbal external
• 30
31
v
representations • • • • • • • • ."' • • • • • •
•
3.4 Verbal representations are inter-preted • • • • • •
3.5 Verbal representations are lllDped while nonverbal
representations are graduated • • • • • • • • • • • •
3.6 Things denoted in verbal representations may not exist
nonverbally
• • • • • • • • • • • • • • •
3.6. 1 Things referred to are not currently existing in
front of us . . . . . . . . . . . . . . . . . . . .
3.6.2 Simple verbal labels may represent complex
experiences . . . . . . . . . . . . . . . . . . . .
3.6.3 Generalized concepts and abstract relations can
only be represented in verbal terms • • • • • • • •
3.7 Simple nonverbal experiences form the bas~s for verbal
descriptions • • • • • • • • ••
• • • • • ••
3.8 Verbal representations have syntax. for commtmication ••
IV.
32
35
37
38
38
39
41
42
44
FORMALIZATION OF THE VERBAL VERSUS NONVERBAL DICHOTOMY OF
EXTERNAL REPRESENTATIONS
• • • • • • • • • • • • • • 50
4. 1 External representations
• • • •
• • 50
4.2 Five cases of observability and tl«) types of perceptual
tasks
• • • • . • • • • • • • . • • • • . . • . • . • 52
4.3 The components of an IPS • • • • • • • • • • • • •
53
4.4 Sensory-driven perceptual sub system
• • • • • • • • 55
4.4.1 Transducers and feature extractors
• • • • 55
4.4.2 Transducer s and feature extractor s must have fixed
behaviors • • • • • • • • • •
. • • . . • 60
4.4.3 External scene must be stable to produce perceptual
contents as forced responses • • • • • • • • • • • 60
4.5 Figure-grotmd segregation and perceptual content • • • • 62
4.5.1 Guidelines for the segregation of figure and grotmd.64
4.5.2 Perceptual contents and memory traces
67
4.6 Positional parameter as a viewing context • • • • • • • 68
4.7 Perceptual similarity • • • •
• • • • • • • • • • • 70
4.7.1 Perceptual similarity and the positional viewing
par ameter • • • • • • • •
• • •
• • • • • • 73
4.8 Recognition task and LTM • • •
• • • • • • • • 76
4.8.1 CAR: Content Addressable Retrieval. • • ••
.79
4.8.2 Nonverbal recognition and nonverbal external
representations • • • • • • •
• • 80
4.8.3 LR: Link Retr ieval • • • • • • • •
• 81
4.8.4 Verbal Recognition and Verbal external
representation
• • • • •
• • • • • • • • 84
V.
AM EXAMPLE IN C(](PUTER VISION •
• • 86
5. 1 IPS. SlJ4P1..E • • . • . . •
. . • . .
. . . • . . . 86
5.2 Perceptual Subsystem of IPS. SIMPLE
• • • • • • • 86
5.2.1 Transducers, sensory positioners and external
scene s • • • • • • • • • • • • • • •
• • • • 86
5.2.2 Local feature extractors • • • • • •
• • 90
5.2.3 SRPC processors and figure proce·s~rs
• • • • 96
_
,.,
vi
5.2.3.1 segregation of WRPC into SRPCs • • • • • • • • 98
5.2.3.2 Moving to a new viewing position • • • •
100
5.2.3.3 Perceptual size and egocentric position
parameter s 0 f SRPC • • • • • • • • • • • • •• 104
5.2.3.4 Addressing information of processors in the
system • • • • • • • • • • • • • • • • • • ••.
105
5.2.3.5 Nonegocentric positional relationship between
SR Pes
••••••••••••••••••••
107
5.2.3.6 Updating stored SRPCs after moving to a new
viewing position • • • • • • • • • • • • • •• 109
5.2.3.7 Figure processors • • • • • • • • • • • • • 114
5.3 Perceptual content and perceptual context of an external
figure in IPS. SIMPLE • • • • • •
120
5.3.1 Perceptual network • • • • • • • • • • • • • • • 120
5.3.2 The scope of a node in PNet • •
• •
123
5.3.3 The set of intra-scope links
• • •• 124
5.3.4 The set of inter-scope links
• • • • • • •• 124
5.3.5 Perceptual content of an external figure
125
5.3.6 Perceptual context of an external figure
125
5.4 Perceptual similarity •• • • • • • • • • • • • • •• 126
5.4.1 Network isomorphism • • • • • • • • • • • • •
126
5.4.2 Relaxing the condition of network isomorphism •• 128
5.4.3 Discrimination and Recognition • • •
128
5.5 Long-term memory in IPS.SIMPLE • •
• • • •
129
5.5.1 Data storing process of LIM •
• • • • • • •• 129
5.5.2 Retrieval processes of LIM
• • • • • • •• 132
5.5.2.1 CAR in IPS. SIMPLE • • • •
• •
132
5.5.2.2 LR in IPS.SIMPLE • • • •
• • • • • •• 134
5.6 Verbal and nonverbal external representations • • • • 135
VI.
SlHMARY AND FtTrURE EXTENSIONS •
..
6. 1 SUlllDary •• • • • • • •
6.2 Future extensions
•. • • • • •
• • • • •
6.2.1 Future extensions in machine perception • • •
6.2.2 Implementation 0 f the resul t in advanced
semiconductor technology • • • • • •
• •
6.2.3 Explanation of phenomena in cognitive psychology
BIBUOGRAPHY
• • • • • • • • • • • • • • • • • • • • • • • • • ••
140
140
141
141
142
142
144
Chapter I
INTRODUCTION
The issue of knowledge representation (KR) has become the focal point
of current research endeavors in artificial intelligence (AI).
A recent
survey (Bractman & Smith, 1980) showed that most AI researchers seem to
accept as an undisputed first pr inciple that developing a computational
model of intelligent thought
representation
scheme of some
processes will
sort.
Beyond
involve constructing
this common
a
consensus,
however, the same survey indicates that there is much confusion and
disagreement as to what is being represented or is representable in what
form or even as to what constitutes a representation.
This thesis makes
an attempt toward answering that question by analyzing and formalizing
verbal and nonverbal aspects of perceptual knowledge representation.
The following is a brief overview of its chapters.
CiJe of the most popular forms proposed for KR has been the semantic
r'
net~rk
(SN)
representing
in
which
"names
pieces
of
knOWledge
of
are
concepts"
encircled
associated
by
in
nodes
labeled
links
r eneeting the knowledge of the relationship between these concepts.
However, many SN designers neglect to make
an explicit distinction
between the concept-structuring (syntax) aspect of the representation
scheme and the dQJlain-spea1f1c concepts themselves (the content part)
be1131 represented by the aggregates of nodes and links in a Slf.
This
resul ts in a representation scheme that seems to be too uniform and
structureless (beyond a spaghetti-like structure of linked nodes)
to
2
express the intensional structure of a complex concept in an Imambiguous
and extendable manner.
eloquently
pointed
(1975, "978, 1979).
This significant philosophical weakness has been
out
in
a
series
of
papers
by
Brachman
Chapter 2 reviews Brachman's points of view and the
solution that he proposed in his doctorate thesis (Bractman, 1978).
Chapter 3 turns our attention to the task of perceptual knowledge
representation.
With a moment of introspection, we would all come to
agreement that we seem to have an effortless experience in learning to
recognize everyday ordinary objects surrounding us as compared to the
way
we
learn
our
languages.
Thus,
the
form
of
the
internal
representation of perceptual knowledge in a ht.ll1an or machine memory has
been the subject of perennial debate in the literature of artificial
intelligence
and
cognitive
psychology.
~ould
such
internal
representations be in the form of "propositions" or must an additional
"nonverbal", "analogic", "isomorphic", "imagery" type of scheme also be
used?
This question is difficult to answer partly because many of the
terms
used
critically
in
the
debate,
such
as
"propositional" ,
"symbolic", "nonverbal", "analogic", "isomorphic" ,etc., are not defined
comprehensively or consistently.
Most often, readers must rely on their
intui tive understanding of these terms in order to grasp what is being
discussed.
Therefore, it seems to us that it should be a fruitful
research effort to make an attempt to define some of these terms in a
way that will help us to delineate the forms of representation that they
aTe used to describe.
Chapter 4 presents a formalization of the concept of verbal versus
nonverbal external representation.
This formalization is based on the
3
consideration
critical
that an
ccmponents,
external
namely,
a
representation
representor,
should
a
involve
representee,
three
and
an
information processing system (IPS) that is viewing the representation.
Intui tively, a nonverbal representor represents its representee by being
perceived as similar in its appearance to that of its representee.
Crt
the other hand, a verbal representor can be learned to represent any
thing whether there is a similarity perceived between them or not.
Therefore, the formalization focuses on how an external object can be
perceived by a given IPS; what is its perceptual content; in what way
that content can be compared for similarity with others; how these
contents are stored in the memory of the IPS: and how they are retrieved
in a perceptual task.
In
Chapter 5,
the concepts formalized
in
exanined, using an example in canputer vision.
Chapter
4 are further
Many implementational
and conceptual detail relating machine perception will be illustrated
and
discussed.
Chapter
6
directions of future research.
gives
suggestions
as
to
the
possible
e
4
Chapter II
SEMANTIC NETWORK
INTRODUCTION
2.1
As computer scientists and artificial intelligence (AI) researchers
have strived to enable computers to understand natural language (NL)
inputs. they have come to realize the great importance of the world
knowledge inside the comprehender's _ htman's or machine's - mind or
memory (Winograd. 1972; Minsky. 1975; Goldstein and Papert. 1977; Bobrow
and Winograd. 1977a; Schank. 1973. 1975; just to name a few of the more
notable papers on this sUbject). Therefore.
been made
represented
in
studying
in
representation
a
how such knowledge oould
machine's memory.
systems
tremendoUS efforts have
are
Among
QJlllian' s
the
be
and
well-known
classic
should
be
knowledge
semantic
network
(Quillian. 1967. 1968. 1969). Schank's conceptual dependency t.heory and
script theory (Schank,
1977).
1973. 1975; Schank and Abelson,
Minsky's frame theory (Minsky.
KnOWledge
1975)
Representation Language (KRL)
1977; Lehnert.
and more recently.
(Ibbrow and Winograd.
the
1977a.
1977b) •
Earlier.
the
predicate
caloulus
(Nilsson.
1971)
knOWledge representation but was graduallY absorbed
g-eneralsystems. such as the parti t10ned
was
used
for
into later. more
semantic
netlo«)Tk (Hendrix.
1975) and the extended semantic network (SchUbert.
1976). There also
have been controversies on whether to represent knowledge prooedurally
5
(Winograd,
1972,
or
1973)
declaratively.
The
advantages
and
disadvantages of both representations have been discussed by Winograd
(1975), Hendrix (1977) and Anderson (1976).
cae of the knowledge representation formalisms that has been studied
and used widely is semantic network (SN). Semantic net\tlOrks which are
basically labeled
directed
graphs,
were
fir st
formally proposed
by
QJillian (1967, 1968, 1969) as an attempt to reach the goal of allowing
\
"representation (in memory) of anything that can be stated in natural
languages." (QUillian, 1969, p.460). It has gone through many stages of
improvements
expression~
for
and
extensions,
(Shapiro, 1971,
expressing
logical
e.g.,
SN
with
1977; Schubert,
quantifications
predicate
calculus
1976), SN with partitions
(Hendrix,
1975) ,
SN
with
partitions for handling focus in discourse (Grosz, 1977), SN using case
grammar representations (Simmons, 1973).
CRITICISMS OF SEMANTIC NETWORKS
2.2
recently, Woods (1975) and Brachman (1977,
~re
made
significant
representation
and
fundamental
formalisms.
Following
criticisms
is
a
of
1978a,
SNs
as
1978b) have
knOWledge
brief st.mmary of
their
remarks.
2.2.1
The corresponding
defined
~
manipUlators
~
often
~
explicitly
Understanding anything whether it be a verbal message or a nonverbal
visual scene is a tas.k which consists of processes capable of producing
appropriate responses (internal or external) to an input.
static
structures such as
SNs can
Passive and
not be the only elements of an
~
6
understanding system (Hendrix, 1977). !here must be a corresponding net
manipulator which is capable of processing the network in tasks related
to
understanding
processes, such as accessing
the net
to
retrieve
relevant information stored, or modifying parts of the net as internal
responses to the new information understood and extracted from the given
input. More importantly, the semantics of SN representations themselves
depend on the interpretation of the primitive elements (see sections
2.2.2.1 and 2.2.2.2) by the net manipulator with a set of rules (also
frequently embodied in the processing routines of the net manipulator)
for combining them into non-primitive terms (arachman, 1978a, p.30).
Authors of many SNs have often described' the semantics of their
representations intUitively. As the network notations grew in complexity
so
as
to
handle
input
sentences
with
more
complicated
meaning
structures, more and more of the assumptions about the corresponding net
manipulators were left to the reader's imagination.
compounded
when,
as
in many
SNs,
entities
(links
!he problem is
and
nodes)
with
fundamentally different imports to the net manipulator are represented
in a misguided uniform manner (see the following subsections) and offer
no . help in disambiguating the
different underlying net manipulating
operations required for each type of entity.
Entities in many SMs are represented
identicaimanner - - -
~~
superficially
In alaDY SIs, no special markers exist to distingUish different types
of links (or nodes) that require fundamentally different processings by
the net manipulator.
Some SNs use different shapes (squares or circles)
to enclose different types of nodes, or use different special markers
7
such as
'*'
to precede the label names of different types of links.
not so in many other
SNs, and
for
But
e
these SNs the corresponding net
manipulators pr eSllDably have to recognize the entire label as a
keyword
in the system (perhaps checking them against a symbol name table or
lexicon) when it is necessary to perform different operations depending
on
the
types of
the
entity.
Following
are
several
specific
and
important cases to illustrate the problem.
2.2.2.1
Primitive links and nonprimitive links in many SMa are
represented in a superfioially identioal .anner
cne prevalent and intuitive notion about a concept node is that it
denotes a class 1 of individual objeots in the world, all sharing a set
of common properties or attributes.
links such as the COLOR links in
Thus, one often sees in many SNs
Figure
1 that are used to attaoh
various kinds of attributes to a concept node as well as links such as
the ISA links in Figure 1 that are used to point to the instanoes or
member s
0
f that concept.
Although the COLOR attribute links and the ISA membership links share
the same superficial form (link wi th a label), there are
and fundanental differences between them.
significant
First of all, attribute links
such as COLOR links are concept- and domain-specific; that is, the COLOR
-_._-,---Most SNs also tacitly assume that a concept node denotes more than
ju.st the exterlaion or that ooncept. 'Ihat La, not only should the
OOTSe.,t node link to all of itl! aotual member:!, but aJ.:so itsbolAd
somehow captur e intensionally what it takes to be a (potential or
actual) member of that c.oncept. See Woods (1975, P .149) and Braohman
( 1977, p. 138, 1978b, p. 84) for a detail discussion on the implication . of this sUbtle and implicit shift from extensional class nodes to WI'
intensional concept nodes.
8
attribute mayor may not be a relevant attribute of a given concept
depending on what specifically that concept is.
COLOR links are domain-
specific; its inclusion depends on what domain the
consider.
S~
is proposed to
Q1 the other hand, the membership ISA link is relevant for
all concepts in all SNs.
All SNs proposed in the literature thus far
have this membership link in many disguises such as "ISA", "MEMBER/OF",
"INSTANCES/OF", etc.
The universal importance of membership links lies
in
they signal
the
notion
that
to
the
net manipulator
the
basic
operation (implicit in many SNs) of "property inheritance" (&'acanan,
1978b, p.83, p.90,
p~91)
its instance nodes.
to be performed between the concept node and
Thus, T1 (or T2) in Figure 1, being
an instance of
the concept TELEPHONE can be said to have the link <T Hor T2) COLOR
BLACK>2 inherited from its concept node.
COLOR are inheritable links,
inheritable(&'acanan,
special
1978, p.90).
While attribute links like
links like
ISA should
not be
These special links are primitive
1 inks which themselves are not r epr esented wi thin the same SN by other
net\o«)rk entities.
Their meaning comes directly from their own special-
purpose interpreting routines in the corresponding net manipulator.
Q1
one hand, links like COLOR are just as conceptual as the concept node
\
TELEPHONE, but they happen to be relational.
Figure
1)
There are situations (see
where they should be represented as nodes and be pointed to
from other nodes by appropriate links (Bracanan, 1978, p .88),
designer
The SN
should carefully delineate the primitives of his SN (using
2 Here and in the following text whenever it is more convenient to do
so, we use the common and eqUivalent lexical form:
<source-node-label link-label target-node-label>
to represent the corresponding NODE/LINK/NODE triple in a SN graph.
9
special characters as prefix, etc.) so that the reader can jUdge what i S e
being
represented
manipUlator is not
by
what,
~xplioitlY
especially
when. the
oorresponding
net
defined.
COLOR
ISA
COLOR
Figure 1:
2.2.2.2
A simple SN showing a prevalent way of using attribute and
membership links
Prlllitives on different levels are represented in
superficially identical .anner
Whether an entity in an SN is primitive or not depends ·on whether it
has its own special-purpose interpreting routines in the corresponding
net manipUlator. Therefore, there is no reason to prevent one from
ma1d.ng attribute links (like the COLOR link in Figure 1) primitive links
in his SN.
such
as
c.ne prominent exaaple is the set of oonoeptual primiti:ves,
INGEST
and
PROPEL,
proposed
by
Sahan~.
Fbwever ,
these
primitives are on a level differen:t from those primitives such as ISA
links.
The
former
are conceptual
primitives that label
particUlar
independent concept-structuring primitives that serve as markers to the
net manipUlator on how to build or interpret a concept in terms of other
10
enti ties in the same net regardless what
concept has.
~rds
the
Their differences are somewhat like those between content
and function words in English.
pr 1mitives
the specific meaning
of
different
levels
in
Again one should delineate these
one·1 s
net
since
they
require
fundamentally different handling mechanisms in the net manipulator for
their semantics.
2.2.2.3
Attribute-specifying links at concept nodes and at instance
nodes are represented in a superficially identical mannner
In section 2.2.2. 1, we note that <TELEPHONE COLOR BLACK> implies that
<T1 COLOR BLACK> through property inheritance.
Many SNs
~uld
also
attach the same COLOR link to T1 to explicitly represent the fact that
T1 is black.
fi:)wever the same attribute link means something different
depending on whether it attaches to a concept node or to an instance
node.
Thus, <TELEPHONE COLOR BLACK> means that for all i such that
is an instance of TELEPHONE,
n
is black.
Q1
n
the other hand <T 1 COLOR
BLACK> means simply that T1 is black.
To
use the
instance node
same attribute link at the concept node and
~uld
at the
result in an unnecessary local ambiguity, which
needs to be resolved by carefully considering the context (in this case,
by examining whether a given node is attached to a concept/class node or
its instance).
2.2.2. ~
_trillut.....,.ifyinc links at the lnstUCM IICH:M aDd attrUtuteducr1b1ng l11'Jka at the concept node are represented
identically
The COLOR link in the example of previous sections is an attributespecifying link (whether it is attached to a concept node or an instance
11
node) •
~wever,
in many other situations, it is
0
ften more desirable
for an attribute link at a concept node to circtmscribe a range of
l.egitimate values rather than specifying a single value (and thus demand
that the same value of that attribute for all of its instances).
In
many SNs, this attribute-describing link at the concept node and its
corresponding attribute-specifying link at its instance nodes are often
represented identically.
For example, in Figure 2, the same HEIGHT link
is used in an attribute-describing sense (describing a range of values)
at the concept node HUMAN as well as in an attribute-specifying sense at
its instance node JOHN.3 Note that the mechanism of property inheritance
here requires that the HEIGHT attribute link at the instance node does
not inherit the whole range of that attribute as described
concept node.
that range.
at its
Rather, the instance should specify a single value within
~
Note also that when a concept has more than one of these
attribute-describing links, a subconcept (i.e., subclass) can be formed
by selectively instantiating a nunber of these attributes while leaving
others unchanged.
This serves to illustrate that property inheritance
is more than a simple copying mechanism and ambiguous net notations will
cause the corresponding net manipUlator to be very difficul t to design.
3 In statistical terms, one can say that HEIGHT at the concept node
specifies a popUlation parameter, while the same HEIGHT link. at the
instance node specifies an individual observation value. fot>st
statisticians ~uld not use the same label to represent these t~
conceptually different denotations.
e
.12
8-----lH~E=-=I~GH~T~ _ _....__'03
E(=HI:.::T
Figure 2:
I
,
FT-8
n)
HEIGHT
SN with attribute-describing links and attribute-specifying
links
.
2.2.2.5
Attribute-specifying links and arbitrary relation links at an
instance node are represented in a superficially identical
.anner
As pointed out by Woods( 1975, pp. 53-54), in many SNs, attr ibute links
such as the HEIGHT
links in Figure 2 that specify criterial properties
of an individual take the same superficial form as the HIT link in
Figure
2
which
individual.
denotes
incidental
relationships
involving
that
The former is a necessary property that the individual must
satisfy or possess in order to be an instance of its class concept,
while the latter is a relationship in which the individual mayor may
not participate.
is
As pointed out in
a domain-specific
represented
as
a
relational
node.
a previous section, HIT like COLOR
concept
Although both
and
as
links
such it
help
to
should
be
describe
an
individual, they are of different types and contribute differently to
the possible inferences the net manipulator makes from the net.
also
behave
inher itance.
differently
in
the
complex
processes
of
They
property
The net manipulator should be able to infer that JOHN
being a m·ember of HUMAN in Figure 2· must have
specific value lB." not be knOtift at the time).
HEIGHT (although a
It would not be able to
infer that JOHN must have HIT someone since that is not necessarily
required
as a member of HUMAN.
The
type of relationship that HIT
13
denotes is ftmdamentally different from that of an attribute such as
HEIGHT.
From
the above examples, it is evident that there are different
types of net
entities in
an
SN.
Any SN notati0 rl
should
provide
tmambiguous indications of these different types for the benefit of the
tmderlying mechanical net
manipulator.
"It is not sufficient to leave
1t to 1ntui tion of the hl.ll1an reader, we must know how the machine will
know to treat these different types of links correctly" (Woods, 1975,
p. 54) •
2.2.3
A small set of uniformly applied
.=!!!
links
One popular way to represent relational verb concepts like HIT in
Figure 2 as concept nodes instead of links 1s to use the case structures
developed by ·the linguist Fillmore. He argued that every noun phrase 1n
English can be related to the governing verb in a sentence by a small
nunber of cases such as AGENT, OBJECT, SOURCE, INSTRUMENT, GOAL, etc.
Thus in Figure 3,
the sentence "John
hit Mary with a hammer"
represented in SN notation with case structures.
Figure 4 is the sentence "A dog bit
The case links in these
ASSAILANT, VICTIM.
t~
is
AS another example,
a postman." with case structures.
figures are ACTOR,
OBJECT,
INSTRtJotENT,
Note the similarity between a case link 1n a verb
concept and an attribute link 1n a class concept or 1ts instance nodes
discussed in previous sections.
concept,
its
aseoc1Cfted
If a node were thoUiht of as a verb
att.Tibute/value
p.a1Ts
could
aa.s1~y
be
case/filler pairs( Bracllnan, 1978a, p. 12).
The intuitive attraction of a case structure is that perhaps a small
14
G~A~
@
ISA
~
~ACTO
OBJECT
Figure 3:
"John hit Mary with a hammer." represented in an SN with case
structur es.
~ITE)
~os~0
ISA
ISA
VICTJM
ASSAILANT
Figure 4:
"A dog bit a postman." adapted from Hendrix (1975).
nl.lDber of case structur es can be defined in such a way that they are
universally applicable to any verb concept, which is then expressed in
terms of an appropriate selection of these cases.
The meaning of an
input sentence can be analyzed by sorting out the noun phrases according
to what case slot of the governing verb concept each noun phrase can or
should fill.
In addition, if there are case slots left
unfilled, the
system may launch an effort to fill them by looking back and checking to
see if any previously entered information can be a plausible filler.
this may keep the
infonDation
system on
the alert in
that may be considered
for
the
those
future
unruled
for
case
Or
any new
slota.
(Actually, the same intuition is applied in the use of attribute links.)
e.
This sounds very' attractive but in practice it is very difficult to
explicitly define a small set of cases all at one place and expect them
15
to be appl icable across the board to any verb concept that uses them.
Many SN designers who adopt the above simplistic approach do not leave
any flexibility so as to allow local concept-specific variations to be
made at the site of each verb concept.
They usually use a single case
link to capture t"., distinct aspects of a case; namely, the range of
values that can fill a case slot, and the
function role that each case
serves in the verb concept, usually as suggested by the name of the
case, e.g., AGENT, INSTRUMENT (Brachman, 1978b, p.93).
This forces us
to believe that the possible range of values can be defined together
"
wi th the functional role of that case once and for all at the global
level; e.g., that every ACTOR is a HUMAN,. regardless whether ACTOR is a
case of the verb HIT or of the verb BITE.
Because of the similarity between case links and attribute links, the
above criticisms also apply to the way attribute links are used in many
SNs.
2.2.4
No structural body for concept nodes
From the most recent sections, it is clear that in many SNs, a
concept
node
is described
as a
collection of attribute/case
(together with other concepts which sit at the other end
links).
Many SNs stop at this point.
links
of these
There are no further descriptive
structures to specify how these parts and properties go together in a
Gestal t
sense for the concept in question.
Ct'neept KICK ha:s an animate !CTOll..
Just la10wing that a verb
JOHN. and uum1mate OBJECT..
BALL.
juxtaposed together gives no clue to the actual and specific process
which takes pl ace between them.
What makes a concept LOVE as
0
pposed
4It
.. e
16
to any other verb concept with a similar case? (Brachman, 1978a, p .29)
In general, the representation of a concept as
a structureless list of
attribute/case links apparently independent of one
•
representing a sUbroutine in a program by giving
arguments.
without
a
procedure
body that
would
another
is like
only a
list of
specify how these
arguments interact With one another and with other variables in the
system
so
as
to
produce
resul ts.
One
needs
concept-independent
notations that one can use with each concept to express these Gestalt
structures, so as to make a concept description complete.
2.3
THE STRUCTURED-INHERITANCE NETWORK
Brachm"an (1978b) proposed a "Structured-Inher itance Network" (SI-Net)
as his response to the deficiences of previous SN vague notations.
Following is a brief sketch of the SI-Net.
See Figure 5 for a graphical
illustration of his notations.
2.3.1
Concepts in SI-Net
A concept in SI-Net is operationally defined as a node with:
1.
a collection of outgoing DATTRS links, each of which points to a
role description
node
(see
section
2.3.2).
The
DATTRS/role-
description-node pair serves to describe one attribute/case of
the current concept.
Note that attribute links and case links
are treated identicallY in
2.
an SI-Net.
acoll-eetion of outgoing STllUC!UBE links 'Mbich explicitly express
the interactive relationships between these attribute/case roles
(see section 2.3.3).
17
MODALITY
. - - - - - - NECESSARY
MODALITY
DERIVED ----......
1
1
ROLENAME
"""'----- "LINTEL"
ROLENAME
"VERTICAL/
CLEARANCE"
"UPRIGHTn
2
MODALITY
'NECESSARY
,
ROLENAME
" SUPPORTER"
ARCH59
T
A
B
C
3
FT
1
Figure 5:
Concept nodes and instance nodes in SI-Net( adapted from Fig
4.5 and 4. 10 in Erachman( 1978b»
18
·e
3.
a collection of outgoing tMODS and DIFFS links pointing to role
description
nodes and
to
DINSTS links pointing
ROLE nodes.
These links are used instead of DATTRS links when the current
conc.ept is a derived one, such as a subconcept derived from a
superconcept, or a brother concept derived through analogy from
another concept, by modifying the corresponding role description
nodes of the superconcept or another concept.
2.3.2
Role description nodes
A role description node at the arrow end of a DATTRS link describes a
part (as an attribute/case) of the internal portion of a concept.
It
contains the following information:
1.
the name and definition source of the function role
that part.
played by
It consists either of the following:
a) a ROLENAME link pointing to a literal string, such as AGENT,
UNTEL, etc., when the current role function is not defined
elsewhere.
In
this case, the role function
is defined by
being embodied in the struc ture of the current concept.
b) a ROLE link pointing to another role description node of a
more general concept (not necessarily a superconcept of the
current concept) where such a role function is defined.
2.
a range of legitimate entities that are considered acceptable as
fillers of that role.
(V/ft)
link pointing
This is indicated by a "VALUEI RESTRICTION"
to al1Qther concept node
circumscribe that range.
~ich
serves to
19
3.
the nllDber of entities that are expected or required to fill that
role in an instance of the current concept.
4.
a
"MODALITY" indicating
concept as a whole.
the criteriality of the role to
the
Currently three values NECESSARY•. OPTIONAL
or DERIVED can be specified for this link.
(It seems that these
values are primitives in SI-Net, to be interpreted by special
routines of the net manipulator.
This is not made explicit by
&'achman( 1978b). but is evident as these values are not dressed
as a concept (please see Figure 5).)
Note that DATTRS links and role description nodes provide a mechanism
to define (when necessary) an attribute/case right at the site of a
concept.
A1 so. the two different aspects of role function and the range
of potential
fillers are
separately expressed.
These
nexibl1ities
allow the expression of any local variation of an attribute/case- an
.
inevitably needed mechanism as discussed earlier in the criticisms of'
previous SN notations.
2.3.3
Structural conditions
A structural condition (S/C) of a concept in SI-Net is a node pointed
to by a STRUCTURE link.
It serves to explicitly capture how the various
role filler s of a concept go together to form the "Gestalt" of that
concept.
It
is
expressed
in
terms of other
concepts
that
exist
elsewhere in the net; thereby captures the way how we describe what we
kn4w in terms of other concepts we are familiar with (&-ae1'lllan, 1978b,
p.67)'
For exanp1e, in Figure 5. the structural condition "SUPPORT"
expresses the
fact that
U~RIGHTS
in the concept of an
ARCH should
e
20
·e
support the LINTEL.
It uses the concept SUPPORT pointed from the SIC
node through a PARAINDIVIDUATES link.
In Figure 5, the SUPPORT as a
concept has as its internal parts a set of role description nodes- (at
the end of DATTRS links).
A1 though not expressed in Figure 5 (and not
specifically mentioned by Erachman). SUPPORT should also have its own
SIC node that tells how to decide "things being supported" are supported
by a "supporter" in a general way.
Here we can think of the SUPPORT
concept as being a general procedure which is called by the SIC node of
the
ARCH concept
to check whether
the
LINTEL role of a
potential
instance of an ARCH is indeed supported by the UPRIGHT roles of that
instance.
In a sense. the SIC of an ARCH is a parametrized instance
(hence the name "PARAINDIVIDUATES") of the SUPPORT concept with the
potential
filler
of
the
UNTEL
role
of
an
ARCH
filling
the
"thing/supported" role of the SUPPORT concept and with the potential
filler s of the UPRIGHT roles of an ARCH filling the "SUPPORTER" role of
the SUPPORT soncept.
The COREFVAL links -(see Figure 5) establish the
correspondences (bindings) of the arguments of the SUPPORT concept with
the roles of the ARCH concept.
Note that the role description nodes of
the ARCH concept only parametrize the arguments of the SUPPORT concept
rather than actually provide specific values for the SUPPORT concept to
execute upon.
In other words. while the correspondences of arguments
are made at the concept level. there is no actual execution of the
SUPPORT concept at that level.
Execution of the SUPPORT concept comes
only wben an 1nstaatlat1on of the ARCH concept is being attempted (With
specific values for those arguments supplied by the instance).
Role description nodes and SIC nodes together provide a mechanism to
describe the internal structure of a concept.
21
2.3.4
Instances
~ ~
concept in
~
SI-Net
An individual instance of a concept in· an SI-Net is a node with an
"INDIVIDUA TES" link pointing to that concept node.
In order to be a
member of that concept, the instance must have an internal str ucture
which corresponds to that of its concept in a manner required by that
concept.
This c.orrespondence is expressed by a set of ROLE nodes, each
of which is pointed by a DINSTS link from the instance node.
Each of
these ROLE nodes has:
1.
a ROLE link pointing to the corresponding role description node
in the concept, of which a specific filler is provided by this
ROLE node in the instance.
2.
a VAL link pointing to a node which denotes the specific value
that is to be the filler of the role in this instance.
See
Figure 5.
In addition, the SIC of the concept would be activated at the time
when this instantiation is being attempted, to check to see whether the
structural interrelationships among the roles are satisfied.
Note that now a cable of links (INDIVIDUATES link plus a set of ROLE
links) exists between a concept and one of its instances, instead of
just a single ISA link.
internal
This serves to make sure that the entire
structure of a concept is considered as a whole either in
property inheritance or 1n checJd.n.g out poten.t1al in.sta:Dces.
-e
22
2.3.5
Subconcepts
A concept
can
~
be
brother concepts
a
subconcept
~
S1-Net
of an
existing
concept
(called
superconcept relative to its sUbconcepts with a DSUPERC link pointing to
the
superconcept from
the
sUbconcept)
through a combination of the
following oprations Orac1'Jnan, 1978b, PP.77-78):
1.
restriction-the role description nodes of the subconcept further
restrict the range of the
potential fillers specified by the
corresponding role description nodes in itssuperconcept.
1his
is expressed by a [MODS link from the subconcept node to a role
description node which provides this further restriction.
A ROLE
link pointing from this role description node in the sUbconcept
to the corresponding role description node in the ,superconcept
e
indicates which role in the superconcept is thus restricted.
2.
role differentiation-a role of the superconcept may have several
subroles
that
sUbconcept.
are
to
A set of
be
explicitly
DIFFS links
distinguished
each pointing
to
description node describing a subrole in the subconcept.
in
the
a
role
Again,
ROLE links are used to associate this role description node to
the corresponding role description node in the superconcept, from
which the differentiation originates •
3.
•
particularization--rather
potential
fillers of a
than
role,
alter
the
the
description
of
the
subconcept may selectively
instantiate some (but not all) of the role descriptions of its
supeTconcept (e.g., a REDHEAD is a peT son whose hair color is
RED).
This operation is the same as used in individual instances
described in the preceding section and thus the same DINSTS link
23
is used to indicate the
particUlar value for
the role.
particularized role is considered to have been filled
The
for all
subconcepts further down from this sUbconcept. The value of the
filler is itself inherited directly.
In the case of brother concepts, we consider concepts that can be
derived from another by analogy (Brachman, 1978b, p. 79) •
Most aspects
of the two brother concepts are assumed to be similar, with only the
ones that are different being explicitly pointed out.
51-Net uses
DBROTHERC link to point from a concept to its brother concept, all of
whose role descriptions and role instances are to be inherited intact,
except
for
those explicitly pointed
to by ROLE links.
Instead of
modifying or particulariZing the -roles of the brother node pointed to
with ROLE links, however, the new brother applies its
~OD5,
D1FF5 and
D1N5T5 to the superconcept of its brother (Brachman, 1978b, p.79).
2.3.6
The representational paradiS! of 51-Net
To summarize the
underlying
philosophy of Brachman' s
following are excerpts taken from his thesis(8r'achman,
51-Net, the
1978b, PP.5-6,
pp.60-61, pp.282-28Q):
1.
The representation of structured objects •
•
Most of the everyday-life objects that we perceive have internal
structures With parts that are also perceivable and describable.
A small set of "universal" relationships such as the case links
used in older stts
1~
not adequate Ul harl41e all of
relationships between the parts of an obj ect.
the possible
The approach that
51-Net takes is to define a concept as a set of function roles
24
tied
together
with an explicit structuring
interrelationship,
buil t from other concepts existing in the same network.
This
allows the definition of the parts and their relationship at the
site
of
each
concept
with
the
emphasis
on
the
structural
condition which describes how the internal parts of a concept
interact in a specific manner so as to form the "Gestalt" of that
concept.
2. _ Deriving new concept from old.
Understanding involves the ability to perceive a new structure in
terms of already known concepts.
For example, new concepts can
be defined by relating the similarity of their parts to those of
other concepts, or by fitting together new types of parts in ways
defined by known relationships.
In
any case, there are many
types of definitional connections between concepts that must be
explicitly accounted for by an adequate representation.
In SI-
Net, the primitive links expressing binding (DINSTS, ROLE, VAL)
provide a precisely-defined
individuation
facility; since the
individuation mechanism is defined in terms of primitive links
only, it is always clear how to derive an individuation from a
concept (even a new, unanticipated one).
modification
mechanism for
links
(DMODS,
DIFFS)
By the same token, the
provide
an
unambiguous
forming sUbconcepts, again well defined for all
concepts, existing- or potential.
3.
---
The representation of idiosyncratic interpretations.
Much of our knowledge is incomplete, vague, or stylized and to
perform
intelligent activities a
knowledge-based
program must
25
allow for
flexibility in. the definition of the concept.
Each
person has an idiosyncratic understanding of the concepts s/he
knows; a representation must provide the means to express a
concept in terms of the current set of concepts available in a
particular
data
primitives").
base
(not
terms of
Lbiver sal
"knowledge
By allowing the structural condition to define the
relationships
between
combinations of other
mechanism
in
for
the
roles
in
terms
available concepts,
expressing
idiosyncratic
of
SI-Net
arbitrary
provides
interpretations
a
of
concepts; therefore, there is no need to insist on a "canonical"
interpretations in terms of a set of predetermined knowledge
pr imitives.
(A prominent example of canonical
Schank's conceptual dependency theory.
See Woods( 1975, pp. 45-48)
for a criticism on canonical approaches.)
important
in
the
docl.lllent-consul ting
understanding of references and
approaches is
This is especially
danain,
where
the
particular topics varies with
experience and exposure -to other parts of the literature.
4.
Representation with conceptual well-formedness.
The major difference between an
constrained
repertoire
of
link
relationships that they represent.
SI-Net and older
types
and
the
SNs is the
particular
The onl y thing that the user
defines using this scheme is the set of his danain-specific nodes
in the netiIDrk.
He cannot create new link types, noT' can
construct arbitrary groupings of links at
node~.
he
'Ibis is b«ause
the nodes are typed and each node type has a fixed syntax for
links
that can
emerge
from
it.
This
guarantees
consistent
26
interpretation by network processing routines, and gives us a
criterion for conceptual well-formedness.
Thus, in summary, many SN designers, in Brachman's opinion, neglect to
make an explicit distinction between the concept-structuring (syntax)
aspect of the representation scheme and the domain-specific concepts
themselves (the content part) being represented by the aggregates of
nodes and links in an SN.
seems to be too
This resul ts in a representation scheme that
uniform and
structureless
(beyond a spaghetti-like
structure of linked nodes) to express the intensional structure of a
complex concept in an unambiguous and extendable manner.
To remedy this
weakness, he proposed a set of concept-structuring primitives in which
to embed any definition of domain-specific concepts, in order to ensure
the
consistency
and
well-formedness
of
the
representation.
These
concept-structuring primitives can be called the syntax of SI-Net in the
sense that they are not used to directly represent or correspond to the
structure
of the
concepts
themselves but
structure of the representation language.
a full
rather
they spec.ify
the
Thus, we seem to come around
turn of a phd.losophical spiral as we recognize the need
to
establish a new set of syntax rules for SN while the original goal is to
represent the meaning of 'lerbal messages independent of the syntax of
the surface language.
27
2.3.7
SI-Net
While
~
Brachnan
provides
domain-specific knowledge representation
claimed
his
foundation
solid
a.
that
SI-Net
representation
cone ept-str uc tur ing
the
at
paradigm
("epistemological") level, he acknOWledged that with no domain concepts
as "primitives", the question arises as to how a program using this type
of knowledge structure might "get started"
conceptual
knowledge
of
the
domain
with
-- how it
its
mechanlsms (BractIDan, 1978b, P .282 footnote).
that
procedures
directly
accessible
from
\o«)uld
low-level
relate
perceptual
He proposed in passing
the
net
as
structural
conditions \o«)uld reflect our direct correlation of certain predicates
wi th . perceptual
mechanisms
(e.g Of
"red",
"sweet",
etc.)
and
thus
account for the basic domain-specific concepts ("knowledge primitives")
needed to start the otherwise circular definitional mechanism (BractIDan,
1978b, P .296).
perceptual level
chapter.
It is the aspects of knowledge representation at this
that
we
shall
turn
our
attention
to
in
the
next
It seems that the external representation of h\IDan knowledge
at this level can be roughly classified into t\o«) types: verbal and
.
nonverbal •
Nontechnically,
nonverbal
knowledge
representations
are
those that can be conveyed and understood regardless of what language
the comprehender speaks.
Verbal knOWledge representations on the other
hand are those that can be understood only when the comprehender speaks
the same language in which the representations are imbedded.
chapter, several contrasting differences between these tll)
knOWledge repre-eentations are ftllllined.
In next
types of
e
28
Chapter III
CHARACTERIZATION OF THE DICHOTatY OF EXTERNAL WORLD KNOWLEDGE
REPRESENTATIONS
In the previous chapter, the semantic networks(SN) as a formalism to
represent knowledge for understanding natural languages was discussed.
Criticisms
of
SN
as
such
Inheritance Net(SI-Net)
also introduced.
were
given
and
Brachman' s
Structured
as his way of answering these criticisms was
l'bwever, the SI-Net formalism deals with the problems
at the concept structuring level; there is still no clear indication in
the formalism on how specific knOWledge of the external world should be
represented at the perceptual level.
It is through these perceptual
processes that man or machine comes into close contact with the detailed
phenomena of the external
world.
Without direct linkages to
these
perceptual processes, knowledge about real objects in the external world
expressed in any formalism, no matter how logically so.und it is at the
concept structuring level, would seem sparse, detached and circular.
The form of the internal representation of perceptual knOWledge of
the external
world in
a hl.lDan or machine memory is the SUbject of
current debate in the literature of cognitive psychology, computational
lingUistics
and
artificial
Should
intelligence.
such
internal
repreaetatations be in the rom or lIpropositions"or must an additional
"nonverbal", "analogic", "isomorphic"or "imagery" type of scheme also be
used?
(For
contrasting
Pomerantz( 1977)
and
sl.IDmaries on
Pylyshyn( 1978);
the
debate,
also
see
see
Kosslyn
and
Fi sc hl er ( 1978) ,
29
Kosslyn(197S), Palmer(197S), Pylyshyn(197S), Sloman(197S), Chafe(197S».
This question 1s d1fficul t to answer, at least partly because many of
the
terms
tha·t
"propositional",
"isomorphic"
are
used
"verbal",
critically
"S~bolic",.
in
the
debate,
"nonverbal",
"analogic"
are not precisely or operationally defined
current context with respect to the question posed.
such
as
and
Within
the
tbst often, readers
must rely on their intuitive understanding of these terms in order to
grasp what is being discussed.
Now, if we turn away from the subject of internal representational
forms inside our memory and look at the external representations that we
see, hear or otherWise sense, then
external
representations
of
a
it is a common observation that
visual
scene
(for
example)
can
be
dichotomized into two extreme types:
1.
a so called "verbal", "symbolic", "linguistic", "propositional"
type, e.g., a verbal description of the scene in Fnglish;
2.
a so called "nonverbal", "analogic", "isomorphic" type, e.g., a
photograph of the same scene.
A
similar
dichotomy
exists
for
patterns in the audio modality.
generally denote
the
first
external
representations
of
sound
In what follows, the word "verbal" will
type
of
the
dichotomY,
and
the
word
"nonverbal" the second type.
It is our common experience of this dichotomy which leads to our
intU1 tive understanding of these contrasting terms.
It se_s to trigjer
the afore-mect1oned debate by beg.g1ng the -QW!.st1on of whether a roughly
parallel dichotomy can be extrapolated
knowledge
representations.
Ibwever.
into our
before
we
internal
use
these
system of
terms
in
30
arguments about internal
representation types, it is appropriate to
closely examine the differentiating characteristics that contrast these
two extreme types of external representations.
These
contrasting
characteristics
between
verbal
and
nonverbal
external representations will be listed and informally discussed in this
chapter.
Since an external representation is meaningful only to the
viewer/interpreter who perceives it in a certain way, we should bring
the
background
discussions.
and
response
of
the
viewer
into
focus
in
these
Although ultimately our interest is in machine perception
and understanding, in the context of this chapter the word "viewer" is
restricted to mean a human viewer.
Often the word "external" will be
dropped when it is too ct.mbersome to write "external representation"
repeatedly; it should be emphasized that all the representations that
are being discussed in this chapter are, unless specified explicitly
otherwise,
external
representations,
i.e.,
they
are
not- internal
representations intended to be stored in the memory of the potential
viewer.
Also, in order to bring out contrasting characteristics, we try
to look at extreme ends of the dichotomy; the boundary may be fuzzy when
middle cases are considered.
3.1
A REPRESENTATION IS A REPRESENTOR/REPRESENTEE PAIR
When we are talking about any particular representation, we always
actually talk about a representor/representee pair.
Representee is the
(real or 1IIagined, concrete or abstract) object, conoept, etc., that is
considere~
by
the
viewer
as
what
is
being
represented
by
the
representor, which must be an external and observable obj ect in one
modality or another.
31
It
is
important to
explicitl y consider
representor/representee pair
when
any representation
as a
we want to categorize whether the
representation in question is a verbal or a nonverbal one.· A given
representor may be regarded as a nonverbal representation With respect
to one representee or a verbal representation With respect to another
representee.
For
example,
the
written
word
"apple"
is
a
representor with respect to a real apple as the representee.
verbal
It is a
nonverbal representor with respect to the string of these five written
letter s.
3.2
NONVERBAL REPRESENTATIONS ARE MODALITY-5PECIFIC
A nonverbal external representor for a given representee in a given
e
modality may not be considered as a nonverbal representor for that same
representee in another modality.
For example, a photograph of a real
apple as a noverbal external representor in the visual modality would
not usually smell or
taste like an apple and
hence
would
not be
considered as a nonverbal representor of an apple in the olfactory or
taste modality.
photograph
of
In other words, although we can still smell or taste a
an
apple
(i.e.,
it
is
still
ob~ervable
in
these
modalities), it would taste or smell like any other photograph instead
of an apple.
If one
were to smell or taste the photograph Without
looking at it, one could not say that it
1~
a photograph of an apple.
For the same reason; a cup of apple jUice would smell or taste like an
apple, hence it would be an external nonverbal representor of an apple
only in the olfactory or taste modality but not in the visual modality.
e
32
A special case here is that when the representee is observable, it can
be used as its own external representor.
Such an external representor
would be a nonverbal representor for itself in every modality in which
it is observable.
en
the other hand, the written word "apple", observable visually and
hence an external representor in the visual modality, denotes its verbal
representee (an apple)
without being visually similar to the verbal
representee.
3.3
·SIMILARITY- CONSTRAINTS ON NONVERBAL EXTERNAL REPRESENTATIONS
Note that from
qualities in each
last section there
0
f
OlD"
seems to be further
detailed
modalities, enabling us to tell things apart.
Through these qualities, two observable obj ects can be compared
their perceptual similarities in each modality.
for
A nonverbal representor
must "resemble" its representee in at least one of our modalities, while
no
such constraint
representees.
is
imposed
upon
verbal
representors
and
their
The ability to judge the perceptual similarity of two
external objects in an idiosyncratic way and to discriminate them on
that basis seems to be innate and shared by all human beings.
The
terms, i.e. idiosyncratic and shared, mean that we are all born to be
able to hear a certain range of sound frequencies, to see visible light
but
not
x-rays,
etc.
We
idiosyncratic "hardware"
$)'St_ with other anillals.
arbitrary symbols.
even
share,
designs of
OlD"
to
a
great
extent,
motor sensory and
these
perceptual
Cb the Dtber bed. verbal representors are
The associations between a verbal representor and
its representee are established through an artificial
and
arbitrary
33
linguistic
system
without
any
regard
to
the
perceptual
similarity
constraints between them.
To illustrate this, let us suppose that there is someone whose native
lauguage is not Fllglish (and hence he has never seen the visual pattern
of the written Fllglish
~rd
"apple" before) and who has never seen any
photograph of an apple before, but who is familiar with real
hence the visual patterns of a real apple).
appl~s
(and
When he is shown, for the
first time, a good color photograph of an apple (the kind he is familiar
with), he
~U1d
soon acknowledge his recognition that there is a pattern
in the photograph that is visually similar to an apple, no matter what
language he speaks.
~rd
"apple", he
Q1
~U1d
the other hand, when he is shown the written
not be able to recognize the
~rd
as representing
an apple no matter how hard he may try to visually examine the pattern
of the written word in the same way he does with the pho"tograph.
visually examining the
~rd,
Q1
he may take notice of how the strokes or
lines curve up and down, left or right, into circles, etc.
He may even
memorize this visual pattern so that when he sees it again he would
recognize it and feel
familiar
wi th it in the same way he learns,
recognizes and feels familiar wi th any visual pattern.
this visual
But to know that
pattern denotes an apple, he must be informed 4 of this
association by the users of the language and commit this additional
information into his memory.
This is a rote learning task, since in
4 Here the learning of a verbal language involves another human being in
some kind of-oommunication (written, spoken, body language, etc.). In
contrast, a human being, indeed even animals, may learn how to
recognize the visual pattern of an apple (or 1ts nonverbal
representor, e.g., its photograph) without the help of another human
being.
~
34
general there is no perceptual linkage between a verbal symbol and what
is being represented by the
pattern on
symbol as there is between
the photograph and
photographed.
the visual
the visual
pattern of what is boeing
Clearly, there is an additional step involved" here for
the recognition of verbal representations.
Fir st, the observer must
become familiar with the perceptual patterns of the verbal symbols in
the
same noverbal
perceptual
patterns of any external
way that he does with the
real
obj ect.
association between a particular verbal
In
addition,
symbol
perceptual
the
specific
(or a group of such
symbols) and what is being represented verbally must be learned from a
convention (initially adopted arbitrarily in the perceptual sense) and
memorized.
Note that when
part of a verbal representation is fixed
through
previous learning, it may impose constraints ( e.g., consistency) to at
least partially fix" the remaining part of that representation.
exanple, the meaning of the word "tri-angle"
For
is more or less sel f-
evident, if one already has the the knOWledge about the meaning of the
components:
"tri"
and
"angle".
However,
this
further
constraint
happens only after a major portion of the verbal representor/representee
association
has
been
fixed
in
an
arbitrary manner
through
social
conventions of that verbal language system instead of through the innate
perceptual sensory system.
representor
"tri-angle"
Hence, the resultant verbal written (visual)
is
still
repreH!1tee in the perceptual sense.
arbitrarily
associated
to
i-ts
1hat is, if one does not know
English, one would not realize the association between the written word
"tr i-angle" and its verbal representee by mere visual examination of the
word.
35
3.4
VERBAL REPRESENTATIONS ARE INTERPRETED
Let
us
illustrate
this
with
an
Suppo se
example.
that
we
are
confronted with a scene or a photograph in which a man is holding a book
in his hands.
As long as time permits (say, it is a photograph), we can
focus our visual attention at any part of the photograph, limited only
by the combined resolution of the photograph and our visual system.
We
can shift our focus continuously with as much overlap as we wish.
We
can, if we want to, take a note on how thick the book is, or whether the
man is· looking at the book, etc.
!bus, when we look at a real scene, or
a nonverbal representation of it, we are free to look at it, process it
or "digest" it in any way we choose to.
All the details and features
e
that can be extracted by our perceptual systems are still there in an
unprocessed form.
Now suppose that we are reading a verbal description
of the same scene instead.
through
NecessarilY, the scene has already gone
somebody's sensory sytem,
recognized
and
processed
by his
perceptual system, and then verbalized according to what is deemed by
him as important to tell, in the language he chooses to use, with the
associated words in that language he selects.
This verbal description,
as prepackaged by this particular observer/speaker, can not be scanned 5
r
for
the
visual
information of the
representee
5 Again as pointed out earlier, we can scan
nonverbal input representing itsel f.
For
message, we can focus on any part of it: is
what kind of paper used? BJt scanning this
information of the representee.
in
the manner
of a
the verbal message as a
example, in a written
it handwritten or typed?
way we don't get visual
tit
36
photograph.
What we would do here is to reverse the verbal generation
process by recognizing
the
verbal
terms
used
in
this' description.
retr ieving any relevant pieces of information in our memory about the
things that we think are denoted. by these verbal terms. reassembling
them into some kind of mental repr esentation of the scene described.
As
another
example.
it
has been
calculated
from
an
information
theoretic standpoint that a TV picture is t,.«)rth 50.000 English t,.«)rds
(Raphael.
1976.
pp.46-48).
information
theory.
transmission
time
1t
for
will
a
This
take
$iven
means
that
in
the
approximately the
transmission
channel
context
same
with
of
amount of
a
given
transmission speed to transmit either 50.000 English t,.«)rds or a 500X500
points digitized TV
pict~e.
Here the transmission time is used as a
concrete measure of the information content.
It is quite another matter
to say that the tt,.«) have the same "knowledge n6 content. as that depends
on the way the verbalizer interprets the picture and on how the receiver
reconstitutes the denoted TV picture inside his mind.
Often. verbal
statements like "test pattern". "commercial" would be enough to convey
an entire series of TV pictures.
6 Using Raphael t s t,.«)rds: " •.•• the word 'information' was ruined by
communication scientists •••• We shall use the t,.«)rd 'knowledge' to refer
to what used to be called 'information': to mean. roughly speaking.
the data that must be transmitted through a transnission channal il,
orer co convey a message. not in all its detail. but well enough for
the receiver to understand its meaning." (Raphael. 1976. p.47).
37
VERBAL REPRESENTATIONS ARE LUMPED WHILE NONVERBAL REPRESENTATIONS
ARE GRADuATED
.
3.5
The point of previous section is that verbal representations are
prepackaged and interpreted products of ht.man minds through the use of
some arbitrary associative link system.
Here we consider the situation
in which we can take measurements of some feature extracted or to be
extracted from the scene denoted.
In a nonverbal repr esentation, there
will be an identifiable physical entity which will reflact the amount of
that feature quantitatively. so that the perceiver will recognize not
only what the feature is being represented but also how much, regardless
of the language this perceiver speaks.
Thus, when we say "there are two
apples, one large and one small". we can not detect any identifiable
physical entity in the words "one", "two". "large" or "small", Which
would reflect in any quantitative ·and language-free manner the
~ount
of
the features denoted by these words . . They, like the word "apple", are
just arbitrary (language-dependent) labels of the "thing" denoted.
If
we represent the scene with a line drawing like the one in Figure 6.
"two"
is represented by the two
drawing.
figures in
the
"Large" vs. "small" is r enected by the relative amount of
area enclosed in the
lines are
separately enclosed
also
t~
analogic
figures.
and
are
In addition, the skeleton contour
not drawn
completely arbitrarily.
fbwever ,. the line drawings are still produots(output) of a ht.man mind
whose
processing
obvious.
and
prepackaging
effects on
the drawing
are very
e
38
Figure 6:
A line drawing of
t~
apples, one large and one small
TBIXGS DDOTED IX VEBBAL REPRESEBTATIONS MAY JOT EXIST NONVERBALLY
3.6
Care was taken in choosing the previous examples so that the things
to
referred
physical.
by the
fbwever,
verbal
labels
are
simple,
since the link between
concrete,
a verbal
label
real
and
and
its
representee is arbitrary, it is a simple and natural step to promote the
status so that we can create and manipulate verbal labels which refer to
anything, existing or not.
Actually, most things that we talk about
either do not exist at all, or have so complex relationships with things
eXisting that it
~uld
not be a simple matter for us to produce and
exhibit these things and express nonverbally what we are talking about.
(o,mpare the amount of effort between writing a novel and f11ming the
novel into a movie.)
3.6.1
Consider the following :
Things referred
~!!! ~
currently existing
~
front
~ ~
We can verbally refer. to and talk about things that are far removed
1n time and space.
This is called displacement effect (A1 tch1son, 1916,
p.39; Cohen, 1977. p.77).
We can talk about things that existed only in
the past, or that will probably happen 1n the future, or that currently
·,
39
exist
in
some other
place
far
away enough that
our
sensory and
perceptual system is not able to examine them at the time when they are
being talked about.
3.6.2
Si~le
verbal labels _ay represeDt co_plex experieDces
Words, inasmuch as they are arbitrary labels of things referred to,
are orten shorthand notations. 'They allow a single label (or a small
group
of
labels)
to
represent
what
might
ex per ience (Lind say and Norman, 1977. p. 483) •
actually
be
a
complex
To be sure, these words
or labels when being recognized as representing themselves, constitute
nonverbal experiences (see previous sectons).
But here, we are talking
about things referred to by these verbal labels after they (the labels)
are recognized and interpreted by our sensory and perceptual system.
Thus, When one says "I took a trip to N. Y. City.", the actual motorsensory experiences referred to are very "complex", involving in part,
driving a car for many hours, staying at someone' s home, or going to
restaurants, etc.
The concept of complexity of an experience may be explained in the
context of hierarchical systems theory, which consider s descriptions of
the world at many levels of abstraction (Mesarovic, Macko and Takahara,
1970, p. 37).
For each level there is a set of relevant features and
variables, laws and principles in terms of which the system's behavior
is deacribed.
Thus, a h... .-s body can be viewed as an interacting set of
cells at the cellular le¥el, and then each cell as a collection of
interacting molecules at the molecular level, etc.
cr
going toward the
macroscopic direction, each human being can be viewed as assuming a
.e
40
certain role in a family, and a family assunes a certain role in a
society.
There are actually many such macroscopic directions that we
can go.
Thus, human beings are part of creatures living on earth, which
is part of solar system which is part of the Milky Way Galaxy, etc.
!:Ue
l,
to the idiosyncracies of our motor-sensory system 7 and the accompanying
"innate" perceptual processes, we would find things in certain level( s)
in this hierarchy very" simple" and probably innate for us to understand
and describe.
processes.
These are the levels where we usually start our learning
"Simple"
motions,
either
the
steady state
tyPe,
e.g.,
walking man, running dog, or the "one-shot" type, e.g., John hit Mary,
etc., are also quickly learned.
These simple experiences and their
verbal
building
labels
are
then
used
describing complex experiences.
as
blocks
or
components
in
Intuitively, simple things are those
that we can learn early in our life, that we can visually examine in one
or t\ll) looks (spatially and temporally) relative to the level of detail
involved.
Thus, looking at the earth from the moon or at a photograph
of earth taken from such a distance, we can easily see the whole earth
as a globe in one visual field.
Then it would be a relatively simple
thing to be convinced that the earth is indeed round.
But from that
.
global view, it would not be easy to see that the apple in my kitchen is
round.
Similarly at the microscopic level. which is beyond the resolution of
our visual system, the verbal terms like cells, molecules, atoms, etc.
are net s1mple th1np to learn at all.
--,_._-----,---
7 Such as the optical resolutions of our eyes, the olfactory receptors
in our noses, etc.
41
3.6.3
Generalized concepts and abstract relations
represented in verbal~rms
~
only be
Even "simple" things and their verbal label s, like "apple" or "hunan
being", which we learn early in our lives, will later be promoted into
generalized concepts, and as such can only exist in verbal terms.
For
exanple, when we talk about the concept of "hunan being", we either talk
.about the entire class of hunan beings (the extension of the term "hunan
being") or talk about what one needs to have in order to be a hunan
being
(the
intension
of
particular individual.
"hunan
being")
without
commitment
to
any
fbwever, anyone that we actually perceive and
recognize as being a hunan being (nonverbally) is a specific human being
with a nane that mayor may not be known to us.
A photograph of a hlll1an
being amounts to the same.thing; a real, existing specific· instance of
the concept of "hunan being" must be in front of the canera in· order to
produce the photograph.
verbal
Hence,
concept directly.
instances
0
we can
not
perceive a generalized
At most, we can only perceive particular
f that verbal concept.
As another
example, we can not perceive nonverbally the numerical
concepts 1,2,3,4, •••or pi.
We either perceive the 1 abel s themselves
(which can be in many forms in many languages), or perceive one finger,
t\\O apples, a circle and its diameter, etc.
Abstract relations such as ownership, or father/son relationships can
be talked about as generalized concepts or as particular instances of
above, even particUlar instances of abstract relation concepts can not
be perceived directly (nonverbally).
man is driving a car.
Thus, consider a scene in which a
There is nothing overt in the scene which we
e
42
would recognize as nonverbally representing that the man owns the car.
The owner ship card that he may have in the car's glovebox is a verbal
representation
of this instance of ownership.
To assert
that
the
relationship between this man and this car is an instance of owner ship,
we usually need a verbal indication such as the ownership card which is
a verbal shorthand notation indicating the occurrence of anyone of many
possible transactions, e.g., buying the car from its manufacturer or its
previous owner, or being given the car as a gift by its previous owner,
etc..
These transactions themselves are usually complex temporally and
spatially relative to our sensory systems.
Also the ownership relation
recursively appears in many of these transactions.
Hence, from these
exanples,it seems that we need verbal ability in order to manipulate
and
~ommunicate
generalized concepts and complex relationships.
Because
computer s are symbol manipulating machines, we can expect that they'
Would be very good at handling these shorthand or abstract relationships
or concepts, which exist· only as labels· in our verbal systems.
This is
indeed the case, as can be judged by the relative success of computers
in solving mathematical and numerical problems.
3.7
SIMPLE NONVERBAL EXPERIENCES FORM THE BASIS FOR VERBAL
DESCRIPTIONS
In the previous section, we discussed verbal relationships that are
either shorthand notations of very complex nonverbal experiences, or
abstract r't1at1onsups wb1ch can lSDtbe deaa:crl'Suatad mmYer1)a11y.
Q1
the other hand, - ability to recognize and understand many simple
nonverbal experiences seems to have been geneticallY prewired into our
perceptual systems and forms the basis on which our verbal system is
43
buil t.
local
For example. evidences seem to suggest that there are innate
feature
extractors.
such as line detectors.
slope
analyzers,
motion analyzers in our vision system (Anderson, 1975, pp. 38-39; Linsay
and
Norman.
1977.
pp.232-237).
Also. the mechanisms for
phonemic distinctions seem to be innate (Ander son,
processing
1975, pp. 75-76) •
This means that our verbal system. instead of describing these simple
nonverbal experiences. uses them as primitives to describe other more
complex ones.
This is both natural and necessary.
It is necessary
because without these nonverbal motor-sensory experiences. the verbal
system can not be anchored and will contain circular definition loops
(Lindsay and Norman. 1977. pp.390-391).
It is natural because these are
the things we under stand well, so that further description is seldom
needed;
expressions like "sings like a bird",
"barks like a dog",
"wharf. Wharf" are about as far as we go verbally in this respect.
Furthermore,
they often
provide
us
with
intuitive
insights.
For
exanple. consider the value of visual aids or graphs in textbooks (see
more discussion on this in Boden, 1977, pp.342-343).
These
simple
nonverbal
motor-sensory
experiences
are
the
most
difficul t to be represented verbally in computers. which lack our motorsensory hardware.
Consider, for example, how difficult i t
~uld
describe an auditory experience to a person who was born deaf.
be to
44
3.8
VERBAL REPRESENTATIONS HAVE SYNTAX FOR COMMUNICATION
Since the labels in verbal representations are arbitrarily associated
to the things referred to, we are free to choose any form for the labels
and any structure (called syntax, see later) to imbed these labels in,
as long
as the
users know and
understand
the convention of these
choices.
Consider the developnent of natural languages.
languages were first evolved as spoken languages.
later adopted.
Almost all natural
Written options were
ene major difference between a sound stimulus and a
visual stimulus is that sound stimuli are always transient involving an
active
process at
the
sourc.e,
while visual
stimuli can
be either
transient (when the source of the visual stimuli is not stationary) or
static.
A static object (such as an· apple or written symbols on paper)
to be viewed by our visual system is passive in that it only needs to be
sitting there and shone by light in order to be seen by us again and
again.
en the other hand, once the sound source has made a sound,
unless it chooses to repeat, the sound will be gone.
As a result of
this transient property of sound, we are forced to accept the spoken
. symbols (called phonemes 8 ) in the same order as they come to our ears
serially in time.
The speaker of this language must choose not only a
certain phoneme to be vocalized at a given instant of time according to
the meaning of the message to be conveyed, but also a certain rule from
the syntax to str ing a gro up ( c.all ed a sen tenc e in Engli sh syntax) of
8 A sound that, by itself, can change the meaning of a word is called a
phoneme in that language. (Lindsay and Norman, 1977, p.270). The part
of a word that contains the basic unit of meaning is called a morpheme
in that language (Lindsay and Norman, 1977, p.484).
45
these phonemes serially in time.
This time-serial processing order of
spoken language is retained in the written mode of that language.
Thus,
although we can scan 9 the written symbols of a language in any way we
choose to (top-down, diagonally, horizontally, repeatedly), we always
have to process and understand them in the same order as they
processed when spoken.
A Chinese reader
from left to right, all with equal
~uld
quickly find the right way in a
given written message, and read on in that order.
~uld
be
For example, Chinese can be wr i tten from top to
bottom, from right to left, or
readability.
~uld
A non-Chinese reader
be at a total loss as to what direction the message was written
and should be read in.
What needs to be emphasized here is that for a
given verbal message in a language, there is only one way to vocalize it
temporally, but there are many ways to write it spatially.
written Chinese can not be understood by reading it sideways.
written in a spiral, one must follow
Vertically
If it is
the spiral in order to read it.
A
written message needs to be read not only time-serially but also in an
order predetermined at the time of writing.
In other
~rds,
verbal
representations are produced solely for the purpose of communication,
using arbitrary (in' the perceptual
sense)
conventions,
relating the
verbal representors (which are the signals actually transn1tted in the
communication) and the representee.
For the message to get through, the
reader/listener must scan the representor following the same convention
-------------9
1:8 a word denot1~ the btilav1-or of rocua1~ on vat"loua PIIrts of
the visual stimulus in a time-serial order. When a static scene is
un familiar and complex, we usually scan the scene (Anderson, 1975,
p .73) • Cbviousl y, we also scan a moving scene as long as time
permits. We move our eyes and our heads with the moving objects to
prolong the time that we can spend on focusing on the object.
R3Dar
e
46
adopted by the writer/speaker who created the message.
In contrast, for
a nonverbal visual scene such as a real apple or its photograph or line
drawing, there is no predetermined way
0
f scanning it spatially in order
to understand it.
Frequently in
the
literature,
dimensional string of symbols..
As
language
discussed
is
described
above,
dimensional" means reading or writing serially in time.
as
one-
the word
"one-
When a page of
verbal message is already written and the reader can no longer follow
the writing process in time, this temporal order becomes an abstract and
,;
verbal relationship, which the reader must know through learning his
language.
The reader must be able to pick each written symbol out as a
unit (anong a jl.ll1ble of visual patterns) and know where (spatially) to
find the next symbol to read after (in the temporal sense) the current
one.
The temporal order in the spoken language partially transforms
into a spatial order (or direction) in the written form, such as topdown, left to r:ight.
(note: just using words like "horizontally" or
"verticallY" would not be enough. An order or direction must also be
specified.)
Since each .written symbol must at least be a 2-dimensional
visual pattern, the term "1-dimensional" can be easily confused with
these spatial dimensions.
To illustrate the above points, let us consider the transmission of
visual information through the use of television (TV) signals.
The TV
camera linearizes a 2-dimensional picture by scanning it in a series of
horizontal lines.
262.5 such horizontal lines stacking from top to
bottom form a field' of a TV picture in one sixtieth of a second.
The
"linearized" video signal can then be transmitted to the TV receiver,
~7
where the scanning process is reversed· in syncroniSll to reproduce the
picture.
In figure 7 t the waveforms of the video signals for tw) such
horizontal lines are shown.
The horizontal sync pUlses inF1gure 7
provide the synchronizing signals and delimit one horizontal line from
the next.
(At the end of each field t another series of synchronizing
pUlses called vertical sync delimits one picture field from the next.)
I"
......;:...-"'-_1===::;-
I_~""'_o.;B;,;:LACK
~......---,--------_f_--~-----.W;,,;,HITE
:: .
•
. ..
I
I I
I
'_I_'~'_I
I
I
--"
Figure 7:
1.
I
••"
I
I
I
'_"_'
I
• ••
, I
, I,
-- I
t I
,
I
I
I
I
.-----
BLACK
WHITE
Waveforms of the video signals for tw) horizontal scanning
lines
'Ihese sync pUlses as well as the number of horizontal lines per
pictur e field t like the syntax of a l.anguage tare additional
e
48
. structures designed solely for the purpose of organization of the
video information into more conveniently communicable forms
signals between the transmitter' and the receiver.
0
f
They do not
renect any specific visual structures in the original picture.
The choices of these additional structures were made arbitrarily,
that is, wi thout any regard to the correspondence to any specific
visual structure of the scene.
As matter of fact, the European
convention chosen for these additional structures such as the
number of horizontal lines per picture field and when to come up
wi th
these
Imerican
sync
pul ses
convention,
is
sufficiently different
r.esul ting
in
two
incompatible
from
the
systems.
Thus, once a convention is reached on these choices, the receiver
must follow the specific convention adopted by the transmitter in
order to reproduce the same picture.
2.
Although these
additional
structures have no· bearing on. the
specific visual structures encoded in the signal, any disturbance
that jumbles these add-it1onal structures will render the entire
video signal unreceivable.
3.
When these TV signals are propagating as electromagnetic waves in
space or
in
the
transmission
lines,
they are
not directly
perceivable by a human viewer, hence cannot serve as an external
representor (verbal or nonverbal) for him.
These waveforms can
be viewed through the uae of proper displaying devices such as an
oscilloscope which produces the pictures in Figure 7.
As such,
they cannot be judged as nonverbal representor of the visual
. scene.
A human viewer looking at a series of these time-linear
49
waveforms and
unaware of the convention used cannot visually
recognize whether it is
picture of an apple or not, in the same
way that he could when looking at a photograph of an apple.
Thus, syntax of a language, like the sync pulses of TV signals, is a
set of add-on structures that specify arbitrarily (in the perceptual
sense)
how individual verbal labels are organized
communication of complex messages.
auditory
signals,
structures.
syntax
of
together
for
the
Due to the time-serial constraint of
natural
languges
all
specify
linear
Fir st, it classifies verbal labels into syntactic classes
(noun, verb, etc.)
arbitrarily without any regard to the perceptual
structures of the representee being referred to.
Then, it specifies
rules (again arbitrarily in the perceptual sense) about how to linearly
string (concatenate) verbal labels together into sentences according to
the syntactical classes of the component verbal labels.
Therefore, the
particular syntactical structure of a given sentence is not selected to
directly renect any perceptual structures of the representee referred
to in the sentence.
Q]
the other hand, in nonverbal representations
such as a map, even though the individual components such as cities,
towns may be represented verbally, their spatial structures on the map
are directly dictated by the spatial structure actually existing between
them.
50
Chapter IV
FORMALIZATION OF THE VERBAL VERSUS NONVERBAL DICHOTCJ4Y OF EXTERNAL
REPRESENTATIONS
4. 1
In
EITElUfAL REPRESDTATIOlfS
order
to
formalize
the
verbal
versus nonverbal
dichotomy of
external representations, we need to be precise about what is meant by
In Section 3. 1, it was pointed out that a
an external representation.
representation is a representor-representee pair.
It was also pointed
out that when we talk about an exter'nal repr esentation, a (ht.man) viewer
is often implicitly involved to whom the representation is external and
by whom it is viewed and considered.
Accordingly, an external representation is defined as the triple 10:
(RTOR, RTEE, IPS)
where RTOR is the representor -- a real object which is observable, in a
sense to be discussed, by the system IPS;
RTEE is the representee -- the "obj ect" being represented (may be .
real or imagined, concrete or abstract);
IPS is an information processing system, to which RTOR is external
and
observable,
and
by
which
the
represent~tional
relationship between RTEE and RTOR is be1n& considered.
Hote
tn.
following:
-----,------------------
10 The formalization developed in this chapter is an expanded version of
a paper published previously (Tang, Gold, Tharp (1979».
51
1.
e
The requirement that the representor be observable is obvious; an
external
object
must
fir st
be
observed
before
it
considered as representing itself or anything else.
can
be
While we
require an RTOR to be observable to the IPS in question, we do
not require an RTEE to be observable to the same IPS.
fbwever,
it is helpful to restrict most of our attention to a special
subclass of external representations -- R.OaS is also observable by the IPS in question.
where the RTEE
A special case in
R.oas is the kind of representations where RTEE
= RTOR,
that 1s,
they are one and the same external and observable object.
,
2.
While
the
information
processing
system
(IPS)
that concerns
cognitive psychologists is the hlll1an mind, we would like to put
it
in
a broader
.conSider
the
perspective
hlll1an
brain
in
as
our
context
belonging
to
information processing systems in general.
information
processing
information
processing
computers --
system as
system
IPS.CCMPUTERs.
IPS.HtJ4AN.
of
It
interest
is
• We
a
therefore
subclass
of
We label the hlll1an
Another
here
important
is
to
type of
that
of
explicitly
consider the involvement of an IPS in an external representation.
A given RTOR observable by IPS. X may not be observable by IPS. Y.
For example, the visual pattern of a word "apple" is observable
(and hence can be an external representor) to an IPS. HUMAN.
the a.e visual pattern is not observable to an
IPS. CCJotPUTER
which is not equipped With any hardware for vision.
cannot
But
·Thus, it
be a RTOR to such an IPS. COMPUTER and hence must be a
RTEE represented by some other
RTOR
(such as the pattern of
e
52
punched
holes on
an
IBM card),
which
is observable
to
the
IPS. CCMPUTER in question.
3.
For a given RTOR, many different .'RTEEs can be considered by the
IPS in question.
IPS. HUMAN can
For exanple, the word "apple" as an RTOR to an
represent
the
visual
pattern of
the
5-letter
character string or it can represent a real apple as it usually
does in an English verbal message.
Ukewise, for a given RTEE,
many different RTORs can be proposed to represent it.
~I\..
4. 2
~ CASES OF OBSERVABILITY All» TWO TYPES OF PERCEPTUAL TASKS
In the last section, we introduced the requirement that an RTOR be an
object "observable" by IPS.
To be more specific, we can distinguish the
following four cases of observabllity:
Case( 1): '!he object is currently being observed by the IPS in
question.
Case(2): '!he object has just been observed by the IPS bout is not
being observed currently.
We sayan observable object is
accessible to an IPS, if it stably exists in the vicinity
of the IPS so that the IPS can move itself or its sensory
organ(s) to "focus" on that object.
Case(3): '!he object had been observed some time in the past but is
not currently observable and accessible.
Ca_( 4): 'Dle object i . observable but h.. n...«
actuall,. been
observed by the IPS in question.
The .formalization of the verbal vs. nonverbal dichotomy is motivated
by t\oolO types of basic perceptual tasks· that an IPS. HUMAN often performs
53
upon
the
representations
external
discrimination task where
t~
from
the
set
R.OBS.
accessible objects Rand
<i1e
is
a
R' are being
exanined side by side and the task is to decide in what respects, R is
perceptually similar (a concept to be defined later) to R' and hence a
nonverbal representor of R' in these respects.
The other task is one of recognition where
the RTOR R' is accessible
to the IPS and the task is to decide whether R' is something it has
observed before, i.e., whether there existed an RTEE R, observed by the
IPS in
the
case(3)
sense
and
(Another
R:R'.
way of stating
the
difference between a discrimination task and a recognition task is given
in the footnote on page 73.)
These are the
called
upon
to
t~
types of perceptual tasks that an IPS is often
perform
when
it
is
engaged
in
considering
the
relationship between the representor and the representee of an external
;
.
representation.
In both tasks,
and the representee)
t~
observable objects (the representor
are examined and compared
for their
perceptual
similarity (to be discussed later) by the IPS in question.
4.3
THE COMPOIIEHTS OF AR IPS
Having adopted a broader point of view which inclUdes computers as a
sUbclass of IPS in our discussion, we need to be specific about what a
computer needs to have in order to be a member of IPS.
An IPS in our
di8C ..s1on is a 5yates capable of enlaging 1n a perceptual task and
acqu1riftl
perceptual
knOWledge
and
using
1t
intelligently;
e.g. t
interpreting the current scene, or clarifying ambigui ties in natur al
language conversations.
An IPS here should be understood to consist of
e
54
at 1 east the fo llowi ng sub sys tems :
1.
a sensory-driven perceptual system (SOPS) which consists of a
finite set of transducers and feature extractors organized into
different sensory organs that convert external stimulus signals
into internal sensory codes.
2.
a'
~hort-term
sensory store (STSS) which is a limited capacity
buffer memory where the sensory information generated by the SOPS
can be temporarily stored.
3.
a long-term memory (LTM)
~ere
appropriate information can be
stored for an indefinite amount of time, for subsequent retrieval
and use.
e
4.
a finite set of motor output effectors (MOE) which can be used by
the IPS to move its sensory organs to focus 11 on different parts
of the scene or to manipulate the objects under observation.
5.
an executive .system (ES) which is the central coordinator of the
above four components.
For example, ES decides where one of its
sensory organs is to "focus" and directs the corresponding MOE to
achieve that goal.
will be
used
Subsequently, this global positional value
by the
ES to
interpret and
information coming from that sensory organ.
index
the
sensory
In this Chapter, ES
and IPS are sometimes used interchangeably.
11 Terms related to visual perception are used here for concreteness.
Ibwever, this is not meant to imply limitation to visual modality.
55
4.4
SEBSORY-DRlVEIf PERCEPTUAL SUBSYSTEM
The key concept in formalizing the dicotomy of external verbal versus
nonverbal representations is that of perceptual similarity.
Ibwever,
before we can formalize the concept of perceptual similarity, we need to
specify
what
content
will
be
compared
for
perceptual
Perceptual content comes from the sensory inputs.
similarity.
So let us look at the
sensory end of the perceptual subsystem of an IPS.
4.4.1
Transducers and feature extractors
As pointed out earlier, the perceptual SUbsystem of an IPS consists
of transducers and feature extractors.
The set of transducers constitutes the sensory windows of an IPS.
Because of the reality of physics and chemistry of matter and energy
interactions, one cannot expect to have all-purpose transducers, natural
or artificial, that are equally sensitive to all forms of signals.
Therefore, transducers are specialized and can be classified by the
types of signals that they are especially sensitive to.
A group of
transducers of similar types may be coherently organized into a sensory
organ.
SUch a grouping of transducers is also often referred to as a
modality.
Thus,
when
we
say that an
external
representor must be
observable, we mean that for a given IPS, an external representor must
produce (emit or reflect) signals which the transducers in one (or more)
of the sensory orians of the IPS 1n question are sensitive to.
!he
outputs of the tr!medueers in a given sen:sory organ are then further
processed and transformed by a set of feature extractors.
The collected
outputs of these feature extractors form the perceptual contents (to be
56
discussed
later)
that reflect
the
current
scene
in
focus
in
that
modality.
Let us formalize this for one such sensory organ in modality m12.
Let
1fll
= {T'!{,
~,
••• ,
~
1fll
}
denote an entllleration of the set of transducers in that sensory organ.
Referring
to
Figure 8,
each transducer
information carrying signal
from
outputs
t m as
the
internal
code
~
responds locally to
the
l!f
and
the scene that impinges on
i
a
response.
Thus,
for
a
given
accessible scene X (see Figure 10), let
tm(X,p~)={t~, t~, ••• , t~
}(X,p~)
1ft
be the collected outputs of these transducers as ES directs its MOE so
that the sensory organ is positioned to ".focus" on some part of X (as
indicated and indexed by the positional parameter
The outputs
tm(X,p~)
are
further
processed
~).
by a
set of feature
ex tr ac to r s :
F1D =
Each
?f
embodies a
{F~, ~, ••• , ~
function
F1D
}
which accepts a mmber of inputs and
12 Notational conventions used here are the following:
1.
Capital letters stand for functions,
operations that ~rks on input data;
devices,
processes,
2.
Lower case letters stand for the codes or data that are inputs
or output.s of theM dev1cas;
3.
Superscript m indicates the modality m to which these devices
or data belong;
4.
Subscripts distinguish different individual members of a set.
Letters without a SUbscript indicate the entire set.
57
xT --...·-I~I-- ....- tT
Figure 8:
A transducer in modality m.
~
produces a single output
as a response (see Figure 9).
!he number of
input channels for each different feature extractor in the set may be
different.
output
Fach input of a given feature extractor may be either an
from
extractor
a
transducer
(tj)
or
an
output
from
another
feature
(~).
tj or ~
: :• IF"I
1--"·- ~
Ff
wher e each in put to
may be:
either an output from a transducer (tm)
or an output from another feature ext~actor (ry:).
Figure 9:
A feature extractor in modality m.
Thus for tbe scene X "viewed" at posi tion
~
(see Figure 10). we
have:
fm(x.p~)={~. f~;
•••• f~
f1I1
}(X.p~)
as the perceptual content that r en acts the current scene.
A sub setting
operation in the feature space called figure-ground segregation (see the
Se-cticn en
t'1lur~und
segreeation) 1.1 then performed on
fIB(X.p~)
to
segregate the perceptual content of the object R of current interest
(the figure) from the rest (the ground) in that modality.
Apart from the requirement that each transducer or feature extractor
58
x
TIl positioned
at vi
..m
rm(XtP~)={~t f~t ••• t f~
pn
}(XtP~)
Perceptual content of scene X
in modality m at ~
figure/ ground
segregation
(a sUb setting
choice by ES)
....
rm(R,p~) ~
rm(GRtP~)
(perceptual Content
of
Ground GR in X)
dm(G
pm)
Rt r
(perceptual Content
of
Figure R in X)
=
{em (R ,p~),
(Mellor y 1'r' 8C e
or GR)
Figure 10:
rm(XtP~)
lin k 1, ••• }
(Metaor y 1'1" ac-e
or
R)
Perceptual contents in modality m.
59
produces
a
single
output,
the
above
formalization
the
on
set
of
transducers and feature extractors is very general and applicable to any
type of modality.
1.
loops or
For exanple, they allow:
feedbacks to exist in
the configuration of feature
extractor s in each modality;
2.
line detector s,
extractor s in
edge detector s as feature
the
visual modality;
3.
low pass or high pass f11 ter s of spatial or audio frequencies as
feature extractor s;
4.
feature extractor s that per form temporal integration by allowing
the input vector to have some elements which come from other
units with different time delays, and thus come from the external
scene at different times.
At this level of generalization, we shall not make any assllllption
concerning the deta11s of how these transducers and feature extractors
are interconnected or organized, nor concerning the type of internal
codes
generated
by each transducer
or
feature
extractor
(such as
frequency-modulated nerve impul ses, chemical neurotransmitter s, graded
post synaptical potentials, binary digits, etc.).
Al though these transducer s and feature extractor s are formalized in
terms of abstract functions, it is helpful in our conceptualization to
think of them as actually implemented by physical devices, such as
neuron."
or
circuits
of neurons,
or
finite
state
machines,
each
4.
operat1n& 1n real tlllle and in a parallel I\.distributed manner.
Although we deal with these transducers and feature extractors and
their
outputs at
a
very general
level.
we
considerations of the following two SUbsections.
wish
to
emphasize
the
60
4.4.2
Transducers and feature extractors .ust have fixed bebaviors
Notice that the job of a transducer or a feature extractor is to
produce perceptual content that selectively reflects certain aspects
0
f
an external scene in a reproducible manner for perceptual tasks such as
discrimination and recognition.
feature
and
extractor s
functional behavior
are
In order to do this job, transducer s
not
one-to-one
should be fixed
considered for a recognition task.
functions,
but
their
at least within the time span
These operators are not one-to-one
functions so that they can selectively respond to input signal sand
extract various features.
They should have fixed functional behavior s
so that their input-output relationships can be reproducible.
Thus, the
same scene X viewed the same way at two d.ifferent times should produce
the same perceptual content.
4.4.3
External scene .ust be stable to produce perceptual contents as
forced responseS--
If the transducer s and feature extractor s are considered to be real
physical devices, they
cannot respond to input signal s instantly.
For
each application of a new set of signals at their inputs, there 1s a
finite time delay before the corresponding outputs (the forced response
that reflects the new driving inputs)
Within
this ti.e delay,
is produced by these devices.
transi«tt outputs w1l1 be produced
previous old output decays into the new output.
as the
Hence, there is the
question of how long the new inputs must stay" stable" before they again
change into new values.
The inputs
must stay stable long enough to
,
"
61
induce the outputs as forced responses, for we are interested in the
case where the
perceptual contents reproducibly reflect the current
scene being viewed.
In other words, the perceptual content must be the
forced response driven by the current scene.
Therefore, we primarily
deal with those external scenes (or representors) that are stable enough
to produce the corresponding perceptual contents as forced reponses.
However, what is to be considered "stable" depends on the nature of
the transducer s and feature extractor s that work on these input signals.
Thus, a set of transducer s and feature extractor s that track a 1000 Hz
audio sine wave may consider a constant 1000 Hz sine wave input a stable
scene, While the same scene may be considered as fast changing and
lmstable for some other set of transducers and feature extractors.
In
hllDan auditory modality, beats with frequency above 20 Hz are no longer
heard as distinctive beats but as a merged and stable sOlmd frequency.
The spokes of a fast turning wheel give a stable visual feeling entirely
different from that of slow turning wheels.
These are two real life
examples indicating that there may be feature extractor s that extract
stable features from a scene that is not ordinarily considered
stable.
Therefore, in many cases,
as
the entire input scene need not be
stable, but only need to have a stable sub set (called the figure, see
next section) to drive a sub set of the feature extractor s and produce
the
perceptual
content
of the
figure
of current
interest.
Here,
"stable" is used in a sense relative to the sUbset of feature extractors
used.
e
62
-e
4.5
FIGURE-GROUJID SEGREGATIOII
AIf])
PERCEPTUAL COIITEIIT
As briefly outlined· in the previous section (referring to Figure 10),
an external scene X emits signals detectable in modality m.
As the ES
. positions its sensory organ towards the scene, the transducer array 'JIll
intercepts
part of these
tranSducer output codes
parameter
~
signals and
tm(X,p~)
(see next section).
converts
them
indexed by the
into
internal
positional
At the next stage,
tm(X,p~)
viewing
is further
processed by a set of feature extractors FfD in that modality.
collected outputs from fill,
fm(X,p~),
The
is called the perceptual content of
~
the external scene X as viewed at position
by the sensory organ in
modality m.
Since, usually, only a part of the scene X is of current interest,
the perceptual content f'Ul(X,pm) undergoes a sub setting operation called
r
figure-ground segregation.
In this operation, the ES, on the basis of
its past experiences and its current goal state, selects a subset from
for
further
consideration
perceptual
the
in
(discrimination or recognition) to be performed.
task
This subset selected
is called the perceptual content of a figure R in the scene X viewed at
~,
it is labeled fID(R ,p~) in Figure 10.
Relative to fM(R ,p~) is the
perceptual content of the ground GR, which, labelea rM(GR'P~), is a
subset of rM(X,pm) and the intersection between rM(GR,pm) and rM(R,pm)
r
.
r
r
is empty.
For example,
a visual system may have,
r.atur-e «tractors.
Yarious
among
its set of visual
lilte seement deteetcTs.
certain line segments at certain retinal positions.
each deteeting
(Retinal positions
refer to the positions of the tranSducers within the visual sensory
63
organ and are different from the positional vieWing parameter
~
which
is the position of the sensory organ as a whole relative to the scene
being viewed.)
Suppose
~
stands for a visual feature extractor which
detects the presence of a line segment at retinal posi tio,n y.
Suppose
that, in a particular sUbsetting resulting in the perceptual content of
a particular figure .R, the positive output of this feature extractor
is selectively excluded from the subset
f1D(R,p~).
F!f
This simply means
that the perceptual content of the figure R being considered does not
include a line detected at that retinal position (hence at some position
on X).
As another example, a feature extractor in an auditory system may be
responding
to
some
sounds
at
high audio
frequencies.
Selectively
ignoring the output of that feature extractor may mean that the ES is
trying to follow a htlllan voice made at low audio frequencies amidst some
high background music made by a violin.
Note that:
1.
The definition of the figure R is approached
rrom
the internal
process Within the IPS in question through the use of perceptual
contents.
scene
Since the perceptual contents reflect the external
being
viewed
in
that
modality,
the
figure
R,
whose
perceptual content is a subset of the perceptual content of the
scene X, should correspond to some subpart (which we have labeled
R) of X.
We did not use the simpler and more direct definition
of f!lurei by
Hy1~
that R is a subset of X because sooh
definition does not take into consideration the manner in which X
and
R are being viewed by the
IPS in question.
Wi thout the
e
64
.·-e
explicit consideration of how a
scene
is being
formalization of figure-ground segregation
r -,
2.
~uld
viewed,
the
be d ifficul t.
The sUb setting choice is an internal operation made at ES level.
The
SDPS
(transducers
and
feature
extractors
in
various
modalities whose job is to reflect external scenes) does not and
should not concern itself with what particular figure should be
segregated
out
from
the
scene
currently being
viewed.
The
external scene simply exists as a whole and does not dictate what
is to be the figure and what is to be the ground.
For a given
scene, different sUbsetting can be done on its perceptual content
inside
an IPS, resulting in different figure-ground segregation
depending on how it is viewed by that IPS, and what the past
experiences and current goal of that IPS are.
4.5.1
Guidelines for the segregation of figure
~
ground
The following is a. list of possible gUide lines that may be used by
an' ES for segregating the perceptual content of a particular figure from
the current scene:
1.
The segregation can be achieved through active manipulation.
The
object that can be manipulated as a whole by the "hand" (MOE) is
regarded as the source of the perceptual content of the figure.
The ES looks
for
the
part of the perceptual content of the
current scene that is expected to correspond to the manipulation.
For
~x_pl.,
extractor)
an edle (as detected by a ce-ta11s v1.ual feature
can
be considered
as belonging
to
the
figure of
current interest when 1 t 1s found at locations corresponding to
65
the
expected
movement
direction of ES,
of
the
object
is manipulating.
auditory modality,
the
particular
intrtIDent
musical
sound
MOE,
under
the
As another example in the
which
is
which
was
not
t;here
struck by the
before
IPS must
a
be
considered as belonging to the auditory figure pattern of current
interest.
2.
The perceptual content corresponding to a moving obj ect can be
segregated out as the figure when the movement of the object is
being followed by casting the visual sensory organ to track the
object.
stable
The object being followed in this way will be relatively
with
extractors
respect
to
while the
some
still
subset
background
of the
viewed
visual
by the
feature
moving
sensory organ will be relatively unstable to another subset of
feature extractors.
forced
response
from
If there is way for the ES to distinguish a
the
"blurred"
transient response coming
from a visual feature extractor, the perceptual content of the
moving object can be segregated from the perceptual content of
the background.
3.
The segregation process is a continuation from
last viewing.
Since intelligent perceptual tasks usually require many related
viewings of the external scene, those feature extractor s that
were selected at the last viewing can continue to be selected so
that their outputs form the subset of the perceptual content to
be cDl1s1dered as ,the figure of the current viewing.
For ex_pIe,
in vision a line segment of a figure which extends outside of the
last viewing angle may be followed in the current viewing task to
66
see where it leads.
lhose feature extractors that are expected
to produce resul ts in this respect will be selected by the ES.
lheir outputs will be considered
to be part of the perceptual
content of the figure in the current viewing.
In an auditory
exanple, a particular subset of auditory feature extractors may
be monitored for a period of time.
"
lhese feature extractors may
be selected because they are the ones that respond to sounds that
are considered as originating from a particular spatial location
(when the auditory system is equipped with stereo reception).
<r, they may be selected as having some synchrony in intensity
changes at a particular subset of audio frequencies.
lhis may be
the way in which the voice of a particular person is followed in
a room full of people speaking at the same time.
4.
Internally stored perceptual contents in LTM, which are retrieved
through verbal linkages (to be discussed in later sections) to
verbal representors recognized, can be used in a top-down manner
to help the segregation of the figure of interest in the current
scene.
For exanple, the spoken
~rd
"apple" (or the written
~rd
"apple" for that matter) ,may trigger the retrieval of the stored
visual perceptual contents of an apple through previously learned
and stored linkages. lhis may then be used to direct the ES in
its
selection
of, the
subset
of
visual
features
needed
to
segregate a figure of an apple (if there is one) from the current
ae_~.
5.
lhe perceptual content of the current scene,
subsetted
in
a
"cut-and-try"
manner.
lhis
fm(X,p~),
process
can be
can
be
67
iterated until a desirable goal of the intended perceptual task
is
reached
(e.g.,
a
particular
sub setting
results
in
the
recognition of a familiar figure) or until an intolerable amount
of time has lapsed.
4.5.2
Perceptual contents and memory traces
In perceptual tasks, the ES deals with the perceptual content of a
segregated figure and the related ground, rather than the perceptual
content of the entire unsegregated scene.
When the relationship between
the representor and the representee in a given external representation
is being considered, the external representor (and the representee) must
first be viewed and its (their) internal perceptual content(s)
produced
and segregated as the perceptual content of a figure in the input scene.
If the current figure
figure,
its
(the representor R) is considered to be a new
perceptual
fm(R,p~)
content,
along
with
any pertinent
"links" (to be discussed in a later section) can be stored
in the LTM
as a newly acquired piece of external world knOWledge which can be used
in future recognition tasks.
'!be stored memory trace corresponding to
the perceptual content of the current figure,
cm(R,p~)
in Figure 10.
perceptual
fm(R,pm),
r
is labeled
The stored memory trace corresponding to the
content of the ground
is labeled
as
cm(GR'p~).
Further
subsetting and/or transformation can be formulated when a memory trace,
cm(R,p~),
is being
laid
from
f1D(R,p~).
Ibwever,
to
simplify the
followtns discus8ion. we eas... th8t the mcda11ty-spec1fic .part. of the
memory trace to be stored is a verbatim copy of the sensory perceptual
content, i.e.,
e
68
The complete memory trace laid down for the current figure is
where:
cm(R ,p~) is called the content of the' memory trace;
{link1, link2, ••• } is a finite set of specific links pointing to
other parts of the memory.
4.6
POSITIOIIAL PABAMETER AS A YIEWDIG CONTEXT
At a lower level, the positional parameter
posi tion
~
indicates the viewing
(including relative orientation) of the sensory organ as a
whole relative to the part of the scene currently being viewed.
This
parameter provides important contextual information for perceptual tasks
such as recognition.
The perceptual content· of a given scene X or of a
•
given figure R will depend upon the positional viewing context.
Hence,
the positional viewing context must be brought to bear on the perceptual
process on hand.
The value for the positional parameter may be made
available to the ES in question in several ways:
1.
Through feedback from its own measuring and manipulation of the
external scene such as by the pacing or moving of its sensory
organ (or itself), or moving the object scene back and forth to
study the perceptual contents of the figure.
The ES would notice
the way the perceptual content of the figure changes with the the
traces in LTM with the various associated
each
~
~s
as indices.
Here
is expressed in an internal form such as the amount of
69
effort (either as commands ES issues to MOE in order to focus or
e
as feedback from MOE) needed to bring the external figure into
proper focus.
2.
Through
an
external
distance
communication with another
IPS.
measuring
Here the
external forms such as n\JJ1ber of feet, etc.
~
of
such
~
device,
or
is expressed
by
in
These external forms
must be interpreted and internalized into compatible forms
as
those
mentioned
above
for
future
use
in
pattern
recognition.
3.
Through the retrieval of some related memory trace
known value of ~ can then be used as an
cm(R,p~)
whose
internal estimate for
the current figure.
4.
Through a set of general transformation rules that can be used by
ES to relate certain aspects (such as size or shape distortion in
perpective) with the positional paraneter.
Those transformation
rules may be learned by the ES through studying the perceptual
contents of the figure at various ~ as mentioned above.
At this level, the viewing position is said to be egocentric as the
IPS itself is taken to be the reference. point relative to which the
viewing position of the current scene is measured or estimated.
At a higher level, the positional paranter
~
may be processed to
reflect.
spatial relationships in the external world in a nonegocentric
manner.
That is, the spatial position of the current scene can be
r.lat1ld to
SOII-e
external -landmark" on reference point, Who_ pcus1tlon
does not change with IPS as it moves from place to place.
This involves
the perception and recognition of the "landmark" as well as of the
e
70
current scene.
Then the t\i«) respective egocentric
~s
are related in a
way that is meaningful to the ES, e.g., the amount of effort needed for
the IPS to move itself from the landmark to the scene.
If the landmark
is not currently accessible, then LIM must be used to establish the
nonegocentric
positional
relationship between
the
landmark and
the
current scene.
4.7
PERCEPTUAL SIMILARITY
Now that we have formalized internal perceptual content of figures in
an external scene, we. are in a position to formalize the concept of
perceptual similarity between the perceptual contents of t\i«) figures.
Intui tively, when we want to compare the perceptual similarity
objects Rand
0
f t\i«)
R', we must first observe and view them in the same
perceptual manner.
This means that we would not compare the perceptual
content of R observed in modality m (e.g., vision) with the perceptual
content of R' observed in another modality m' (e.g., taste).
It \i«)uld
absurd to say that an apple is perceptuallY similar to an orange because
the shape of an apple is similar to the taste of an orange.
the same modal! ty, we \i«)uld
extractor
with the
output
not compare the output
of another
different
0
Even wi thin
f one featur e
feature
extractor.
Different feature extractors in the same modality extract different
properties of the same scene.
It would be just as absurd to say that an
apple is perceptually similar to an orange because the shape of an apple
1.
/lUi i l ar
totfte
col«
of
an
01""".
'1h.~o,...
we
fONa11.
perceptual similarity as follows:
Given an IPS, t\i«) figure patterns Rand R' in scenes X and X'
71
viewed at positions
~
and
~,
are said to be perceptually similar
in modality m, if they have the same perceptual contents in that
modality, i.e.,
fm(R,p~)
= fIII(R'
,p~,)
by matching componentwise, i.e.,
fT(R,p~)
for i
= 1,
= tif(R' ,p~,)
2, ••• , Np
The requirement of equality in the formalization seems to be too
stringent; but consider the following:
1.
Transducers and feature extractors are generally not one-to-one
functions.
selecting
They cannot be if they are to do their
and
extracting
specific
jobs of
features out of the
input
signals from the external scene.
2.
Perceptual similarity is performed on the perceptual contents of
figures which are subsets chosen by the ES from the perceptual
content of the entire input scene being viewed at
modality.
~,
~
in the given
For a given external scene viewed at the same position
different subsets can be chosen in different perceptual tasks
in which perceptual similarity is compared.
Therefore,
an exact match in
feature
space required
for
perceptual
similarity does not necessarily resul t in an exact match in external
scenes.
In other words, perceptual similarity not only depends on what
the external scenes are but also depends on how they are viewed by the
IPS il1 qmusticn (e.g., in t.tlG aodali t'Y, what SUb sets () r feat·ur e8 to be
selected as ti\e perceptual contents of the figures, etc.).
Perceptual similarity is one of the key concepts that underlie the
~
72
dichotomy of verbal versus nonverbal external representations.
When an
external representation, (RTOR, RTEE, IPS), is being considered in a
discrimination
task (where both
RTOR and
RTEE are accessible),· the
perceptual content that corresponds to the
RTOR and
the RTEE being
considered must first be segregated out as figures from the ext.ernal
scene.
If their selected perceptual contents are found to be the same
(perceptual
similar),
representor
of the
then
RTEE in
the
RTOR
those
is
said
to
be
a
nonverbal
respects that renect how their
perceptual contents are selected and derived.
Thus, the word "apple"
written in red color may be said to be a nonverbal representor of a red
apple with respect to color.
fbwever, visual features other than just
color may be selected for the perceptual contents of the figures in
question, so that red written word "apple" would no longer be considered
to be perceptually similar to that of the red apple.
That is, the word
"apple" with this richer perceptual content selected would not be a
nonverbal representor of the red apple. in visual modality.
Therefore,
in a discrimination task, the question of whether a RTOR is to be a
nonverbal representor of a RTEE must be assumed in the 'light of what is
chosen
by the
IPS as
perceptual-similarity.
be considered
perceptual
the
contents to
be compared
for
On the other hand, an external figure (RTOR) can
as a verbal
contents of the
perceptual similarity.
perceptual
representor of a given
RTOR
and
the
RTEE wothout the
RTEE being
matched
for
To fully characterize this, recognition tasks
involvlnlL1'M lIust be cOftftder«i •. We will de this in sectior. 4.8.
To prepare for the discussion of recognition tasks, note that we have
assumed that stored memory traces are verbatim copies of the perceptual
73
contents of figures previously experienced.
Therefore, the operation of
perceptual siml1arity comparison can be performed between the perceptual
content of the current figure and the memory traces of past figures in
the same modality or even between memory traces in the same modality.
The comparison of perceptual similarity between the perceptual content
of current figure and those of past figures stored as memory traces is
the basis of nonverbal recognition to be discussed in Section 4.8.
4.7.1
Eaoh
,.
Perceptual similarity and the positional viewing parameter
peroeptual
parameter,
p~,
discussed.
As
oontent
whose
role
is
indexed
With
in
peroeptual
a
positional
similarity
Will
viewing
now
be
I
mentioned
in
Seotion
4.6,
the
positional
viewing
parameter provides the viewing context, espeoially when the peroeptual
oontents of the figures being oonsidered are rioh enough to inolude
features that are sensitive to
loudness in sound in hearing).
~s
(e.g .. size and shape in vision,
The perceptual content of a given scene
X or of a given figure R will depend upon the viewing position,
p~,
taken by the IPS.
In peroeptual tasks, the peroeptual content of the current figure R
(fI'l(R ,p~)) is compared for perceptual similarity either wi th that of a
figure
R'
(fIIl(R',p~,)
in
discrimination
tasks,
or
cm(R'.p~,)
in
--,-------_.-------,--13 How the oandidate figure R' is found _ong the searoh spaoe Will be
d1scua8ed in Seet1o~ 4.8.2.
'lb. differmce bet-.endi:SCT'1Il1n.~ion
and reoognition is that in discrimination tasks, the perceptual
content of the oandidate figure R' Will be derived from the
perceptual contents of the current soene (i.e., the search spaoe for
R' is the current soene); in reoognition tasks, the search spaoe for
R' is in the LIM of the IPS in question.
e
74
recognition tasks)13.
If the match is not successful, the system might
attempt to see if the discrepancies are due to the difference in the
viewing
pM
positions,
r
and
rr:, '
if
and
so,
set
a
0
f
internal
transformation rules (as argued in Section 4.6) should be available to
adjust the perceptual contents accordingly and the matching process is
en
performed again.
the other hand, if the perceptual contents can be
matched without any transformations, then we would like to say that the
viewing
positions,
egocentric sense.
1.
whether
known
or
not,
are
equivalent
in
the
This has two consequences:
When one of the positions is not known to the system, then the
other known one can be .used as an estimate.
2.
When both
positions are
known
independently,
then
they must
either be equivalent or some eqUivalence transformation must be
found which "accounts" for the discrepancy.
Thus, in vision, if
two figures Rand R' have the same perceptual content but with
different
undergo
viewing
internal
distances,
then
transformations
their
to
difference in their viewing distances.
perceptual
refiect
the
contents
relative
The one that is nearer
will be interpreted as being smaller in size.
As an illustrative example, consider the Muller-Lyer illusion (Figure
11) •
The illusion is that to the hl.lDan eye, the vertical line on the
left appears to be shorter than the one on the right even though they
can be verified to be of the Sale length.
1968> is as followa.
'the
f1gur~
cne explanation (Gregory,
on .tbe left is recosnlzed to be the
outside corner of a three dimensional structure, while the figure on the
right is recognized to be the inside corner.
These recognitions provide
75
Figure 11:
Muller-Lyer illusion.
new estimates for the relative vieWing distances of the vertical lines.
The vertical line on the left being part of an outside corner appears to
protrude to the front of the background, while the one on the right
appears to dip into the background.
themsel ves as figures,
matched
their
Taking these tlo«) vertical lines
perceptual contents (length,
(perceptually similar).
Fb~ver,
their
etc.)
relative length
are
is
readjusted due to the new estimates of their relative vieWing distances
through the recognition of the context in which they are imbedded (see
Figure 12).
76
Figure 12:
4.8
Interpretation of Muller-Lyer illusion. .
RECOGBITIOR TASK AID LIM
Due to the long time span in many recognition tasks, LTM has to be
consul ted
concerning
the
information
about
the
RTEE,
which
is not
currently accessible, given the perceptual content of the RTOR, which is
currently accessible.
From a theoretical
standpoint, a memory consists of data stored,
together with their addresses.
In general terms, an address is not
restricted to the physical location where a specific dattm is stored;
but rather it refers to the scheme to retrieve a piece of stored data.
'Ihere 1s usually more than one way to retrieve a
g1v~
data, depending on what is known about the data.
consider-
t~
discussion:
p1~e
of stored
Bere we wish to
types of retrieval situations that are important to this
77
(a) RETRIEVAL BY KNOWN CONTENT: Here the full or partial content of
the target data is known before retrieval and the retrieval involves
finding
those
content.
pieces of stored data that have the same or similar
The retrieval is accomplished through searching and matching,
-where searching involves a scheme to ste"p through the memory (need not
be exhaustive) and to match the content of the data being accessed.
The
kind of memory architecture that allows this searching and matching to
be done in parallel is called content addressable memory in the computer
science
literature
important
whether
(Foster
it
is
(1976».
done
in
For
parallel
our
purpose,
or
illustration, we Will give a biochemical example.
in
it
serial.
is
As
not
an
Transfer-RNAs are
cellular macromolecules that carry out translations of genetic codes.
each of them has at least
one for
t~
recognition sites on the same molecule,
a specific triplet code on messenger-RNA, the other
for a
specific amino-acyl-enzyme complex (Mahler and Cordes (1971), Pp .970ff) •
each of these
t~
sites can be considered as the basis of a search for a
piece of data and of a match of its content With a known and specific
one.
(b) RETRIEVAL BY ASSOCIATION: Here one piece of data may be associated
With another piece of data Without any regard to whether their contents
are matched in any way.
The association allows one piece of data to be
in a retrievable state when the associated data 1s being retrieved.
The
association can be implemented either
1.
by a pbJSical limcace t
such as the cue of the t1ol1O recognition
si tes on the transfer-RNA linked together by being physically
bound in the same molecule;
78
._e
2.
by storing the addressing information at each other's site, such
as the pointer linkage technique often used in computer science.
In
the
literature,
the
terms
"associative
memory"
and
"content
addressable memory" are often used' as synonyms (Matick (1975), p. 179;
Hill and Paterson (1973), p.405; Arbib (1972), p.186; Kooonen (1977),
pp.161ff).
fbwever, the words "associative" and "association", when
used to refer to memory retrieval, connote a significantly different
aspect from that of content addressability.
usage here as outlined above.
"link"
r.
l
Hence we distinguish their
To avoid confusion, we will use the word
in place of "associative" or "association"
in the
following
discussion.
Now consider a nonverbal recognition task.
Since a nonverbal RTOR is
perceptually similar to its RTEE, one can say that a nonverbal RTOR
specifies the perceptual content of its RTEE.
Note that the addressing
processes of LTM are private internal processes of each IPS, While the
perceptual content renects the
pUblic object.
sensory information of an
external
Therefore, one cannot expect that the perceptual content
will contain within itself the addressing information of the stored
memory trace of its RTEE.
nonverbal
It follows that nonverbal recognition ofa
RTOR involves a content
addressing
problem
for
L'I'M.
In
contrast, a verbal RTOR need bear no perceptual similarity to its verbal
RTEE; the internal information of its RTEE stored in L'I'M is not directly
retrieved, even through a content addressing
scheme.
For a verbal
symbol itself, in the same way that a nonverbal recognition task is
performed.
When
the
stored data
in
L'I'M for
the verbal
symbol
is
79
retrieved, the verbal RTEE is then accessed through link addressing.
Such a link has to be learned and stored previouslY for each specific
verbal representation; otherwise the verbal representation cannot be
recognized.
4.8.1
CAR: Content Addressable Retrieval
The concept of oontent addressable retrieval (CAR) is formalized as
follows:
Given an IPS and given a peroeptual oontent
rm(R.p~)
in modality m
as retrieval oue. CAR is an internal prooess whioh oauses the
active regions of its LTM ·to be searohed and a memory traoe
dm(R'
.p~,)
to be retr ieved. whose oontent. cm(R'
,~,
). satisfies
the peroeptual similarity matoh:
om(R'
.p~,)
=
fID(R.p~)
Note the following:
1.
Ihe active region of LTM is the searoh spaoe delineating a subset
of LTM in
whioh the memory traces are searohed
and matohed
against the retrieval oue for peroeptual similarity.
2.
Ihe.detailed mechanism of the searoh prooess is left open in this
formalization.
in parallel.
Ihe searoh process may be oonduoted in serial or
It may be exhaustive or heuristio.
It may race
against time and stop whether it has suooeeded or not when time
is up.
the
Ihe oruoi·al point is that the memory traoe retrieved (if
_arch
1.1
sucoeaat'ul)
1.1
perceptually
retrieval oue.
\
s1mllar
to
the
e
80
e.
3.
When the retrieval task is successful, the response, i.e., the
memory trace retrieved, may not be unique; there may be more than
one memory trace in
the
active region
similar to the retrieval cue.
that
is
perceptuallY
When time permits, the system may
exanine as many as possible for the perceptual task at hand.
Alternatively,
the
first
one
which
responds
(assuning
all
responses do not come in at .the same time; this is certainly so
if the search is serial) may be considered.
4.
'!he retrieval cue may not have to be the perceptual content of
the current figure under investigation.
from
memory
traces
perceptual tasks.
retrieved
in
It may be synthesized
previous
but
still
active
For example, the retrieval cue for the next
memory access could be a sub set (chosen by ES) of the memory
trace just retrieved in the current task.
4.8.2
Nonverbal recognition
~
nonverbal external r'epresentations
When an IPS is confronted with a scene X (in modality m), in order to
make sense of the situation, it must try to recognize familiar figures
from the scene.
To (nonverbally) recognize a figure R in scene X is to
say that the IPS in question had some previous perceptual experience of
R (or
some R'
trace(s), and
that is perceptually similar to R)
the retrieval
of the memor.y trace(s)
stored as memory
of R or
successful upon being given the current perceptual content of R.
'th1.s~
the IPS must first . . resate 11
perceptual content of R,
fm(R,p~).
from the perceptual content of X.
fTOli
R'
is
To do
the ace. X alld' derive the
by selecting an appropriate subset
This is performed according to a set
81
of rules as outlined in Section 4.5.1.
It is \«)rth emphasizing here
e
that these rules do not require that the figure R be first recognized.
The recognition task then proceeds by posing the perceptual content of R
as retrieval cue and the memory traces in the active region of L'l'M are
accessed in the content addressable manner.
I
Often the memory trace that
corresponds to R may be retrieved before R is segregated and recognized,
by means of the link retrieval scheme (Section 4.8.3).
When this is the
case, the segregation of the figure R may be aided by the memory so
retrieved (see Section' 4.5.1).
Recognition of R can then be confirmed
after
case,
segregation.
ei ther
In
the
figur e
R is
said
to
be
nonverbally recognized by the IPS in question if one (or more) memory
traces
dm(R',~,)
in
L'l'M
Whose
content
cm(R"p~,)
is
perceptually
fm(R,p~),
similar to the perceptual content of the current figure R,
retrieved •
The content of the memory trace retrieved,
corresponds to
a
past
external figure R'.
perceptual
experience
(by the
cm(R'
IPS)
,~,
is
),
of some
Thus, the triple (R, R', IPS) is said to be a
nonverbal external representaion in the recognition sense.
4.8.3
In
LR: Link Retrieval
Section
4.8.1,
we
have discussed
the concept of the content
addressable retrieval scheme, whereby a particular piece of memory trace
stored in L'l'M can be retrieved through an internal process of searching
and
matching.
Now,
suppose
that
the
specific
information
(called
&41". . .1111 infonution) about where to find a particular pt.• • of lIleDOry
trace can be expressed in a way that is meaningful to the retrieval
system.
Then, if appropriate addressing information is available before
e
82
retr ieval, the system can locate and retr ieve a particular piece of
memory trace directly without having
searching and matching.
to go through the
process
0
f
The mechanism by which addressing information
is expressed or implemented is different for different retrieval system.
For example:
1.
The target data can be directly reached by following a specific
piece of wire or path.
Thus, a semantic network drawn on a piece
of paper coupled with a hlll1an vision system as the retrieval
system is an instance of such an implementation; a' node can be
accessed from another note by visually following the drawn line
that links the two nodes.
Neurons with outgoing axons that make
contact to other neurons seem to be good candidate components to
implement another such system.
2.
The location of each piece of stored data can' be expressed by
some unique representation called address (also known as pointers
in some computer language system).
The address is meaningful to
the retrieval system in such a way that knowing the address is
equivalent to knowing the location of the target data.
Hence,
address is all it needs in order to access the content of data
stored at location.
No matter how the addressing information is expressed or implemented, it
has the following characteristics that concern us in the context of this
disc ussion:
S1nce tbe ac1dr.e.saiDI 1atcnaation specifies tl'le location rather
than the content of the target data or memory trace stored, the
target can be retrieved directly without searching and without
83
requiring its content to be matched for perceptual similarity to
anything.
This is in direct contrast to the CAR scheme.
lite
way
to
use
the
addressing
information
to
augment
content
addressing retrieval is to use it to connect tl«> specific pieoes of
memory traoe together so as to refleot a retrieval relationship which
has nothing to do with any peroeptual similarity between the contents of
the tl«>.
Thus, to formalize, let the 1 ink
linkCdm(R,p~»
denote the speoifio addressing information leading to the particular
memory traoe dmCR,pm) expressed as the argt.ment of the link.
r
The link
is meaningful to the link retrieval CLR) prooess of the IPS in question
in the sense that when the La prooess is applied to the link, the memory
traoe linked will be retrieved.
That is,
LRClinkCdm(R,p~»)
= dmCR,p~)
We say that a link exists from a source memory trace dm' CR' ,~: > to a
target memory trace
memory trace,
traoe.
1.
dm(R,p~)
if the addressing information of the target
llnkCdm(R,p~»,
is stored as part of the souroe memory
Note the following:
Since link retrieval does not require perceptual similarity match
between the linked memory traces, the modality m need not be
equal to m' •
2.
The
links,
as
formalized
above,
are
unidireotional.
A
b1dj,Tecti.cnal link oan be expreaad by t_ links, link( <JII(! ,~) >
stored at dm'CR',pm:> and linkCdm'CR',pm:» stored at dmCR,pm).
r
r
r
e
84
3.
Eaoh stored memory traoe dm(R,pm)
r has two parts:
~
a peroeptual oontent,
om(R,p~),
oorresponding to some past
peroeptual experienoe of an external figure R;
a finite set of links, {link1, link2, ••• }, eaoh a pieoe of
speoifio addressing information leading to some other
memory traoe in LTM.
4.8.4
Verbal Reoognition and Verbal external representation
The oonoept of verbal reoognition oan be formalized as follows:
We
say
that
an
external
figure
R is
reoognized
as
verbally
representing another external figure R' by the IPS in question if the
following events take plaoe:
1.
the figure R is nonverbally reoognized by the given IPS;
that
i~,
the
figure
R is
peroeived
and
modality m with resulting peroeptual oontent
memory traoe, dm(R"
,p~"),
whose
segregated
rm(R,p~)
oontent is
in
and a
peroeptually
similar to rm(R,p~) is retrieved in a nonverbal reoognition
manner.
2.
there exists a link stored in dm(R"
,p~,,)
leading to the
memory traoe d m' (R' ,p~:) oorresponding to the peroeptual
experienoe of the external figure R'.
'!he memory traoe
d m' (R' ,p~:) is retrieved through the link.
Note the following:
must :be peroeptually "aUar
1iIb.1le B and Il' need not be.
1.
ft and fill
2.
'!he speoifio link involved is not intrinsio and must be learned.
Sinoe memory traoes reoord the partioular peroeptual experienoes
85
of each IPS as an individual. their contents and locations stored
~uld
be
external
different
scenes.
as
different
Therefore.
memory traces can not be
through learning.
IPS's
encounter
the link between
intrinsic
tlt«)
different
particular
and must be established
It must be established at a time when the
addressing information of the target memory trace is known to the
system.
Since the addressing information of a memory trace is
known to the system when it is being retrieved. one such time.
during which a specific link may be installed is when both Rand
R' are present in the current scene. triggering the retrieval of
their memory traces.
The concepts formalized in this chapter will be further illustrated and
discussed in next chapter with the use of an hypothetical example in
computer vision.
86
Chapter V
AN EXAMPLE IN CCIIPUTER VISION
5. 1
IPS •SIMPLE
In this chapter we will describe a simple and hypothetical machine,
which shall be named IPS. SIMPLE.
It will be used as a consolidated
example in vision to illustrate the major concepts developed in the last
chapter and to serve as a vehicle to move our discussion on various
issues concerning machine perception and knowledge representation.
5.2
PERCEPTUAL SUBSYSTEM OF IPS. SIMPLE
The sensory driven perceptual subsystem (SDPS) of IPS. SIMPLE consists
of
an
array
of
light
transducers,
a
set
processors, and a set of sensory positioners.
of
feature
and
object
They will be described in
detail in the following sections.
5.2.1
Transducers, sensory positioners and external scenes
The light
transducers serve
as
a
windoW through which external
stimuli (light patterns) will be converted into internal codes to be
processed further.
This set, called T, of transducers consists of 81
elements arranged in a 9X9 rectanlular array behind a properly focused
lens assembly.
'ntis array T, also call. the retina, and the lens
system will be referred to as the eye/camera that can be moved by the
sensory positioner as a whole.
Due to its 2-dimensional rectangular
87
arrangement, the individual transducers in the array can be entmerated
e
and labeled' as Tij , that is
T
={
Tij I i ,j are integers ranging fran -4 to 4}.
We will omit the superscript m since we have only one modality for
IPS. SIMPLE.
Each transducer Tij is of the simple thresholding type, outputing a
binary code (1.e., the output t
ij
is in the set {1,O}) depending on
whether the total amount of local light energy impinging on Tij sampled
during
a
small
time interval
threshold or not.
of threshold
sT
(for
scan
time length)
exceeds a
It is asstmed that all Tij in T have the same level
which will
stay constant over
time.
Hence,
as
time
progresses, a train of such codes will be output by each T , with each
ij
code lasting at least sT time units long.
Therefore, normally, each t ij
should also be indexed by time in increments of sr-
1'bWever, if the
part of the external scene being viewed is static, each t ij will have a
stable output
posi tion.
(after
an
initial
transient
Hence, in this case, the t
posi tions instead.
ij
period)
at
each viewing
can be indexed by the viewing
When part of the scene moves, the retinal image of
the moving object can' be stabilized by using the sensory positioner to
track the moving object.
Therefore, temporal indexing with the viewing
position is involved in the case of tracking.
indexing will not be explicitly pur sued here.
1'bwever, these aspects of
Instead, they
~uld
be
implicitly imbedded in the internal representation as will be discussed
later .
Using htman visual system as a model, we observe that there are many
ways in which different viewing positions can be brought about.
For
e
88
example, the viewing position can be changed by:
1.
moving the head/body (or eye/camera for
IPS. SIMPLE) in x or y
direction 14, resulting in a translational change of the retinal
image of the object viewed;
2.
moving the head/body in z direction, resulting in the change of
the size, hence resolution, of the retinal image of the object
viewed;
(In a camera system with a variable focal length .lens
system, a similar effect can be brought about by zooming the lens
in or out.)
e
3.
panning around y axis;
4.
tilting around x axis;
5.
swinging around z axis;
6.
coordinated converging movements of both eyes for stereoscopic
vision.
See Sobel (1974) for a discussion -of a computer controlled camera system
in computer vision that incorporates and deals with most of the above
featur es.
To keep our example simple for illustrative purpose, let us assune
that the eye/camera has already been positioned at proper distance in
the
z direction
adjusted.
from
the
scene
to be viewed
with proper
focusing
The only wa ys being considered for the eye/camera to be moved
by the sensory positioner are on the x-y plane parallel to the scene.
Furthermore,
the
amount
of movement
in
any
of
the
four
allowed
14 The axis of the lens pointing toward retina is the positive z axis;
x, yare axes perpendicular to z axis. _ x axis points from left to
right and y axis points from bottom to top with regard to the spatial
orientation of the eye/camera system.
89
directions (1.e., +x, -x, +y, -y) is discretized and coordinated so that
the resul tant image on the reti.na Will move an integral
tranSducer
cell
distance.
Therefore,
wi th
these
nl.ll1ber of
simplifying
assumptions, the external scene can be thought of as 2-dimensional tape
squares suoh as in Figure 13.
The tranSducer array acts as 9X9 tape
heads which read a block of 9X9 tape squares at eaoh vieWing position.
Hence, at the viewing position marked by a reotangle in Figure 13, the
tranSduoer array outputs would look like Figure 14(a).
viewing
position
represent
Beoause
these
of
the
P1 for
future referenoe.
viewing
positions
2-dimensional
We
Will disouss how to
internallY in
nature
of
the
Let us call this
a
later
external
section.
scene
which
IPS.SJMPLE sees, the word "figure" Will often be used interohangeablY
with the word "object" used in previous chapters.
Note: the code "*,, is used instead of the code" 1" to
make the patterns more obvious for the human readers.
Figure 13:
1-
External scene of IPS.SJMPLE world as a 2-D tape squares.
'lbe resion m.. ked by a rectBDgle is the first vieWing
poSition P, (see text).
e
90
o0 0 0 0 0 0 0
000 1 000 1
0 a 1 a 1 a a 1
0 1 oa a 1 0 1
0 1 000 1 0 1
a1 11110 1
a1aaa10 1
aa00000 a
0 a a a a a a a
a a 0 0 0 000
0
1
0
0
1
0
0
111- 1?1- CL6+ ??1- 111- ?11- CL7+
111- LN2+ 111- LN4+ 111- 111- L31+
L72+ ?11- X1?- 111- L74+ L3?- L3?+
L31+ ???- EG3- 111- L31+ LN3- 1T5+
1T5+ ?11+ LN1+ ?11+ 1T1+ LN3- L31+
???+ ???- EG7- ???- 11?+ 111- . LE3+
??1- 111- AL1- 1?1- ?11- X11- 111-
a
a
a
(a)
Figure 14:
5.2.2
(b)
(a) An exanp1e of transducer array outputs. (b) An example
of local feature extractor array outputs -- WRPC 1.
Local feature extractors
The next stage
o~
internal processing after transduction is conducted
by a set of 49 local feature extractors arranged in a 7X7 rectangular
array :
F
= { Fij
I i.j are integers ranging from -3 to 3}
Each Fij looks at the outputs of a 3X3 local blocks of transducers
centered at the retinal position (i.j).
'!hus. more specifically. the
inputs for Fij are the outputs of 9 transducers :
{ Ti _ 1• j + 1 • Ti • j + 1 • Ti + 1• j + 1 •
Ti _ 1• j
•
Ti • j
•
Ti + 1• j
Ti _ 1• j _ 1 • Ti • j _ 1 • Ti + 1• j _ 1 }
Note that to avoid the problem of partial inputs at the boundary of the
retina. the size of this local feature extractor array is 2 rows and 2
columns less than that of the transducer array.
'lhe out-put rupona of each Fij is • 4.o-ohar8Cter "deser1pt1veR
corresponding
to
the
patterns at its inputs.
detection
of the
various
cod~
3X3 binary picture
'!here are 29 or 512 different patterns for a
91
3X3 block; not all of them are detected and coded by Fij .
output
1.
~esponse
The input-
e
behavior of each Fij is described as follows
The total of 512 possible patterns is divided into 2 categories,
po .Ii tive patterns and negative patterns, depending on the output
of the center transducer at (i ,j) is 1 or O.
Those 256, or half
of 512, patterns With Tij of 1 are called positive patterns.
78
of these positive patterns are selectively detected and coded
Wi th specific output response codes by Fij •
output responses are listed in Figure 15.
These 78 input-
The remaining 178 of
the 256 positive input patterns are given the output code '1??+'.
Note
that the 4-th character of the output response code is
always
2.
'+'.
The other
indicating a positive pattern is detected.
half of the
512 input
patterns.
outputing O. are called negative patterns.
wi th center
Tij
78 of these negative
patterns are complementary to those given in Figure 15.
They are
detected and coded by Fij With same first 3 characters as their
corresponding
canplementary
po .Ii tive
patterns.
character of the code is '-' for negative patterns.
16 for an example.
The
4-th
See Figure
The remaining 178 are detected and coded as
, 111-' •
Thus. Figure 14(b) gives the collected outputs of the 7X7 F array when
the transducer array outputs in Figure 14(a) are given as their inputs.
Now we pause to canment on the following points
1.
It 1.1 au_«t that the outputs of the 3X3 block of transducers
are sampled by each Fij at the same time and this sampling 1.1
coordinated Wi th the scan time interval .IT of the transducer
-I
e
93
e
(6) Local X-intersection detectors
010
101
111 IX1+
010 IX2+
010
101
(7 )
Local isolated spot detector
000
010 IS1+
000
(8 ) Local uniform intensity detector
111
111 AL1+
111
(9 ) Local possible-line detectors
000
100
000
111 Ll?+
111 Ll?+
111
100
000
001
001
010
110
L2?+
001
all
100
010
010
110
L3?+
100
all
001
Ll?+
001
111
000
Ll?+
L2?+
all
010
100
L2?+
L3?+
110
010 L3?+
010
L2?+
001
110
100
all
010
010
L3?+
010
010
all
L4?+
110
010
001
L4?+
100
110 L4?+
001
100
010
all
L4?+
all
110
000
12?+
110
all
000
14?+
000
all
110
16?+
000
110
all
18?+
001
all
010
32?+
100
110
010
34?+
010
110
100
36?+
010
all
001
38?+
(10) Local possible-X-inter section detectors :
010
111
all
110
111 Xl?+
111 Xl?+
111 X1?+
111
111
010
all
110
all
111
110
Figure 15:
i
%11+
'10
'11
011
Xl?+
11?+
(Continued.) A subset of input-output responses of a local
feature extractor.
9~
111
000
111
Figure 16:
LN1-
An example of nesative pattern and its feature extractor
output.
array so that valid outputs of the transducers will be taken as
inputs 1S •
Thus, like the case of the transducer outputs, the
output of each Fij should also be indexed by time increments of
sF (which is a oanbination of sT and Fij response time).
And
again, when the external scene being viewed is statio, the f ij
oan be indexed by the viewing position instead.
are
t~
types of positions involved here.
Note that there
Q1e is each retinal
position explioitly expressed by the index (i ,j) or implioitly
expressed in a matrix form such as in Figure
1~(a)
and FIgure
The other is the position of the eye/oamera as a whole,
1~(b).
movable by the sensory positioner.
We shall oall the former the
retinal position and oall the latter the eye/oamera position or
the viewing position.
It is the issue of indexing each f ij with
the corresponding eye/camera position that is being discussed
here.
We ohose not to index each f ij explioitly but instead will
discuss a scheme to imbed suoh positional information in a later
section.
This is because an external scene in general is not
olearly marked in some form of (x,y) or (x,y,z) ooordinates but
1S See Arbib(1972, p.SO) and Ullman(1981) for discussions of oases where
delay elements are added for motion detection and motion feature
(e.g., direction) extraction.
95
rather
the vieWing
position is derived
e
perceptually from the
relative position of objects recognized in the scene.
2.
Q1e can think of the input-output response of each Fij
as a
function which divides the 512 possible input patterns into 124
(or however many) equivalence classes, each class labeled by one
of the
124 distinctive
codes are simply a
These
4-eharacter codes.
4-eharacter
somewhat arbitrary and convenient way to
distinctively symbolize these different equivalence classes of
inputs
recognized.
Any coding
scheme
that
can
serve
this
labeling purpose and is amenable to internal processing by the
IPS in question woUld do just as well.
3.
Q1e can carry the feature extraction process one step finer by
thinking of the detection of each of these equivalence classes as
being carried out by one feature extractor.
For example, there
is a 'LN1+' feature extractor for the equivalence classes labeled
by 'LN1+' at the (i,j) retinal position, whose job is to detect
the presence of the input pattern whose output code is LN 1+.
Thus,
each
Fij
is
actually
a
collection
of
these
feature
extractors at this finer level, each outputing a binary code, 1
if the
pattern is detected
as present,
0 if absent.
It 1s
because of the mutual exclusiveness of the occurrence and the
detection of these patterns that we can use a scheme of pooling
and
cod1~1
them 1nto a set of 4-chllracter codes instead of 124
(or howev-er many) dimensions of binary nllDbers (only one of which
Will be
1 -
produced -
:-/
the one corresponding to that
4-eharacter code
the rest Will be 0 for each scan).
However, one
e
96
should note that this is so not because of the nature of the
signal of external scene but because one chooses to design it
this way.
continuous
Thus, if one wants to
probability
ext~nd
function
function (in fuzzy set theoretic)
occurred, then
or
a
this concept and uses a
continuous
membership
for each 3X3 input patterns
124 (or however many)
dimensions must be used.
This is because many of these feature detector s so designed may'
output a nonzero and different probability or membership value,
i.e., no longer a situation of all 0 but one 1.
5.2.3
~
processors and figure processors
The array of F
outputs as illustrated in Figure 14(b) is the set of
ij
local feature values collected over the entire retina unsegregated as a
whole, hence, shall be called whole retinal perceptual content or WRPC
of the external scene as viewed at a given eye/camera
IPS.SIMP.
position by
WRPC corresponds to the concept formalized as rM(X,pM)
in
last chapter.
Q.lr goal is toward defining a perceptual content corresponding to a
part of external scene recognized as an external obj ect or RTOR.
This
definition should be flexible so that it can be applied to any part of
external scene regardless the size of its retinal image.
(For a given
fixed size of retina, one can not guarantee that the retinal image of
any external figure of interest will always be smaller than the retinal
size
lftiet1 viewed
at. a given d1st3!1ee.)
Furthermore.
OUT
peTcep~ual
experience tells us that an external scene is more understandable if it
can be segregated into figures or objects, the recognition of which can
97
be pursued either separately or jointly as parts of a global context or
both -
e
our conscious internal knowledge about our world is object-
centered •
lhis segregation and recognition process is often likened to the
chicken and egg problem (Macla«>rth( 1977). Haven( 1978» •
In order to
segregate the scene meaningfully, i.e., in a way consistent with the.
internal model of the \olOrld, at least some of the figures must be
recognized so that the interpretation of their role in the scene can be
asserted.
On the other hand, in order to recognize any figure, it needs
to be properly segregated from the scene first.
Moreover, in a set-up
like eye/canera, a finite-sized retina, no matter how large, can only be
aimed at limited part of the external scene at any given time.
In order
to expand the range, our approach, modeled after hllDan vision, is to
.provide mobility to the eye/canera with a set of sensory positioners so
that different parts of the scene can be visited and scanned.
of sensory
pos1t~oners
is brieny mentioned
lhis set
in last section and
is
assllDed to have limited f'reedan of movement for illustrative purpose.
lhis moving and scanning process makes an ever-present problem even more
acute.
lhat is, sensory data cane in a piece-meal manner, e.g., one
WRPC per scanning stop for
IPS. SIMPLE.
They must be accllDulated and
integrated with close coordination with sensory positioners into a sort
of cognitive map 16 more useful for perception.
QJr approach to these problems is to propose a finite set of 2K+1
16 Cognitive maps are internal representations of spatial
one's environment (Anderson (1980), p.86).
layout of
e
98
processors or SRPC processors:
G
= { Gk
I
k 1s an integer ranging from -K to K}.
Each of these processors can be viewed either as a reentrant processing
routine which a central computer executes on demand or as a separate
physical
processor.
Al though
there
is
important
physical
and
philosophical distinction between these two views, we don't push that
distinction critically in our discussion here.
In order to see how these SRPC processors work and why we propose
them, let us examine the situation more closely.
5.2.3.1
Segregation or WlPC into SRPCs
Consider the whole retinal perceptual content WRPC 1 in Figure 14(b).
In order
to apply the intuitively powerful
principle of divide-and-
conquer, a set: of preliminary segregating algorithms, such as those
based on edge information, region growing, or connectedness, etc., can
be put to work at this level.
To illustrate, let us consider using
connectedness criterion for segregation for IPS. SIMPLE.
For rectangular grid arrays like T
or F , there are two types of
ij
ij
connectedness, called 4-eonnectedness and 8-connectedness, stemming from
the fact that each cell in the grid has two kinds of neighbors, those 4
sharing a common edge with it and those 4 sharing only one common vertex
with it.
Thus, two adj acent cell s in a gr id hav ing the same or simi! ar
values are said to be 4-eonriected, if they share a oommon edge.
ar.. aid to be 8-aczmectad it tat1 sbM'e at l_st
Two
~ells,
onec~
They
v..tex.
(i"j1) and (in,jn)' are said to be connected by a 4- (or 8-)
path if there is
a sequence of cells,
{(1"j,), (i ,j2)' ••• , (in,jn)},
2
99
such that (im,jm) is 4- (or 8-) connected to (im+"jm+') m ranging from
, to n-'.
A collection of cells is said to be a 4- (or 8-) connected
region if any tW'O cells in the region is connected by a 4- (or 8-) path.
lhus,
4-eonnectedness
or
8-eonnectedness
rectangular grid array into regions.
serv·es· to
segregate
a
Note that segregation process like
this is a sUbsetting operation; the segregated regions are subsets of
the whole.
We will use the words 'region' and 'subset' interchangeably
in this context.
However, there is a well doclll1ented dilemma (Pavlidis(1977, p.57),
Ro sen feld and Kak( 1976, p. 335), OJda and Hart ( 1973, p. 284» •
4-eonnectedness on a picture like Figure 14(a), the figure
segregated into a bottom part shaped like
'H'
If we use
'A' would be
and three isolated dots at
'A'
the top, without granting connectedness to the region inside
outside.
'A'
and
If, on the other hand, we use 8-eonnectedness, then the figure
is segregated out as a connected region; at the same time, the
'A'
region of O's inside
is also connected to the region of O's outside.
A solution to this dilemma provided by the literature cited above is to
apply 8-connectedness to segregate regions of 1's and 4-oonnectedness to
segregate regions of O·'s or vice versa.
In the folloWing discussions,
we assume that IPS.SJMPLE uses 8-eonnectedness to segregate positive
patterns as figures in WRPCs and 4-eonnectedness to segregate negative
patterns as background.
Wi thin
choices
the
ana
po wer
0
f
We note but Will not elaborate that it is
IPS. 3IMPLE to
sWi to. h between
the t WI:)
0
PPO s1 te
that its ltn.oWledge or Wblch cQlIDect84neu is applied to
which woUld be used by it for the analysis of the result of segregation.
We also note in passing that the ability to consciously apply different
.. !
e
100
segregation algorithms in the perception of a scene seems to provide a
basis for an explanation to the well-known phenomenon of figure-ground
rever sal.
Thus, using the connectedness criterion
discuss~d
above, WRPC 1 can be
segregated into 4 sUbsets, two positive sUbsets, SRPC+ , SRPC+ , and two
1
2
negative subsets, SRPC_ , SR PC_ , as in Figure 17(a)-(d).
1
2
SRPC stands
for segregated retinal perceptual content, from which SRPC processor s
derive- their names.
SRPC corresponds to
the concept formalized
as
fmC R,pm) in last chapter.
5.2.3.2
Moving to a new viewing position
Since we have assumed
that the sensory positioner can only move
eye/camera in discrete steps in x-y plane, let us express the relative
eye/camera movement from
one viewing
position
to next as
(delta.i,
delta.j), where delta.i and delta.j are components of relative movements
of eye/camera on x and y axis, positive if they are toward positive
direction of x and y axis (1.e., right and up respectively).
expression,
let
us
assume
that
the
sensory
Using this
positioners
move
the
eye/camera from the current position, P1' with (delta.i,delta.j)=(3,O)
to a new viewing position, that is, 3 columns to its right.
this new viewing
position
P2 for
future reference.
Let us call
The transducer
outputs at this new viewing position are given in Figure
feature extractor arTay outputs aTe WRPC
2
as given in Figure 18(b).
aa-. agree.at1Dl
proo. . . i"" applied to WiPC
SRPCs, given in
FIgure
SRPC-5.
19(a)-(e)
18(a) and
The
2 aDd ... recates it into 5
as SRPC+ ,
3
SRPC+4'
SRPC_ ,
3
SRPC-4'
101
CL6+
LN2+
L72+
L31+
IT5+
111+
111+
LN4+
LN1+
111+
L74+
L31+
IT1+
111+
(a) SRPC +1
CL7+
L31+
L31+
IT5+
L31+
LE3+
111-
111-
111-
111-
111-
Figure 17:
111111-
111-
111-
L31LN3LN3111X1?-
???-
111111-
EG7AL1-
111-
111111-
111X11EG3-
111111-
(b) SRPC +2
111(c) SRPC
-1
??1-
(d) SRPC_
2
SRPCs segregated from WRPC 1: (a)SRPC +1· (b)SRPC +2·
(c)SRPC_ 1 • (d)SRPC-2.
Since IPS.SIMPLE made this scanning movement, it shoUld be able to
use this information to correlate the resul ts of the two scans; i.e.,
SRPC+
3
is the r11ht part of SRPC+ 1, SRPC+2 was the leftpar1: of SIPC+4
17 Furthermore, it shoUld also be able to check any changes not due to
the scanning movement itself -- i.e., motion detection •
... 1
tit
102
o0
0 0 0
1 000 1
0 1 o0 1
o0 1 0 1
o0 1 0 1
1 1 1 0 1
o0 1 0 1
o0 0 0 0
o 0 000
0 0 0
1 1 1
o0 0
o0 0
1 1 1
o0 0
000
0 0 0
0 0 0
0
1
0
0
1
0
0
0
0
???LN4+
??????L74+
L3?+
IT1+
???+
111-
??1???11?+
???111-
??????L3?LN3LN3??1X11-
L1?+
?????????+
??1???-
??1-
111-
LN1+
EG7EG3LN1+
EG7AL1AL1-
LN1+
EG7EG3LN1+
EG7AL1AL1-
(b)
(a)
Figure 18:
CL7+
L3?+
L3?+
IT5+
L3?+
LE3+
(a)Transducer array outputs at the new viewing position( see
text). (b)Corresponding local feature extractor array
outputs - WRPC •
2
and so on 17.
To achieve this goal, the SRPC processors are proposed to
perform, among other things, this bookkeeping role fran scan to scan.
This
works
as
follows.
After
WRPC 1 from
viewi ng
po sition
p 1 is
segregated into 4 subsets, each subset is assigned to an SRPC processor
fran a pool of available SRPC processors.
Let us suppose .that this is
the very first scan in IPS. SIMPLE t s life and all SRPC processors are
available 18.
preliminary
processors.
dispatcher.
Let
GO
be
segregation
Because
of
the
one
process
this
responsible
and
role,
for
for
carrying
assigning
GO will
be
SRPCs
refered
out
to
to
the
other
as
the
Let the following assigrments be made:
G+ 1 to SRPC +1
G+ 2 to SRPC +2
G-1 to SRPC
-1
G..z to SlPC-.2
18 This situation should be very rare. In general, there Will always be
some SRPC processors already assigned and active fran previous scans
even when scan movement is 1 arge.
103
LN4+
111+
L74+
L31+
1T1+
111+
(a) SRPC +3
CL7+
L31+
L31+
1T5+
L31+
LE3+
111-
111111-
111111-
111-
111111L31LN3LN3111X11-
L11+
LN1+
LN1+
111+
LN1+
LN1+
EG7AL1AL1-
EG7AL1AL1-
(b) SRPC +4
(c) SRPC
11111111?- ·11?-
??1?1?-
-3
(d) SRPC-4
???11?-
EG7EG3-
EG7EG3(e) SRPC_
5
104
These selected processors with assigned SRPCs are said to be active.
The Fij which is part of a SRPC assigned to
attached to
that~.
~
is said to have been
Segregation process strives to attach all elements
in F at each viewing position.
Fach active
.~
stor'es the assigned SRPC
in its memory before the outputs of F array change when eye/canera moves
to a new viewing position.
When the output of an Fij changes, it seeks
a new attacanent.
The collection of the memory of
all active processors can be called
the short-term sensory store or STSS mentioned in last chapter.
5.2.3.3
Perceptual size and egocentric position parameters of SRPC
Fach active
~
computes and updates two perceptual parameters for its
a ssig ned SR PC :
1.
The retinal image size of SRPC : (size.i, size.j), where
size.i
= i max
- i min + 1
s1 ze.j
= jmax
- jmin +
and
i
SRPC }
ij in
max
= MAX{
i
F
i min
= MIN{
i
F
jmax
= MAX{
= MIN{
j
F
j
F
jmin
ij in SRPC
}
ij in SRPC
}
ij in SRPC
}
Hence (s1ze.1, size.j) is the size of the smallest rectangle that
contains the SRPC.
(Other shapes, e.g., circles and ellipses,
can a1. be UItd el:tber instead or jointly, providing different
assessment of the overall geometric shape of SRPC.)
the size of SRPC+ 1 is (5,6>'
For exanple,
105
The egocentric position of SRPC
2.
ego.i
= -(nearest
= -(nearest
ego.j
.. (ego .i,
e
ego.j), where
integer of (size.i - 1)/2 + i
)
min
integer of (size .j - 1 )/2 + 1
)
min
Hence (-ego.i, -ego.j) 1s the retinal position of the center of
the rectangle mentioned above.
(-ego.i,
-ego.j)
Q-o,
to put it more generally,
is the amount of relative movement that the
eye/canera has to be moved fran its current position to a viewing
position so as to place the SRPC at the center of retina.
For
exanple, the egocentric position of SRPC+ 1 is (1,-1).
The SRPC, its size and egocentric positions are called the perceptual
values of the content of each active
~.
the perceptual values of each active
~
Note that the egocentric
See Figure 20 for a sunmary of
at the viewing position P1'
position is the
positional relationship
between the part of the external figure being viewed and the viewer
(Le., the eye/canera of IPS.SIMPLE).
the
figure,
the
egocentric
position
When the viewer moves relative to
Will
change.
La ter
when
we
introduce the concept of nonegocentr ic positional relationship, we Will
show that
egocentric
relationship.
Note
position
also
perceptual
is a
special
that
if the movement
size
parameter
would
case
in
of nonegocentric
z direction
allowed,
the
change
With
change.
Hence. the computation of both parameters could lead
were
distance
to
a
perceptual algorithm for translational and size invariance.
Fach processor in the system is identified by a unique address which
can be used to establish various links between these processors.
In the
e
following
discussions,
we
s}'lllboli-ze
(i.e., 'Gk ') of the processor itself.
relationship
processors.
but
(he
also
actually
the unique
address by the name
lhese links not only signify the
alloW
messages
use of such addresses is in
to
now
between·
the establishment of
It
described in the following section.
is
107
Nonegocentric positional relationship between SRPes
Fach active
relationship
~
compiles and keeps a list of nonegocentric positional
pointers or
links
to
other
active
processors.
These
relationship pointers take the form of (type, addresses, values), where,
1.
type -- type is a label indicating the type of the relationship.
In the case of nonegocentric positional relationship, we give it
a 1 abel of 'NONEGO'.
2.
addresses -- addresses are the addressing
information of the
source and target processors. involved in the relationship.
Since
the pointer of the relationship is always stored in the source
processor, only the address of the target processor needs to be
explicitly expressed most of the time.
3.
values -- values are specific perceptual information pertaining
. to the relationship.
In the case of a nonegocentric positional
relationship, there are two kinds of values : one regards whether
the two SRPCs are connected or not indicated by a label fran the
set
{CONNECTED,
NOT-CONNECTED,
.}.
The
connectedness is not known at the time.
'.'
means status on
The other value gives
the information about how far apart the two SRPCs are in terms of
the amount of relative movement that the eye/canera needs to be
moved from the egocentric position of the source SRPC to that of
the target SRPC.
For
example,
It is of the form : (nonego.i, nonego.j), where
ncmecc..1
= 8Ie .1 so ur ee
- e&o.itarget
nonego.j
= ego.j source
- ego .jtarget
after viewing
position
P1'
the list of nonegocentric
e
108
relationships of G+ 1 with other active processors consists of three
pointers :
(NONEGO, (G+ ,G+ ), (NOT-CONNECTED, (4, 0»)
1
2
(NONEGO, (G+ ,G_ ), (NOT-CONNECTED, (1.,-1»)
1
1
(NONEGO, (G+ ,G_ ), ( NOT-C ONNEC TED, (0, (}»)
1
2
Note that since all SRPCs are viewed at this single viewing position,
(delta.i, delta.j) = (0,0) for above calculations.
See Figure 20 for a
sllDmary of the nonegocentr ic relationship pointers established at the
first scanning position.
lbt
every
NONEGO
link
between
every
processors needs to be implemented.
are linked by at least one
possible
pair
of
active
As long as all active processors
unbroken
path of NONEGO links (in one
direction), the rest of all possible links can be inferred fran a chain
of properly chosen links.
But this is an issue of the trade-off between
processing time and storage space.
While storage space can be saved by
not storing redundant links, processing time must be spent in inferring
them when need ed •
The
egocentr ic
position
parameter
introduced
earlier
can
be
considered as a special case of NONEGO link, provided, we assign a
special positional status to the dispatcher GO'
Let GO be assigned the
position status of the center of the retina, (0,0).
will never be changed.
reference
point.
That is, the GO al ways acts as a viewer-centered
The egocentr ic
position of all
prooeuora 1. actually apa.a1t14ul
processor and GO'
stored in each
~
That assignment
other active SRPC
relationship between that
SRPC
Thus the egocentric positional parameter should be
(k not equal to 0) in the following form :
109
(EOO, (Gk,G O)' (NOT-CONNECTED, (ego.i,ego.j»
with (ego.i,ego.j) initially computed as mentioned before.
CONNECTED, unless the figure associated with
IPS.SIMPLE itself.
It is NOT-
is perceived as par.t of
From this point on, we shall regard the egocentric
positional parameter in
node (GO).
~
e
~
(k not 0) as a link with an implied target
The EGO links remain a separate type from
NONEGO links
because of its special status.
For each of these EGO links, there is a reverse link stored in GO
(EOO, (G O'~)' (NOT-CONNECTED, (-ego.i ,-ego.j»
This list of EGO links in GO helps to keep it informed of how many
active processors in the system at any given time.
When the eyel canera is moved by an amount of (delta.i ,del ta.j) to a
new viewing position, the above links must be updated to :
(EGO, (Gk,G )' (NOT-CONNECTED, (delta.i+ego.i,delta.j+ego.j»
O
(EGO,
(GO'~)'
e.
(NOT-CONNECTED, (-del ta .i-ego.i ,-del ta .j-ego.j»
We can let GO be reponsible for keeping track of the eyelcamera movement
by communication with sensory positioner and initiate the update process
in each active
~
~
by sending the information (delta.i,delta.j) to each
through its EGO links.
Updating stored SRPCs after moving to a new viewing position
5.2.3.6
Let us consider what happens when the eyelcamera moves an amount of
(+3,0)
to
the next scanning
poa1t1on (Figure
position,
P2'
producing WRPC
2
at that
18(1)).
With the knowledge of the amount of eye/canera movement, it would be
straight forward for each active SRPC processor to decide which of the
e
110
following cases happened :
1.
case (a) : the figure corresponding to its assigned SRPC is being
shifted out of the retina by the scanning movement.
2.
case (b) : a new part of figure is being shifted into the retina.
3.
case (c) : both case (a) and case (b), when the perceptual size
of the figure is large in the direction of the scanning movement.
These three cases will be discussed in more detail below.
An example of case
(a) is illustrated by SRPC.
1
(in Figure 17(a»
before the move and SRPC. (in Figure 19(a» after the move. In this
3
case, knowing the direction and the amount of eye/camera movement, G. 1
would anticipate that its SRPC is going to be shifted partially out of
the retina.
external
This means that if nothing changes on the part of the
figure
whose
perceptual
content
was
SRPC .1'
then
the
new
perceptual content of that figure at the new viewing position would be a
subset of SRPC. •
1
Thus, G.
1
would
uPdat~
the egocentric positional
parameter from (1, -1) to (4, -1) which is the new egocentric position
expected of SRPC.
1
after the movement of (.3,0).
change the actual stored values of SRPe.
1
There is no need to
if it is found that the part
of SRPC. 1 still in the retina after the move (presumably SRPC. ) agrees
3
with the old information in SRPC. •
1
G.
1
checks this by carrying out the
For each f ij in SRPC. , it compares it with the
3
stored value of SRPC. 1 at the retinal position (i.delta.i, j.delta.j).
following comparison.
(delta.i, delta.j)
is the amount of eye/camera movement (.3.0).
If
there is soare teaJlOTal chanin of the figure during the Dloveaumt either
due to change in illumination or due to figure movement, then a missmatch would occur.
This can serve as a basis for motion detection.
For
111
a static scene such as in this example, the comparison \«)uld result in a
match, and
G+ 1 would keep the old
SRPC +1 as the
stored
e
perceptual
content with the egocentric positional. parameter modified as mentioned
above.
There is no change in the perceptual size .of SRPC +1'
The new
Fij whose output is in SRPC +3 attaches itself to G+1' signifying that it
is part of the perceptual content
before as SRPC +1'
0
f a figure which has been viewed
There is no change to any nonegocentric positional
relationships involving SRPC +1'
Case (b)
is illustrated by SRPC+2 in Figure
17(b) and
FigUre 19(b) before and after the (+3,0) movement.
case (a)
SRPC +4 in
It 1s the reverse of
: the new SRPC, SRPC +4' has more perceptual content of the
ex ternal figure than the old SRPC, SRPC +2'
In this case, the G+2 would,
after assur ing itself that there 1s no temporal change by checking the
shifted SRPC +2 against the relevant part of SRPC +4' update its SRPC by
using SRPC+4 as its new assigned SRPC.
Since G+
2
now has a new SRPC,
its nonegocentric positional relationship with other active processors
\«)uld also need to be updated.
only its own
stored
in
stored
the target
initiated by sending
I
Note that this updating involves not
NONEGO pointers but also those
processors.
Such rippling-off change can
proper control messages along
involved target processors.
NONEGO pointers
the 1 ink to
be
the
See Figure 21 and compare the link lists
wi th those in Figure 20 for the effect of updating perceptual val ues in
case (b).
Caae ( c) 1s illustrated by SRPC -1 in Fiiure 17(0)
Figure 19(c) •
-
SlfIC -3
in
The processor involved, G_ , checks the relevant parts of
1
SRPC -1 and SRPC -3 for temporal changes.
And again, in this example,
e
112
Node
Value
~ SRfe. 1
Size
(5:"6)
Link Type
Eoo
NONEGe
NONEGe
NONEGe
NONEGe
NONEGe
Values
NOT-CONNECTED.
NOT-CONNECTED.
NOT-CONNECTED.
NOT-CONNECTED.
NOT-CONNECTED.
NOT-CONNECTED.
(
(
(
(
(
(
Eoo
'NONEGe
NONEGe
NONEGe
NONEGe
NONEGe
NOT-CONNECTED.
NOT-CONNECTED.
NOT-CONNECTED.
NOT-COHNECTED.
NOT-CONNECTED.
NOT-CONNECTED.
(-2, -1)
(-6. 0)
(-5. -1)
(-6, 0)
(-2,-1)
( O. 1)
Eoo
NONEGe
NONEGe
NONEGe
NONEGe
NONEGe
NOT-CONNECTED.
NOT-CONNECTED,
NOT-CONNECTED.
NOT-CONNECTED.
CONNECTED
,
NOT-CONNECTED.
( 3,
(-1,
( 5,
(-1,
( 3.
( 5.
Eoo
NONEGe
NONEGe
NONEGe
NONEGe
NONEGe
NOT-CONNECTED,
NOT-CONNECTED,
NOT-CONNECTED,
NOT-C ONNECTED,
NOT-CONNECTED.
NOT-CONNECTED,
(
(
(
(
(
(
Eoo
NONEGe
NONEGO
NONEGe
NONEGe
NONEGe
NOT-CONNECTED,
NOT-CONNECTED.
NOT-CONNECTED.
NOT-CONNECTED.
CONNECTED
,
NOT-CONNECTED.
( 0,
(-4,
( 2,
(-4,
NOT-CONNECTED.
NOT-CONNECTED,
NOT-CONNECTED.
NOT-CONNECTED.
NOT-CONNECTED,
NOT-CONNECTED.
(-2,-2)
(-6.-1)
( 0.-1)
(-5.-2)
(-2,-2)
(-6.-1)
4,-1)
6, 0)
1,-1)
0, 0)
4,-1)
6, 1)
0)
1)
1)
1)
0)
2)
PART-oF
G-3
safe -3
(7, 7)
4.-1)
O. 0)
6, 0)
1, -1 )
4,-1)
6, 1)
0)
1)
1)
1)
(-3, 0)
( 2, 2)
PART-oF
Eoo
NONECO
NONECO
NONEGe
NONECO
NONEGe
H,
liAs.-u-.Pur
iiA$.4s.,.pABT
·.note: Go and its stored EGO links are omitted fran this list"
Figure 21:
SUmmary of updated perceptual values in the active SRPC
processors after moving to the second vieWing position P2"
113
agreement would be fotmd.
However, in this case, both SRPes contains
some non-overlapping information and one can not totally replace the
other.
Hence, for updating, it presents an engineering, perhaps even
philosophical, dilemma.
If one tries to combine SRPe_
1 and SRPe_3 into
a single entity by taking set tmion of the two, then the SRPe processor
would end up having to be able to deal with an array of perceptual
values whose size is larger than the size of F array.
is no conceptual limit to
that
Worse yet, there
size, since an external
interest may be arbitrarily large in size.
figure of
We have just contained this
Now we would invite
problem by setting a limit on the retinal size.
this problem back by requiring each SRPC processor to have the ability
to handle it all.
Besides, this would create a
pro~lem
of how to
express and keep track of the egocentric and nonegocentric positional
relationships
for
the
constituent
SRPes
compatible to what we have just defined.
in
en
a
way
that
is
still
the other hand, if the
maximl.lll size of the SRPC each processor has to store is kept as the size
of the F array, then two SRPe processors have to be assigned in this
case, each storing a SRPC with overlapping information between the two,
resul ting in storage inefficiency.
To solve this dilemma, we opt for
the second approach but with a higher level structure added which not
only can eliminate the efficiency problem but also lends itself to more
nex ibility in the organization of perceptual data.
structure is described in the following section.
This higher level
e
114
5.2.3.7
Figure processors
A higher level of structure is proposed to
organizing
the
SRPC
processors
in
the
serve the purpose of
case
of
corresponding to an external figure being viewed.
yet
another
set
of
processor s
called
multiple
SRPCs
It takes the form of
figure
perceptual
content
processor s, or simply figure processor s :
H
= {H,m
I m
= 1,. •• ,M
}
When it is realized that the perceptual content of a figure being
viewed consists
0
f more than one SRPC, a figure processor is chosen from
a pool of free H processors to capture and manage these SRPCs through
their SRPC processors.
to
capture
the
Let us assume that in our example H is chosen
1
processors
G_
1
and
G_
3
•
The
capturing
process is
implemented by establishing tli:l new types of links (or pointers), PARTOF and HAS-AS-PART, in the three processors involved.
HAS-AS-PART links and
their variations have been Widely in
knOWledge representation in artificial intelligence.
Winston
(1975,
meaning to
1979).
The PART-OF and
Here we
provide a
For example, see
perceptual and operational
these links by giving an algorithm to establish them as
result of a perceptual process.
Two PART-OF links are established and stored, one in G_
(PART-OF, (G_ , H
1
one in G_
use for
1
»,
3
(PABT-OF, (G_ , H ) .
1
3
'1WD HlS...AS-PUT linb are stMed in H :
1
(HAS-AS-PART, (H 1 , G_ »,
1
-3» .
(HAS-AS-PART, (H 1, G
1
115
As in the case of G processors, we use the name of H processors to
symbolize its addressing information in the links.
See Figure 21 for
for a summary of all the active processors, their perceptual values and
the links established between them as the scene is fir st viewed at p 1
and then at P2'
Al though the information in Figure 21 is presented in a tabular form
with links expressed as a list of pointers, close to the way they would
be implemented
in
a digital
computer,
the
same information can be
presented in a network form such as in Figure 22.
(In order to achieve
a clear layout, not all NONEGO links in Figure 21 are drawn in Figure
22)
Note that the NONEGO links between G-1 and G-3 have perceptual value
CONNECTED, indicating that they are perceiVed as connected parts of a
figure.
In general, connectedness is not a necessary attribute between
parts of a figure and hence is not a neoessary condition for a group of
SRPC processors to be captured by a figure prooessor.
conditions
that
can
processors into a
trigger
higher
an
level
IPS
to
Other sufficient
conceptually
struoture include, . for
perception of figures that move together, the
group
active
example,
the
perception of figures
clustering together, being told about the figure and its composition,
etc.
We will not go into them for IPS. SD-tPLE.
any nunber
figure..
of figures can be grouped
'thus, not only oan an H
Note that conceptually,
together to
proce~r
capture G
form
a composite
pr'O<H!ellOrS
but it
also can capture other aotive H processors.
In general, if
IPS.SD-tPLE realizes that several active processors
should, for any reason, be grouped together and perceiVed as part of a
tit
116
EGO LINKS
~
(4,6)
_
NC,(-2,-l)
Nt, (2.1)
-
o
(7,7)
rc,(3,OI
o
rc,(O,OI
-
-
NC,(O,O)
NC,(4,-l)
,
(3~.3)
~
•
Nt, (-3,0)
c
(7,7)
_
c
_
Nt. (4, I)
NC,
(6, I)
NC.
(-6,-1)
11
NC, (-2,-2)
,
(3,2)
_
_
-r__--n:,.....,...,.....,,.
Ne, (2,2)
&Q
_toe: v«'t1cal links sr. IOI£"GO li1!ks, not all which are listed in
Figure 21 are drawn here.
Figure 22:
The organization of active processors in Figure 23 reexpressed in net\lCrk form.
117
new composite figure, it can do so by assigning a new H processor to
stand for that figure and capture those active processors as its parts.
The roles of an active figure processor include :
1.
Reducing
the
amount of overlap qf the
SRPCs
stored
in
its
captured SRPC processors by checking to see if a newly captured
SRPC
(
at a
new viewing
position)
can
captured SRPC without loss of information.
replace a
previously
This needs to be done
at the level of H processors because in case (c) the redundant
information
SRPCs.
in one
SRPC may be distributed
in
several other
This can be done in a way similar to case (b) by checking
the sizes, egocentric positions and nonegocentric position links
of the captured SRPCs.
Regenerating on demand any intermediate SRPCs fran tlC or more of
2.
its captured
SRPCs.
described above.
This is the reverse process of the one
This regeneration process may be need for the
comparison of perceptual similarity of tlC figures.
As an exanple, let us suppose that the eye/canera of IPS. SIMPLE scans
the scene of Figure 13, starting at p l' then at P2' then moving one
coll.ll1n to
the' right at
(del ta.i, delta.j)
=
(24,
processors as it scans.
the final point.
a time until
it
finally moves a
total
of
0) £'r'om p l' building its netlCrk of active
Figure 23 shows a portion of that network at
It illustrates how some of the processors in that
netlCrk mi.&ht be grouped into a higher level structure tbrouah the use
of H pr oee'Ss:tr S •
In the above discussions, we have mentioned loosely the perceptual
content
of
an
external
figure.
OUr
main
goal
is
toward
the
e
118
TO OTHER NODES
a:
/(. .. (( ...
-
HAS-r\S-PART
((-
15ARt:oF
NmGU<
..,
EGO
L.INKS
SIZE
(2'.-1 )
G+ 1
SRPC+ 1
('.6)
(-Z"
j)
NC,
'\G
(7,-1)
NC,
(-7,t>
HA$-r\S-PART -
flART:oF
6)
:+2
C7,~ ..
(18,0)
(-Ia,o)
C~.
(4,0)
HAS-r\S-PART
I5AR'F-OF
°
(-4,0)
~a.
HA$-r\S-PART
PARf:oF
e
I
SRPC.j.7
~
G:3
SRPC+ 11
(7,y
(14,0)
@7
- I si
NC,
OF THE
(7,1)
NE"J'WaU(
(-7,-1)
~NC'
Hr\S-r\S-PART
(7,-1)
PARi"4
G~
SRPC+24
TO
OTHER
NOOES
(-14.d)
",6)
..
-
(-1, 1)
NC,
(7,0)
~NC
( -7,0)
HAS-r\S-PART
PARt:oF
~5
SRPC+3 ,
".,
(0,-1)
(0,15
*Note: verotical links are NONEGO links. not all which are possible
are dra\«1 here.
Ne : NOT-C ONNECTED
CON : CONNECTED
SRPCa are doctalented in Filar. 24.
Figure 23:
Partial drawing of the netw:»rk of active processors after
the scene in Figure 13 is scanned by Ips·. SIMPLE (see text).
119
o0
0 0 0 0 0 0 0
011111000
o10 0 0 0 1 1 1
o 1 0 0 001 1 0
011111010
o 1 0 000 0 1 1
o 1 0 0 000 1 0
o0 0 0 0 0 0 1 0
o 0 000 0 0 0 0
CL7+
L31+
L31+
IT5+
L31+
LE3+
L11+
LN1+
LN1+
L58+
111+
111+
111+
LN1+
LN1+
L52+
111+
111+
111+
IT5+
L31+
LE3+
L11+
LN1+
111+
LN1+
LN1+
LN1+
LE5+
L 11+
LN 1+
LN 1+
1??+
LN1+
LE5+
L11+
LN1+
LN1+
111+
111+
111+
IT5+
L31+
LE3+
SRPC+7
o0
0 0 0 0 0 0 0
1 1 0 0 0 0 0 0 0
001 1 1 1 100
001 100 0 1 0
1 101 000 1 0
000 1 1 1 100
000 1 0 0 0 0 0
000 1 0 0 0 0 0
o0 0 0 0 0 0 0 0
L58+
111+
111+
L52+
L58+
L74+
L36+
L52+
o0
0 0 0 0 0 0 0
1 000 000 1
o 1 0 0 000 0 1
o 1 0 0 0 0 001
o 1 000 000 1
o 1 000 000 1
011111001
o0 0 0 0 0 0 0 0
o 0 0 0 0 0 000
o
LE7+
LN3+
LN3+
LN3+
L31+
CL1+
L11+
SRPC +24
o0
0 0 0 0 0 0 0
00111 1 100
00100 0 0 0 0
00100 0 000
001 1 1 1 000
o 0 1 000 000
001111100
o0 0 0 0 0 0 0 0
00000 0 0 0 0
CL7+
L3?+
L3?+
IT5+
1??+
CL1+
LE5+
LE5+
SRPC +35
Figure 24:
SiPes used in Figtre 23 (ftcept SIPC +1 Which
17).
i~
in P'1aure
120
formalization of the concept of external figures or objects in terms of
the way it is (or can be) perceived by a given IPS.
We believe that
there is no universally applicable way to segregate a given scene 'into
its figure compositions independent of how the scene is to be perceiVed
by a given IPS.
The external scene just exists as a whole.
<illy when
the signals fran" it are f11 tered through the perceptual system of a
given IPS, it becomes represented in an object-centered manner.
And it'
is in this Object-centered manner, perceivable parts of the external
scene are cast in
the role of RTOR in
disc ussed in last chapter.
an
external
representation
In the folloWing sections, we Will use the
exanple of IPS.SlMPLE as a basis to develop a formalization of what is
the concept of the perceptual content of an external figure.
5.3
PERCEPTUAL CONTENT AND PERCEPTUAL CONTEXT OF AN EXTERNAL FIGURE IN
IPS.SIMPLE
5.3.1
Perceptual network
In general, after many (not necesar11y exhaustive or overlapping)
vieWings of an external scene, the perceptual subsystem of IPS.SIMPLE
would develop an active netl«)rk of interacting
figure
processors
whose
stored
perceptual
represent the current perception of the scene.
SRPC processors and
values
and
interaction
Each vieWing would cause
some addition or modification of parts of the network.
We call this active netl«)rk of processors in. IPS. SIMPLE a perceptual
network,
or s1rI.pJ.y .....--.
PHet.
.'
..
e
FormaJ.1Y~
a Plet i.9 a 2-tuple
Ol~l)
where :
'
1.
N is a set of active node processors, or simply nodes, chosen
from the union of the sets G and H.
121
2.
L is a set of labeled links of the form :
(TYPE, (n ,n ), VALUES)
s t
TYPE is the label of the link indicating its type, (ns,n t ) are
the addresses of the source node and the target node of the link,
and
VALUES are a
set of optional
information
pertaining
to
perception.
To
recapitulate what
sections,
we
list
we
the
have discussed
following
and
brief
illustrated
summary
points
in
previous
about
the
perceptual network (N,l.) in IPS.SJMPLE
1.
Fach node has a unique address in the system, which can be used
by other nodes to access that node.
In
our
discussion,
we
symbolize the address of, and thus, identify, each node by the
name of the. processor.
2.
Some nodes have specific perceptual values developed and stored
pertaining to the perception of parts of the scene viewed.
We
have proposed the following node values for each node processor
fran the set G : SRPC and its perceptual size.
These values are
updated
new
when
the
node
processor
absorbs
perceptual
information related to that SRPC.
3.
There
are
rules of assigning
new processors leading
to
the
creation of new nodes in the perceptual network as perception
goes on.
Generally, whenever there is change in the sensory data
(in our exanples, this is illustrated mainly as due to the change
of viewtna positions of IPS.SlJ"L.E) re:ult11'Jl i~ 1few SI.PCs 1Cich
can not be entirely absorbed into the network, new processors
will
be
assigned
to
attach
and
store
these
SRPCs
and
new
122
relationships between new processors and existing ones Will be
established.
4.
Fach link represents. a perceptual relationship between t1l«) nodes
in
the net1l«)rk, perceptual
established
as a resul t
in
the
of the
sense that the links are
perceptual
processing
of an
external scene and reflect certain relationships between objects
in the
scene.
We
have proposed
NONEGO, PART-OF, HAS-AS-PART.
four
types of links
: EGO,
Rules of establishing these links
have been discussed and illustrated.
5.
Fach link is directional and is always stored at the site of the
source node which, by knoWing the address of the target node, can
actually use the link to send messages to the target node.
have mentioned t1l«) occasions for such communication.
eye/camera
movement,
GO
initiates
egocentric
positions
in
other
the
active
updating
processors
After each
process
by
SRPC
nonegocentric relationship With other
in
of
sending
(delta.i, delta.j) to each processor along its EGO links.
one SRPC processor updates its
We
When
such a way that the
processors needs to be
revised, it sends out messages to target nodes involved so that
the reverse NONEGO links can be updated at the target node.
6.
There can be perceptual values as part of the link.
We have
proposed, for NONEGO links, t1l«) types of perceptual values :
the
connectedness and the amount of nonegocentric relationship.
1.
&te·h node is aproee-ssor because it embod1e~ a set of proce.sus
which are invoked to enact its role in the perceptual net1l«)rk.
They include
updating
of its perceptual
values,
sending
and
123
receiving messages to and fran other processors, etc.
Because of
these processes, the perceptual netW)rk is always in a state of
nux.
Given such a PNet, (N,L), at some point of perception, we have noted
that each node in the net manages the perception process corresponding
to some part of an external scene being viewed either by virtue of the
peroeptual values developed
at
its own node or
through other
node
processors oaptured by it via the various HAS-AS-PART links that it has
stored.
!his
observation
will
be
considered
as
a
basis
for
a
formalization of the perceptual oontent of an external figure perceived
in the scene.
Let us introduce several definitions in the following
sections leading to that formalization.
5. 3.2
The scope of !
~
1n
!!!!!
Given a PNet (N,L) and a node n in N, we say the node ·n' is in the
set, S(n), called the scope of the node n, if (a) n=n' or (b) n' is
connected to n through a ohain of HAS-AS-PART links in L, originated
fran n and ended ,at n' •
!hus, the scope of a node is the node itself and all the
nodes in N
oaptured (HAS-AS-PART) directly by it or indirectly through other nodes
oaptured by it.
For example, in Figure 23,
S(G +2) = { G+2 }
S(H ) : { H , {H 2 , G+2' G+ }, G+4' G..-5 }
3
3
3
124
5.3.3
The
~
of intra-scope links
Given a link q in L and a scope S(n) of a node n in Nt we say q is in
the set, U(S(n», of intra-scope links, if both the target node and the
source node of q are in S(n).
For exanple, in Figure 23,
U(S (G +2»
U(S (H 2»
= null
={
set
(HAS-AS-PART, (H 2 , G+2» ,
(HAS-AS-PART, (H ,G+ »,
2
3
»,
(G +3' H2 »,
(PART-OF, (G+ ,H
2 2
(PART-OF,
(NONEGO, (G+2 ,G+ ), CONNECTED, (4,0»,
3
(NONEGO, (G+ ,G+ ), CONNECTED, (-4,0»
2
3
5.3.4
~ ~
}
of inter-scope links
Gi ven a link q in L and scope, S(n), of a node n in N, we say that q
is in the set, V(S(n», of inter-scope links, if either the source node
or the target node (but not both) of q is in S(n).
Thus, the link in Figure 23, (PART-OF, (H , H », is an inter-scope
2 3
link for S(H ) but an intra-scope link for S(H ). Note also that EGO
2
3
links are always inter-scope links, unless the corresponding figure is
perceived as the IPS.SIMPLE itself.
V(S(n»
serves as a "cable" of
links between nodes in S(n) and nodes outside of S(n).
125
5.3.5
Perceptual content of
external figure
~
Given a PNet (N,L) and a node n in L, we say that the subnet spanned
by the node n is the 2-tuple : (S(n),U(S(n»), where
Sen) is the scope of node n;
U(S(n»
is the set of intra-scope links of the scope S(n).
Note that a sUbnet, like PNet itself, includes all the perceptual values
stored in the node processors in S(n), as well as all the perceptual
values stored in all the links in U(S(n».
We have noted that each node in PNet stands for the perception of
some figure in the external scene.
Thus, the sUb net , (S(n), U(S (n») ,
spanned by the node n is said
be the
to
perceptual content of an
external figure perceived in the scene.
Note
that
for
a
figure
to
have
a
perceptual
content
in
this
formalization, not only does it have to be perceptible by the IPS in
question, but it must also have been already perceived as such With a
corresponding node created in
PNet.
For exanple, given the PNet in
Figure 23, the external pattern 'LE' in Figure 13 has not been perceived
as a composite figure by the IPS. SmPLE so far.
5.3.6
Perceptual context of
~
external figure
Given a PNet (N,L) and a sUbnet (S(n) ,U(S(n») as the perceptual
content of an external figure, the perceptual context of that figure is
the set of inter-scope links. V(S(n».
an
external
figure
prOVides
its
'n1us. the perceptual context of
pero.eived
relat.1oASh1p
with
other
figures in the external scene. It specifies the particular environment
in which that figure is perceived.
126
For
example,
one
perceptual
context
of
the
external
figure
corresponding to the node H in Figure 23 is that it is perceived as
2
part of the figure corresponding to the node H •
3
It is worth noting that a given link in (N,L) can be part of a
perceptual content or part of a perceptual context depending on what is
the figure in question.
5.4
PERCEPTUAL SIMILARITY
The perceptual similarity of two external figures are jUdged by their
perceptual
contents
independent
of
their
perceptual
context.
The
concept of perceptual similarity in the case of IPS. SIMPLE is formalized
by the concept of network isomorphism.
5.4. 1
The
Network isomorphism.
prerequisite
of comparing
two
external
perceptually is that they have corresponding
figures,
Rand
nodes, nand
R',
n',
and
perceptual contents, (S(n),U(S(n») and (S(n'),U(S(n'»), in the current
PNet of IPS.SIMPLE.
Rand
R'
Then the question of perceptual similarity between
is settled by comparing their
perceptual contents.
While
perceptual values stored in each node and the organization of the net
are related to the perception of an external figure, the name of the
node processor (hence its address) is incidental and immaterial.
This
leads to the concept of network isomorphism.
FOnlally ,Given tw:» sub nets , (S(n), U(S(n») and (S(n'), U(S(n'»), we
say they are isomorphic, if there exists a one-one and onto mapping:
w fran Sen) to S(n')
127
such that :
1.
the perceptual values of corresponding nodes are identical.
This
means, in our case, that if n 1 in Sen) is a SRPC node, then
SRPC ( w( n 1»
si ze ( w( n 1»
2.
= SRPC ( n 1)
= si ze ( n 1)
the organization structure of the two subnets are matched. That
is, if n 1 and n
2
in Sen) have an intra-scope link,
(TYPE, (n ,n ), VALUE)
1 2
then there must be a corresponding link in U(S(n'», i.e., the
link
(TYPE, (w(n ),w(n
1
2
»,
VALUE)
must exist in U(S(n'», be of the same type and have the same
value.
'lWo external figures, Rand R', are said to be perceptually similar to
IPS. SlMPLE if their perceptual contents in IPS. SlMPLE are isomorphic.
A1 though the requirement of network isomorphism is very tight, it
should
be
noted
that
the
figures
are
compared
only
in
what
is
perceptible and what has been currently perceived of them by the IPS.
Thus, even
in the case of hunan
vision, two
boxes with identical
outsides can be judged as perceptually similar by us, until they can be
opened and their insides examined.
If difference is found at that time,
the similarity judgement will be retracted.
between • •t
Note here the difference
is not perceptible aDd *at is pe.rceptible but not yet
.
perceived depends on how easy the bO'Xes can be oJ)el'led for exl!lllinatiol'l by
the individual in question.
128
5.4.2
Relaxing
~
condition of network isomorphism
To allow more nexibility in the comparison of perceptual similarity,
the condition of network isomorphism can be weakened in many ways.
exanple,
the
monomorphism),
mapping
or
w can
onto
but
function (homanorphism).
Q-
be
one-<)ne
riot one-one
but
not
onto
(epimorphism)
or
For
(yielding
simply a
the mapping can be a partial function.
We
say that a function w fran S(n) to S(n') is a partial function, if there
is a sub set d (S (n»
function.
in S(n)
for which w fran d(S (n» to S(n') is a
'!his means that some nodes in S(n) and some nodes in S(n')
can be omitted in the matching process.
See 3'1apiro and Harlick (1981>
for a discussion of structure isomorphism and homanorphism and their use
in structure description and inexact matching.
5.4.3
Discrimination
~
Recognition
In the above discussion, it is assumed that both external figures are
close enough (either spatially or temporally) so that perception of both
can be managed by the IPS at the same time, that is, perceptual contents
of both figures are contained in the current PNet.
the
case
of discrimination
tasks
where
figures
'!his is typicallY
or
objects
to
be
discriminated are shown side by side (or in rapid temporal succession
with repetition) so that the viewer can alternately examine any of them
and build their perceptual contents at the same time.
In recognition tasks, one figure R is being viewed and the goal is to
aareb tbrougb OM'S lulIBcry to see if a perceptual content similar
(isomorphic) to that of R can be found.
We will discuss the sub.ject of
long term memory and its role in a recognition task in next section.
129
5.5
LONG-TERM MEMORY IN IPS.SIMPLE
From a perceptual point of consideration, long-term memory (LTM) is
important not so much for its capability of storing data for a longtime
span as it is for its role in catching the "overflow" of data from the
perceptual SUbsystem.
We have discussed the aspects of adding new node processors to the
PNet as new sen sor y data come in.
Soon, we will either run
0
ut
0
f .the
supply of available processors or face a combinatorial explosion of the
n\.ll'1ber of EGO and NONEGO links which the perceptual subsystem has to
install, maintain and update.
At some point, we either have to throw
away some data or push them onto some kind of holding stack where they
can
be "hibernating"
retrieval.
of updating
position updates.
some
hibernating,
While
consideration
until
perceptual action calls for
they
their
can
perceptual
be
suspended
values
-
no
their
from
the
more
EGO
LTM serves the purpose of this holding stack.
First, let us consider the data storing process of LTM.
5.5.1
Data storing process of LTM
LTM in IPS.SIMPLE can be considered as an extension of its PNet.
is designed to have a similar structure as that of PNet.
It
Let us assllne
that nodes in LIM can be created on demand and have unique addresses for
access.
Each node created is treated as a unit just like the node
processor in PNet, having the capability of storing exactly the same
kinds
of pef'ceptual
difference
is
that
values and links
the
nodes
in
LTM
as
do
node
not
proc'e'SSCT'S
have
the
d'O.
"!he
processing
capability of node processors and do not participate in the updating
130
process of the perceptual values stored 19 •
In other words, each node in
LIM is a "place-token,,20 for one node processor in PNet.
The perceptual
data (including all links except EGO link which is discarded) developed
in one node processor can be dt.IDped to one node in LIM.
All perceptual
values (except egocentric positions) can be dumped without change.
The
addresses contained in the links referring to the dumping node processor
must be changed to the address of the node in LIM.
references in the reverse links also.
This includes the
The process of changing the
address of the target node in the reverse link can be performed in the
same manner
addresse·s.
as discussed
in
updating
EGO and
NONEGO link
The task can be per formed by the node processor just pr ior
to the d·umping.
ready for
earlier
After dumping its data, the node processor 1s freed and
a new assignment in
PNet.
The
selection of which node
processor in PNet to be freed may be based on a heuristic rule, such as
.,
1 .
the rule that selects the "oldest" or least referenced one as the front
most candidate to be freed next.
19 They may be involved in the node address updating process related to
the links stored. This aspect will be discussed later.
20 The word "place-token" has been used in the literature (e.g., Marr,
1978, p. 62) to refer to internal symbols or codes marking the spatial
location of the occurrence of (visual) features. A node in LIM is a
place-token since it marks the position of a node processor in PNet
at a particular time of perception so as to capture its perceptual
activity at that moment during a particular perceptual. experience.
Therefore, the nodes in LIM assume a "static" role of node
processors, that of preMf'v1ng peroeptual values for 21 later
refti"ertoe. In a structure 11Jre h.-8Ibrains wber-e LIM is believed to
be composed or more dynamic components (e.g., circuits of neurons),
the "place-tokens" may indeed be processors themselves, ready to be
called into action in the perceptual arena; while waiting, their main
role would be to refresh the data which may otherwise fade -- it is
in this sense of data preserving role that is being referred as
static.
131
This d t.mping and freeing process is u-sually performed on a on-demand
basis; its rate depends on the types of node processors involved.
SRPC processors
would
normally be
freed
at
a
higher
rate
than
The
H
processors because the sensory data at that level changes the most
rapidly.
As a node in LIM is freshly created fran a node processor in PNet, it
may very likely still be the target node of many links originated from
other active processors in PNet.
These nodes in LIM which are still the
target nodes of the links pointing fran active processors in the current
PNet are called exposed nodes in LIM.
Since its address is known from
the PNet, an exposed node in LIM can be reactiv'ated and "popped out" of
LIM by selecting a free processor to assune all the perceptual data
stored.
(And again, all the necessary node address changes in all the
involved parties must be performed.)
On the other hand, an exposed node in LIM may be pushed deeper and
deeper as more and more of the node processors to which it still links
become themselves nodes in LIM.
A node in LIM which is not the target
node of any links stored in any active processor in the current PNet is
called a buried node in
LIM.
A buried
node is suspended
address Updating activities until it is re-exposed again.
from any
A buried node
is re-exposed when it becomes the target node of links pointing from
active processors in current PNet, usually because one or more of the
nodes in LIM to which it links to are reactivated and replaced by a node
processor in curretPlet.
Because of the existence of LIM, the PNet in IPS. SIMPLE, after it
reaches a pre-designed size, will cease to grow; its role is always
e
132
related to the perception at present, maintaining, at the same time,
some ties to the past by linking to some of the nodes in LIM.
These
ties will shift dynamically, depending on what are being perceived at
present and what are the current goal s of the IPS.
5.5.2
Retrieval processes of LTM
If LIM serves only as a data sink into which perceptual data can be
dtmped without any means of re-using the data dtmped for the benefit of
the perceptual task at hand, then its value to the system is no more
than that of a waste basket.
To be useful, LIM should alloW mechanisms,
whereby a node stored in it, regardless of how deeply it may be buried,
can be accessed, exposed and reactivated.
We
have
formalized
tw>
types of retrieval
processes
addressable retrieval (CAR) and link retrieval (LR)
in the previous
chapter.
They will be illustrated for the LIM in IPS. SIMPLE.
5.5.2.1
CAR in IPS.SIMPLE
Since
LIM
in
IPS. SIMPLE is designed
structure as that of the
to
have
content
the
same
kind
of
PNet, the definitions of the concepts of
scopes, intra-scope links, inter-links, and subnet are applicable to
nodes in LIM also.
Furthermore, since each node in LIM was created fran
a node processor in the PNet of some time in the past, it corresponds to
a certain external figure perceived by the IPS at that time.
Iherefore,
for 8Siv1!!'n node n 1nLTM, the subnet (S(n) ,U(S(n») stmnned by n i., the
perceptual content of a certain external figure perceived by IPS. SIMPLE
and stored in its LIM.
133
When a sUbnet, (S(n'),U(S(n'»), is being developed in the current
PNet, n' being the node processor corresponding to an .external figure R'
being perceived, the process of finding and retrieving one (or more)
sUbnet, (S(n") .. U(S(n"»), stored in LIM such that (S(n'),U(S(n») and
(S(n") ,U(S(n"») are isomorphic is called content addressable retrieval
or
CAR
in
IPS. SIMPLE.
When
its perceptual content has been
found
isomorphic to some previous perceptual content( s), the external figure
is said to have been recognized by IPS. SIMPLE.
For this retrieval process to work, methods of searching LIM must be
pro.vided.
Since SRPC nodes contain detailed
values to be matched,
of the subnet.
"intra-node"
perceptual
a'le can start the search process with SRPC nodes
SJppose that the SRPC nodes in LIM are prearranged in
some order related to their specific perceptual values (the so-called
hashing technique in computer science), then this order can be used to
facilitate the search and match process at the SRPC node level.
When
enough bottom SRPC node processor s in a subnet have fOlmd their matched
nodes in LIM, then the· top nodes in the candidate sub nets in LIM can be
accessed via the intra-scope links (such as the PART-OF links) from the
bottom SRPC nodes.
Q1
This search approach is called the bottom-up search.
the other hand, top nodes of some sub nets in LIM as potential
candidates for match may have already been exposed and their address
known to the PNet, then their bottom nodes can be accessed for further
lIatchin& via intTa-aaope l1Dlcs (such aa the HAS-As-PAIT links).
1b1s
search a1'1'l"'oaeh is called thg top..down sesTch.
Ei ther
way,
the
target subnet tn
LIM
is fOlmd by matching
isomorphi::m between it and the given subnet in PNet.
for
In other words, an
134
··e
external figure is recognized because it is found perceptually similar
to another external figure viewed and remembered by the IPS.
(hce an isomorphic subnet in LTM is found, these nodes can be reactivated and their perceptual information re-used (modified if desired)
by re-installing node processors into them.
Since there is already a
temporary isomorphic "copY" of that subnet in PNet, there is usually no
need to find new supply of node processors.
The node processors in the
isomorphic subnet can be re-used directly in the re-installing process.
This re-use of part of LTM helps to prevent LTM fran growing blindly and
rapidly as more and more sensory data pour in.
MOre important is that
the old subnet retrieved in this manner contains inter-scope links to
other related parts in LIM which is potentially useful to the perceptual
task a t hand.
5.5.2.2
LR in IPS.SIMPLE
When a subnet (S(n), U(S(n») in LIM is content-addressably retrieved
and re-instated into current PNet, o,ther nodes or subnets in LTM are
exposed by being or containing the target nodes of inter-scope links
fran S(n).
These nodes or subnets exposed this way are accessible to
the current PNet without further search and this retrieval process is
called (inter-scope) link retrieval.
Since the inter-scope 1 inks are
called context of a given sUbnet, (inter-scope) link retrieval will be
called context retrieval here.
1be iaJDf'taItt point to note about context retrieYal is that the nodes
or subnets in LTM are context retrieved not because they are isomorphic
to some subnets in the current PNet, but because they are part of a
'35
previously perceived context of an external figure
whose perceptual
content is isomorphic to the current figure in question.
This
fundamental
retrieval)
is
difference
considered
between
to
be
CAR
the
and
basis
LR
on
(Leo,
context
which
external
representation can be classified into t\«l extreme types : verbal and
nonverbal.
We will illustrate and discuss this aspect in the next
section.
5.6
VERBAL AND NONVERBAL EXTERNAL REPRESENTATIONS
Let us suppose that IPS.SlMPLE encounters an external scene shown in
Figure 25.
After it spends some time viewing it, a subnet corresponding
to figures in that scene would be developed in its PNet, representing a
way the
soene oan be
(S(H,,),U(S(H,,»)
\«lrld.
peroeived
Figure
This
subnet is
26.· Then it moves to other
As more sensory data pour
subnet into the LIM.
by it.
in, its
shown
as
parts of its
PNet finally pushes that
Although their peroeptual data remain unchanged,
the names (and the addresses) of the nodes
be the same as in Figure 26.
0
f the subnet in LIM will not
However ,in the following disoussion, we
will still refer to the nodes of the sUbnet in LIM using their old names
as in Figure 26.
We will also use these node names to refer to the
external figures to which they correspond.
Thus, H,o refers to the top
figure (" APPLE") of the soene in Figure 25 and Hg the bottom one (the
figure with a round shape).
SetH till. laM' after that . .a t is bs:r1ed iziu I.1'H. it
external
scene
as
in
Figure
'3.
Let
us
assume
that
.es tae
the
subnet
corresponding to that scene is the subnet (S(H ),U(S(H ») in Figure 23.
3
3
e
136
In
a
CAR
isomorphic
process,
to
the
the
subnet
subnet
(S(H
10
),U(S(H
(S(H ),U(S(H »)
3
3
in
10
»)
in
PNet.
LTM
is
Therefore,
found
the
external figure H in current scene is recognized by the IPS. SIMPLE as a
3
pattern that it has seen before.
That is, the figure H is a
3
representor (RTOR); its noverbal representee (RTEE) is the figure H10 ;
and the triple :
(H , H
10
3
is a nonverbal representation.
, IPS. SIMPLE)
0000000000000000000000000000000000000
0000000000000000000000000000000000000
0001000111110000000100000011111000000
0010100100001111100100000010000000000
0100010100001100010100000010000000000
0100010111110100010100000011110000000
0111110100000111100100000010000000000
0100010100000100000111110011111000000
0000000000000100000000000000000000000
0000000000000000000000000000000000000
0000000000000000000000000000000000000
0000000000000000000000000000000000000
0000000000000000000000000000000000000
0000000001000000000000000000000000000
0000000111100000000000000000000000000
0000001001010000000000000000000000000
0000010000010000000000000000000000000
0000010000010000000000000000000000000
0000010000010000000000000000000000000
0000010000010000000000000000000000000
0000001111100000000000000000000000000
0000000000000000000000000000000000000
Figure 25:
Another external scene for IPS. SIMPLE.
Uponrecogn1t1on, th. sUbnet (S(H 10) ,U(S(H 10») in LTM is reinstated
into PNet.
The inter-scope (with respect to S(H 10» NONEGO links stored
in the nodes in S(H 10) pointing to the nodes in S(H g ) in LTM re-expose
137
~IZ
HAHHMT
15lGri'4i"
rPMT
... 11 ~~ll
~~_
HZ'
fJf
:MT
IN'"
PMT=Q
0' -
HAS~
E+ll
SRfIC+7
l
$NIC+11
c.
tJ (~h
(-7.-1 )
!WH"1MT
j5f.Jft'4if
n.~
~. t ~O)
(.... 0)
~;r- ~15
II
•
(5.'
(-7. J)
~~
_
SRIIC+1
Q+5
n.7V
SRP'C+24
n.O)
@
I1AS=AHMT
fiMf=at
(.;:cn 11'"
~21 SIlPC+35
(5.'
C.
(·ZO.·I2)
NC.
HA5-o'S-
-
HM-AS-flMT
p;;.6)
Hlif::JJf
C.
03,12
(20,IZ)
Q+2
SRIIC+5'7
n.n
l
~,t (~h
0'
(0. J)
HAHS=:flMT
lSMf4#
6
21
SRfIC+51
n·,V
.Note: vertical links are HONEGe links t not all which are possible
are drawn here.
liC : lDT-elJlRCDD
car : COIlllECTED
SltPC3
Figure 26:
are dOCUlicted 111 PlCUI" U
11, 2~ and
21.
Perceptual contents of external figures of the scene in
Figure 25.
e
them and put them in a state immediately retrievable fran the PNet.
By
accessing these nodes in S(H g ), the IPS.SlMPLE would be reminded of the
context in which the external figure H was viewed by it.
3
triple :
Thus, the
(H , Hg, IPS.SJMPLE) is a verbal representation.
3
Note that:
1.
If IPS. SIMPLE were to make an attempt to compare the perceptual
content of the RTOR and that of its verbal RTE'E, it would usually
\
find no UoIam'pbi. 8istinl betwen thea.
']his . . . . that CAR
can not be used directly to retrieve verbal
RTORs.
RTEEs from their
139
2.
There are two distinctive steps that lead to the recognition of a
verbal representation:
(a)
the recognition of the
RTOR itself through
CAR;
(b) the retrieval of the internal representation
of the verbal RTEE through LR via associative
links between them, which has to be learned
and stored prior to the recognition task.
If
the
links
recognition
are
will
absent
(not
learned),
not be recognized
then
by the
IPS
the
in
verbal
question.
Learning of the associative links is achieved, in the case of
IPS. SIMPLE, by exposing the RTOR and RTEE in close proximity in
the external scene so that direct NONEGO links between their
perceptual contents can be established in
scans them.
This can
be compared
to
PNet as
the
expected
IPS.SIMPLE
learning
activity of children when they see verbal labels such as "APPLE"
written beside a picture of an apple in their books.
In next chapter,
the concepts formalized and
illustrated in this
thesis will be summarized and directions for future extension will be
given •
140
Chapter VI
SlttMARY AND FUTURE EXTENSIONS
6. 1
SUMMARY
In
summary, differentiation between verbal
and nonverbal external
representations cannot be discussed without explicitly considering the
IPS which is processing the representation.
It is proposed that the
differentiation is based on:
1.
the perceptual similarity between the RTOR and the RTEE within
the viewing context selected by the IPS;
2.
the manner in which the perceptual content of the RTEE stored in
the LTM is accessed, given the perceptual content of the RTOR as
current sensory input.
An external object which is observable in a modality of the IPS in
question will give rise to a set of internal descriptive codes called
the perceptual content whose values depend on the object itself, as well
as
how
it
is
"viewed".
nonverbal
A
representor
is
an
external
observable object which, when viewed in the modality in question, gives
rise to a perceptual content that is regarded by the IPS as specifying
directly the full or partial perceptual content of its representee when
viewed in the Salle way.
external
observable
en the other hand, a verbal representor 1s an
object
whose
recognition of the representor
itself.
perceptual
regarded
The information (if any)
content
causes
the
as nonverbally representing
of its verbal representee is then
141
derived through previously established internal associative links with
the recognized representor.
This
difference
is
conceptualized
situations in the dichotomy.
where the
by
considering
the
extreme
In reality, middle cases are abundant,
different types of memory access are used in a mixed
t~
manner or in stages.
For exanple, some aspects of the RTEE may be
represented in a verbal manner and some other aspects are represented in
a nonverbal manner.
i
6.2
FUTURE EXTENSIONS
The research pur sued in this thesis can be extended in the following
directions:
6.2.1
Future extensions in machine perception
In the
previous chapters, we have mentioned but not pursued many
possible enhancements.
Given the results developed in this thesis, they
are good candidate subjects for further research in machine perception.
For exanple :
1.
the perception of external objects, especially its segregation
from the scene, can be further enhanced :
a) through the active manipulation of that object by the viewing
IPS;
2.
the
various
investigated
illlpeets
when
01'1
more
the
perc.ption
freedom
is
of am
allowed
objee-t
in
ne~
moving
be
the
eye/ camera around; (For example, the impact on the s1 ze parameter
142
by the change of viewing distance.)
3.
relaxation of the condition of network isomorphism and its impact
on the matching process in comparing perceptual similarity need
be studied.
4.
the
specific
concepts
of
perceptual
network
and
subnetwork
developed in Chapter 5 can be further tested by applying them to
other modalities t especially that of hearing.
6.2.2
I~le.entation
technology
of the result in advanced seBdconductor
- -
As the semiconductor technology matures into and beyond the stage of
very-large-scale-integration (VLSI) , the concepts of a network of node
processors interacting toward
the goal of perception form a fertile
ground in the application of the VLSI technology.
The main problem that
need to be solved in this regard seems to be the design of effective
mechanisms to allow smooth interactions between a large nl.lDber of node
processor s.
6.2.3
Explanation of phenomena in cognitive psychology
Al though human perception is used as a working model for the study
0
f
machine perception, the judgement of success of a theory in machine
perception is based more on the per formance after its implementation
than on its power
in explaining
phenomena in cognitive psychology.
tbwev.. , it 1s alWllYS Tauuril'll if such a
thecTy cum also proyide
reasonable explanations in that direction.
Some of the phenomena in cognitive psychology whose explanation can
be sought in the light of the concepts developed in this thesis are
listed below :
,,-
1.
the tip-of-tongue phenomenon -
perhaps. it is the feeline
imminent retr feval of top node in a bottom-up, searoh;
2.
the figure-ground reversal phenomenon -
possibly bistable
different oonfigurations (organizations) of PNet using diffet
figure segregation oriteria;
3.
why reoognition is easier than reoall -
probably related to
questions of what is the peroeptual oontent used as retrieval
what is the target peroeptual oontent(s) to be retrieved and
is the retrieval prooesses involved.
, I
".
~
144
BIBLIOGRAPHY
Aitahison, Jean. The Articulate Mammal: An Introduction to
Psycholinguistics. New York: tkliverse-SOoks, 1976.
.,.
Anderson, Barry F. Cognitive Psychology: The Study
!!!2. Thinking. New York: Academic Press, 1975.
I
\~
r
~
Knowing, Learning
And er son, John R. Cogn.iti ve Psychology and Its Implications.
Francisco: W. H. Freeman and Q>., 1980.
I
\~
,
~
San
Arbib, Michael A. The Metaphorical Brain: An introduction as
Artificial Intelrrgence and Brain Theory.--New lbrk: WileyInter science, 1972.
I
r
Bobrow, D. G. and T. Winograd. An Overview of KRL, .! Knowledge
Representation Language. Q>gnitive Science 1(1), PP.3-46, 1977(a).
i
l~
Bobrow, D. G. and T. Winograd. Experience with KRL-O: One Cycle of a
Knowledge Representation Language. lh IJCAI-5, pp.213-222, 1977(b).
Boden, M. A. Artificial Intelligence and Natural Man.
Eboks, 1977.
~
New lbrk: Basic
Br aal1nan, Ronald J. What's in a Concept: Structural Foundations for
Semantic Network. in International Journal of Man-machine StudI'is 9,
pp. 27-152, 1977.
Bracl1nan, Ronald J. On the Epistemological Status of Semantic Network.
Bol t Beranek and Newn'ii"illhc., Report to. 3807, 1978(a).
Bracl1nan, Ronald J. A Structural Paradigm for Representing Knowledge.
FhD Thesis, Sol t Beranek and Newnan lhc ::ile·port tb. 3605, 1978(b).
Bracl1nan, Ronald J. On the Epistemological Status of Semantic Network.
in Associative N~tworki: Representation and use of Knowledge by
computer. &1ited by Nicholas V. Findler71ieW'York: Academic Press,
1979.
!'if!!
'§!is.
Br acla.. Boaa1d J. and .Ji. C" Sl1th•.
Represerrtat1cm. SIGl!T Ito. 10,. Fe..
I~..e howled,.
145
Chafe, Wallace L. Creativity ~ Verbalization !! Evidence ~ Analogic
Knowledge. in Theoretical Issues in Natural Language Processing,
Fds. R.C.Schank and B.N.Nash-Weber. Proc. Workshop in Computational
lingUistics, Psychology. lingUistics & Artificial Intelligence, June
1975.
Cohen, Gillian.
1977.
The Psychology of Cognition.
New York: Academtc Press,
Collins, Allan. The TroUble with Memory Distinction. in Theoretical
Issues in NaturaI'Language"'TrOcessing, Eds. R.C.Schank and B.N.NashWeber. Proc. Workshop in Computational Linguistics, Psychology,
lingUistics & Artificial Intelligence, June 1975.
L\lda, Richard O. and Peter Hart. Pattern Classification and Scene
Analysis. New York: John Wiley and Sons, 1973.
Fischler, Martin A. On the Representation of Natural Scene. in
Computer Vision System:- Pp .47-52. Editedby A. R. Hanson and E. M.
Riseman, New York: Academic Press, 1978.
Foster, Caxton C. Content Addressable Parallel Processors.
Van Nostrand Reinhold Company, Ltd. 1976.
New York:
Goldstein, Ira and Seymour Papert. Artificial Intelligence, Language
.!!!!!. 1!!! Study E! Knowledge. in Cognitive sa ienc e 1, p. 84, 1977.
Gregory, Richard L.
November. 1968.
Visual Illusions.
in Scientific American,
Grosz, B. J. The Representation and Use of Focus in a System for
Understanding-Dialog. in IJCAI-5, Pp.~7-76, 19777
--Havens, William S. ! Procedural Model E! Recognition ~ Machine
Perception. Ph.D. Thesis, The li1iversity of British Columbia, 1978.
Hendrix, G. G•. Expanding the Utility of Semantic Networks through
Partitioning. in IJCAI-4, pp.115-'-i8, 1975.
Hendrix, G. G. Some General Comments on Semantic Networks.
pp. 984-985, 1977.
in IJCAI-5,
Hill, Fredrick J. and Gerald R. Peter son. Hardware Organization
Design. New York: John Wiley & Sons, Inc. 1973.
Kohonen, T.wo.
Associative Hemery.
~
Berl1n:Springer verlag, 1977.
Koulyn, Stepmm M. on Re'tTieving Int'ormation from Visual Images. in
Theoretical Issues in Natural Language ProceSSIng, Eas. R.C.SChank
and B.N.Hash-Weber. Proc. Workshop in Computational LingUistics,
Psychology, LingUistics & Artificial Intelligence, June 1975.
~
146
Kosslyn, Stephen M. and J. R. Pomerantz. Imagery, Propositions. and the
Forms of Internal representations. in COgnitive PsychOlogy vOI:g:-Pp.52":76 (1977).
Lehnert. Wendy. Human and Computational Question
Cognitive sCience 1.p.47, 1977.
~
Answering.
in
Und say, Peter H. and Conald A. Norman. . Human Information Processing:
An Introduction to Psychology. New York: Academic Press, 1977.
Mackworth, A. K. How to See a Simple World. in Machine Intelligence 8
E. W. Elcock andD.MiC1i1e-(eds.), New York: Halsted Press. 1977.
Mahler, Henry R. and EUgene 'H. Cordes. Biological Chemistry.
Edition, New York: Harper &: Row. 1971.
Second
Marr. David. Representing Visual Information. in Computer Vision
Systems. Eds. A. Hanson and E. M. Riseman, New York: Academic Press.
1978.
Matick. Richard E. Memory ~ Storage. in H. Stone (Ed.), Introduction
to Computer Architecture. Chicago: Science Research Associate. Inc.
1975.
Mesarovic. M.D •• D. Macko and Y. Takahara. Theory of Hierachical
Multilevel Systems. New York: Academic Press. 1"97'0.
Minsky, Marvin. ! Framework ~ Representing Knowledge. in The
Psychology of Computer. Vision. edited by Patrick H. Winston:-New
York: McGraw-Hill Book Company. 1975.
I
Nilsson. N. J. Problem Solving Methods in Artificial Intelligence.
York: McGraw-Hill, 1971.
New
Palmer, Stephen E. The Nature of Perceptual Representation: An
Examination of the Analog/ProPositional Controversy. in Theoretical
Issues ~ Natural Language Processing, Eds. R.C.SChank and B.N.NashWeber. Proc. Workshop in Computational UngUistics, Psychology.
UngUistics &: Artificial Intelligence, June 1975.
Pavlidis, Theodosios. Structural Pattern Recognition.
Springer-VerI ag, 1977.
Pylyshyn. Zenon W. Do ~ ~ Images and
Issues in N~tural Languaje proces:sing.
Q;t;;r. ~oe. Wc1"k3hop in COmputational
Unguistlcs &: Artificial Intelligence,
Berlin:
Analogues. in Theoretical
Fds. R. C. Schank and B. N. HashUnluiRics. Psychology.
June 1975.
Pylyshyn, Zenon W. Imagery ~ Artificial Intelligence. in W. Savage
(Ed.), Minnesota Studies in the Philosophy of SCiences Vol. 9.
pp .19-55, 1978.
147
Quillian, M. Ross. Word Concepts: A Theory and Simulation of Some Basic
Semantic Capabili~ in Behavior Science12, pp. 410-430:-1967
41'
Quillian, M. Ross. The Teaching Language Comprehender: A Simulation.
Program and Theory of Language. in Computational llnguistics 8,
pp.459-476,1969.
Raphael, Bertram. The Thinking Computer: Mind inside Matter.
Francisco: W. H. Freeman and Co., 1976.Rosenfeld, Azriel and Avinash C. Kak.
York: Academic Press, 1976.
san
Digital Picture Processing.
New
Schank, Roger C. Identification of Conceptualizations Underlying
Natural Language in Schank, Roger C., and K. M. Colby(eds.).
Computer Mo.dels ~ Thought .!!!.!! Language. san Francisco: W.H.
Freeman, 1973.
Schank, Roger C. The Structure of Episodes In Memory. in
Representation and Understanding: StudIes in Cognitive Science.
D. G.Bobrow and A. Collins, New York: Academic Press, 1975.
Eds.
SchUbert, L. K. s .us Extending the Expressive Power of Semantic
Network. in Artificial Intelligence 7, pp. 163-198, 1976.
Shapiro, llnda G. and Robert M. Haralick. Structural Descriptions and
Inexact Matching. in IEEE Transaction on Pattern Analysis .!!!.!!
Machine Intelligence 3, ?p. 504-519, 1981.
Shapiro, Stewart C. A Net Structure for Semantic Information Storage,
Deduction and RetrIeVal. in IJCAI-2, pp. 512-523, 1971.
Shapiro, Stewart C. Representing ~ Locating Deduction Rules in !
Semantic Network. in SIGART Newsletter 63, pp. 14-18, 1977.
Sloman, Aaron. Afterthoughts ~ Analogical Representation. in
Theoretical Issues in Natural Language Processing, Eds. R.C.Schank
and B. N. Nash-Weber .-p,.oc. Workshop in Computational llnguistics,
Psychology, llnguistics & Artificial Intelligence, June 1975.
Sobel, Irwin. On Calibrating Computer Controlled Cameras for Perceiving
3-D Scenes. in Artificial Intelligence 5, ?p. 185-198, 1974.
Tang, Dershuen A•• Harvey J. Gold and Alan L. Tharp. Formalization of
the Concepts of Verbal vs. Nonverbal External Representation of -;;;::eeptual lCn'O'Wlecip. I i Pr01tHd11t1s of InteM1'8t10nal Q)ttferenee on
CYbernetics and SOCiety, I:enver, Colorado, pp.7J1.O-145, CRtober 8-10,
1979.
Winograd, Terry.
Press, 1972.
Understanding Natural Language.
New York: Academic
A
•
148
r"',
.
\
.
r--'
Winograd, Terry. Frame Representations and the Declarative-Procedural
Controversy. in Representation and Understanding: StUdies in
Cogni ti ve Science. Ed s. D. G.Bobrow and A. Collins, New YorkTAcademic
Press 1975.
Winston, Patrick H. Learning Structural Descriptions from Examples •
FhD Thesis, in The Psychology of Computer Vision, ed'I'te'd by Patr 1ck
H. Winston, NewThrk: McGraw-fUll BOok COmpany, 1975.
i
1 _
,.
Winston, Patrick H. Artificial Intelligence.
Company, Reading, Mass, 1979.
Addison-Wesley Publishing
WoodS, William A. What's in a Link: Foundations for Semantic Networks.
in Representation ~ Understanding: StUdies in Cognitive Science.
Eds. D.G.Bobrowand A.Co11ins, New York: Academic Press 1975.
i
1
,{
l
Ullman, Shimon. Analysis of Visual Motion by Biological
Systems. in Computer 14, Pp.57-69, 1981.
~
Computer