word - Middlesex University

Grounding Symbols:
Labelling and resolving pronoun resolution with fLIF
neurons
Fawad Jamshed and Christian Huyck
School Of Engineering and Information Sciences- Middlesex University
The Burroughs –London NW4 4BT, UK
[email protected], [email protected]
Abstract
If a system can represent knowledge symbolically, and ground those symbols in an environment, then it has
access to a vast range of data from that environment. The system described in this paper acts in a simple
virtual world. It is implemented solely in fatiguing Leaky Integrate and Fire neurons; it views the environment,
processes natural language commands, plans and acts. This paper describes how visual representations are
labelled, thus gaining associations with symbols. The labelling is done in a semi-supervised manner with
simultaneous presentation of the word (label) and a corresponding item in the visual field. The paper then
shows how these grounded symbols can be useful in reference resolution. All tests performed worked perfectly.
1.
Introduction
A major hurdle in the development of an artificial intelligent agent is the symbol grounding problem (SGP) [6,
20]. A symbol can be defined as an association with an object due to a social convention and usually has an
arbitrary shape with no resemblance to its referent. Each symbol is a part of wider and more complex system
20,22]. Any symbol is meaningless to its user unless, somehow, it is given some meaning. Once a symbol gets
its meaning, it is grounded. How an artificial agent can develop the meanings of symbols autonomously is the
SGP [3, 4, 7, 8].
The SGP is one of the most important open questions in the philosophy of information [23].
Manipulating meaningless symbols into another meaningless symbol is not intelligence [18]. Most artificial
agents do not understand the meanings of the symbols they are processing and mostly they are just processing
information according to predesigned algorithms. Instead of defining symbols in the form of other ungrounded
symbols, a system might ground them in such a way that they independently have meaning without getting any
significant help from any external source [1, 2, 21].
2.
Theoretical Background and Previous Work
The SGP has been in existence for hundreds of years. As knowledge about human cognition has advanced, more
candidate symbol grounding solutions have been proposed. Especially since the development of connectionist
systems, which are inspired by biological neurons, there have been more ideas and solutions to address the SGP.
This paper presents simulations that begin to address the SGP. The simulations are based on fatiguing Leaky
Integrate and Fire (fLIF) neurons [13]. They also make use of the cell assembly (CA) concept; a CA is a set of
neurons with high mutual synaptic strength that is the neural representation of a concept [9]. A brief description
of CAs and fLIF neurons is given below.
2.1 Fatigue Leaky Integrate and Fire neurons (fLIF)
fLIF neurons are a relatively simple model of biological neurons [12]. The model used in this paper makes use
of discrete cycles. Each neuron has some activation, which it receives from other neurons. If a neuron has
enough activation at the end of a cycle, it will fire, spread activation to connected neurons, and will lose all its
energy. Neurons are connected to other neurons with unidirectional, weighted connections. If a neuron fires, it
will pass the activation value of the weight of the connection. If a neuron’s activation is less than the threshold,
it will not fire but will lose some of its activation as it leaks away. fLIF neurons also fatigue just like biological
neurons [14]. If a neuron fires regularly it becomes harder to fire.
As neurons fatigue, they become more difficult to fire again. This is modelled by increasing the
threshold of a neuron as describe in equation 1.
T(t) = T(t-1) + Fc
Equation 1
Where T is the threshold at time t, T(t-1) is a threshold at time t-1 and Fc is fatigue constant. If a neuron does
not fire the threshold associated with that neuron decreases using the fatigue recovery constant as shown in
equation 2. The threshold never goes below the base activation threshold.
T(t) = T(t-1) + Fr
Equation 2
If a neuron does not fire at a given time, some of its energy leaks away but it still integrates energy from the
surrounding active neurons. This is modelled by calculating the activation as describe in equation 3.
A(t) = A(t-1)/D + C
Equation 3
Where A(t) is the activation at time t and is sum of activation A at t-1 time, reduced by a decay constant D, and
C, the sum of incoming activation of all active neurons that are connected to a given neuron and fired at time t1. The value of C is determined by multiplying the incoming activation on all connected links with the
associated weights of those links.
2.2 Cell Assemblies
CAs were proposed by Hebb sixty years ago [9] and still today successfully explain how the human brain learns
different concepts and how they are stored. A single neuron does not represent a memory but a large number of
neurons represent each concept in the human brain. The neurons for a particular concept have high mutual
connection strength that can support a reverberating circuit. This circuit is a CA, and can continue to fire after
the initial stimulus ceases. The learning of a CA is done by a Hebbian learning rule. Hebbian learning states that
the connection strength between two neurons is related to how frequently they fire simultaneously.
When an external input is applied to neurons, the strength of the connections between neurons is
adjusted accordingly. The repeated presentation of input increases the strength of the connection between
simultaneously active neurons while decreasing the connection strength between other neurons. The set of
neurons with increased synaptic strength forms a CA. CAs are reverberating circuits. Initial firing of some
neurons in the CA can lead to further firing of other neurons in the CA due to high connection strength. This
then can lead to a cascade of firing called CA ignition [11, 13, 15, 24].
One advantage of using CAs is they can be used as long and short term memories. A short term
memory persists as long as neurons are firing. Long term memories are formed by synaptic modification due to
the Hebbian learning rule. This dual dynamics (ignition and learning) of a CA makes it more suitable for
developing powerful computational devices [17]. Thus a wide range of tasks can be modelled using CAs.
3 Proposed Work
Labelling is a simple form of symbol grounding. A system, based on an existing agent that contains an existing
semantic CA and an existing label CA, is developed. An association between the semantic and label CAs is
learned. Next, these labels are exploited to provide a form of reference resolution. These are relatively simple
tasks that are a proof of concept.
Labelling depends on categories. Categories are very important as they help in identifying the class of
an object. By putting things together which have similar features, the system learns to categorise things [5]. A
category is represented by a CA. Prior work has shown that CAs can be learned from environmental stimuli
[13]. While they may be learned, it is also possible to set the topology of the system so that a particular CA
already exists.
One theory states that a concept is represented by a semantic pole and a phonological pole [18]. A CA
for the category would represent the semantic pole, and a different CA for the label would be the phonological
pole. If a system has a semantic CA, and a label CA, it can attach them to each other, which means the symbol is
now grounded. By having this iconic representation of categories, the system has attached a name to a category.
Symbol grounding can be used to address the reference resolution problem. Reference resolution is a
common problem in natural language. For example in the sentence
We saw a doll with a black jacket on and it was quite big.
Example 1
The pronoun it can refer to either doll or to the jacket. If the system can decide which, it is resolving the
pronoun. In resolving the pronoun, the system could ignite both the semantic CA and the label CA associated
with the item to which the pronoun is resolved.
4 Simulations
The simulations described below are an extension of the first version of the Cell Assembly Robot (CABot1)
[16]. CABot1 does no real learning. The first simulation shows how a slight modification along with learning
allows the attachment of labels. The second simulation shows how the now labelled semantic CAs can be used
for reference resolution.
CABot
The main aim of CABot is to develop an agent in simulated neurons, which can take natural language as input
and interact with the environment without any external help. By interacting with the environment, it is hoped
that it can learn the semantics of the environment sufficiently well to improve language processing.
For CABot1, a virtual 3D environment was established based on the Crystal Space games engine. Two
agents were placed in the environment, the first controlled by a user, and the second was the CABot1 agent. All
processing in CABot1 was done by a complex network of fLIF neurons, though it emitted symbolic commands
to the Crystal Space stub.
Figure1 Instance of pyramid in virtual environment
Figure2 Instance of stalactite in virtual environment
A complete description of CABot1 is beyond the scope of this paper but further information can be found
elsewhere [16]. A total of 21 sub-networks are used to subdivide the tasks of vision, natural language parsing,
planning, action and system control.
The important subnets for the purposes of this paper are the vision nets and the word nets. There are
three vision subnets, a simulated retina, a primary visual cortex and a secondary visual cortex (V2). These
systems were hard coded, so there was no learning. Visual input was in the form of a bitmap representation of a
view of the game from the agent’s perspective. In particular, the secondary visual cortex subnet was set to
recognise pyramids and stalactites. If one of these was present in the game, a CA in V2 ignited. There were
several position and size dependent CAs associated with both pyramid and stalactite. Figure 1 and figure 2
shows instance of pyramid and stalactite respectively.
Similarly, the parsing component had CAs for words. In the game, the user issues natural language
commands to tell the agent what to do. There was a noun subnet used during parsing and an instance subnet to
store semantic roles during parsing. Both noun and instance subnets had CAs for both pyramid and stalactite
labels.
4.1 Labelling Experiment
The labelling simulation associates label CAs with semantic CAs. Initially pictures and labels are presented at
the same time so the weights between the semantic and label categories can be adjusted. The connections
between visual CAs and the appropriate label CAs are adjusted and learned using Hebbian learning. According
to Hebbian learning, the weights of the connections are increased when both the pre and the post synaptic
neurons fire, and when the pre but not the post synaptic neuron fires, the weights are decreased. The learning is
bi-directional with weights on connections from the visual net to the label net and weights from the label net to
visual being learned at the same time, though these connections usually have different weights.
Each shape is represented by a CA of 2500 neurons in the V2net, whereas each label is represented by a CA of
150 neurons in the Instance net. Every 10th neuron of a visual shape in the V2net has 10 random connections
with stalactite and pyramid CAs in the instance net. Each neuron of a label instance in the instance net has 10
random connections with stalactite and pyramid shape CAs in the V2net. The initial weights between labels and
visual input are set low (~0.01) so that the weight can be adjusted to the appropriate level. The other parameters
were selected by a trial and error method. For the V2net decay rate of 1.1, fatigue rate of 1.0, fatigue recovery
rate of 3.0, learning rate of 0.1 and activation threshold of 5 was used and for the Instance net, decay rate of 1.5,
fatigue rate of 0.4, fatigue recovery rate of 1.2, learning rate of 0.1 and activation threshold of 4 was used. The
correlation rule was used for learning connections [11]. According to correlation learning rule, when presynaptic and post neurons fire the weight is increased, and when pre-synaptic neurons fire and post-synaptic
neuron dose not fire the weight is decreased. The complete code for this experiment can be found at
(http://www.cwa.mdx.ac.uk/CABot/label/labelling.html ).
During testing, different pictures are presented while the label input to the instance net is switched off
to test the connection efficiency between the V2 net and the instance net. At the end, the V2 net is input is
turned off and different labels are presented to see the connection efficiency between instance net input and
V2net. Figure 3 and figure 4 show an instance of pyramid and instance of stalactite respectively as presented as
visual input to CABot1.
Figure3 A representation of pyramid in visual net
Figure4 A representation of stalactite in visual net
Results
The test is fully automated and run without the interaction of human beings. The result is calculated using the
number of neurons fired at the 49th step of presenting every picture or label. There is a threshold of 10 percent of
the total neurons of a category to fire in order to qualify as an active category. Thirty eight percent of the images
were used to adjust the weights and the remaining sixty two percent of the pictures were used to test the system.
Each picture was given 50 CABot steps for training and testing purpose. Individual results for an instance of
pyramid and stalactite are produced after every 49th cycle.
Graph 1 Labelling when visual net input is off
Graph 2 Labelling when instance input is off
Graph 1 show the result of a run when the input to the V2net is switched off and input is applied to the instance
net. When an input is applied to a CA in the instance net, the correct associated CA of the V2net gets excited as
well even though it has no external input. Whereas Graph2 shows the results of a run when the input to the
instance net is turned off and input is applied to the V2net. The appropriate CAs of the V2net also ignites the
corresponding CAs of the instance net even though there is no lexical input to the instance net. These graphs
clearly show the appropriate connections are being learned. Table 1 shows the number of neurons firing when
an input is applied to the pyramid CA of the instance net, which in turn also ignites the CA of the corresponding
pyramid CA of the V2 net that has no external input.
No. of Neurons Firing
Visual Pyramid
Visual Stalactite
Label Pyramid
Label Stalactite
304
0
97
0
Table 1 Labelling experiment result for a shape
The test runs for 2000 steps. In the first 600 steps, 6 different pictures of pyramid and stalactite are presented.
Then for the next 1000 steps, 10 different pictures of pyramid and stalactite are presented while the instance net
receives no external input. For the next 398 steps, there is no visual input to the V2 net and input to the instance
net is turned on, rotating between pyramid and stalactite labels after every 50 steps.
Different instances of visual pyramid and stalactite were used with varying angle and edge sizes. The
results were very promising. On all randomly performed 28 tests carried out, the correct association between
shapes and their labels was learned.
4.2 Pronoun Resolution
For the pronoun resolution experiment, the connections along with their weights are set within CAs of the
instance net (from it CA to all other CAs in the instance net) and from the V2 net to the instance net, the later
reflecting those weights learned in the labelling experiment. The connections from the IT CA in the Noun net to
the it CA in the instance net are also set along with their weight. Input from both the V2 net and the Instance it
CA in the Instance net is needed to ignite the stalactite or pyramid instances CAs in the Instance net.
Each shape in the V2net, label in the Instance net and words in the Noun net are represented by a CA
of 500 neurons. Every 11th neuron of a shape CA in the V2net and of the instance it CA in instance net has 3
random connections with stalactite and pyramid CAs in the instance net with the weight of 0.7 and 0.6
respectively. Every 5th neuron of a IT CA in the Noun net has 3 random with it CAs in the instance net with the
weight of 1.2. The other parameters were selected by a trial and error method. For the Instance net, decay rate of
1.4, fatigue rate of 0.5, fatigue recovery rate of 1.4 and activation threshold of 3.8 was used. Learning rate and
other parameters are not significant for this experiment and thus are given default values.
In a pronoun resolution experiment, a sentence containing pronoun ambiguity [10] and a shape to
which the pronoun is referring to was presented to the system. The activation is set to such a level that none of
the CAs of instance it or CAs of shapes from the V2 net can ignite stalactite and pyramid CAs in the instance
net on its own. A combined activation from the visual field and the noun field is able to ignite the symbol label.
The pronoun it gets linked to the presented shape dynamically each time a different shape is presented.
Figure 5 show a run of pronoun ambiguity experiment.
Figure5 shows a typical run of the pronoun ambiguity resolution experiment. In this run a pyramid is presented
to the system as shown in the Visual Input net and that causes the ignition of the pyramid CA in the V2 net.
When a sentence is presented to the system it gets parsed by the system and words in that sentence ignites the
corresponding words CAs in the Noun net. In this run a sentence containing pronoun ambiguity it is presented
to the system and that causes the ignition of the it CA in the Instance net. The label pyramid CA in the Instance
net gets activation both from the V2 net and from the it CA in the Instance net and hence gets ignited.
For the 1000 cycles, the system is tested by presenting different shapes as input to the system and
presenting the system with a sentence containing an ambiguous it pronoun. While the system is presented with
the pronoun it, if the same shape CA turns on in the instance net, then the system has associated the pronoun it
dynamically with the presented shape and hence has resolved the pronoun resolution ambiguity. The current
system is not advanced enough to look for specific shapes if nothing is present in the visual input at that time or
to work when more than one shape is presented to the system. Future systems will be sufficient to locate the
specific shapes and will be able to deal with multiple at a time to resolve the pronoun resolution.
Results
Whenever the ambiguous pronoun is presented, the neurons of the associated CA of V2net fires at the same time
so resolving what the pronoun is referring to.
Visual
Visual
Pyramid
Stalactite
253
0
No. of Neurons Firing
It
70
Label
Label
Pyramid
Stalactite
90
0
Table 2 Pronoun resolution experiment result
Table 2 shows the result of a run when a pyramid was presented to the system as visual input and a sentence
containing an ambiguous it pronoun were presented to the system simultaneously. The CAs of the word
instances both receive connections from the it instance and the corresponding visual CAs; the word instance is
only ignited when it gets activation from both of the nets. The results in table 2 show that it the pronoun is
dynamically resolved via connections from the active visual pyramid CA. The experiment was run 20 times and
each time pronoun ambiguity was successfully dynamically associated with the visual item presented to the
system at that time.
5 Conclusions and Future Work
The results obtained from the above experiments are promising. The label experiment learns the correct
association between shapes and their corresponding labels on all the 28 experiment that were conducted. The
label experiment is a small but important step towards the solution of the SGP. Labelling is an essential aspect
of symbol grounding because it attaches symbols to sub symbolic representations.
The pronoun resolution experiment creates a dynamic association between ambiguous pronouns and
shape categories. Pronoun resolution is not required to ground symbols, but the experiment shows one of the
many benefits of symbol grounding. Though the model presented is not as complex as the biological brain, it
has shown it can be used towards solution of SGP. The promising results of these experiments shows that
Hebbian learning can be used effectively to ground semantic symbols and indicates that the model and the
technique used can be used effectively in solving other aspects of symbol grounding. The main goal of this
research is to develop an agent which can ground symbols and, by using those grounded symbols, can
effectively perceive and interact with its surrounding environment.
The future model of this agent will not only be able to learn and label the new shapes but also will be
able to learn and label new symbols from what has already been learned and labelled. Other more demanding
and difficult aspects of the SGP which need to be addressed in order to ground symbols include symbolic theft
and functional symbol grounding. Symbolic theft is to evolve new categories by breaking or combining the
existing categories into more elementary categories when possible. For example, by combining stripes with
horse, a new category, zebra, can be created. Functional symbol grounding grounds the symbol according to the
context in which it is used. The use of the symbols is really individual as well as domain and situation specific
[23]. By using the functional approach towards the SGP, the usefulness and thus the accuracy of the system can
be enhanced.
Other useful steps include alignment and the use of environmental feedback. Alignment is used to
modify a symbol to cohere with the meaning of the symbol of an experienced agent or a human. Environmental
feedback is used to readjust an agent’s already grounded symbols in response to environmental feedback it
receive. This includes feedback from other the behaviour of other agents. All of the above mentioned aspects of
SGP are not enough to ground the symbols on their own, labelling is needed to attach any symbol to its
semantics.
References
[1] C. Breazeal,"Sociable Machines: Expressive Social Exchange between Humans and Robots". Sc.D.
dissertation, Department of Electrical Engineering and Computer Science, MIT (2000).
[2] A. Cangelosi, “Evolution of Communication and Language Using Signals, Symbols and Words”, IEEE
Transaction in Evolution Computation, 5, pp. 93-101, (2001).
[3] A. Cangelosi, A. Greco and S. Harnad, “From Robotic Toil to Symbolic Theft: Grounding Transfer
from Entry-Level to Higher-Level Categories”, Connection Science, 12, pp. 143-162, (2000).
[4] A. Cangelosi, A. Greco and S. Harnad, “Symbol Grounding and the Symbolic Theft Hypothesis”, in
Simulating the Evolution of Language, A. Cangelosi and D. Parisi, Eds., London, Springer, pp.191210, (2002).
[5] P. Davidsson, “Toward a General Solution to the SGP: Combining Machine Learning and Computer
Vision“, in AAAI Fall Symposium Series, Machine Learning in Computer Vision: What, Why and
How? pp. 157-161 (1993).
[6] S. Harnad, “The SGP”, Physica D, pp. 335-346, (1990).
[7] S. Harnad, “Symbol Grounding in an Empirical Problem: Neural Nets are just a Candidate
Component”, in Proceedings of the Fifteenth Annual Meeting of the Cognitive Science Society, (1993).
[8] S. Harnad, “Grounding Symbols in the Analog World with Neural Nets – a Hybrid Model”,
Psychology, 12, pp. 12-78, (2001).
[9] D. Hebb. The Organization of Behavior. John Wiley and Sons, New York (1949).
[10] D. Hindle, Mats Rooth. “Structural Ambiguity And Lexical Relations” Meeting of the Association for
Computational Linguistics (1993).
[11] C. Huyck. “Overlapping CA from correlators”. Neurocomputing 56:435–9 (2004).
[12] C. Huyck. Developing Artificial intelligence by Modeling the Brain (2005).
[13] C. Huyck. “Creating hierarchical categories using CA”. Connection Science (2007).
[14] C.Huyck, and R.Bowles. “Spontaneous neural firing in biological and artificial neural systems”.
Journal of Cognitive Systems 6:1:31– 40 (2004).
[15] C. Huyck, and V.Orengo. “Information retrieval and categorisation using a cell assembly network”.
Neural Computing and Applications (2005).
[16] C.Huyck, “ CABot1: a Videogame Agent Implemented in fLIF Neurons” IEEE Systems, man and
Cybernetics Society, London. pp 115-120 (2008).
[17] I. Kenny, and C. Huyck. An embodied conversational agent for interactive videogame environments. In
Proceedings of the AISB’05 Symposium on Conversational Informatics for Supporting Social
Intelligence and Interaction, 58–63 (2005).
[18] R. Langacker. Foundations of Cognitive Grammar. Vol. 1. Stanford, CA. Stanford University Press
(1987).
[19] J. Searle, “Minds, Brains, and Programs”, Behavioral and Brain Sciences, 3, pp. 417-458, (1980).
[20] L.Steels “The Symbol Grounding Problem has been solved. So what’s next?” Symbols, Embodiment
and Meaning, Oxford University Press, (2007).
[21] M. Mayo, “Symbol Grounding and its Implication for Artificial Intelligence”, in Twenty-Sixth
Australian Computer Science Conference , pp. 55-60 (2003).
[22] R. Sun, “Symbol Grounding: A New Look at an Old Idea”, Philosophical Psychology, 13, pp. 149-172,
(2000).
[23] M. Taddeo and L. Floridi Solving the Symbol Grounding Problem: a Critical Review of Fifteen Years
of Research (2003).
[24] T. Wennekers and G. Palm "Cell Assemblies, Associative Memory and Temporal Structure in Brain
Signals", in "Time and the Brain: Conceptual Advances in Brain Research (Vol. 2, ),Harwood
Academic Publishers, pp 251—274, (2000).