Hierarchies in Dictionary Definition Space

I
S C
Hierarchies in Dictionary Definition Space
Institut des sciences
cognitives
Olivier Picard1, Alexandre Blondin Massé2, Stevan Harnad1, Odile Marcotte3, Guillaume Chicoisne1 and Yassine Gargouri1.
1
Institut des sciences cognitives, Université du Québec à Montréal, Montreal, Canada
2
Laboratoire de Combinatoire et d’informatique mathématique, Université du Québec à Montréal, Montreal, Canada
3
Groupe d’études et de recherche en analyse des décisions and UQÀM, HEC Montréal, Montreal, Canada
Introduction
Dictionaries and Graphs
A category is a kind of thing (object, event, action, trait or state). To categorize is to do the right thing
(eat, fight, flee, mate, etc.) with the right kind of thing. All species can acquire categories through
trial and error sensorimotor induction. We are the only species that can also acquire and transmit
categories through verbal instruction, by naming and defining them. The words in our dictionaries
are almost all the names of categories, followed by their definitions.
In principle, all categories can be acquired through verbal definition, but we cannot acquire all of
them that way: we have to know the meanings of some of the defining words already, by some other
means. This is the “symbol grounding problem” and presumably that other means of acquiring categories is sensorimotor induction. But how many words — and which ones — need to be grounded
directly through sensorimotor induction in order to allow all the rest to be acquired through verbal
definition?
Word
apple
bad
banana
color
dark
edible
fruit
Definition
red fruit
not good
yellow fruit
light dark
not light
good
edible
Word
good
light
no
not
red
tomato
yellow
Results
Definition
not bad
not dark
not
no
dark color
red fruit
light color
Value
Value
AOA
400
TLF
300
Table 1: Artificially contrived (toy) dictionary of 14 words. Each defining word is defined. No word defines itself.
300
BF
I
C
200
AOA
TLF
400
BF
I
C
200
100
100
0 1 2 3 4 5 6
We reduced dictionaries to their “grounding kernels” (GKs), about 10% of the dictionary, from
which all the other words could be defined. The GK words turned out to have psycholinguistic
correlates: they were learned at an earlier age and more concrete than the rest of the dictionary. But
one can compress still more: the GK turns out to have internal structure, with a strongly connected
“kernel core” (KC) and a surrounding layer, from which a hierarchy of definitional distances can be
derived, all the way out to the periphery of the full dictionary. These definitional distances, too, are
correlated with psycholinguistic variables (age of acquisition, concreteness, imageability, oral and
written frequency) and hence perhaps with the “mental lexicon” in each of our heads.
Level
fruit
edible
0 1 2 3 4 5 6
Level
tomato
banana
Figure 3: For both dictionaries tested — Cambridge CIDE (left) and Longmans LDOCE (right) —, Kernel Core (KC)
words are acquired younger (AOA) and more frequent (BF, TLF). Concreteness (C) and imageability (I) decrease continuously with definitional distance levels.
apple
good
AOA
yellow
0.3
red
0.3
***
I
All categories, even the most “concrete” are in fact abstractions, because we must abstract from
particular cases, even concrete sensorimotor ones, in order to find the invariant features that distinguish category members from nonmembers and allow us to do the right thing with the right kind of
thing. But the more that categories are based on other categories, the more abstract they become,
and this is reflected by the distances in our induced definitional space. It is in the nature of words
to be amenable to combination and recombination in such a way as to define or describe ever more
categories. Defining, like eating, is something we do. Our more concrete categories are answerable to the constraints of the sensorimotor world in which they are grounded, but our more abstract
categories are increasingly answerable only to combinations of other categories, as we describe and
define them. In abstract mathematics, that constraint, though only formal, is still a rigorous one.
In more hermeneutic discourse (e.g., constitutional law or theology) the main constraint on words
increasingly becomes just other words. Our mental lexicon must encode the meaning of all the
words we use in our thought and discourse. Hierarchies in dictionary space may turn out to have
counterparts in cognitive space.
not
color
light
0.2
bad
***
***
I
C
0.1
no
GK
I
0.2
I
***
AOA
0.1
BF
dark
BF
AOA
0.0
KC
**
*
C
AOA
BF
0.0
C
TLF
BF
TLF
TLF
TLF
C
Figure 1: Graph of toy dictionary in Table 1. All words in dictionary are definable out of the 6 words in the Grounding
Kernel (GK, light blue) alone. The Kernel Core (KC, dark blue) is a strongly connected component (SCC) of GK.
Levels 0–6
Levels 0–6
Levels 1–6
Levels 1–6
Figure 4: Multiple regression results. Left, GK Hierarchy: GK words are significantly younger and more imageable
but not if the 0 level (GK) words are excluded. Right, SCC Hierarchy: C and I correlation significant through all levels.
Definitional Distance Hierarchies
Summary
apple
References
fruit
GK
KC
edible
bad
Blondin Massé, A., Chicoisne, G., Gargouri, Y., Harnad, S., Marcotte, O., Picard, O. (2008)
How is meaning grounded in dictionary definitions? TextGraphs ’08: Proceedings of the 3rd
Textgraphs Workshop on Graph-Based Algorithms for Natural Language Processing Manchester,
United Kingdom.
Chicoisne, G., Blondin Massé, A., Picard, O., Harnad, S. (2008) Grounding abstract word definitions in prior concrete experience 6th Int. Conf. on the Mental Lexicon Banff, Alberta.
Harnad, S. (1990) The symbol grounding problem Physica D 42:335-346.
Harnad, S. (2005) To Cognize is to Categorize: Cognition is Categorization. In Lefebvre, C. and
Cohen, H., Eds. Handbook of Categorization Elsevier.
Contact: [email protected].
banana
good
not
no
0
dark
Imageability (I)
Concreteness (C)
tomato
1
light
lo
3
2
GK
KC
4
lo
red
Spoken Frequency (BF)
lo
hi
color
hi hi
lo hi
0
1
2
3
5
4
6
yellow
lo
Written Frequency (TLF)
Figure 2: Definition-distance hierarchy induced by the strongly connected components of the graph. Level 0 words
“no” and “not” correspond exactly to the Kernel Core (KC). The level 1 words “bad” and “good” along with “dark” and
“light” form the two next strongly connected components. Higher levels are further in definitional distance. (Levels 0
plus 1 happen to be the grounding kernel in this toy example, but this is not the case in general.) Notice that all arrows
are either contained in the same level or going to an upper level.
Age of acquisition (AOA)
hi
Figure 5: Overall effect: Correlation with definitional distance from KC is one-step for age (AOA) and frequency (BF,
TLF) and continuous for concreteness (C) and imageability (I).