I S C Hierarchies in Dictionary Definition Space Institut des sciences cognitives Olivier Picard1, Alexandre Blondin Massé2, Stevan Harnad1, Odile Marcotte3, Guillaume Chicoisne1 and Yassine Gargouri1. 1 Institut des sciences cognitives, Université du Québec à Montréal, Montreal, Canada 2 Laboratoire de Combinatoire et d’informatique mathématique, Université du Québec à Montréal, Montreal, Canada 3 Groupe d’études et de recherche en analyse des décisions and UQÀM, HEC Montréal, Montreal, Canada Introduction Dictionaries and Graphs A category is a kind of thing (object, event, action, trait or state). To categorize is to do the right thing (eat, fight, flee, mate, etc.) with the right kind of thing. All species can acquire categories through trial and error sensorimotor induction. We are the only species that can also acquire and transmit categories through verbal instruction, by naming and defining them. The words in our dictionaries are almost all the names of categories, followed by their definitions. In principle, all categories can be acquired through verbal definition, but we cannot acquire all of them that way: we have to know the meanings of some of the defining words already, by some other means. This is the “symbol grounding problem” and presumably that other means of acquiring categories is sensorimotor induction. But how many words — and which ones — need to be grounded directly through sensorimotor induction in order to allow all the rest to be acquired through verbal definition? Word apple bad banana color dark edible fruit Definition red fruit not good yellow fruit light dark not light good edible Word good light no not red tomato yellow Results Definition not bad not dark not no dark color red fruit light color Value Value AOA 400 TLF 300 Table 1: Artificially contrived (toy) dictionary of 14 words. Each defining word is defined. No word defines itself. 300 BF I C 200 AOA TLF 400 BF I C 200 100 100 0 1 2 3 4 5 6 We reduced dictionaries to their “grounding kernels” (GKs), about 10% of the dictionary, from which all the other words could be defined. The GK words turned out to have psycholinguistic correlates: they were learned at an earlier age and more concrete than the rest of the dictionary. But one can compress still more: the GK turns out to have internal structure, with a strongly connected “kernel core” (KC) and a surrounding layer, from which a hierarchy of definitional distances can be derived, all the way out to the periphery of the full dictionary. These definitional distances, too, are correlated with psycholinguistic variables (age of acquisition, concreteness, imageability, oral and written frequency) and hence perhaps with the “mental lexicon” in each of our heads. Level fruit edible 0 1 2 3 4 5 6 Level tomato banana Figure 3: For both dictionaries tested — Cambridge CIDE (left) and Longmans LDOCE (right) —, Kernel Core (KC) words are acquired younger (AOA) and more frequent (BF, TLF). Concreteness (C) and imageability (I) decrease continuously with definitional distance levels. apple good AOA yellow 0.3 red 0.3 *** I All categories, even the most “concrete” are in fact abstractions, because we must abstract from particular cases, even concrete sensorimotor ones, in order to find the invariant features that distinguish category members from nonmembers and allow us to do the right thing with the right kind of thing. But the more that categories are based on other categories, the more abstract they become, and this is reflected by the distances in our induced definitional space. It is in the nature of words to be amenable to combination and recombination in such a way as to define or describe ever more categories. Defining, like eating, is something we do. Our more concrete categories are answerable to the constraints of the sensorimotor world in which they are grounded, but our more abstract categories are increasingly answerable only to combinations of other categories, as we describe and define them. In abstract mathematics, that constraint, though only formal, is still a rigorous one. In more hermeneutic discourse (e.g., constitutional law or theology) the main constraint on words increasingly becomes just other words. Our mental lexicon must encode the meaning of all the words we use in our thought and discourse. Hierarchies in dictionary space may turn out to have counterparts in cognitive space. not color light 0.2 bad *** *** I C 0.1 no GK I 0.2 I *** AOA 0.1 BF dark BF AOA 0.0 KC ** * C AOA BF 0.0 C TLF BF TLF TLF TLF C Figure 1: Graph of toy dictionary in Table 1. All words in dictionary are definable out of the 6 words in the Grounding Kernel (GK, light blue) alone. The Kernel Core (KC, dark blue) is a strongly connected component (SCC) of GK. Levels 0–6 Levels 0–6 Levels 1–6 Levels 1–6 Figure 4: Multiple regression results. Left, GK Hierarchy: GK words are significantly younger and more imageable but not if the 0 level (GK) words are excluded. Right, SCC Hierarchy: C and I correlation significant through all levels. Definitional Distance Hierarchies Summary apple References fruit GK KC edible bad Blondin Massé, A., Chicoisne, G., Gargouri, Y., Harnad, S., Marcotte, O., Picard, O. (2008) How is meaning grounded in dictionary definitions? TextGraphs ’08: Proceedings of the 3rd Textgraphs Workshop on Graph-Based Algorithms for Natural Language Processing Manchester, United Kingdom. Chicoisne, G., Blondin Massé, A., Picard, O., Harnad, S. (2008) Grounding abstract word definitions in prior concrete experience 6th Int. Conf. on the Mental Lexicon Banff, Alberta. Harnad, S. (1990) The symbol grounding problem Physica D 42:335-346. Harnad, S. (2005) To Cognize is to Categorize: Cognition is Categorization. In Lefebvre, C. and Cohen, H., Eds. Handbook of Categorization Elsevier. Contact: [email protected]. banana good not no 0 dark Imageability (I) Concreteness (C) tomato 1 light lo 3 2 GK KC 4 lo red Spoken Frequency (BF) lo hi color hi hi lo hi 0 1 2 3 5 4 6 yellow lo Written Frequency (TLF) Figure 2: Definition-distance hierarchy induced by the strongly connected components of the graph. Level 0 words “no” and “not” correspond exactly to the Kernel Core (KC). The level 1 words “bad” and “good” along with “dark” and “light” form the two next strongly connected components. Higher levels are further in definitional distance. (Levels 0 plus 1 happen to be the grounding kernel in this toy example, but this is not the case in general.) Notice that all arrows are either contained in the same level or going to an upper level. Age of acquisition (AOA) hi Figure 5: Overall effect: Correlation with definitional distance from KC is one-step for age (AOA) and frequency (BF, TLF) and continuous for concreteness (C) and imageability (I).
© Copyright 2026 Paperzz