Collecting Cognitively Salient Semantic Relations Gerhard Kremer [email protected] Introduction Preliminary Focus Motivation Electronic learner’s dictionary for German & Italian (ELDIT) – among others, shows semantically related words for a target word. Composed part relations (adj modifier + part-noun), e. g. rabbit: has long ears dog: has a wet nose Method Goal Assuming that salient instances of part relations for a given concept have been identified already: Find the most salient modifiers. Production experiment data was used both as input (concept, part) and for evaluation (modifier-part pairs produced for a concept) Approach Based on occurrence frequencies of concept/part/modifier combinations in the WaCKy Webcorpus, create ranked list and select the 5 highest ranked modifier candidates. Concept dog: DE 1.0 has paws, a tail, barks © has a heart, breathes § IT 1.0 Examples Evaluated Rank Lists (DE) modifier [...] part within 4 words (and within 20 sentences of concept) same, but not considering concept context combination: multiplication of frequencies from both sources reranking: pull up those modifiers which are similar to those at higher ranks (by calculating their cosine distance based on nouns they co-occurr with) without considering concept in concept context combined: freq(contextless) x freq(in context) reranked combi list (acc. to cos distance) 0.2 0.4 Precision 0.2 0.4 Precision 0.6 0.8 without considering concept in concept context combined: freq(contextless) x freq(in context) reranked combi list (acc. to cos distance) 0.8 Goal Semi-automatically harvest instances of cognitively salient semantic relations from text resources from the web. 0.6 . . . but entries were selected manually, on an arbitrary basis. Webcorpus Excerpt Example . . . Die mittelgroßen Affen leben in Gruppen von etwa 15 Tieren auf Bäumen im Regenwald. [. . . ] Die Kipunji sind verwandt mit anderen Mangaben, doch sie weisen einige Besonderheiten auf. Sie haben braunes oder hellbraunes Fell und geben Töne von sich. . . Task: Describe the given concept in short phrases Stimuli: 50 concrete concepts from 10 classes (DE & IT) Data: Categorised into types of relations (cf. McRae et al.) Analysis: Deviations of the top 6 types from overall production frequency distribution (for each concept class) go cate mammal ry German Italian type type part ty aviour ction ation i l a qu beh fun loc ry go cate Pearson residuals: 22.9 4.0 2.0 0.0 −2.0 −4.0 class class 0.4 0.6 0.8 1.0 0.0 Recall 0.2 0.4 0.6 0.8 1.0 Recall Production vs. Perception (DE) Follow-up judgement experiment: “The part of a concept is modifier.” – plausible or not? Result for the acceptance rate of 0.75: Small overlap of modifiers both produced and accepted (46), compared to modifiers only produced (53) or only accepted (42). Conclusion vegetable bodypart clothing implement implement vehicle vehicle building 0.2 Pearson residuals: 23.3 fruit vegetable furniture 0.0 bird fruit clothing ty aviour ction ation i l a qu beh fun loc mammal bird bodypart part 0.0 0.0 Experiment: Feature Production −14.2 p−value = < 2.22e−16 furniture building 4.0 2.0 0.0 −2.0 −4.0 −12.6 p−value = < 2.22e−16 Results Preferred relation types depending on which (super-) class the concept belongs to Similar patterns for German and Italian data → General Procedure: For a given concept select those relation types which are prominent within its concept class, and for each of these find those relations which are cognitively most salient. Best method combines in-context and contextless information (with similar performance for German and Italian), and yields both produced and perceived modifiers Reranking improvable? Adaptable to part-noun collection and other relation types? References Kremer, G., Abel, A., and Baroni, M. (2008). Cognitively salient relations for multilingual lexicography. In Proceedings of the Workshop on Cognitive Aspects of the Lexicon ( COGALEX 2008), pages 94–101, Manchester, United Kingdom. Coling 2008 Organizing Committee. McRae, K., Cree, G. S., Seidenberg, M. S., and McNorgan, C. (2005). Semantic feature production norms for a large set of living and nonliving things. Behaviour Research Methods, Instruments & Computers, 37(4):547–559. Spence, D. P. and Owens, K. C. (1990). Lexical co-occurrence and association strength. Journal of Psycholinguistic Research, 19(5):317–330. Advisors: Marco Baroni (CIMeC), Andrea Abel (EURAC)
© Copyright 2026 Paperzz