When stems are «transparent»: THE ORTHOGRAPHY-SEMANTICS CONSISTENCY Simona Amenta, Marco Marelli, Davide Crepaldi What makes the following word-sets different? A Widow B Whisk Cheer Herb Poet Cheek Helm Pond Quiet Train Quest Trail What makes the following word-sets different?* Transparent Set Opaque Set Widow Whisk Cheer Cheek Herb Helm Poet Pond Quiet Quest Train Trail *These words are often found as stem targets in the majority of morphological priming studies. A classic masked priming paradigm transparent TARGET SPEAK opaque ARCH 500ms ########## 40ms prime TARGET PRIME UNREL. PRIME SPEAKER ARCHER POTTERY POCKET Something got overlooked… Main effect of semantic transparency Target stems used in the transparent condition elicit quicker RT than target in the opaque condition, independently of the type of prime (related vs. unrelated). This effect is consistent throughout most studies of morphological priming. A meta-analysis of previous experiments Exp.1: Validating the «stem transparency» effect •Validating the observed effect excluding potential methodological and lexical confound: •Testing the effect under simpler experimental condition -> unprimed words •Dataset: items from previous experiments that were also included in the British Lexicon Project (Keuleers et al., 2012; BLP): •325 words (157 «transparent», eg., WIDOW; 168 «opaque», eg. WHISK) The regression model confirmed the difference observed in previous experiments: Stems from transparent sets > stems from opaque sets The case of the widow and the whisk Why do we have a «stem transparency» effect? The correct question might be another: How good is a word as a cue for its meaning? Widow: widower, widowood, widowed… Whisk: whisky, whiskey, whiskered, whisker, whiskery… The association between form and meaning in stems from the «opaque» set might be weaker: «Opaque» stems are worse symbols than «transparent» stems A measure of orthography-to-semantics consistency • Family of «orthographic relatives» for each of the 325 words • collecting all words starting with those stems from a list including the top 30k most frequent content words in a 2.8-billion corpus (ukWaC, English Wikipedia, BNC) • Measure of semantic similarity between a stem and each of its orthographic relatives • methods from distributional semantics (simil-HAL or LSA) Distributional semantics •The meaning of a word can be approximated by the way that word co-occurs with other words in the lexicon •In a Distributional Semantic Model (hence, DSM) word meanings are represented as vectors that are derived from these co-occurrences •The more two words tend to occur with the same set of other words (i.e., in similar contexts), the more their vectors will be close, the more their meanings will be considered to be similar. •Geometrically, this amounts to measuring the cosine of the angle formed by the two vectors: the more similar the vectors, the smaller the angle between them, the higher their cosine Orthography-Semantics Consistency 𝑂𝑆𝐶 𝑡 = 𝑘 𝑥=1 𝑓𝑟𝑥 ∗ cos 𝑡, 𝑟𝑥 𝑘 𝑥 𝑓𝑟𝑥 t is the target word (stem); rx each of its k orthographic relatives, and frx the corresponding frequencies extracted from the corpus • measure of semantic similarity between a stem and each of its orthographic relatives • OSC was computed as the frequency-weighted average semantic similarity • OSC is a 0-to-1 score Exp.2: Testing the OSC hypothesis We tested the effect of OSC in (a) distinguishing stems coming from either opaque or transparent sets in previous priming experiments (b) predicting lexical decision latencies for those items in the BLP OSC density in the two sets OSC was larger in the «transparent» set (.72±.012) than in the «opaque» set (.50±.017) Difference between the two sets was significant (t(322)=7.41, p=.0001) Substituting OSC to transparency OSC values were regressed against log-transformed RTs extracted from the BLP (log-freq–from SUBTLEX-uk-, family size –from CELEX -, and length in letters were also included in the analysis) OSC is able to explain RTs in lexical decision (b=-0.046, t=3.47, p=.0006, estimates of OSC replacing «semantic transparency» factor in previous analyses) Exp.2: Conclusions OSC distinguishes stems coming from opaque and transparent sets in previous priming experiments • Stems taken from the transparent sets have significantly higher OSC -> selection bias? OSC predicts lexical decision latencies for items in the BLP • OSC has a facilitatory effect on RTs in lexical decision -> learning? The «stem transparency effect» may be explained by considering how much, in the whole lexicon, the orthographic information carried by the stem is consistent with its associated semantics Exp.3: Generalizing the OSC effect •1821 words, randomly sampled from the words included in the semantic space created for the previous experiment and the BLP •OSC ( Est.=-0.0254; t=3.84; p=.0002) has facilitatory effects on RTs: the higher OSC scores, the shorter the RTs •OSC does not correlated with Frequency, Family Size and Length Even when considering a large set of items, the effect of OSC on lexical decision latencies is significant Conclusive remarks oThe strength of the association between orthography and semantics contributes to determining how easily a word is recognized oHence, consistency measures, like OSC, should be considered every time we are interested in investigating visual word recognition (at least) Back to morphology… OSC is not, strictly speaking, a morphological measure: ◦ OSC input is comprised also of words that are not morphological relatives (brother) or morphologically parsable (brothel) However, morphological relatives will contribute largely to the computation of OSC: ◦ words that are morphologically related share the onset and their meaning is mostly related OSC is thus in line with views of morphology as a phenomenon that emerges from form-meaning patterns Future directions oWhat are, if any, the consequences for the priming phenomenon? oTo what extent could OSC explain the priming phenomenon? Thank you!
© Copyright 2026 Paperzz