An Unnatural Process

An Unnatural Process
Janet B. Pierrehumbert
Northwestern University
Evanston, IL
November 13, 2002
11/02 Written version of paper delivered at 8th meeting on Laboratory Phonology,
New Haven, 6/02. Under review for Laboratory Phonology 8, Mouton de Gruyter
1
Introduction
This paper presents an experimental study of the productivity of the /k/-/s/ alternation exhibited
in derivational pairs such as electric, electricity. This alternation is exemplified in numerous word
pairs in English involving various suffixes, among them -ity, -ism and -ist. The Collins on-line
English dictionary includes 108 clear examples of words ending in -ity, -ism, or -ist which have a
stem-final /k/ softening to /s/ before the suffix. The largest single group, and the main topic of
this paper, is words formed with the suffix -ity. There are only twelve stems ending in /k/ which
fail to soften before one of these three suffixes (e.g. anarchy, anarchism, soliloquy, soliloquist). All
of these involve affixes other than -ity.
The productivity of the alternation is disputable for a number of reasons. First, there are very
few forms which would support extension of the alternation beyond an orthographic ”ic” followed by
one of the triggering suffixes. The only such pairs listed in the Collins are Greek/Grecism, opaque,
opacity , it raucous, raucity, reciprocal/reciprocate, reciprocity, and pharmacology, pharmacist. A
few additional pairs, most of which would only be known to individuals with a high vocabulary level,
are presented in Myers (1999). Second, as Myers also notes, the /k/-/s/ alternation as presently
found in English is not natural (in the sense of Anderson, 1981). Though it originates historically
in lenition and fronting of the velar stop before a front vowel, in the synchronic phonology, suffixes
with surface high front vowel do not in general trigger the softening of /k/ to /s/. The dictionary
includes more than a hundred pairs such as cheek, cheeky in which /k/ does not soften before -y,
and none in which it does. In contrast, the suffix -ize, beginning with a low vowel, does trigger
velar softening. Secondly, any phonetic pressure which turned /k/ to /s/ would presumably also
affect /t/. However, the affixes -ity and -ism never soften /t/ to /s/, as we see from /the uniform
preservation of /t/ in the 82 examples in the dictionary such as mute, mutism and sanctum, sanctity.
1
(In contrast, -y does turn /t/ into /s/, as in more than 150 pairs such as accountant, accountancy).
Lastly, as discussed in Lavoie (2001), weakening of stop closures generally create approximates,
rather than fricatives. Fricatives involve additional, active adjustments. The vocal folds are more
spread than for voiceless stops, in order to maximize turbulence at the constriction, and /s/ further
involves an exacting shaping of the tongue to direct a jet of air against the teeth. Thus, the /k/-/s/
alternation involves an articulatory reorganization which reflects structure preservation. Structure
preservation is a characteristic of an abstract and categorical level of representation rather than of
the phonetic surface.
Understanding productivity is important because it provides a crucial line of evidence about
cognitive abstractions. If an alternation completely fails to generalize beyond attested forms, it
suggests that these forms are learned individually and that no abstract generalization over them
is formed. If the alternation is aggressively and reliably extended, even to forms which differ
substantially from attested forms, it follows that a very broad abstraction has been formed. For
example, the reliable and aggressive extension of the regular English plural pattern indicates that
it abstracts away from many segmental and prosodic properties of the word. If the situation lies
somewhere in between, then the exact pattern of the extensions which are observed can provide
information about the exact character of the abstraction which has been formed.
Phonotactics is the area in which most research has been done on the availability of phonological patterns in the lexicon for use in novel forms. A considerable number of studies, reviewed in
Pierrehumbert (in press), indicate that the type frequency (frequency in the lexicon) of a phonological pattern affects the likelihood and perceived well-formedness of novel words containing that
pattern. This dependence is gradient. Sequences which are extremely numerous in the lexicon
are readily extended to new words, sequences which are moderately numerous are somewhat extended, and rare sequences are avoided. An example of such a study is Hay, Pierrehumbert, and
Beckman (in press). Their experimental study on heterosyllabic nasal-obstruent clusters found that
perceived well-formedness of novel words such as /strImpi/ and /zæpô/ was a gradient function of
the frequency of the nasal-obstruent (N.O) cluster, on the most likely parse of the novel word. In
this study, the frequency for a tautomorphemic cluster was estimated as its frequency in trochaic
monomorphemic words containing a lax front vowel in the CELEX monomorphemes. (See Baayen
et al., 1995, regarding the CELEX on-line dictionary, and Hay et al., in press, for a summary of
how the set of monomorphemes was determined). In short, frequencies were estimated over words
of the same general character as the stimuli in the experiment.
The choice of universe for estimating frequencies in Hay et al. (in press) was opportunistic, and
obscures a central issue in understanding the relation of lexical frequencies to pattern productivity.
This issue is taken up with Figure 1. Figure 1 shows a partial lattice of heterosyllabic N.O clusters.
The atoms on the bottom are individual phoneme clusters (further details of prosodic position and
context are disregarded). The nodes above the atoms are some of the various available natural
classes of such clusters. As is well-known, natural classes can be formed using partial descriptions
of phonological patterns. For example, the sequence /nt/ is an element of the set of clusters of /n/
followed by any stop; it is also an element of the set of clusters containing a homorganic nasal and
stop. The cluster /np/ belongs to the former set but not the latter; the cluster /mp/ belongs to
the latter set but not the former. The lattice is organized from specific (on the bottom) to general
2
(at the top). Each node is labelled with the probability of the indicated descriptor with respect
to the universe of N.O clusters. Probabilities have been estimated from counts in the CELEX
monomorphemes, as above. Clearly, the less specific the description, the more cases it encompasses
and the larger the natural class it describes. Thus, the probabilities go up as we follow the lines
up the lattice, but the exact way they go up depends on exactly what is lumped together in each
superset. The topmost case, any nasal followed by any obstruent is taken to define the universe for
the probabilities which are indicated below each node. If the universe were larger, the probabilities
shown on the figure would all be smaller, but their rankings would remain the same.
The question raised by Figure 1 is: Of all the probabilities which may be defined using partial
phonological descriptions of a pattern, which are cognitively relevant? Different answers to this
question lead to completely different predictions about productivity and perceived well-formedness.
For example, if the description Ni Oi (p = .88) versus Ni Oj (p = .12) were the relevant level of
abstraction – as is typically proposed in phonology textbooks – then the feature [+/- continuant]
would have no importance for the evaluation and productivity of these clusters, and inhomorganic
nasal-fricative clusters would seem every bit as bad as inhomorganic nasal-stop clusters. However,
results in Hay et al. (in press) clearly show differential outcomes for nasal-stop and nasal-fricative
clusters, indicating that this degree of generalization is too great. The next level down shows two
alternative ways to break out the cases (these are staggered solely for the clarity of the figure). One
fixes the nasal consonant and generalizes over the following the obstruent. The other generalizes
over place. It separates homorganic from inhomorganic clusters regardless of place, but it maintains
information about the continuance of the upcoming consonant. If the first alternative were the
cognitively relevant description, then /np/ would be as acceptable as /nt/. This is false. The second
alternative also groups /nt/ in a superset with other clusters, namely /mp/ and /Nk/ but not /np/.
It is closer to the true state of affairs, since it captures both a strong effect of homoganicity on
nasal-stop clusters and a weaker effect on nasal-fricative clusters. However, there is nothing about
the lattice structure which tells us that one line of generalization is more relevant or close to the
truth than the other. Although any phonologist would sensibly prefer one line of generalization to
the other, there is no explicit formal account of what this ”sensibleness” consists of. Still less is
there an explanation of why listeners proved to operate at a detailed level of description, separating
related effects for different clusters, rather than forming the simple overarching generalization which
groups them together, as in phonology textbooks.
The same issue arises in a different guise in dealing with morphophonological alternations. When
such alternations are language particular, they must be learned from examples. There is by now
abundant evidence that the productivity of an alternation depends on its type frequency (though not
necessarily in a transparent fashion). Alternations which are exhibited in extremely few types, such
as irregular conjugations for auxiliaries, will not be extended to novel forms no matter how frequently
the irregular forms may be used. However, the universe of examples relevant for any given process,
and the types of formal generalizations which are made over these examples, is not well understood.
For example, Bybee (1995) and Clahsen and Rothweiler (1992) come to different conclusions about
frequency effects in the formation of German participles, based on different assumptions about the
cognitively relevant universe.
In comparison to phonotactics, the domain of morphophonology provides both challenges and
3
opportunities in addressing this issue. It is challenging because alternations such as the /k/-/s/
alternation are generalizations over word pairs rather than merely over words. For phonotactics,
first-order logic provides a conveniently organized hierarchy of abstraction over word forms, taking
the shape of a lattice of partial descriptions as in Figure 1. For word pairs, in contrast, the
proper formal toolkit is not as evident. Is it partial descriptions of the base which are relevant?
Or partial descriptions of the complex form? Or relations of partial descriptions of the base and
the complex form? The research literature contains case studies which appear to argue for all of
these possibilities. As discussed in Myers (1999) and also below, the /k/-/s/ alternation has to be
formalized with respect to word pairs. However, claiming that word pairs are relevant does not in
itself define the relevant universe of word pairs. For determining the pronunciation of a novel form
clemicity given the base clemic, the pair conic, conicity should plainly be included. But how about
Turk, Turkism (involving a different affix)? Or ferocious, ferocity (for which no bare form of the
base exists)? Or morbid, morbidity (illustrating preservation of a final stop before the same affix)?
In short, the expectation (based on results in phonotactics and in psychology) that implicit
knowledge of morphophonology is stochastic does not in itself tell us what probabilities will be
relevant. Probabilities can be estimated for any formal description that can be tabulated. This can
include first-order descriptions, second-order descriptions, and even relations across levels of the
system. Many of the ”analytic biases” mentioned in Steriade’s original commentary on this session
(Steriade, 2002) can be viewed as claims about what statistics are available to the cognitive system.
For example, one analytic bias she discussed, in relation to data presented in Goldrick (2002), is to
the effect that voicing pairs should alternate alike. This can be viewed as claiming that statistics on
formal descriptions which conflate place information, but maintain voicing information, are highly
available for phonological acquisition and hence in the ultimate form of the phonological grammar.
Issues of mathematical factorability, as discussed in Nearey (in press) provide a possible grounding
for this availability.
Balancing the formal complexity of morphophonological alternations is the fact that the investigation of unnatural alternations sidestep one of the most persistent and recalcitrant problems of
phonology, This the relationship of frequency to the phonetic foundations of phonological systems.
Since the very beginning of phonology as a field, scholars have reported that some sounds seem
phonetically simpler than others, and that the simpler ones tend to be more common and tend to
play a default role in the phonological grammar. The typological survey of Greenberg (1966) reveals
the striking consistency across languages in which phoneme types are rare and which are common.
Phoneme types which are rare across languages also tend to be rare in the individual languages
in which they do appear. Quantitative models by Lindblom and colleagues have also gone some
distance towards elucidating the articulatory and psychoacoustic basis for more and less common
segment types in terms of their robustness and contrastiveness in segmental systems. (Liljencrants
and Lindblom, 1972; Lindblom Lindblom, B., P. MacNeilage and M. Studdert-Kennedy, 1984; Lindblom, 1986). In the light of such research, there is a risk of confusing correlation and cause when
interpreting experimental findings such as Hay et al. (in press). Although the experiment demonstrated a high correlation (r2 = 0.65) between lexical log frequency and perceived well-formedness,
this correlation could in principle arise from a concealed factor, namely phonological markedness.
Possibly, the phonetically simplest clusters are judged to be better because they are simpler and
4
they are also used more often in words because they are simpler. This would lead to a correlation
between lexical frequency and perceived well-formedness, even in the absence of any ability to learn
lexical frequencies.
Some results in the literature tend to argue against this viewpoint. For example, language
specific phonotactics can be shown to be psychologically real, as in Cutler and Otake’s (1998)
findings on the perception of nasal-obstruent clusters in Japanese versus Dutch; and several studies
show that people have good success in learning the statistics of artificial languages. However, the
present inventory of experimental findings and the present understanding of phonetic naturalness
are very incomplete. This an important issue for further research. Work on unnatural processes,
such as /k/-/s/ alternation, can make a contribution to this research by permitting an examination
of frequency effects in a area where the phonetic foundation is poor. Since the /k/-/s/ alternation
is neither pervasive in English nor ubiquitous across languages, any frequency effects which are
observed can be presumptively attributed to the the effects of experiences with words of English.
2
Methods
The experiment uses a wugs paradigm, pioneered by by Berko (1958) in experiments on language
acquisition by children. In this paradigm, subjects are taught a novel stem and then invited to use
it as the base for a complex form. ”Here is a wug. Look, now there are two of them. There are two
????? ”. This paradigm has been used by many researchers in the area of inflectional morphology,
especially to explore the competition between regular and irregular forms in the formation of verb
tenses and plurals in various languages. These studies include Bybee and Pardo (1981) and Albright
and Hayes (2002). In the present experiment, it is extended to derivational morphology, an extension
also made in Zuraw’s (2000) study of deverbal (agentive) nouns in Tagalog.
Most early studies of derivational morphology, such as studies of the English Vowel shift discussed in Cena (1978), Jaeger (1980, 1984), and McCawley (1986), use concept formation tasks or
judgments of words presented in pairs, rather than a wugs paradigm. These tasks have the potential drawback of priming awareness of the regularity being studied through the very design of the
stimulus materials. The wugs paradigm is more conservative, because of its open-response format.
In the present experiment, there were no examples (in either the instructions or the test phase) of
an extant word which participates in a /k/-/s/ alternation. Thus, the stimulus materials did not
cause the subjects to notice an alternation which might otherwise have escaped them.
Two related experiments are reported. In both, the stimuli were two sentence paragraphs, in
which the first sentence introduced a target word and a morphologically related word was missing
in the second sentence. The subject’s task was to supply this missing word. For one group of
subjects, the first sentence introduced a base adjective, and the task was to complete the second
sentence by creating an abstract noun from the adjective. The instructions drew attention to the
variety of means that English has for turning adjectives into nouns, such as affixing -ity (as in
virgin bride/ virginity) and affixing -ness (as in bright/brightness). Subjects were told to make a
noun in any way they wished, and to respond as soon as an idea occured to them. For the other
group, the format of the materials was reversed, and the subject’s job was to back-form the base
adjective from an abstract noun. Subjects practiced backforming virgin from virginity and bright
5
from brightness. Subjects for both experiments were young adults recruited through Northwestern
University and Ohio State University. Some of the subjects were in the Northwestern Linguistics
Department subject pool, enrolled in introductory linguistics courses. Other subjects were paid $8
for doing the experiment. Subject pool and paid subjects were evenly divided between the noun
formation and the backformation conditions. There were 10 subjects in the noun formation task
and 7 subjects in the back-formation task. No subject did both tasks.
For both groups, the instructions and the materials were presented orally. Subjects never saw
the written form of the materials. The items consisted of 16 baseline items (extant words with an
established nominal form ending in -ness or -ity), 16 fillers (existing and novel words that present
some degree of uncertainty between -ity and -ness), and 32 target words. None of the example
items, baseline items, or fillers involved /k/, /s/, or an alternation between /k/ and /s/. Thus the
/k/-/s/ alternation is not directly primed in the stimuli. The target word in all stimuli was the
last or next-to-last word in the sentence. The same set of bases figured in both the noun formation
and the backformation task. A full listing of baseline, target, and filler words can be found in the
appendix, including IPA transcriptions for nonwords used in filler and target items.
Of the 16 baseline items in the experiment, eight were words for which there is an established
noun in -ness and no established noun in -ity, such as [1].
[1] When Anna discovered a new doughnut shop, she was very happy. For her, a warm doughnut
means ?????. (ANSWER: happiness, *happyity)
Eight were words for which there was an established form in -ity, with the corresponding form
in -ness, if any, deemed to have a contextually inappropriate meaning.
[2] Bob’s short-term bonds were among his most liquid assets. After he got arrested, he was able
to post bail because of his high ?????. (ANSWER: liquidity, *liquidness)
The sixteen filler items were evenly divided between real words and nonwords. For real words,
adjectives were selected for which the nominal form appears to be unstable.
[3] My brother has always been very frugal. Reusing aluminum foil is just one symptom of his
?????. (ANSWER: frugalness OR frugality)
For the nonwords, adjectives were constructed for which both -ness and -ity forms were plausible, such as [4].
[4] Anthropologists working in Manuka found all the hallmarks of a caustive society. In fact, it
became a textbook example of ?????. (ANSWER: caustiveness OR caustivity).
There were three different types of target items, differing in their prototypicality as hosts for a
/k/-/s/ alternation. Eight items were Latinate pseudowords, ending in a front schwa followed by a
velar stop (e.g. in the phonetic form of the suffix -ic, /Ik/). All words were polysyllabic, and some
6
of them suggested existing words through their prefixes or stems.
[5] Halley’s comet is a very interponic comet. Its orbital period varies because of its ?????.
The second set of eight target items, also ending in /k/, had a main stress on the initial syllable
and a secondary stress on the last syllable. The (unreduced) vowel in the last syllable was /E/, /æ/
or /a/, thus the last syllable was not a good example of the suffix -ic.
[6] Before Pierre stood an electrifyingly hovac sculpture. In his entire career as curator, he had
never before seen such a perfect example of ????? .
Third, the targets included a set of eight nonLatinate bases. These were monosyllabic nonwords
(in some cases with a prefix over- or under-).
[7] Inside, the light was so dim it was entirely mork. We couldn’t read the instructions in the ?????.
A fourth set of target items, non-Latinate novel words ending in /s/, presents no real prospect
of variability in the noun formation task. These items are included because they are needed in the
backformation task to ensure a balance between /knEs/ and /snEs/ forms. A 50-50 balance of /k/
and /s/ stems when removing -ness is needed to ensure that the outcome of the -ness forms is not
priming a /k/ response for the cases of -ity removal. In the noun formation task, such items were
as in [8].
[8] Our organic vegies are guaranteed to reach you in dwess condition. If you are not using them
immediately, spritz them with a dilute solution of white vinegar to maintain their ????? .
In the backformation condition, the two sentence paragraphs were reworked so that the first sentence introduced a complex noun, and the second sentence had a missing adjective. The semantic
content of each paragraph was were minimally modified so as to maintain the semantic contextualization of the forms. For example, the reversed version of [5] is [9], and the reversed version of [6]
was [10].
[9] The period of Halle’s comet varies because of its interponicity. It is a very ????? comet.
[10] In Pierre’s entire career as a curator, he had never before seen such a perfect example of hovacity. It was an electrifyingly ????? sculpture.
Stimuli were block randomized in eight blocks of eight, with each block including one example
of all conditions, and each block including one nonword and one real word filler. Subjects repeated
the target adjective out loud in a pause after the first sentence. Their pronunciation was corrected
if necessary. Substitutions of a lower front vowel for /E/ in words such as bowdec were, however,
accepted, as some speakers appeared to have a near merger of /æ/ and /E/. The stimuli were read
7
aloud to each subject individually, and their responses were transcribed as they occurred. In cases
where a subject suggested several responses, the first response was scored. Subjects who stumbled
in the middle of a response were asked to repeat the whole word. A high quality digital recording
was also made at the same time, which was used to resolve any uncertainties in the transcription.
To score the noun formation data, the subset of responses in which subjects selected the affix
-ity after a stem ending in /k/ was extracted. The frequency with which /k/ is softened to /s/ is
computed on this subset. The size of the subset differed considerably across subjects due to the
open response format. To score the backformation data, the subset of responses was extracted in
which subjects removed all and only the affix -ity from a noun ending in /sIti/. The frequency with
which the bare stem ended in /k/ (as opposed to /s/ or some other consonant) was computed.
Subjects in both groups understood the instructions and generally succeeded at the task. All
baseline items behaved as expected except for inane, inanity, for which a number of subjects
replied inaneness. In the noun formation task, pnly one subject was able to guess at the end of
the experiment that the /k/-/s/ alternation was being investigated. In the backformation task, no
subject guessed what was being scored in the experiment.
3
Results
In the noun formation condition, subjects were generally successful in carrying out the task and
produced -ity and -ness responses about equally often. 80% of baseline items predicted to have -ity
did indeed have it. 82% of baseline items with -ness had -ness. 53% of fillers were produced with
-ity and 47% with -ness. There was a moderate preference for using -ness on nonword bases, so
that 42% of nonword fillers were produced with -ity versus 63% of word fillers.
The ten subjects in the noun formation task produced a total of 71 words in which -ity was
suffixed to a base ending in /k/. The results break out by target type as in [11]:
[11]
Target type
Latinate -ic targets
Semi-Latinate targets
Non-Latinate Targets
-ity
30
36
5
/s/s
28
30
0
%/s/
93
83
0
Of the 10 subjects, eight produced examples of velar softening, with the number of examples
ranging from 4 to 13 per subject. These results indicate that velar softening is indeed productive for
most educated adults, applying nearly 100% of the time in the Latinate targets. There is a hint of a
prototypicality effect in the finding that softening was somewhat less reliable for the semi-Latinate
targets than for the Latinate ones. However, the sample size is not big enough to firmly conclude
that the softening rate is lower for the semi-Latinate than for the Latinate stems. For the few cases
in which -ity was attached to a non-Latinate target, softening never applied. There is only a small
number of such cases, but the outcome is readily and reliably confirmed by informant judgments, so
I will view the lack of softening for the non-Latinate targets as a fact which needs to be explained.
Thus, the main effects which require explanation are the high productivity of softening for the
8
semi-Latinate targets, a group for which there is no critical mass of extant forms, and the lack of
softening in the non-Latinate stems (given the softening that was observed in the Latinate stems).
Subjects were generally successful on the back-formation task. For the baseline items there
were only two errors out of 112 items (inanity -> inate and profanity -> profound). Over all items,
subjects produced a form with a bare stem 86% of the time. Other responses involved either addition
of a suffix (snilkness -> snilky) or truncation/lexical intrusions (tromucal -> trauma, justicality ->
just). As a result of this success, there were 68 target outcomes in which -ity was removed from a
target word so as to yield a bare base. These forms implicitly provide a choice of either preserving
the observed /s/ or undoing velar softening to yield /k/. Results are as in [12].
[12]
Target type
bare
Latinate ”-ic” targets 32
Semi-Latinate targets 36
/k/s
6
5
%/k/ %/s/
18
82
13
87
In interpreting this table, recall that the materials did not include non-Latinate targets ending
in -ity. The non-Latinate targets all involved /k/ or /s/ before -ness.
The fact that some reversals of /s/ to /k/ occur is evidence of the psychological reality of the
/k/-/s/ alternation. However, the rate of back-formation of /k/ from /s/ is much lower than the
rate of velar softening in the noun formation task. Two subjects out of seven were responsible for
all cases of backformation of /s/ to /k/. These backformed at rates of 33% and 75%, respectively.
(Both produced examples of /k/ for both Latinate and semi-Latinate bases). The difference in the
size and reliability of the effect requires explanation.
4
Discussion
The experiment provided evidence that the /k/-/s/ alternation is psychologically real, as indicated
by its productive extension in an open-response experiment. The alternation is extended variably
rather than in exceptionless fashion. Thus, it would be desirable to identify the relationship between
the variability in the subjects responses and the variability in the words in the lexicon. In exploring
this question, I will make several simplifying assumptions. One is that the relevant probabilities
can be approximated over word pairs involving the exact affix in the experiments, -ity. I make this
assumption because the cognitive relevance, for the subjects’ task, of exceptions to velar softening
involving other affixes is unclear to me. A second assumption is that any given word pair either is,
is not, in the universe over which a probability is defined. I will not consider models in which the
importance of a word pair is weighted on a scale by its similarity to the current target. Although
there are some hints of such effects in the data, I leave the exploration of these effects for future
research.
A straightforward extension of previous experimental studies of phonotactics, such as the Hay
et al. experiment discussed above, would seek to identify stochastic output constraints to which the
products of -ity affixation must conform. This extension would be in the spirit of Optimality Theory,
as well as of many prior studies in prosodic morphology, in seeking to reduce morphophonological
alternations to patterns which are generally true of the language. However, it is not the case that
9
the consonant preceding the word-final strings /Iti/ and /Izm/ (or the morphemes -ity and -ism),
is usually /s/. Only 25% of words ending in /Iti/ end in /sIti/, and only 9% of words ending
in /Izm/ end in /sIzm/. For both endings, the phoneme /l/ is more common than /s/, although
even /l/ does not achieve a majority of the forms. Simple frequency tracking on the output would
predict that subjects would tend to substitute /l/ for /k/ (or for any other consonant!) but only
at a rate of around 25%. Statistics on the output do not explain either the extremely high rate of
substitution observed for the Latinate and semi-Latinate forms ending in /k/, or the failure of all
other consonants to be affected in the same way. Similarly, outcomes for the back-formation task
also appear to reflect implicit knowledge of specific morphological relationships. Without knowledge
of such relationships, there would be no pressure at all to backform /s/ to /k/, since /s/ is both
more faithful to the stimulus and more common in word-final position.
In the light of this situation, knowledge of the alternation must be a generalization over morphologically related word pairs. This assumption echoes the treatment of velar softening as a
derivational rule in Chomsky and Halle (1968). It is carried through in the non-OT literature on
computational morphology, notably Skousen (1989), Daelemans et al. (1999), Baayen (in press),
Ernestus and Baayen (2002) and Albright and Hayes (to appear). These works all share the assumption that variable outcomes in morphophonology are related to conditional probabilities defined on
word pairs.
The models of Skousen (1989) and Baayen (in press) are particularly relevant because they are
general statistical inference models which make no fundamental difference between morphological
derivation and back-formation. The variability of new complex forms as a function of base forms,
and the variability of new base forms as a function of complex forms, are both treated in the same
way. For both, the pronunciation of an unknown form is inferred from a related known form in the
light of a universe of known word pairs exemplifying the same relationship. That is, these models set
up analogies in which the unknown variable in the analogy may be either a base or a derived form:
they pose equally questions such as conic: conicity :: clemic : ? and questions such as conicity:
conic :: clemicity : ? . An analogical approach does predicts that different rates of productivity may
be found for morphological derivation and back-formation, because the probability of A given B,
and the probability of B given A are mathematically distinct and would often have different values.
Thus, the question is what universe of generalization provides the probabilities which are reflected
in the outcomes; and why the cognitive system takes this particular universe of generalization to be
the operational one, as opposed to others which are equally available from a mathematical point of
view.
The following discussion is premised on three claims about the operational level of generalization, which bring together some of the threads of the literature on morphological processing by
people and by machines.
PREMISE 1: All other things being equal, the cognitive system prefers a generalizations which
yields more information about the outcome over those which yield less.
This claim, a plain language statement of the information-theoretic proposals of Daelemans et
al. (1999), means that descriptions of the data which are associated with extreme probabilities are
10
more relevant than ones which are associated with probabilities near .5. Probabilities of 1.0 and 0.0
provide unequivocal information about the nature of the outcome. For example, in Figure 1, the
generalizations Ni Stopi (p = .72) and Ni Stopj (p = .04) are both preferable to the generalization
nStop (p = .43), because the former are farther from p = 0.5. A high probability provides positive
information about the form of the output, and a low probability tends to rule out the described
type of output. A probability of .5, representing complete uncertainty, provides no information
either way. Typically this premise will tend to favor generalizations based on small sets of words
over generalizations based on bigger sets, as smaller sets tend to be more homogeneous (to exhibit
more uniform outcomes) and bigger sets tend to be more heterogeneous (to exhibit more diverse
outcomes). But this is not always the case, as the study of learning of English verb morphology
by Derwing and Skousen (1994) indicates. They apply the AML model of Skousen (1989), which
anticipates the conclusions of Daelemans et al. (1999) by providing an algorithm for automatically
growing smaller analogical sets to bigger ones exactly when the homogeneity of the set is not compromised. More recently, Albright and Hayes (2000) also adopt information-theoretic weighting of
generalizations over examples.
PREMISE 2: All other things being equal, the cognitive system prefers generalizations based
on larger sets of data to those based on smaller sets of data.
This preference is justified because increasing the sample size increases the reliability of the
estimate of the probability of a pattern. In a morphological system directed towards automatic
part-of-speech tagging, Mikheev (1997) brings together premises 1 and 2 by assigning rule scores
based on the lower edge of the 90% confidence interval for the probability associated with the rule.
For rules with probabilities above 0.5 (e.g rules which positively specify the nature of the outcome),
this number increases both with the estimated probability and with the size of the sample from
which the probability is estimated.
A further claim made in Mikheev (1997) is:
PREMISE 3: All other things being equal, longer phonological descriptors are more better than
shorter ones.
This claim is directly at odds with the assumption of classical generative grammar that the
structural descriptions in rules should be as simple and general as they can be without encountering counterexamples. It finds a certain precedent in the Elsewhere Condition of Kiparsky (1973),
a broad principle of cognition according to which specific knowledge takes priority over general
knowledge. Recent experimental work in phonetics and psycholinguistics has shown that cognitive representations are much more redundant than was imagined in the early days of generative
grammar; see discussion in Broe and Pierrehumbert (2000), Bybee (2001) and Baayen (in press).
These results bear on the present data in suggesting that phonological characteristics which are
true of all word pairs exhibiting an alternation would be maintained in the general template for
that alternation even if they are redundant.
These premises readily explain why the Latinate and semi-Latinate stems in the noun formation
11
task displayed such a high rate of velar softening. Extant Latinate stems provide a core of multisyllabic stems ending in /Ik/, for which velar softening occurs at p=1.0 before the affix -ity. By
premise 1, this is a highly cognitively relevant generalization. Expanding the set to include multisyllabic stems ending in any /Vk/ (e.g to include the few pairs such as opaque, opacity, reciprocate,
reciprocity) increases the sample size without compromising the reliability of the generalization,
since the probability of softening is still 1.0. However, dropping the shared prosodic features, such
that the relevant universe is simply words ending in /k/ with a counterpart in -ity, simplifies the
description without expanding the sample size. The sample size remains the same because there are
no monosyllabic words ending in /k/ with a derivative in -ity. By premise 3, then, the generalization
stops short of this simplification. Hence, its structural description is not met in novel forms such
as bleckity, and it fails to apply. A general implication of Mikheev’s claim is that morphological
alternations would generalize to forms which lie in the cracks between existing forms, so to speak,
and but not to forms which break entirely new ground.
The fact that velar softening did not occur 100% of the time in the experiment may be attributed
to regression towards the mean. Experimental outcomes normally show variability, and with the
predicted probability at ceiling, the only possible effect of variability is to lower it. One source
of this variability could be variation in the vocabulary level of the subjects (since many of the
relevant word pairs are quite erudite, and lower verbal subjects would know fewer of the them).
Another could be be variation amongst subjects in the extent to a which cognitive relationships
were established for the word pairs involving -ity. For example, a pair such as septic, septicity will
only be related by someone who knows the broad meaning of septic rather than its specific use in
the phrase septic tank.
Turning now to the back-formation data, a generalization at the same level relates words of four
or more syllables ending in /VsIti/ to stems of two or more syllables ending in /VC/, where V is any
vowel. This universe includes everything in the universe for the noun formation task, plus pairings
such as strenuous, strenuosity and porous, porosity. The probability of /k/ in the stem given /s/ in
the derived form is 0.44 over this universe. The minority of /k/s is due to the greater number of
words examples such as porosity , in which the surface /s/ really does relate to a base /s/. With
the rate of /k/s being close to 0.5, the level of generalization which was operational in the noun
formation task yields, for the back-formation task, a statistic which is very unattractive from the
point of view of premise 1. Furthermore, the predicted rate of back-formation is much higher than
the rate which was observed.
Both of these problems can be solved by a shrewd increase in the universe of word pairs. The
universe just discussed did not include pairs such as liquid, liquidity or concave, concavity, because
these do not end in /sIti/. Introducing such pairs into the universe leads to extremely heterogenous
outcomes, since all the various consonants except /s/ that appear in -ity words are faithfully copied
from the stem. However, if one introduce variables into the analysis, then such pairs may be
characterized in a homogeneous way; all examples except for the /k/-/s/ pairs involve identity
between the stem-final consonant and the consonant appearing before -ity. Just as alpha notation
in the framework of Chomsky and Halle (1968) can enforce matches between structural description
and output, we allow indexing to enforce points of identity between the complex form and the base.
The pairs liquidity, liquid and concavity, concave are then both examples of the abstract pairing
12
(Ci Iti, Ci ). The need for variables in linguistic abstraction is extensively discussed in Marcus (2001)
on the basis of other experimentally findings on generalization and productivity.
Figure 2 shows the observed probability of maintaining the same C in the stem, for complex
words ending in -ity, as a function of the universe of lexical items included in making the estimate.
The leftmost universe on the graph is the one discussed just above, in which the probability of
maintaining the same C is 0.56 (that is, one minus the probability of an /s/, /k/ pairing). Successively bigger universes are nested supersets of this universe. As the graph indicates, the probability
increases monotonically as the description of the universe is relaxed to include more and more cases.
This is because the /k/-/s/ alternation is the only source of alternation for the lexical sets included
in the graph. As more supersets of /s/ are included, the percentage of nonalternation goes up.
Given that the probability of maintaining the same consonant is already above 0.5 for the smallest
set, enlarging the set steadily increases certainty about the outcome.
Insofar as enlarging the universe of comparison brings in more and more examples of nonalternating (faithful) input-output pairings, this line of reasoning about backformation yields a bias
towards an underlying representation which is faithfully reflected in the surface form. We may compare it to the principle of lexicon optimization proposed in Prince and Smolensky (1993) and further
discussed by Zuraw (2000 p. 87) amongst others. According to this principle, when estimating the
underlying representation from the surface form, the language uses selects the underlying representation which makes input(UR)-output(Surface) pair maximally harmonic. For the present case,
we would be comparing candidates /k/ and /s/ amongst others. The /s/ candidate conforms to
faithfulness, whereas the /k/ candidate always violates it. Given that the /s/ candidate is otherwise
unproblematic, it would always win over the /k/ candidate.
There are important empirical differences between these the OT accounts and the present account. Without emendation, OT lexical optimization predicts a single outcome, whereas actually
the outcome is variable. The maximally inclusive universe in Figure 2 predicts a back-formation
rate of 10%, which is within the statistical uncertainty for zero examples per individual, given the
actual counts for stripping -ity from a Latinate or semi-Latinate form. The behavior of the two
individuals who did produce backformations to /k/ can then be understood as a arising from a
more narrow conception of the universe than other people, a conception closer to the universe used
in the noun formation taks.
A second difference between the OT accounts and the present one is that the general pressure
towards faithfulness which can be read into Figure 2 depends critically on monotonicity of the
graph and on the statistics of the most narrowly described case (the leftmost point on the graph).
Faithfulness is not relevant in all situations, but only in those in which it resolves uncertainty. The
outcomes for the noun formation task provides a case in point. Although this task was an exact
counterpart to the backformation task, the productivity of the alternation proved to be entirely
different. There was at best a sporadic penchant for a faithful outcome, in contrast to the strong
preference found in the backformation task. This difference can be explained by noting that in the
forward direction, the /k/-/s/ alternation is statistically reliable; expanding the universe of description would only introduce variability and weaken a certain inference. For the backformation task,
in contrast, expanding the universe increases certainty, since the inference was so very uncertain to
begin with.
13
Given this line of argument, it is important to figure out why the analogy set based on words
in -icity /"IsIti/ did not appear to be active or relevant in the present experiment. It may have
had more importance for two subjects than for the others, but even they did not backform to /k/
reliably on words ending in -icity. This descriptor is more specific than those displayed in the figure,
and the set it describes has a baseform ending in /k/ with P = 1.0. If this set were operational, the
experimental results should have displayed a distinction between the Latinate targets (which would
backform to /k/) and the others (for which /s/ would be the more probable outcome). One might
suggest that the many words ending in -ic should not be counted separately, but rather treated as
multiple instantiations of a single suffix. However, a listing of the lexical entries shows that the
decomposability of many of the more common words in -ic is extremely disputable. Examples of
words ending in /Ik/ which can further add -ity include:
[13] eccentric/eccentricity (*eccenter)
plastic/plasticity (*plast)
public/publicity (*puble)
rustic/rusticity (*rust)
toxic/toxicity (*tox)
In addition, if we treat all -ic words under a single entry, the number of examples of velar
softening is so highly reduced that its productivity becomes surprising.
The failure of words in -icity to support a strong pattern of backformation casts doubt on the
form of premise 3. Even if moderately long descriptions are preferred to very minimal ones, it
may not be the case that very long descriptions are preferable to long ones. In fact, Mikheev’s
suggestion is opposite to that advanced in Pierrehumbert (2001). Based on Monte Carlo simulations of phonotactic learning, Pierrehumbert argues that extremely detailed descriptions tend to
be statistical unstable, since they pertain to so few cases that their statistics are not robust across
individual differences in vocabulary. These proposals can be reconciled by suggesting that the entry
level for generalizations over phonological patterns is moderately detailed – more detailed than the
simplest descriptions that Mikheev dismisses, but still more broadly applicable than the worst cases
considered by Pierrehumbert. These entry level descriptions are then further generalized with the
generalization increases certainty about the outcome, as discussed above.
5
Conclusion
In conclusion, the /k/-/s/ alternation was found to be highly productive in noun formation, and
some evidence of its psychological reality is also found in backformation. An approach based on
statistical inference over word pairs enjoys considerable success in explaining the outcomes. A key
assumption are that the universe of comparison grows more general to provide a critical mass of
examples and to reduce uncertainty in the predicted outcome. A second key to success is the idea
that word pairings can involve variables, so that pairings in which the same consonant is preserved
no matter what its character can act together in influencing the outcome.
Explaining why /k/ is preserved in forms such as /blEk/–/blEkIti/ requires the assumption that
in abstracting over a universe of examples, the cognitive system prefers to maintain somewhat rich
14
and redundant characterizations of those examples. Thus, abstractions are not simplified beyond
what is required by differences amongst the examples. At the same time, the preservation of /s/
in backformations such as /IntôpanIsIti/–/Intôpan@s/ indicates some limits on how fine-grained and
detailed abstractions can be. The behavior of these cases suggests that arbitrary phonological descriptors are not actually relevant to forming the universe for the morphophonological inference.
Instead, there may be a privileged degree of granularity in analysis, reminiscent of the basic level
of categorization which privileges the concept ”dog” over ”dalmation” or ”animal”. Initial generalizations made at this level can then be adjusted upward or downward to achieve more certainty
about the outcomes.
6
References
Albright and Hayes (2002) Rules vs. Analogy in English Past Tenses: A Computational/ Experimental Study. Paper presented at the 76th Annual Meeting of the Linguistic Society of America, San
Francisco, Jan 3-6. Downloadable from http://www.humnet.ucla.edu/humnet/linguistics/people/hayes
Anderson, S. R. (1981) ”Why Phonology Isn’t ’Natural’”, Linguistic Inquiry 12:4, 493-539.
Baayen, H. (in press) Probabilistic Morphology. R. Bod, J. Hay and S. Jannedy (eds.), 2002. Probability Theory in Linguistics. The MIT Press.
Baayen, R.H, Piepenbrock, R. and Gulikens, L. (1995) The CELEX lexical database (release 2)
CD-ROM, Philadelphia, PA: Linguistic Data Consortium, University of Pennsylvania (Distributor), Philadelphia, PA.
Berko, J. (1958) The child’s learning of English morphology. Word 14, 150-177.
Boersma, P. (1998) Functional Phonology: Formalizing the Interactions between Articulatory and
Perceptual Drives. University of Amsterdam, Amsterdam, Holland.
Boersma, P. and B. Hayes (2001) Empirical Tests of the Gradual Learning Algorithm, Linguistic
Inquiry 32, 45-86. (Downloadable from http://fonsg3.let.uva.nl/paul/)
Broe, M. and J. Pierrehumbert, eds. (2000) Introduction, Papers in Laboratory Phonology V: Acquisition and the Lexicon, Cambridge University Press.
Bybee, J. (1995) Regular Morphology and the Lexicon, Language and Congnitive Processes 10:5,
425-435.
Bybee, J. (2001) Usage-Based Phonology. Cambridge University Press, Cambridge UK.
15
Bybee, J. and E. Pardo (1981) Morphological and lexical conditioning of rules: Experimental evidence from Spanish. Linguistics 19, 937-968.
Cena, R. M. (1978) When is a phonological generalization psychologically real. Indiana University
Linguistics Club.
H. Clahsen, H. and M. Rothweiler (1992) Inflectional Rules in Children’s Grammars: Evidence from
the Development of Participles in German, Yearbook of Morphology, 1-34.
Cutler, A. and T. Otake (1998) Assimilation of Place in Japanese and dutch, Proceedings of the
Fifth International Conference on Spoken Language Processing, Sydney, Australia, 1751-1754.
Daelemans, W., Zavel, J., Van der Sloot, K. and Van den Bosch, A. (1999) TiMBL: Tilburg memory
based learner reference guide 2.0. Report 99-01, Computational Linguistics, Tilburg University.
Derwing, Bruce, and Royal Skousen. (1994). ”Productivity and the English Past Tense: Testing
Skousen’s Analogy Model”. The Reality of Linguistic Rules, edited by Susan D. Lima, Roberta L.
Corrigan, and Gregory K. Iverson, 193-218. Amsterdam: John Benjamins.
Ernestus, M. and H. Baayen (2002) The Functionality of Incomplete Neutralization in Dutch: The
Case of Past-tense Formation Paper presented at Labphon8, Jun3 27-30 2002, New Haven CT.
Flemming, E. (1995) Auditory Phonology, Ph.D dissertation, UCLA.
Goldrick, M. (2002) Probabilistic Phonotactic Constraints and Sub-segmental Representations: Evidence from Speech Errors. Paper presented at Labphon8, Jun3 27-30 2002, New Haven CT.
Greenberg, J. (1966) Language Universals: with special reference to feature hierarchies. Mouton,
the Hague.
Hay, J. B., Pierrehumbert, J. B., and Beckman, M. E. (in press) Speech Perception, Well-Formedness,
and the Statistics of the Lexicon. J. Local, R. Ogden, and R. Temple (eds) Papers in Laboratory
Phonology VI, Cambridge University Press, Cambridge UK.
Jaeger, J. (1980) Categorization in phonology: an experimental approach. Ph.D dissertation, U.C.
Berkeley.
Jaeger, J. (1984) Assessing the psychological reality of the Vowel Shift Rule. Journal of Psycholinguistic Research. 13, 13-36.
Kiparsky, P. (1973) ”Elsewhere” in phonology. In S. Anderson & P. Kiparsky (eds.). A Festschrift
for Morris Halle. New York: Holt. 93-106.
16
Lavoie, L. (2001) Consonant Strength: Phonological Patterns and Phonetic Manifestations. Routledge.
Liljencrants, J. and B. Lindblom (1972) Numerical Simulation of Vowel Quality Systems: The role
of Perceptual Contrast, Language 48, 839-861.
Lindblom, B., P. MacNeilage and M. Studdert-Kennedy (1984) Self-organizing processes and the
explanation of phonological universals. in B. Butterworth, B. Comrie and O. Dahl, eds., Explanations for language universals. Mouton, New York, 181-203.
Lindblom, B. (1986) Phonetic Universals in Vowel Systems, in J. Ohala and J. Jaeger (eds) Experimental Phonology, Academic Press, Orlando FL, 1773-1781.
Marcus(2001) The Algebraic Mind: Integrating Connectionism and Cognitive Science (Learning,
Development and Conceptual Change). MIT Press, Cambridge MA.
McCawley, J. D. (1986) Today the world, tomorrow phonology. Phonology Yearbook 3, 27-45.
Mikheev, A. (1997) Part-of-Speech Guessing Rules: Learning and Evaluation. Computational Linguistics 23, 3, 405-423. Myers, J. (1999) Lexical Phonology and the Lexicon. MS Graduate Institute
of Linguistics, National Chung-Cheng University, Taiwan. Downloadable from Rutgers Optimality
Archive.
Nearey, T. M. (in press) On the factorability of phonological units in speech perception, J. Local,
R. Ogden, and R. Temple (eds) Papers in Laboratory Phonology VI, Cambridge University Press,
Cambridge UK.
Pierrehumbert, J. (2001) Why phonological constraints are so coarse-grained. In J. McQueen and
A. Cutler (Eds) SWAP special issue, Language and Cognitive Processes 16 5/6, 691-698.
Pierrehumbert, J. (in press) Probabilistic Phonology: Discrimation and Robustness. R. Bod, J.
Hay and S. Jannedy (eds.), 2002. Probability Theory in Linguistics. The MIT Press (to appear)
Prince, A. and P. Smolensky (1993) Optimality Theory: Constraint interaction in Generative Grammar. Technical repots of the Rutgers University Center for Cognitive Science TR-2.
Skousen (1989) Analogical Modelling of Language. Kluwer, Dordrecht.
Steriade, D. (2002) comentary presented at Labphon8, June 27-39 2002, New Haven, CT.
Zuraw(2000) Patterned Exceptions in Phonology. Ph.D dissertation, UCLA.
17
7
Appendix
[1] Baseline Items
NESS-WORDS
calm
correct
happy
icy
kind
light
lonely
sweet
calmness
correctness
happiness
iciness
kindness
lightness
loneliness
sweetness
ITY-WORDS
abnormal
captive
inane
liquid
profane
senile
valid
virile
abnormality
captivity
inanity
liquidity
profanity
senility
validity
virility
NESS-WORDS
arid
aridness
frugal
frugalness
human
humanness
odd
oddness
ITY-WORDS
arboreal
binomial
liberal
real
arboreality
binomiality
liberality
reality
[2A] Word Fillers
[2B] Nonword fillers
NESS-WORDS
Spelling
clipid
demarte
flader
mastive
ITY-WORDS
Spelling
bordal
caustive
justical
tromucal
IPA
Spelling
IPA
clipidness
kl"IpIdn@s
demarteness d@m"aôtn@s
fladerness
fl"edôn@s
mastiveness m"æstIvn@s
IPA
Spelling
bordality
caustivity
justicality
tromucality
kl"IpId
d@m"aôt
fl"edô
m"æstIv
b"Oôdl
k"OstIv
Ã"2stIkl
tr"amjukl
IPA
bOôd"ælIti
kOst"IvIti
Ã2stIk"ælIti
tramjuk"ælIti
[3A] ”Latinate” targets with stem in a sequence construable as the suffix ”-ic”.
18
Spelling
IPA
kl"EmIk
clemic
criotic
kôaj"atIk
extric
EkstôIk
hytronic
hajtô"anIk
interponic Intôp"anIk
malatonic mæl@t"anIk
phynomic fajn"omIk
runomic
ôun"amIk
Spelling
IPA
clemicity
klEm"IsIti
crioticity
kôaj@t"IsIti
extricity
Ekstô"IsIti
hytronicity
hajtô@n"IsIti
interponicity Intôpan"IsIti
malatonicity mæl@t@n"IsIti
phynomicity fajn@m"IsIti
runomicity
run@m"Isiti
[3B] ”Semi-Latinate” targets ending in a 2ndary stressed syllable not construable as ”-ic”.
Spelling
bowdec
hovac
nodac
pavoc
solvoc
stanorac
strenoc
trylec
IPA
b"odEk
h"ovæk
n"odæk
p"ævak
s"alvak
st"æn@ôæk
stô"Enak
tô"ajlEk
Spelling
bowdecity
hovacity
nodacity
pavocity
solvocity
stanoracity
strenocity
trylecity
IPA
bod"EsIti
hov"æsIti
nod"æsIti
pæv"asIti
solv"asIti
stæn@ô"æsIti
stôEn"asIti
tôajl"EsIti
[3C] ”Non-Latinate” targets ending in /k/
Spelling
bleck
mork
over-glique
shruk
snilk
tuque
twake
under-grack
IPA
blEk
m"Oôk
ovôgl"ik
Sô"2k
sn"Ilk
t"uk
tw"ek
2ndôgô"æk
Spelling
IPA
bleckness
bl"Ekn@s
morkness
m"Oôkn@s
over-gliqueness ovôgl"ikn@s
shrukness
Sô"2kn@s
snilkness
sn"Ilkn@s
tuqueness
t"ukn@s
twakeness
tw"ekn@s
under-grackness 2ndôgô"ækn@s
[3D] ”Non-Latinate” targets ending in /s/
Spelling
blarse
deploose
dwess
druss
jace
melse
queece
under-dass
IPA
bl"aôs
d@pl"us
dw"Es
dô"2s
Ã"es
m"Els
kw"is
2ndôd"æs
Spelling
blarseness
deplooseness
dwessness
drussness
jaceness
melseness
queeceness
under-dassness
19
IPA
bl"aôsn@s
d@pl"usn@s
dw"Esn@s
dô"2sn@s
Ã"esn@s
m"Elsn@s
kw"isn@s
2ndôd"æsn@s
8
Figure Captions
Figure 1: Partial lattice of probabilities for heterosyllabic nasal obstruent clusters in monomorphemic words of English. Counts are established with respect to the Celex monomorphemes, as
discussed in Hay et al. (in press).
Italic indices are used for convenience to indicate homorganicity or nonhomorganicity. (Actual
phonological structures for homorganic consonants have have feature sharing.) Capital N denotes
any nasal. Probabilities of superordinate categories include frequencies of atoms which are not
shown at the bottom, but which are described by the superordinate note. For example, the probability of Ni Stopi includes velar clusters, and the probability of nFric includes /n.z/ etc.
Figure 2: Rate of faithful outcomes for the final consonant of the base in relation to the consonant
preceding -ity in the complex form. The leftmost point is computed for the university of -ity words
ending in /sIti/. The universe is expanded by taking successively larger supersets of this set.
20
Partial lattice of descriptors for NO clusters in monomorphemes.
NO
1.0
NiOj
.12
NiOi
.88
NiStopi
.72
n.t
.19
mStop
.22
nFric
.21
nStop
.43
n.p
0.0
NiFrici
.16
n.s
.11
n.f
.03
mFric
.20
NiFricj
.08
NiStopj
.04
m.t
.0008
m.p
.12
m.s
.0048
m.f
.009
B
P(=C)|Xity
1
Probability Bof C[i] = C[j]
[+cons]
0.8
0.6
[+cont]
s
s,t,d,l,n [ + c o r ]
s,t,d
Maximum
uncertainty
0.4
0.2
0
100
200
300
400
Count of lexical
items
A
500
600