Optimality Theory: Experimental Extensions Jeroen van de

Journal of the Phonetic Society
of Japan, Vol.14 No.1
April 2010, pp.00–00
音声研究 第14巻第 1 号
2010(平成22)年 4 月
● ‒ ●頁
Optimality Theory: Experimental Extensions
Jeroen van de WEIJER*
最適性理論:その実験的展開
イェルーン ヴァンデウェイヤー *
要旨:この論文では,音声学(調音の容易性)に基づく有標性制約の基盤付けと,心理言語学(語認識の効
率)に基づく忠実性制約の基盤付けについて,その方法論に関する一案を論ずる。制約そのものは全て普遍
的であり,個別言語ごとのバリエーションは制約階層(=文法)のみにあるというのが最適性理論の基本方
針であるが,制約の基盤付けに基づくアプローチこそがこの基本方針を(単に仮定するのではなく実質的に)
導き出すのである。更に,個別的な制約階層における制約の順序の付け方に関しては,音声学や心理言語学
による基盤付けのほかに,類型論的な予測による基盤付けにも左右される。そこで,こうした基盤付けに基
づくアプローチを実験的にどのように展開してゆくかに注目しつつ,文法理論全体に与える意味合いについ
ても検討する。
Key words: Optimality Theory, markedness, phonetics, faithfulness, psycholinguistics, typology, experimental
linguistics
1. Introduction
Recent times have witnessed two parallel but contradictory trends in phonology: first, there is increasing
specialization in the field: researchers are becoming
increasingly specialized, and some phonologists have
come to concentrate almost exclusively on particular
subfields, such as tonology, vowel harmony, intonation
studies, metrical structure, the interface of phonology
with morphology or syntax, etc. (van Oostendorp and
van de Weijer 2005). At the same time there is an
increasing need of and call for a “holistic” theory of
linguistics, in which researchers cross disciplinary
boundary lines (see Jackendoff 2007, van de Weijer
2009a). Due to the advance of psycholinguistics, an
increased interest is noted in the competence andd performance of single individuals on specific occasions,
and the development to study variation on a small scale,
in communities or across generations. Increased technological possibilities, e.g. in phonetics, have made
empirical verification and experimental testing possible
that were unheard of ten or twenty years ago, as well as
other kinds of research, such as corpus linguistics and
increasingly realistic modelling in computational theories such as connectionism. As a result, further integration between disciplines is called for and now seems
more feasible than ever. In this paper I will address one
area in which theoretical phonology, specifically Optimality Theory, might connect with more “concrete”
approaches such as phonetics and psycholinguistics.
2. Grounding Optimality Theory
The introduction of OT (Prince and Smolensky
1993/2004) represented a “paradigm revolution” (Kuhn
1962) in many ways. OT proposed a new way about
how phonology (and other aspects of language) works,
and was highly successful in its application to a number
of old problems, especially where these involved inherent conflict, such as the sometimes contradictory
demands of the prosodic and segmental level, or the
need to fit morphology into a particular accepted phonological shape. However, this did mean that a lot of
questions that were fashionable before the advent of
OT all of a sudden did not receive the attention they had
enjoyed beforehand (see again van Oostendorp and
van de Weijer 2005 for discussion and partial reparation).
These included some of the basic issues in phonology,
such as the content of the set of distinctive features, the
“arity” of these features, its universality, and the
organization into a geometry. The same goes for higher
prosodic levels, such as the structure of the syllable,
questions regarding foot structure and inventory, and
the role of such notions as government and licensing.
* Full Professor, College of English Language and Literature, Shanghai International Studies University(上海国際研究大学
英語英文学科正教授)
̶1̶
3weijer.indd 1
2010/06/14 10:37:02
特集「最適性理論の実験検証と実験音声学の理論整備」
One piece of —unjustified— criticism that has been
levelled at Optimality Theory is that in OT “anything
goes”, which seems a wrongful interpretation of the
fact that all constraints in OT are directed at the level of
the output, while inputs, or underlying representations,
are constraint-free (“Richness of the Base”). This position underestimates the extent to which all parts of OT
can and should be justified: markedness constraints
should be grounded from a phonetic point of view,
faithfulness constraints should be anchored in the basics
of word recognition, constraint hierarchies make
predictions as to typological recurrence of linguistic
patterns, and inputs should be constrained by important
principles such as Lexicon Optimization. In all these
aspects, previous theories, such as standard generative
phonology, do not even begin to compare with OT in
terms of accountability and restrictiveness. It is natural
to extend this accountability to candidate outputs (see
below).
Of course, the traditional position (Prince and
Smolensky 1993/2004) is that constraints are universal.
This does not entail at all that “constraints are part of
Universal Grammar and therefore don’t have to be
learned”, which in turn could again be (spuriously)
equated with “constraints are part of Universal Grammar
and therefore don’t have to be motivated”. An increasing number of linguists have taken the more interesting
position and require that constraints are grounded (see
e.g. Archangeli and Pulleyblank 1994, Hayes, Kirchner
and Steriade 2004). In fact, the basic model of OT can
be derived from a simple model of communication
between a speaker and a listener (or communication in
other media, such as sign language): the speaker wishes
to put his message across with as little effort as possible, while listeners wish to retrieve the message with
the least amount of perceptual effort (Jurafsky et al.
2001, van de Weijer 2009a; see also Passy 1891, cited
in Boersma 1998). It makes sense that speakers (who
are listeners too) take into account the needs of listeners, so that grammar will typically mediate between the
two opposing needs and form a kind of compromise.
Since both speaking and listening are common to the
human species —neither ears nor mouths nor brains
differ much across the planet— constraints derived
from these basic activities must be universal too. In this
way, a theory which demands that its constraints are
grounded derives rather than stipulates universality. Let
us, in the next subsections, examine the subparts of OT
in more detail, with special attention for experimental
verification and grounding.
2.1 Markedness
Markedness constraints should, and mostly can, be
grounded in phonetics, taking into account both ease of
articulation and perception. All well known markedness constraints such as Identical Cluster constraints
(Pulleyblank 1997), and constraints relating to syllable
well-formedness (ONSET, NOCODA, *COMPLEX) can be
motivated in this way. Two factors deserve special
attention: first, the formulation of these constraints
depends crucially on the theories of segmental structure
and syllable structure adopted. In a theory such as “CVonly” (Lowenstamm 1996), for instance, in which a
word like priestt is represented as in (1), there is no need
for any of the three syllable structure constraints mentioned just above, because there are no complex clusters and no codas to begin with, and C-positions and
V-positions always come in pairs:
((1)) C V C V C V C V C V
|
|
|
|
|
|
|
|
|
|
x x x x x x x x x x
|
p
|
r
|
i
|
s
|
t
However, in such a theory, constraints or principles
of a similar nature, are necessary to explain why in
English the various empty positions in (1) are allowed
to persist, and why languages differ in which structures
are allowed and which structures are not allowed. There
is a large body of literature in Government Phonology
and related frameworks (see e.g. van der Hulst and
Ritter 1999 for discussion) which tries to do exactly
this. In the end, there may be no fundamental difference
between this exercise and the standard OT programme.
It remains to be seen if different other theories of
syllable structure (e.g. mora theory, Hayes 1989, or
X-bar syllable structure, Levin 1985) offer particular
advantages in this respect.
Second, it should be noted that feature frameworks
have a specific contribution to make where the formulation of segmental constraints is concerned. Consider,
for instance, the constraint *VOICE (cf. Kager (1999,
p.40): No obstruent must be voiced, or in terms of distinctive features: *[–son, +voice]. There is a large body
of research on the arity of the feature [voice], which
suggests that there is only one value ([voice]; see
Wetzels and Mascaró 2001 for discussion and an opposing view). This entails that the constraint *VOICE can be
simplified to *[–son, voice] (or *[son, voice], on the
view that sonorants and obstruents form two distinct,
equipollently opposed, classes). Crucially, there is no
room in the theory for a corresponding constraint *NonVoice (*[–son, –voice]), because there is no object
̶2̶
3weijer.indd 2
2010/06/14 10:37:03
Optimality Theory: Experimental Extensions
[–voice] that can be referred to in phonological constraints. In other words, the results obtained in unary
frameworks (e.g. Dependency Phonology, Anderson
and Ewen 1987) remain highly valid in OT and to the
extent that those results and research in OT point in the
same direction, provide unequivocal confirmation of
both approaches, both in terms of the contents of phonological representations and the content of constraints.
2.2 Faithfulness
Faithfulness constraints, such as DEP, MAX, IDENT
and LINEARITY penalize changes to inputs. Faithfulness
constraints can be motivated from a listener perspective. There is a direct connection to psycholinguistics
here, in terms of (speed of) word recognition, because
changes that have occurred in outputs makes them
harder to relate them to a corresponding input, that is,
roughly, it makes it harder “to recognize them”, all
things being equal. (In actual fact, things are hardly
ever going to be “equal”, because in context markedness constraints will affect the input candidate; in such
cases, the output will be matched with an input form
according to the grammar).
Note that an important assumption is made here, viz.
that there is one input form to which surface outputs
can be compared. What if there is no one abstract input
form, as in theories like Exemplar Theory (see e.g.
Bybee 2006 and many references cited there). In Exemplar Theory, (multiple) surface forms are stored directly
in some degree of phonetic detail. No abstraction takes
place but only categorization (see e.g. Dell 2000)1. In
that case, two approaches are possible: in the first,
faithfulness could still be evaluated by comparing an
output form to which a listener is exposed with any of
the exemplars present in the lexicon. This would probably diminish, but not obviate, the need for a faithfulness
component in grammar: of course some “recognition
mechanism”, involving a likeness evaluation metric,
akin to faithfulness, is needed. A second approach would
be to consider the possibility that some degree of abstraction among exemplars still takes place, so that there is
still some role to play for an abstract, generalized form
with special status, to which output forms may be compared — this abstract form may or may not correspond
to the traditional “underlying form” of generative grammar or the “input” in Optimality Theory. In recent work,
Sloos (2009; see references cited there), investigates
the latter approach for a variable set of data in Dutch
and finds it makes the correct predictions. In short, the
operation of faithfulness constraints is intimately related
to our theory of the lexicon, to which we turn next.
2.3 The lexicon
One aspect of OT that might be improved upon in
terms of psycholinguistic realism is its assumptions on
the lexicon. A lexicon on which no constraints hold (as
well as an infinite set of generated candidates which
need to be checked by the Evaluator) are often criticized for not being “psycholinguistically realistic” (see
Goldrick, to appear, for discussion of this notion). In
generative grammar, the lexicon consisted of a list of
words of the language, which specified only nonredundant information. All predictable alternations (and
other properties of the output) were supplied by rule.
That is, generative grammar is a so-called dualprocessing model. The question whether this is adequate
is still a matter of great controversy (see e.g. Pinker
1999, Plaut 2003 for discussion). The question is
whether OT is necessarily a dual-processing model like
its generative predecessors or whether it is in fact more
malleable and could contribute to a “compromise model”
which is both theoretically and psycholinguistically attractive (see Alderete and Frisch 2009 for recent discussion).
Two remarks on the nature of underlying representations should be made here: instead of solitary inputs,
why shouldn’t the lexicon consist of, as in Exemplar
Theory, the words (and phrases) that a speaker/listener
has been actually exposed to, with a rather good reflection of what these sound like, in which contexts they
are used, and in which words that have been encountered often are stored more robustly than other, less
frequently encountered words? In such a conception,
both phonological and semantic generalizations are
made between words, leading to a multidimensional
array. Grammar does still play a role: there is a selection mechanism to pick out one form, suited to its specific phonological environment and suitable to the
required speech style and other contextual factors,
while incoming forms must be matched to existing
exemplars (see above). In this approach, both production and perception are subject to a language-specific
hierarchy of constraints: both can be well expressed as
an OT grammar.
A second remark concerns the specification of forms
which alternate. Consider the example of the English
plural, which alternates between [], [] and []. Will a
plural morpheme be stored in the lexicon? In Exemplar
Theory, it will not be, which accords well with wordbased theories of the lexicon (e.g. Bybee 1988)2. If it is,
as in morpheme-based models and dual-processing
theories, there are two possibilities to such lexical
entries in a surface-oriented approach. Consider the
point of view of the learner: she is exposed to three
̶3̶
3weijer.indd 3
2010/06/14 10:37:03
特集「最適性理論の実験検証と実験音声学の理論整備」
different allomorphs with exactly the same meaning.
One possibility is that all three forms are stored, and be
applied to new forms in conformity with emergent
generalizations or the constraint grammar, the most
frequent form could be stored and adapted to novel
situations in analogical fashion. The second logical
possibility would be that the learner stores the “lowest
common denominator”, e.g. a “coronal fricative”
underspecified for voice. This possibility, which can be
shown to work (van de Weijer 2009b), would involve
underspecification, again showing the importance of
considering the role of theories of segmental structure
in OT and the emergent lexicon3.
Finally, we must allow for the role of frequency (cf.
also Diessel 2007 for a review of the role of frequency
in language acquisition, language use and diachronic
change). Exemplar Theory offers a direct translation of
the notions of token frequency, namely as the degree of
entrenchment of tokens in exemplar clouds, and of type
frequency, which is represented by the connections
between exemplars which have, for instance, the same
affix and are therefore related in meaning. These frequencies could be (partially and/or indirectly) related to
the weight of constraints in novel approaches in Optimality Theory, such as Harmonic Serialism (see e.g.
McCarthy 2008). It is also possible that these weights
are not related to constraints themselves, but are, rather,
a property of lexical entries themselves (see below).
2.4 The structure of the grammar
The architecture of OT, and in particular the notion
of “freedom of generation” in the component GEN,
does not lend itself well to psycholinguistics implementation. The generation of an infinite number of
candidates which must be evaluated, although perhaps
unproblematic from a computational linguistics point
of view (e.g. Bíro 2006), does not fit well with the idea
of real language production which takes place in real
time. In this respect, Harmonic Serialism (see again
McCarthy 2008 and references cited there), also presents an improvement since it dramatically decreases the
power of GEN: instead of an infinite number of generations, this component is only allowed to make one
change in a candidate; if this change improves markedness, further changes are possible but if there is no
markedness improvement, the derivation stops. As
before, markedness and faithfulness constraints play a
crucial role in evaluation. On the one hand, Harmonic
Serialism re-introduces the concept of derivation (and
rule ordering, albeit in a precisely limited and restrictive way) into OT, which many researchers have been
arguing ever since OT was first introduced. On the
other hand, it is also welcome because it permits a much
smaller role for grammar, more compatible with a realtime approach.
Thus, in this conception of grammar it is important
again to delve into segmental structure, so that it can be
defined what counts as a “single step” in derivation (see
also McCarthy 2009). Secondly, any constraint
grammar makes predictions in terms of the “factorial
typology”, in other words, the introduction of new
constraints must be motivated by exploring its
typological consequences. Also in this sense, OT
grammars are highly accountable.
3. Conclusion
As a surface-oriented theory, Optimality Theory
stands a better chance than some to match up with
psycholinguistically realistic models of language
production and perception. One area in which it could
accommodate to the latter is its theory of lexicon, which
might be formulated more realistically than is presently
usually the case. Rather than an unstructured list
without redundant information (as in earlier generative
grammar), the lexicon will be a rich repertoire, with a
wealth of redundant information on lexical entries and
manifold relations (phonological as well as semantic)
between them. It remains to be seen whether morphemes
are separate entries in such a lexicon. Lexical entries
may be strongly entrenched due to frequent use and
therefore easily retrieved and recognized. Lexical
entries may become forgotten or obscured. No two
individuals have the same lexical repertoire, paving the
way for interspeaker variation and language change.
Stylistic variation (or other forms of intraspeaker
variation), can be regarded as the operation of slightly
different grammars (or slightly differently weighted
constraints therein) within the same speaker. The fact
that lexicons are shared for a large percentage makes it
reasonable to speak of the “same language” for a group
of speakers. Smaller differences can be referred to as
“dialect differences”, noting that terms such as
“language” or “dialect” are essentially meaningless.
If we accept that there is not a single underlying
form for a given lexical entry, but a small (or large)
“exemplar cloud”, it is still the case that one of these
exemplars must be picked out for production on any
given occasion. Some exemplars within the cloud may
be more prominent and therefore stand a better chance
of being selected (and being selected more quickly):
this is the frequency effect which has been observed in
̶4̶
3weijer.indd 4
2010/06/14 10:37:03
Optimality Theory: Experimental Extensions
many ways. Still, the planned output will appear in a
particular phonological context, uttered in a particular
speech style, and be responsive to its context in many
other ways. That is, the selection of the output must
meet, in the best possible way, a number of constraints
at the same time. For this purpose, and the reverse procedure, i.e. that of matching an output form with an
already stored exemplar, OT remains eminently suited.
Acknowledgements
This paper was presented at the August 2009 workshop on Experimental Optimality Theory, Kobe
University, in the context of the project “Autonomy,
Harmony and Typology”. I would like to thank the
project leader, Prof. Shosuke Haraguchi (Meikai
University), for making this workshop possible, and the
presenters in this workshop, Andries Coetzee, Jongho
Jun and René Kager, for their stimulating presentations.
Thanks to Marjoleine Sloos for comments on a prefinal
version.
Notes
1)
A de facto diminished role of inputs is also seen in
Optimality approaches assigning a (smaller or larger) role
to Output-Output correspondence (Benua 1985, Burzio
2000, and others).
2) In a word-based approach, morphological boundaries
may still be assumed to exist in lexical entries. On this
view, Alignment constraints should be regarded as faithfulness constraints.
3) Note that the build-up of such a lexicon and the acquisition of the constraints that play a role in an emergent language must go hand in hand. See Kager (this volume) for
discussion.
References
Alderete, John and Stefan Frisch (2009) “Phonotactic Learning without A Priori Constraints: A Connectionist Analysis
of Arabic Cooccurrence Restrictions.” Ms. ROA-1055.
Anderson, John M. and Colin J. Ewen (1987) Principles of
Dependency Phonology. Cambridge: Cambridge University
Press.
Archangeli, Diana and Douglas Pulleyblank (1994) Grounded
Phonology. Cambridge, MA: MIT Press.
Benua, Laura (1997) Transderivational Identity: Phonological
Relations Between Words. Doctoral dissertation, University
of Massachusetts, Amherst.
Bíro, Tamás (2006) “Squeezing the Infinite into the Finite:
Handling the OT Candidate Set with Finite State
Technology”. In: Anssi Yli-Jyrä, Lauri Karttunen and Jahuni
Karhumäki (eds.) Finite-State Methods and Natural
Language Processingg (pp.21–31). Berlin: Springer.
Boersma, Paul (1998) Functional Phonology. Formalizing the
interactions between articulatory and perceptual drives.
The Hague: Holland Academic Graphics. [LOT International Series 11]. Doctoral thesis, University of Amsterdam.
Burzio, Luigi (2000) “Segmental Contrast meets Output-toOutput Faithfulness,” The Linguistic Review 17, 368–384.
Bybee, Joan L. (1988) “Morphology as Lexical Organization.”
In Michael Hammond and Michael Noonan (eds.) Theoretical Morphology. Approaches to Modern Linguistics
(pp.119–142). San Diego: Academic Press.
Bybee, Joan L. (2006) “From Usage to Grammar: The Mind’s
Response to Repetition,” Language 82, 711–733.
Coetzee, Andries W. (2008) “Grammaticality and Ungrammaticality in Phonology,” Language 84, 218–257.
Dell, Gary S. (2000) “Commentary: Counting, Connectionism, and Lexical Representation.” In Michael B. Broe and
Janet B. Pierrehumbert (eds.) Papers in Laboratory Phonology V: Acquisition and the Lexicon (pp.335–348). Cambridge: Cambridge University Press.
Diessel, Holger (2007) “Frequency Effects in Language
Acquisition, Language Use, and Diachronic Change,” New
Ideas in Psychology 25, 108–127.
Goldrick, Matthew (to appear) “Using Psychological Realism
to Advance Phonological Theory. Slightly revised version
to appear.” In John Goldsmith, Jason Riggle and Alan Yu
(eds.) Handbook of Phonological Theory (2nd edition).
Oxford: Blackwell. ROA-1039.
Hayes, Bruce (1989) “Compensatory Lengthening in Moraic
Phonology,” Linguistic Inquiry 20, 253–306.
Hayes, Bruce, Robert Kirchner and Donca Steriade (2004)
Phonetically-Based Phonology. Cambridge: Cambridge
University Press.
Hulst, Harry van der and Nancy A. Ritter (1999) The Syllable:
Views and Facts. Berlin: Mouton de Gruyter.
Jackendoff, Ray (2007) “A Whole Lot of Challenges for Linguistics,” Journal of English Linguistics 35, 253–262.
Jurafsky, Daniel, Alan Bell, Michelle Gregory and William D.
Raymond (2001) “Probabilistic Relations between Words:
Evidence from Reduction in Lexical Production.” In Joan
B. Bybee and Paul Hopper (eds.) Frequency and the Emergence of Linguistic Structure (pp.229–253). Amsterdam:
John Benjamins.
Kuhn, Thomas S. (1962) The Structure of Scientific Revolutions. Chicago: University of Chicago Press.
Levin, Juliette (1985) A Metrical Theory of Syllabicity. Doctoral dissertation, MIT, Cambridge (Mass.).
Lowenstamm, Jean (1996) “CV as the Only Syllable Type.” In
Jacques Durand and Bernard Laks (eds.) Current Trends in
Phonology-Models and Methods (pp.419–442). University
of Salford: European Studies Research Institute.
McCarthy, John J. (2008) “The Gradual Path to Cluster Simplification,” Phonology 25, 271–319.
̶5̶
3weijer.indd 5
2010/06/14 10:37:04
特集「最適性理論の実験検証と実験音声学の理論整備」
McCarthy, John J. (2009) “Studying GEN,” Journal of the
Phonetic Society of Japan 13. ROA-1049
Oostendorp, Marc van and Jeroen van de Weijer (2005) “Phonological Alphabets and the Structure of the Segment.” In
Marc van Oostendorp and Jeroen van de Weijer (eds.) The
Internal Organization of Phonological Segments (pp.1–23).
Berlin: Mouton de Gruyter.
Passy, Paul (1891) “Etude sur les changements phonétiques et
leurs caractères généraux.” Paris: Librairie Firmin-Didot.
Pinker, Steven (1999) Words and Rules: The Ingredients of
Language. New York: Basic Books.
Plaut, David C. (2003) “Connectionist Modeling of Language:
Examples and Implications.” In Marie T. Banich and Molly
Ann Mack (eds.) Mind, Brain, and Language: Multidisciplinary Perspectives (pp.143–167). Mahwah, NJ: Erlbaum.
Prince, Alan, and Paul Smolensky (1993/2004) Optimality
Theory: Constraint Interaction in Generative Grammar.
Technical Report TR-2, Rutgers Center for Cognitive Science, Rutgers University, New Brunswick, NJ. [Reproduced
by Blackwell, New York in 2004.]
Pulleyblank, Douglas (1997) “Optimality Theory and Features.” In Diana Archangeli and Terry Langendoen (eds.)
Optimality Theory: An Overview (pp.59–101). Cambridge:
Blackwell.
Sloos, Marjoleine (2009) Frequency effects are sensitive to
phonological grammar. The interaction of resyllabification
and pretonic schwa deletion as a frequency effect in Dutch.
Research MA thesis, Leiden University.
Weijer, Jeroen van de (2009a) “Optimality Theory and Exemplar Theory,” Phonological Studies 12, 117–124.
Weijer, Jeroen van de (2009b) Cats and Dogs Revisited for the
Twelfth Time: An Optimality Analysis of the Plural and
Ordinal Suffix in English. Ms.
Wetzels, Leo and Joan Mascaró (2001) “The Typology of
Voicing and Devoicing,” Language 77, 207–244.
(Received Nov. 14, 2009, Accepted May 12, 2010)
̶6̶
3weijer.indd 6
2010/06/14 10:37:04