Bridge verbs and V2 verbs - the same thing in spades? Sam

1
Bridge verbs and V2 verbs the same thing in spades?
Sam Featherston, Universität Tübingen
Sam Featherston
Universität Tübingen
SFB 441 Linguistische Datenstrukturen
Nauklerstrasse 35
D-72074 Tübingen
e-mail: [email protected]
Tel.: +49 7071 2977152
Fax: +49 7071 295830
2
Bridge verbs and V2 verbs - the same thing in spades?
Sam Featherston, Universität Tübingen
In this paper we investigate the quality of ‘bridgeness’, the extent to which a verb
permits extraction from its complement clause. In spite of considerable study,
linguists have only a limited idea of the nature of the factor involved. We address
the question whether the quality of being a bridge verb is the same as the quality
which permits a verb to take a verb-second complement clause in German. We
report the results of two experiments using the technique of magnitude estimation,
a method of eliciting grammaticality judgements with a high degree of
differentiation. The data reveals that the ‘bridge’ quality and the ‘V2’ quality covary very closely, suggesting that they may well be facets of the same factor. We
discuss the implications this finding has for the nature of the bridge quality.
1 Introduction
The cross-linguistic tendency of languages to permit constituents to occur phonologically
elsewhere in the sound string from where they would normally be interpreted is one of the most
remarkable features of the grammar. This dissociation of phonological and syntactic form has
been analysed as syntactic movement within the generative tradition of syntax. However, in
spite of four decades of intense study and undoubted progress, the nature of this displacement is
still controversial and many outstanding questions remain, for example about the motor of
movement processes and the obligatoriness of its application. The phenomenon of bridge verbs
provides a case in point: while possible extractions are standardly captured in lexis-independent
syntactic configurations, in practice only a subset of verbs seem to allow extraction of
constituents from their complement clauses, demonstrating that movement restrictions are not
merely syntactic but must also be related to lexical factors. We illustrate this in (1), where the
verb meinen ‘think’ allows the extraction, hoffen ‘hope’ seems less natural, and bezweifeln
‘doubt’ feels inappropriate in the structure.
(1)
Wen meint /?hofft /*bezweifelt Hans, hat der Lehrer ausgeschimpft?
who thinks /hopes /doubts
Hans has the teacher scolded
'
Who does Hans think/hope/doubt the teacher has scolded?'
In the light of this, the nature of these ‘bridge’ verbs has received considerable interest, linguists
generally adopting the tactic of attempting to identify first what characteristics, semantic or
syntactic, make a bridge verb, with the hope that this information will enable further conclusions
This work was supported by the Deutsche Forschungsgemeinschaft within SFB441 Linguistische Datenstrukturen.
Many thanks to Wolfgang Sternefeld, project leader, Tanja Kiziak, and many other members of the SFB. All
remaining errors are my own.
3
to be drawn. A question which has been raised in this regard with reference to German data is
whether the verbs permitting extraction are exactly the same group as those allowing a Verb
Second (V2) complementizerless complement clause as in (2). This too is a feature which
appears to adhere to certain predicates and not to others. Note that we shall refer in the text to
verbs which permit a complementizerless V2 complement as V2 verbs, for brevity.
(2)
Max hofft, er hat die Geschworenen überzeugt.
Max hopes he has the jury
convinced
'
Max hopes he has convinced the jury.'
In this study we employed the magnitude estimation methodology (Bard, Robertson & Sorace
1996, Cowart 1997, Keller et al 1998) to carefully controlled sets of materials. On the firm basis
of this empirically obtained data, we are able to evaluate hypotheses relevant to this structure
which have been advanced in the literature. We interpret the results as showing that the bridge
verb feature and the V2 verb feature are at least closely related, but that the bridge quality is a
continuum which interacts cumulatively with a range of other factors. It turns out that the
standard structure assumed to be the test of the bridge quality, extraction from a verb-final
complement, is in fact less useful as a measure of the ‘bridge’ quality, than is extraction from a
V2 complement clause, since the latter shows greater amplitude, and thus clarity, of variation.
We argue that judgements which have been advanced to support other conceptions of the bridge
feature have been misleading, since they have used a binary grammaticality model, which is
inappropriate for this data.1
2 The nature of bridge verbs
The most common theme in discussion about bridge verbs is the question what decides whether a
verb will join the class or not (for some early work see Erteschik-Shir 1973). A fundamental
insight is that the meaning components of the verb play a role. The various authors treat these
rather differently, but these differences are for our purposes fairly unimportant. For example, it
seems to be generally agreed that verbs with a more factive interpretation permit extraction more
readily (Kiparsky & Kiparsky 1970), so that predicates of stating and thinking are usually better
than verbs of asking and doubting. A number of linguists have attempted to identify the relevant
semantic features more precisely. For example, Cattell (1978) puts forward a relevant division
between verbs which presuppose that the proposition within their complement clause is part of
common belief, and those which do not. He then distinguishes within this second group between
volunteered stance predicate types (eg claim, assert, say, allege) and response stance predicate
types (eg deny, admit, confirm, agree). Cattell argues that it is the volunteered stance verbs
which permit bridge effects. In his examples this is measured by the reference of an initial why
to an embedded clause, as in Why do they claim/deny that she killed him? He judges those
predicates to have the bridge feature which readily permit the reading where the why is
understood as applying to the complement clause, ie why did she kill him? not why do they
claim/deny it?
In their discussion of island effects, Erteschik-Shir & Lappin (1979) analyse the relevant
factor as dominance, which is essentially a sort of discourse focus: ‘A constituent is dominant
1
Much heat and little light is generated by discussions of precisely what is meant by grammaticality. We use the term here to
refer to the construct measured in grammaticality judgements, after performance factors have been discounted to the extent that
they can be recognized as such. We intend no theoretical claim about the boundaries of syntactic competence, but if we find
effects related to structure and independent of lexis, for which there is no readily available processing, production, plausibility or
other performance explanation, then we shall refer to this as related to grammaticality, using the term in the sense of syntactic
well-formedness. For a discussion of the nature of syntactic well-formedness see Featherston (2004).
4
[...] when a speaker intends to draw attention to its semantic content.’ (45). They admit that this
term is not clearly distinguishable from topic in the topic - comment pair. On this analysis,
bridge verbs are those which contribute least ‘semantic content’ and ‘weight’ (Erteschik-Shir &
Lappin 1979: 62-3). The use of a verb with more semantic weight in a matrix clause must be
assumed to result from the speakers'intention to draw attention to its content.
Kluender (1992) argues in a rather parallel manner, but discusses how the referential
specificity of the complementizer, the subject, the extracted element, and the potential bridge
verb may cumulatively be the causal factor of island effects. In the case of matrix verbs
‘referential specificity’ simply means more detailed and thus more specific semantic content. He
argues that it is the additional processing load of a more semantically complex verb which makes
extractions involving them seem more ‘difficult’ , and thus of dubious acceptability. All of these
accounts can be seen as differing only in perspective, and identifying essentially the same factor,
that of semantic weight. However, not all approaches stress the effect of meaning components:
for example, Müller & Sternefeld (1995) assume that a complement CP has an NP shell, and the
bridge quality is a reflexion of the ease with which this NP is incorporated into the verb.
The most obvious competitor to these would be frequency: if a verb is sufficiently
frequent it becomes more transparent to movement. Consideration of some good and bad bridge
verbs makes it seem plausible that this frequency plays a role. In order to investigate this
possibility we performed some frequency counts of the eight verbs that we tested in our
experiments reported in this paper below. In table 1 we give their spoken and written lemma
frequencies (CELEX, MPI Nijmegen) and infinitive form frequencies (COSMAS, IDS
Mannheim).2 From these we calculate a mean log frequency score, which is intended to abstract
away from the vagaries of individual corpus contents. We first convert each of the four
frequency figures into logarithms, as is standard for frequency measures, and then sum the four
of them, and divide by four. We compare these mean log frequencies with the ‘bridgeness’ score
which we derive from the grammaticality judgements we elicited in our experiments reported
below. These bridge feature scores are a measure of how good these verbs were judged to be as
bridge verbs; technically the figures represent the mean normalized judgements of the extraction
conditions in experiments. The origin and significance of these figures will become clear when
we lay out our studies below, but readers should accept on trust for the moment that these scores
constitute an empirically obtained relative measure of the extent to which speakers judge that
these verbs permit extractions from their clausal complements, a higher score indicating greater
‘bridgeness’ .
Verb
sagen ‘say’
CELEX
spoken
3545
CELEX COSMAS COSMAS Mean log Bridgeness
score
written
spoken
written frequency
8614
3637
93345
4.0039
-0.002
glauben ‘believe’
1076
1728
194
28877
3.2544
-0.155
behaupten ‘claim’
64
702
40
15126
2.6086
-0.134
hoffen ‘hope’
99
569
64
26255
2.7440
-0.235
fürchten ‘fear’
35
311
14
11954
2.3151
-0.395
115
731
492
17796
2.9667
-0.446
erzählen ‘tell (story)’
2
The CELEX statistics come from the CELEX lexical database, release 2, Max Planck Institut für Psycholinguistik, Nijmegen,
of 6 million words of German. The COSMAS data is the results of searches in the spoken and written public archives in
COSMAS I, Institut für deutsche Sprache, Mannheim, for the infinitive forms of these verbs. At the time of these searches, this
corpus contained 541 million accessible word-forms.
5
erklären ‘explain’
106
1680
88
27946
2.9104
-0.619
bezweifeln ‘doubt’
11
77
4
2241
1.7201
-0.764
Table 1: Frequency as a determinant of ‘bridge’ quality. The Mean Log Frequency is derived
from the four raw frequency counts, the Bridgeness Score is from the experiments we report
later in the paper.
Inspection of the frequency scores reveals a fair degree of consistency between the four
frequency measures. Pearson'
s coefficients confirm strong correlations between them (all
pairwise coefficients =>0.95). The mean frequency figures and bridgeness figures show a much
weaker but nevertheless still significant correlation (coefficient 0.725). Although this data set
confirms that there is a degree of correlation, it would appear that frequency is not the only factor
affecting transparency to extraction. It is striking that the monosyllabic stems of hoffen and
fürchten seem to form better bridge verbs than the bisyllabic stems of erzählen and erklären, in
spite of the higher frequency of these latter. However, differences may come to from other
sources; frequency scores are only ever a reflexion of frequency in the sample of texts that make
up the corpus base, and these are inevitably subject to bias. For example, it seems implausible
that erzählen should be eight times as frequent in the spoken language as hoffen (as it is in the
COSMAS spoken corpus). The good correlation among the four frequency scores may reflect
the fact that their constituent text bases are largely composed of the same ‘public’ language style.
The difference between the frequencies and our experimentally obtained judgements of
‘bridgeness’ may represent the distance between this public variety of the language and the
‘informal’ style far more frequent in everyday use that our experimental subjects assume as their
stylistically neutral language. To summarize, we may say that this data demonstrates that
frequency correlates quite closely with the bridge quality, but it does not and cannot demonstrate
that frequency is the determining factor and indeed it would tend to suggest that other factors too
are relevant.
One possibility is that frequency is not itself a causal factor of bridgeness at all, but
merely relates to the parameter of semantic weight, the common denominator of the approaches
of Cattell (1978), Erteschik-Shir & Lappin (1979) and Kluender (1992), which we mentioned
above. Now it seems plausible that semantically light verbs are better bridge verbs, but this fact
is of little use to us in constructing a linguistic account, since semantic weight is not a precise
term and is not obviously quantifiable. Additionally, the causal relationship between frequency
and semantic weight is unclear: Is a verb frequent because it is semantically light and therefore
fits many circumstances, or does frequency of use erode semantic content and phonological
form? Are both of these true? In the light of this, it will be clear that only a very woolly and
untestable account of the bridge quality can be built on this basis, but in spite of these problems,
it does seem probable that semantic weight is relevant. One possibility might be that
semantically light verbs permit themselves to be more easily semantically compounded with
their embedded verbs, thus effectively reducing the extraction distance to a single proposition.
This has a degree of similarity with the incorporation account of Müller & Sternefeld (1995)
mentioned above. On this semantic lightness account we would therefore derive the difference
in acceptability between (3a-c) from the difference in feasibility of the compound predicates in
(4a-c). The more plausible a compound predicate is, the easier it is to extract over it, since this
requires the construction of the complex predicate.
(3)
a.
Who does John think Mary likes?
b. ?Who does John believe Mary likes?
c. ??Who does John doubt Mary likes?
6
(4)
a.
b.
c.
THINKS.LIKES(Thomas, Mary, John)
THINKS.TRUE.LIKES(Thomas, Mary John)
THINKS.NOT.TRUE.LIKES(Thomas, Mary, John)
The difficulties of any such a attempted explanation are obvious, and we shall take this approach
no further here, noting only that this phenomenon has many similarities with that of ‘coherence’
in clauses (Bech 1955). Both properties are scalar rather than categorial and most obviously
determined by lexical properties of the matrix verb, but they are also sensitive to semantic effects
such as negation (5a) as well as surface effects such as heaviness (5b).
(5)
a.
b.
?Who doesn'
t John think Mary likes?
?Who does John think on the balance of the evidence Mary likes?
It is thus apparent that the bridge quality is not something which is going to be pinned down
easily, since it seems to straddle the boundaries between syntax, semantics and surface factors
and depend on a wide range of variables. An example of the last: good bridge verbs tend crosslinguistically to be short, but whether this is really a causal factor or whether it is simply a
correlate with frequency (Zipf'
s Law) or semantic simplicity seems hard to tease apart. All of the
above accounts of this phenomenon share this difficulty to some degree: they are all hypotheses,
and in order to move the debate on we need to produce evidence which would decide between
them. It is however not immediately obvious how this might be achieved, especially as it seems
that these factors co-vary to a considerable extent.
What can however be tested is how the syntactic behaviour of bridge verbs relates to
other syntactic behaviour, and this is our aim here. The particular question we shall address
relates to German. In this language there are two finite complement clause types, those with and
without complementizers. While these correspond in many ways to complement clauses in
English with and without complementizers, there is one major difference: while a clause with a
complementizer has a clause-final verb (6a), a complementizerless clause has an initial topic
position followed by the finite verb just like a matrix clause (6b).
(6)
a.
b.
Max hofft, dass er die Geschworenen überzeugt hat.
Max hopes that he the jury
convinced has
'
Max hopes that he has convinced the jury.'
Max hofft, er hat die Geschworenen überzeugt.
Max hopes he has the jury
convinced
'
Max hopes he has convinced the jury.'
It is clear that there is a significant overlap between bridge verbs and those verbs which permit
verb-second complementizerless complement clauses. It is controversial, however, whether
these verbs are indeed identical or whether they merely overlap. For example, Grewendorf
(1988) suggests that they are indeed the same set of verbs, and refers to Haider (1984). Reis
(1994) on the other hand argues against this proposal pointing out a number of differences, and
Müller & Sternefeld (1995) come to the same conclusion. A finding that the bridge feature and
the V2 feature were the same would be of theoretical significance, since it would provide a real
clue to the nature of the bridge feature.
It is also controversial what structure is the criterial condition for being a bridge verb in
German. While the general view is that the grammaticality of extraction from a V-final
complement is the most appropriate test, our own view is that extraction from V2 clauses is no
less and in some ways more suitable. In our experiments reported below we attempted to obtain
replicable data which would throw light on the relationship between the ability of a verb to take a
7
V2 complement clause, and the extent to which it licenses extraction from verb-final and V2
complement clauses.
3 Experiment 1
3.1 Design
Our first aim was to determine what grammaticality patterns the different complement clause
types and extractions would produce, and whether these would co-vary. If the bridge verb and
V2 complement verb characteristics are linked, we should expect to see them behaving in a
parallel fashion across verbs. If not, we should expect to see some contrasts. For each of eight
verbs (see below), we tested the same syntactic conditions: there will be hierarchies of
acceptability for each. If the pattern of response across verbs gives a parallel signature in both
the extraction and V2 complement conditions (ie verbs are either good, bad or marginal in each,
but not good in one and poor in the other) then we have strong evidence that the qualities making
for a good bridge verb and for a good V2 verb are related. However, in order to make more
meaningful statements about the relationship, we must also be able to distinguish between effects
of extraction and effects of complement clause type. We therefore tested in six syntactic
conditions: subject and object extraction from V2 and V-final clauses, and Yes/No questions
with V2 and V-final clauses.
(7)
a.
b.
c.
d.
e.
f.
Wer glaubst du, hat den Schüler ausgeschimpft?
who think you has the pupil scolded
'
Who do you think scolded the pupil?'
Wer glaubst du, dass den Schüler ausgeschimpft hat?
who think you that the pupil scolded
has
Wen glaubst du, hat der Lehrer ausgeschimpft?
whom think you has the teacher scolded
'
Whom do you think the teacher scolded?'
Wen glaubst du, dass der Lehrer ausgeschimpft hat?
who think you that the teacher scolded
has
Glaubst du, der Lehrer hat den Schüler ausgeschimpft?
think
you the teacher has the pupil scolded
'
Do you think the teacher scolded the pupil?'
Glaubst du, dass der Lehrer den Schüler ausgeschimpft hat?
think you that the teacher the pupil scolded
has
The contrast of (7ab) with (7cd) allows us to control for asymmetries in the extraction of subjects
and objects; and the contrast (7ac) with (7bd) allows us to identify the effect of clause type. We
contrast these with interrogatives (7ef) so that our conditions without extraction remain
maximally similar to the extraction conditions.
We tested eight verbs, on a continuum of bridgeness and in semantically related pairs, to
the extent that this is possible: sagen ‘say’ und behaupten ‘claim’ , erklären ‘explain, declare’
und erzählen ‘tell (a story), relate’ , hoffen ‘hope’ and fürchten ‘fear’ , and glauben ‘think,
believe’ and bezweifeln ‘doubt’ . The contrast of sagen and behaupten might highlight effects of
frequency and length, while the pairs hoffen/fürchten, and glauben/bezweifeln might reveal the
effect of negativity and, in the second case, factivity. The pair erzählen/erklären were included
as closely matched ‘manner of saying’ verbs, which seemed to be poor but not impossible bridge
verbs. Let us be clear that no claims of strong adequacy are intended for these matchings, and it
was the continuum of bridgeness which was our primary concern.
8
3.2 Eliciting judgements
The magnitude estimation methodology (Bard, Robertson & Sorace 1996, Cowart 1997, Keller
2000) allows us to obtain maximally differentiated judgements from a group of informants and
compare them meaningfully. It derives from a methodology used to grade physical sensations
such as heat and brightness and developed from there for use in attitude and opinion
measurement (Stevens 1975). It varies from standard elicitation of grammaticality judgements in
several ways. The first innovation is that subjects are asked to provide purely relative
judgements: at no point is an absolute criterion of grammaticality applied. Also, all judgements
are proportional; ie subjects are asked to state how much better or worse sentence A is than
sentence B. This process is started by the presentation of a reference item, relative to which all
subsequent judgements are made. This reference item remains visible throughout the
experiment, and acts as a sort of anchor for the scale of judgements. Subjects themselves fix the
value of the reference item, so that they can use a set of numbers that they feel comfortable with.
Furthermore, the scale along which judgements are made is open-ended: subjects can always
introduce a score better or worse than all those already used. Lastly, the scale has no minimum
division: subjects can always produce an additional intermediate rating. The result is that
subjects are able to produce judgements which distinguish the all the differences they perceive
with minimal interference from the scale.
A major strength of this approach is that the results are numerical and form an interval
scale, so that standard statistical tests can be used to analyze the results. Since the design of this
experiment is within-subjects, that is, all informants see all conditions and we expect each person
to make the differences in judgements that we are interested in, we utilize repeated measures
analyses of variance as our main statistical test (Clark 1973).
3.3 Materials and procedure
Eighteen basic texts were constructed from which the syntactic conditions could be derived by
embedding them in a matrix clause with one of our eight matrix verbs and extracting the subject
or object as necessary (eg der Lehrer hat den Schüler ausgeschimpft ‘the teacher has told off the
pupil’ ). The lexis in these experimental materials was strictly controlled for length (range 6-9
letters, subjects'mean 7.4 letters, objects'mean 7.6 letters) and lemma frequency (range 1-120
per million, subjects mean 44.2, objects mean 40.9).3 All arguments were preceded by definite
articles. The verbs in the embedded clauses were all chosen as having a negative connotation
(eg beleidigen ‘insult’ , verachten ‘despise’ , bestechen ‘bribe’ ) so as to prevent semantic
mismatch with the matrix verb fürchten ‘fear’ . Otherwise, sentences such as Who do you fear
that the doctor has cured? might well be scored badly because of their implausibility. The
matrix subject was always du ‘you’ , as this seems most natural in questions. The homogeneity
and plausibility of the materials were informally pre-tested in order to minimize variation
between items.
Eighteen versions of these experimental materials were constructed such that each subject
saw each combination of syntactic condition and matrix verb once, and each item appeared
equally often. Subjects also saw twelve filler items, so that the total number of sentences each
subject saw was 60. Subjects were recruited by flier in Tübingen university and by newsgroup
listing. A financial incentive to take part was offered. Twenty-five subjects took part, but two
subjects' data was excluded from further analysis because of doubts about its quality.4
3
All frequencies were derived from the CELEX lexical database (release 2) of the Max Planck Institut, Nijmegen).
4
In both cases, the subjects produced data patterns which were precisely the inverse of that of the other subjects. A colleague
9
Participants were asked to supply their names, ages (mean age 30.4, range 20-52), sex (11
females, 12 males), occupations (all except one students or graduates) and dialect backgrounds
(10 from the southern areas of Baden-Württemberg and Bavaria, 8 from the centre, 2 from the
north, and 3 claiming to speak only standard German).
Subjects logged themselves on to the experimental website and participated in the
experiment remotely, as the experiment was made available on the web using the WebExp
experimental software package (Keller et al 1998, see http://www.language-experiments.org).
The experiment proceeded as follows: first subjects read a page of instructions outlining their
task. The criterion they were to judge by was defined as whether the sentences ‘sound natural’ .
They next filled in a personal details sheet. The first practice phase was designed to familiarize
them with magnitude estimation; they were instructed to assign numeric values to line lengths
relative to a reference line. This was followed by a second practice phase which extended the
use of magnitude estimation to judging sentence acceptability. Only after this did the experiment
itself begin.
3.4 Results
For graphical presentation the data was first normalized by subtracting the subject'
s mean from
each score and then dividing it by the subject'
s standard deviation (conversion to z-scores). This
effectively unifies the different scales that the individual subjects adopted for themselves, and
allows us to inspect the results visually. For perspicuity we present the results in two stages.
Figure 1 presents just the extraction conditions (subject vs object extraction, V2 vs verb-final
clause type). The vertical axis shows the grammaticality judgements assigned by our subjects:
higher scores indicate greater perceived naturalness, but note that these scores are purely relative:
there is no point which indicates absolute (un)grammaticality nor any boundary between
grammaticality and ungrammaticality. Along the horizontal axis, the structures are grouped by
matrix verb, which are ordered by their ‘bridgeness’ in these extraction conditions. The error
bars show the mean normalized grammaticality judgement scores and their 95% confidence
intervals.
pointed out to us that he had forgotten during a trial of our experiment whether better sentences should be given higher or lower
scores, no doubt under the influence of the standard German educational marking system, which runs from 1 (very good) to 6
(very bad). We hypothesize that these two subjects made the same mistake and exclude their data, though this does not
materially alter our results. Subjects in subsequent experiments see an on-screen reminder that higher scores should be used for
better sentences.
10
Figure 1: Subject and object extractions from V2 and V-final complement clauses, by matrix
verb (sagen ‘say’ , glauben ‘think, believe’ , behaupten ’ claim’ , hoffen ‘hope’ , fürchten ‘fear’ ,
erzählen ‘tell (a story)’ , erklären ‘explain, declare’ , bezweifeln ‘doubt’ ).
The first point to note: the subject and object extractions from V2 clauses are identical: subjects
plainly do not distinguish between them.5 The equivalent extractions from V-final clauses on the
other hand, show a subject-object asymmetry: here object extraction is clearly better than subject
extraction. We analyzed these results by subjects and by items using a repeated measures anova
procedure. The main effects for clause type (ie V2 clause, V-final clause) (F1=358.4 p<0.001,
F2=77.66 p<0.001), and extraction type (ie subject extraction, object extraction) (F1=24.24
p<0.001, F2=24.45 p<0.001), and the interaction of clause type and extraction type (F1=39.38
p<0.001, F2=12.91 p=0.002) are all statistically robust.6 This interaction is the classic that-trace
effect, whose existence in German is often doubted (see Featherston (to appear) for discussion).
It is clear that extraction from V2 clauses is generally better than from V-final clauses, but that
the preference for object extraction is consistent across verbs (no significant interaction of verb
and extraction type, nor of verb, clause type and extraction type (both p>0.5). In fact it rather
looks as if the difference between English and German is that extraction from clauses with
complementizers is more acceptable in English. That is, there is much less difference between
(8a) and (8b) than between their German equivalents (7cd) above.
(8)
a.
b.
Who do you think the teacher told off?
Who do you think that the teacher told off?
Thus the dividing line between the grammatical and ungrammatical is drawn between the subject
and object extractions from clauses with complementizers in English (hence the that-trace effect)
but most commonly between the extractions from V2 and the extractions from V-final clauses in
5
In fact this close correspondence makes this result interesting in itself from a methodological perspective, because it gives
some idea of the sensitivity of the methodology and our implementation of it.
6
It will be noted that the tests for the by items analyses produce weaker scores than the by subjects analyses. This may be
because some empty cells in the data set of the by items analysis had to be filled with zeros, since the repeated measures
procedure requires all cells in the design x lexicalizations grid to have values.
11
German (eg Grewendorf 1988).
Both pairs of extraction conditions decline in acceptability as the matrix verbs become less
core members of the class of bridge verbs (effect of verb type: F1=22.56 p<0.001, F2=10.71
p<0.001), but the V2 extractions worsen much more obviously than the extractions from V-final
clauses (interaction of verb and clause type F1=9.79 p<0.001, F2=4.12 p=0.002). There appears
from figure 1 to be a less consistent downward trend in the V-final clause extractions than in the
V2 extractions, but this is merely apparent, the result of our ordering the matrix verbs on the
basis of their V2 extraction scores, on the basis that this structure reveals ‘bridgeness’ most
clearly. The key point here is that the fall in well-formedness in the V-final extractions
corresponds very closely to that in the V2 extractions, the smaller visual amplitude being a result
of the lower starting point in grammaticality. These V-final extractions are standardly regarded
as ungrammatical in most varieties of German and it is unsurprising that we find a reduction in
elasticity of judgements among such highly marginal structures which are closer to the lower
margins of perceived ungrammaticality.
This becomes rather clearer when we look at figure 2. Here we have merged the subject
and object extractions in each clause type, and included the data from the polarity questions.
This makes it much clearer that the two extraction types are decreasing in grammaticality across
the verbs in a very similar manner, with only the amplitude in the fall distinguishing them. The
behaviour of the polarity question conditions, by their contrast, makes it clear how very similar
the extractions conditions are.
Figure 2: Experiment 1 results - the contrast of the extraction and non-extraction conditions in
both V2 and V-final clause types.
While the extraction conditions reflect the hierarchy of bridge quality among our matrix verbs
that one might have expected, the polarity questions behave very differently, and this requires
some explanation. On four verbs, behaupten, glauben, hoffen and fürchten, the scores are fairly
evenly better than any of the extraction conditions. This is no surprise, since any biclausal
extraction may be expected to have some cost in naturalness. On sagen and erklären, both
polarity questions dip quite clearly, and on erzählen and bezweifeln the V2 question dips more
strongly than the V-final question. This pattern of data bears no resemblance to the extraction
conditions and is clearly not caused by just the bridge characteristic. Now the cause of the
variation in the polarity question conditions cannot be established beyond doubt, but it seems
12
likely that two factors are interacting. Firstly the V2 polarity questions are influenced by the
ability of the matrix verb to license a V2 complement. This probably explains why the V2
questions are sometimes worse than the V-final questions. But the major part is probably due to
pragmatic factors. Recall that the matrix verbs all have second person subjects; the polarity
questions are therefore questions to a speaker whether he or she is carrying out the speech act of
the matrix verb. It is pragmatically fairly natural to ask a person if they think, claim, hope, doubt
or fear something, since this is not directly observable, but rather less so to ask if they say,
explain, or tell something. We suspect this factor is playing a role in producing these naturalness
judgements. While there are complement choice effects in this result, they are badly obscured by
what are most likely pragmatic effects. It is thus difficult to make any strong statement on the
relationship of the bridge feature and the V2 feature on the basis of this data. In order to clarify
this issue, we carried out a second study in which we sought to exclude the pragmatic oddness
clouding our first set of results.
4 Experiment 2
4.1 Design
In this follow-up experiment we tested the same verbs but examined only four syntactic
conditions. Since the data from extraction conditions was so clear in the first experiment and
anyway the contrast between subject and object extractions was not crucial to our argument, we
included only object extractions from V2 (9a) and V-final clauses (9b), and compared them with
3rd person singular declaratives with V2 (9c) and V-final complement clauses (9d).7
(9)
a.
b.
c.
d.
Wen sagt er, hat der Lehrer ausgeschimpft?
who says he has the teacher scolded
’Who does he say the teacher has scolded?’
Wen sagt er, dass der Lehrer ausgeschimpft hat?
who says he that the teacher scolded
has
Er sagt, der Lehrer hat den Schüler ausgeschimpft.
he says the teacher has the pupil scolded
'
He says the teacher has scolded the pupil.'
Er sagt, dass der Lehrer den Schüler ausgeschimpft hat.
he says that the teacher the pupil scolded
has
There were thus three changes from the materials in experiment 1. First, the non-extraction
conditions were declaratives instead of interrogatives. Second, the matrix subject was third
person, not second person. These two changes were made in order to avoid the pragmatic
oddness which clouded the previous results. One more change was made: the reference sentence
relative to which subjects initially make their judgements was different. In all other ways the
methodology and procedure were exactly as before.
Using a subset of the previous materials ten versions of the experimental materials were
prepared. Subjects saw 32 conditions and eight filler items. Thirty subjects were recruited by
flier in Tübingen university with the offer of a financial incentive. Two subjects'data was
excluded from further analysis because of doubts about its quality.8 Participants were asked to
7
Let us note that we did in fact include some subject extractions from V-final clauses in the filler items to this experiment, so
that the mean response of the subjects would change as little as possible.
8
There were two causes for concern in the first excluded subject’s data. First, the subject used a reference value of 100, but gave
answers exact to one place of decimals, eg 73.8, 94.6. This is considerably beyond the fineness of distinction that most people
feel they can detect. Second, one score was several standard deviations away from all the others, but did not resemble a simple
typing error because it too was given to one decimal place. Although the general pattern of scores given was consistent with that
13
supply their names, ages (mean age 25.0, range 20-36), sex (9 females, 19 males), occupations
(all except two students or graduates) and dialect backgrounds (14 from south Germany, BadenWürttemberg and Bavaria, 3 from the centre, 5 from the north, and 6 claiming to speak only
standard German).
4.2 Results
The results of experiment 2 are presented in figure 3. In the extraction conditions which
replicate experiment 1 we may note some minor variation: in particular erzählen seems
somewhat better than before relative to hoffen and fürchten. This difference is within the normal
tolerances for variation, as indicated by the confidence interval error bars, but it is interesting
that this second measurement of the hierarchy of bridgeness brings it more in line with the verb
frequency data we noted above. More worryingly, the verb sagen performs worse than any other
verb in the V-final extraction condition (Wen sagt er, dass der SUBJ geVERBt hat? ‘Who does
he say that the SUBJ has VERBed?’ ). This surprisingly low score seems too deviant to dismiss
as mere random variation; it is also consistently visible across subjects, and so it must be the
result of some experimental bias in experiment 2 absent from experiment 1. This limits the
range of possibilities, but the most likely explanation is that it is due to the similarity of this
sentence type to the reference item of this experiment (Wer sagst du, hat den Politiker
bestochen? ‘Who do you say has bribed the politician?’ ), with which subjects were instructed to
compare each experimental sentence. In fact the reference sentence was the subject V2
extraction form of one of the experimental items. It seems probable that the direct comparability
of the V-final extraction condition with sagen and the reference item caused subjects to judge
this condition more harshly. This account is supported by the relatively good scores subjects
gave to the V2 extraction condition with sagen, which may show a similar effect, but resulting in
a relative upgrading of this experimental condition. At any rate we may be certain that this is
some irrelevant effect, since it is plainly false that sagen is the worst bridge verb of our set, a fact
confirmed by experiment 1 and the other conditions with sagen here in experiment 2.
of other subjects, this subject’s data was therefore disregarded. The second excluded subject assigned all but two conditions the
same score as the reference sentence (and the two exceptions received the same). This suggests that the subject had poorly
understood the instructions, but such data is also problematic for the analysis in terms of multiples of standard deviations (zscores) because it causes the little differentiation the subject does make to be assigned excessive relative weight. These
exclusions do not change the results in any material way.
14
Figure 3: Experiment 2 - Object extractions and declaratives in both V2 and V-final clause types,
presented by matrix verb (sagen ‘say’ , glauben ‘think’ , behaupten ‘claim’ , hoffen ‘hope’ ,
fürchten ‘fear’ , erzählen tell (a story), erklären ‘explain, declare’ , bezweifeln ‘doubt’ ).
The results of this second experiment are otherwise very clear. The V-final declaratives show no
effect of the bridge quality of the matrix verb, and are scored consistently high across the top of
the graph. The lack of variation between verbs here supports our supposition that pragmatic
factors were affecting the scores of the Yes/No question conditions in experiment 1. Here, by
using third person subjects and declaratives instead of interrogatives we have been able to
remove the violent swings of acceptability of experiment 1, which would tend to confirm our
attribution of this variation to plausibility. The very even scores for V-final declaratives also
provide a very useful baseline for comparison, which allows us, for example, to exclude the
possibility that some of the variation in the other conditions is due to verb preference for a
clausal complement. Since this condition shows all of these verbs to be very natural with a
clausal complement, every variation from the declarative V-final score can be reliably attributed
to the extraction and V2 complement qualities that we are interested in.
The extractions from V2 start fairly high but decline quite rapidly along the continuum of
bridgeness. The V-final extractions, discounting sagen, start much lower and decline more
gently. Apart from the quirky behaviour of sagen, both extraction conditions confirm the results
from experiment 1, in that the extractions from V2 and the extractions from V-final correspond
closely, differing only in the perceived amplitude of variation. Even the slight tendency for
extraction over erzählen to be rather better than in experiment 1 is clearly visible in both
extraction types. We therefore see some confirmation of our hypothesis that the bridge quality is
affecting these two structures equally, and that the essence of the bridge quality is that of
permitting extraction from a complement clause, whether this is a V2 or a V-final clause. This is
supported by the statistical analysis: in a repeated measures anova, there are significant main
effects for Verb (ie sagen, glauben etc) and Structure (ie wh-extraction or declarative), and
significant interactions of Verb and Clause, Structure and Clause, and the three-way interaction
Verb, Structure and Clause (all p<0.005).9
9
Verb (F1=16.12, p<0.001; F2=15.9, p<0.001), Structure (F1=539.31, p<0.001; F2=313.55, p<0.001), Verb x Clause (F1=12.04,
p<0.001; F2=7.98, p<0.001), Structure x Clause (F1=113.04, p<0.001; F2=186.54, p<0.001), Verb x Structure x Clause (F1=4.77,
p<0.001; F2=3.84, p=0.004,
15
Let us now consider the correspondence of the V2 declaratives with the V2 extractions.
Again the match is very clear, and the contrast with the V-final structures could hardly be more
striking. Both V2 conditions are judged weaker as they descend the bridge continuum, in
remarkable parallel, while the V-final declaratives are invariably good and the V-final
extractions very poor. The generalization seems to be that the V2 declaratives are consistently a
little better than the V2 extractions, which comes as no surprise. Even the variations across the
verbs match very closely across the two V2 conditions. After sagen, the next pair of verbs
behaupten and glauben are about the same in both conditions, but there is a clear fall off down to
hoffen, which however is only a little better than fürchten again in both conditions. The scores of
erklären are a little better than fürchten in this second experiment, but there is a clear descent to
erzählen, and again more steeply to bezweifeln, in both these two V2 conditions. Even the
variation between this experiment and experiment 1 is consistent across these two conditions: in
each erzählen is judged a little better than before relative to fürchten. Only sagen and bezweifeln
spoil this otherwise exact correspondence. It is possible that the declarative V2 sagen is
suffering from the same experimental effect as the extraction from V-final sagen, since it too
seems to be worse than the equivalent behaupten score, which is counter-intuitive. The only
clear breach of the pattern is on bezweifeln, where the V2 declarative is actually worse than the
V2 extraction, but even here their relative positions are well within the normal range of variation.
This close correspondence is reflected too in the statistical analysis: the Pearson’s correlation
coefficient of the V2 declaratives and V2 extractions is 0.209, which is significant (p=0.002).
The only other significant correlation between the structural conditions in this data was a
negative correlation between the V2 declaratives and the V-final declaratives of -0.190
(p=0.004). This testifies to the sensitivity of V2 declaratives to the bridge factor, contrasted with
the insensitivity of the V-final declaratives to it. All in all therefore, this data provides strong
support to the thesis that the lexical feature permitting a V2 complement and the feature
permitting extraction are the same. We see no reason to distinguish between them in this data.
5 Weighing up the evidence
In figure 4 we combine the extraction and declarative data from the two experiments, which is
feasible since both sets are normalized. This allows us to take a look at the broader picture and
reduces the degree of random variation in those conditions which were tested in both
experiments, since in these conditions each plotted data point is the mean of fifty-one subjects’
scores. The overall picture is clear: we see three groups of structures. The declarative V-finals
cross the top of the graph, showing no effect of the verb. The extractions from V-final cross the
bottom of the graph, the subject extractions somewhat worse than the object extractions. There
is a clear but relatively weak drop-off in the scores from left to right, which is slightly obscured
by the rogue sagen score in the object extractions, which is artificially low due to some irrelevant
effect, we have argued.
The middle group interests us the most. They form a clear group, bunching fairly tightly.
The object extractions from V2 form the straightest line, which is to be expected since this
condition was tested in both experiments, and so it is formed of more data points. The other two
structures, subject extractions from V2 and declarative V2 show a little more variation, but the
general trends of all three conditions correspond very closely. This is particulary clear in
contrast to the other three conditions, which are not part of this group. It is also very clear when
one looks back at figure 2, where the Yes/No questions plainly do not co-vary with the whquestions.
16
Figure 4: The declarative conditions and the combined extraction conditions of both
experiments.
This strong correspondence supports the analysis according to which the V2 feature is strongly
related to with the bridge feature, as has occasionally been argued (and denied) in the literature
(eg Grewendorf 1988, Müller & Sternefeld 1995). Such a conclusion is naturally tentative: our
sample of verbs and structures is necessarily limited, but this data provides no reason not to
associate these two lexical characteristics, on the contrary, the clear covariance of bridge feature
and V2 feature provides strong evidence in favour.
But this support does not only consist of this covariance; further inspection of these data
sets reveals other evidence, which concerns the conclusiveness of apparent counter-examples. In
experiment 1 we saw that judgements of relevant sentences can easily be affected by plausibility
factors, which means that we must be very careful in selecting data if we wish to identify
example structures which disconfirm the association of the V2 and bridge characteristics. Such
apparent counter-examples may simply be revealing a difference in plausibility between
structures in the bridge environment and structures in the V2 environment. To identify a
counter-example it is necessary to exclude plausibility systematically, as can be illustrated in our
experimental data. In experiment 1, the V2 and V-final interrogative conditions largely co-vary,
which must feed the suspicion that plausibility is the cause of their variation. In experiment 2 on
the other hand, the V2 and V-final declaratives behave independently, suggesting that
plausibility is playing no role.
Now the necessity of controlling for plausibility belongs to the ABC of data collection and
must surely be no surprise to anyone. But the pattern of well-formedness judgements we observe
provide another, stronger reason to question whether apparent counter-examples actually
demonstrate a dissociation of V2 and bridge features. It is namely no counter-evidence to
provide an example of a verb which permits a V2 complement but does not seem to permit
extraction from a V-final complement; for our data makes it clear that V2 structures are in
general better than extractions from V-final, in fact V2 declaratives seem to be a little better than
V2 extractions. The claim that the V2 feature is the same as the bridge feature does not therefore
predict that any verb which permits the one will also permit the other with the same degree of
acceptability, rather it asserts that verbs will occupy the same position in the hierarchy of V2
licensing that they occupy in the bridge hierarchy. Verbs will thus either be good or bad
17
exemplars of both characteristics, relative to other verbs, but the two characteristics need not
give rise to identical judgements. In the light of this, it will be clear that examples advanced as
counter-evidence on the basis that the V2 structure seems feasible but the bridge structure does
not (eg Reis 1994), do not in fact defeat the proposition that the two characteristics are aspects of
the same feature.
Let us note here an additional argument which Reis has raised against the association of
V2 and bridge features. She suggests that data from certain preference predicates displays a
very different pattern to that found in the verbs we have looked at, so much so that it undermines
the case for the V2 and bridge features being mere aspects of the same characteristic. Let us
examine some of these cases in the four relevant structures: V2 complement (10), V-final
complement (11), extraction from V2 (12), extraction from V-final (13) (analyses and
judgements from Reis 1994; Reis has more examples, but these are representative). While both
simple declarative structure types are fine or at least feasible (10, 11), there is a mismatch
between the extraction conditions. Extraction from a V2 complement (12) is quite impossible,
but extraction from a V-final complement clause (13), although poor, is not completely excluded.
These extractions are topicalizations rather than wh-elements, but this should not alter the case in
any relevant way. Why should these structures behave so very differently if they are both long
extractions?
(10)
(11)
(12)
(13)
a.
Mir wäre
lieber,
Hans würde die Papiere vernichten.
to.me would.be preferable Hans would the papers destroy
'
It would be preferable to me if Hans destroyed the papers.'
b.
Vernünftig wäre,
du gingest morgen zu einem Spezialisten.
sensible would.be you went tomorrow to a
specialist
'
It would be sensible for you to go to a specialist tomorrow.'
c.
Ich würde vorziehen, du sagtest mir endlich über diesen Vorfall die Wahrheit.
I would prefer
you told me at.last about this
incident the truth
'
I would prefer it if you at last told me the truth about this incident.'
a.
Mir wäre
lieber,
dass Hans die Papiere vernichten würde.
to.me would.be preferable that Hans the papers destroy
would
b.
Vernünftig wäre,
dass du morgen zu einem Spezialisten gingest.
sensible would.be that you tomorrow to a
specialist
went
c.
Ich würde vorziehen,dass du mir endlich über diesen Vorfall die Wahrheit sagtest.
I would prefer
that you me at.last about this
incident the truth
told
a.
*Die Papiere wäre
mir lieber,
würde Hans vernichten.
the papers would.be to.me preferable would Hans destroy
b.
*Zu einem Spezialisten wäre
vernünftig, gingest du morgen.
to a
specialist would.be sensible went you tomorrow
c.
*Über diesen Vorfall würde ich vorziehen, sagtest du mir endlich die Wahrheit.
about this incident would I prefer
told you me at.last the truth
a. ?? Die Papiere wäre
mir lieber,
dass Hans vernichten würde.
the papers would.be to.me preferable that Hans destroy
would
b. ?? Zu einem Spezialisten wäre
vernünftig, dass du morgen gingest.
to a
specialist would.be sensible that you tomorrow went
c. ?? Über diesen Vorfall würde ich vorziehen, dass du mir endlich die Wahrheit sagtest.
about this incident would I prefer
that you me at.last the truth
told
Now this data would indeed be a problem, if it were not that there is another analysis of these
structures. It is surely no coincidence that all the matrix predicates Reis cites can take an quasiexpletive correlative pronoun (Mir wäre es lieber, Es wäre vernünftig, Ich würde es vorziehen).
18
Now this pronoun can in some cases be non-overt, but it is nevertheless part of the argument
structure of the predicate, and can usually at least optionally be overt (for discussion see Reis
2000, Sandberg 1998). We would therefore argue that the embedded clause is the correlate of
this pronoun, essentially standing in apposition to it; the clause is thus not the direct complement
of the predicate. It follows that the relationship of this type of clause with its matrix verb is
different to the relationship of a complement clause to its matrix verb, even if the two structures
are not immediately distinguishable on the surface, which will be the case if the quasi-expletive
es is non-overt and the complement clause is extraposed. We see this superficial similarity
illustrated in (14) and (15).
(14) Wenn du fremdgehen würdest, würde ich (es) vorziehen, dass du mir die Wahrheit sagst.
if
you had.an.affair would would I it prefer
that you me the truth
say
(15) Wenn du fremdgehen würdest, würde ich (*es) hoffen, dass du mir die Wahrheit sagst.
if
you have.an.affair would would I
it hope that you me the truth
say
In these examples we add a preceding context so as to permit both matrix verbs to appear
felicitously as conditionals. In (14) the complement of the verb is the correlative es, which may
optionally be non-overt, and which stands in a relationship similar to apposition to the embedded
clause. The superficially similar structure in (15) can take no correlative es, as the embedded
clause is the immediate complement of the verb, even though it has been extraposed. We argue
therefore that the embedded clause is a complement in neither (10) nor (11), since these
structures are similar to (14). If, as we suggest, (12) and (13) are not extractions from
complement clauses, then they lose much of their force as counter-examples. Nevertheless, we
must admit that this alternative analysis of these structures does not entirely defeat the force of
Reis'
s argument, for it is nevertheless true that the extraction examples in (12) are much worse
than those in (13), which our analysis does not predict (quite the opposite indeed). Even if
neither (10) nor (11) have a complement clause but merely an embedded correlative clause, why
should extraction in (12) be so much worse than in (13)?
One tempting possibility is that it is the role of the proposition as correlate of the subject
which impedes extraction, since many correlative pronouns occupy the subject position, and
subjects are known islands for extraction. However the (10c-13c) examples do not seem to
confirm this: the structure type has a correlative pronoun, but it unambiguously occupies the
object position, and the same constraint on extraction in (12c) seems to apply. On the other hand
the object correlate (13c) does seem to be better than the subject correlates (13ab). Subject
extraction restrictions cannot remove the problem however, since the V-final extractions in (13)
are poor, but not impossible, and for our hypothesis that the bridge and V2 features are identical
to go through, we must explain why therefore the extractions from V2 embedded clauses are so
much worse.
An approach to such questions which frequently yields results is to examine the data more
closely. A complicating factor in these examples seems to be the adjunct status of many of the
topics, and the lack of overt case marking in those which are arguments. Adjuncts and
arguments which are not overtly case-marked are very easily parsed as related to the matrix
clause, which can cause distortion of judgements. While the judger rationally knows that the
topic should be associated with the embedded clause, not the matrix clause, it seems that we are
not fully able to prevent the goodness of the construction on the matrix clause association
reading affecting our judgements of the other reading. To exclude the possibility that we are
unconsciously associating the topic with the matrix clause we must consider examples with
overtly case-marked argument extraction, as in (16), where we have additionally standardized the
lexis.
19
(16)
a.
b.
c.
*Den Brief wäre
mir lieber,
dass Hans vernichten würde.
the letter would.be to.me preferable that Hans destroy would
*Den Brief wäre
vernünftig, dass Hans vernichten würde.
the letter would.be sensible that Hans destroy
would
*Den Brief würde ich vorziehen, dass Hans vernichten würde.
the letter would I prefer
that Hans destroy would
Once we have standardized the data and excluded the possibility of a matrix association of the
topicalized element, two things become apparent. First, the apparent difference between the
subject correlates (16ab) and the object correlate (16c) disappears: these extractions are all
judged about equally bad, which would counter the suggestion that the grammatical function of
the correlative es plays a role. Second, these examples are unambiguously ungrammatical, in the
same way that their standardized and disambiguated equivalents in (17) are.
(17)
a.
b.
c.
*Den Brief wäre
mir lieber,
würde Hans vernichten.
the letter would.be to.me preferable would Hans destroy
*Den Brief wäre
vernünftig, würde Hans vernichten.
the letter would.be sensible would Hans destroy
*Den Brief würde ich vorziehen, würde Hans vernichten.
the letter would I prefer
would Hans destroy
The intermediate judgements reported by Reis in (13) seem therefore to be largely a product of
the ambiguity of attachment of the specific examples with adjunct or non-overtly case-marked
topics. Let us admit, however, that while the examples in (16) are worse than those in (13) and
must be regarded as no part of the language, they still seem not quite as bad as those in (17). We
have no immediate explanation for this, but the explanation might be related to the certainty of
analysis which the inclusion of a complementizer provides. A complementizer, by definition,
introduces a clausal complement, which at least lets the parser know what structure it is dealing
with even if this is not normally licit. In the absence of a complementizer, many different
structures are possible, which leaves the parser in a more confused state. This is of course
somewhat speculative and it is clear that further work will be necessary to clarify which factors
are having what effects here (cf Reis 1996, Kiziak 2004).
We conclude that these predicates which take correlative pronouns can not take clausal
complements themselves, but do permit V2 or V-final embedded clauses to be correlated with
their pronominal objects. They do not allow extraction from any clause associated with their
correlative pronominal objects, which can be readily understood as a sub-case of the more
general prohibition of extraction from a clause under an NP complement. Since V2 clauses and
V-final clauses behave in a parallel manner in these structures, this state of affairs is quite
compatible with the identity of bridge and V2 features.
6 Concluding remarks
Our results throw up a number of interesting points. Firstly, our data underlines that the bridge
feature is a continuum and not a categorial distinction: there is no absolute group of bridge verbs,
only better and worse ones. Now few linguists would doubt this point (eg Reis 1994, Erteshik
1973) but equally few seem to follow its implications through consistently. Instead they
continue to apply an essentially binary model of grammaticality to the data, though admittedly
generally allowing intermediate positions. In the case of bridge and V2 verbs, this abstraction is
plainly obscuring the picture. The hypothesis of the identity of V2 and bridge features does not
make the prediction that if one of these structures is possible, the other one will be too, rather it
20
predicts that any verb will occupy the same position in the hierarchies of V2 licensing and bridge
licensing, since these are essentially the same.
This insight requires us to accept a gradient notion of grammaticality, at least for the
structures in question here; since any attempt to divide the structures we tested into
‘grammatical’ and ‘ungrammatical’ groups would plainly be arbitrary. The best we might
manage would be to distinguish a group of structures which are good enough for them to be
used, and another group, which are so bad that they never occur in practice. Note that this does
not in itself prejudice a structural account of syntactic phenomena, since the controlled elicitation
of introspective judgements of well-formedness from multiple subjects always produces such a
pattern. Given the freedom to do so, informants expressing judgements make far finer
distinctions than standard assumptions about well-formedness would suggest, and they show
little sign of finding absolute endpoints which might reflect categorical (un)grammaticality. If
syntactic theory is to make full use of this rich data source it must come to terms with the
inadequacy of the binary grammatical/ungrammatical abstraction of well-formedness which is
generally assumed (see Featherston 2004 for more discussion of this).
The next conclusion we might draw from this data set would be that extraction from V2
complement clauses is a better criterial structure for the judgement of the bridge qualities of a
particular verb than extraction from a V-final clause. The reason for this is simple: this structure
shows the same hierarchy but reveals greater amplitude of difference. This too requires that we
suspend the assumption of binarity and think about grammatical acceptability as a continuum.
The probable reason why extraction from V-final has been generally thought of the criterial
structure is that it more readily produces outputs which are so bad that they are thought of as
excluded from the language, while extractions just become ‘less good’ , a finding which has little
meaning outside of a model of gradient grammaticality.
Third, our results provide some suggestive evidence on the factors which predict bridge
quality. In our set of eight verbs matched into related pairs, the pairs show some degree of
common behaviour. Sagen and behaupten are both good, while the attitudinal hoffen and
fürchten are both in the mid-range. Fürchten and bezweifeln, which both have a negative aspect,
are both poorer than their positive pairs. Erzählen and erklären are both poor. There is also
evidence of other factors: of the four polysyllabic stemmed verbs, three are in the lower half of
the bridge and V2 scale. Intuitively it does seem to hold that the better scoring verbs are more
semantically light than the weaker scoring ones; they are also more frequent. This data would
therefore tend to support the suggestion that it is some combination of the factors of semantic
lightness, shortness, and frequency which determines bridge and V2 quality. The causal
relations between these factors, however, our data cannot illuminate.
Last, we see no reason to distinguish between the grammars of north and south German
speakers in this data, contrary to the general assumption that only southern German dialects
permit extraction from V-final clauses (eg Grewendorf 1988; ch7). Figure 5 below shows the
mean scores across verbs of the four extraction conditions and the two declaratives, by the
regional background of the subjects. We define the ‘South’ as Baden-Württemberg and Bavaria
(n=24), the ‘North’ as Hamburg, Bremen, Berlin, Schleswig-Holstein, Niedersachsen,
Brandenburg and Mecklenburg-Vorpommern (n=7), the ‘Centre’ as all parts in between (n=11).
Note that the ‘No dialect’ class (n=10) includes both those explicitly claiming no dialect
influence as well as those giving an inappropriate response (almost always ‘German’ ). There is
no visible systematic difference between the subjects from the different linguistic regions.
Northern speakers make exactly the same distinctions that southern speakers do, and they permit
extractions from V-final clauses to the same extent. This was reflected too in the statistical
analysis: the factor Region was wholly insignificant (F=0.285 p=0.836).
21
Figure 5: The extraction and declarative conditions (of both experiments combined)
distinguishing the dialect background of subjects.
This finding leaves a question mark over the much discussed differences between northern and
southern speakers of German in their judgements of these long extraction structures. Why do we
fail to detect any sign of this well-known and recognized phenomenon?
One approach to this puzzle would be to locate the contrast in the data: the conventional
grammaticality judgement technique asks for a binary judgement whether a sentence is
grammatical or ungrammatical, not how good it is relative to other structures. Since our data
makes it clear that all speakers make the same relative judgements, it seems likely that the
attested dialectal differences are due to them drawing the rather arbitrary line of absolute
ungrammaticality in different places: southern speakers draw it rather lower on the scale and thus
include more of the marginal structures within the ‘grammatical’ group, northern speakers draw
it rather higher and tend to exclude them. But the grammars of both groups appear to be
identical: what perhaps varies is the judgement of what counts as full ungrammaticality, and this
seems likely to at least partly driven by frequency of occurrence. While the relative judgements
of naturalness that we have employed in our studies largely reflect the computational complexity
of a structure, the intuition of absolute grammaticality or ungrammaticality includes this
information but is one step further removed. We hypothesize that the factor which drives the
intuition of absolute grammaticality is our judgement whether a structure is not only good but
also (potentially) frequent; the intuition of absolute ungrammaticality depends upon the feeling
that a structure is so bad that it would never be (deliberately) used (see Featherston 2004 for
discussion). On this account therefore, the difference in north and south German in extraction is
independent of their grammars - which are identical as our data confirms - and merely reflects
that northerners simply do not use and therefore do not hear this construction so often. Given a
conventional grammaticality judgement task, they therefore tend to group these structures with
the ‘ungrammaticals’ , rather than with the ‘grammaticals’ as southerners do. A training study by
Fanselow et al (1999) who exposed north Germans to such structures and found increased use of
them thereafter would offer some tentative support to the exposure-based suggestion.
7 References
22
Bard, Ellen, Dan Robertson & Antonella Sorace (1996). Magnitude estimation of linguistic
acceptability. Language 72: 32-68.
Bech, Gunnar (1955). Studien über das deutsche Verbum infinitum, Band 1. Kopenhagen. (New
identical edition Tübingen: Niemeyer, 1983).
Cowart, Wayne (1997). Experimental Syntax: Applying Objective Methods to Sentence
Judgements. Thousand Oaks, California: Sage.
Cattell, Ray (1978). On the source of interrogative adverbs. Language 54: 61-77.
Clark, Herbert (1973). The language-as-fixed-effect fallacy: A critique of language statistics in
psychological research. Journal of Verbal Learning & Verbal Behavior 12: 335-359.
Erteschik-Shir, Nomi (1973). On the nature of island constraints. PhD dissertation, MIT.
Erteschik-Shir, Nomi & Shalom Lappin (1979). Dominance and the functional explanation of
island phenomena. Theoretical Linguistics 6: 41-86.
Fanselow, Gisbert, Reinhold Kliegl & Matthias Schlesewsky (1999). ‘Long’ movement in
Northern German: A training study. ms, University of Potsdam.
Featherston, Samuel (2004). The Decathlon Model: Design features for an empirical syntax. To
appear in Linguistic Evidence: Empirical, Theoretical, and Computational Perspectives,
Marga Reis & Stephan Kepser (eds.). Tübingen: Narr (available at
http://www.sfb441.uni-tuebingen.de/~sam/papers/LingEvid.pdf).
Featherston, Samuel (to appear). Universals and grammaticality: Wh-constraints in German and
English. Linguistics 43.
Grewendorf, Günther (1988). Aspect der Deutschen Syntax. Eine Rektions-Bindungs-Analyse.
Tübingen: Narr.
Haider, Hubert (1984). Topic, Focus and verb second. Groninger Arbeiten zur Germanistischen
Linguistik 22: 47-100.
Keller, Frank (2000). Gradience in grammar: Experimental and computational aspects of degrees
of grammaticality. PhD Thesis, University of Edinburgh.
Keller, Frank, Martin Corley, Steffan Corley, Lars Konieczny & Amalia Todirascu (1998).
WebExp: A Java Toolbox for Web-Based Psychological Experiments. (Technical Report
HCRC/TR-99.) Human Communication Research Centre, University of Edinburgh.
Kiparsky, Paul & Carl Kiparsky (1970). Fact. In Progress in Linguistics, Manfred Bierwisch &
Karl Heidolph (eds.), 143-73. The Hague: Mouton.
Kiziak, Tanja (2004). Einschub oder Bewegung? Empirische Evidenz zur Parenthese-Hypothese.
MA thesis, Eberhard-Karls-Universität Tübingen .
Kluender, Robert (1992). Deriving island constraints from principles of predication. In Island
Constraints. Theory, Acquisition and Processing, Helen Goodluck & Michael
Rochemont (eds.), 223-258. Dordrecht: Kluwer.
Meyer, Roland (1999). Studien zu Inselbeschränkungen, vor allem im Russischen:
Deklarativsätze und Nominalphrasen. MA thesis, Eberhard-Karls-Universität Tübingen.
Müller, Gereon & Wolfgang Sternefeld (1995). Extraction, lexical variation, and the theory of
barriers. In Lexical Knowledge in the Organization of the Language. (Current Issues in
Linguistic Theory, 114.), Urs Egli, Peter Pause, Christoph Schwarze, Arnim von Stechow
& Götz Wienold (eds.), 35-80. Amsterdam: Benjamins.
Reis, Marga (1994). Brückeneigenschaften von Matrixsätzen bei langer Extraktion im
Deutschen. Manuscript of talk at the Deutsche Gesellschaft für Sprachwissenschaft
Conference, Münster, 9-11.3.1994.
Reis, Marga (1996). Extractions from verb-second clauses in German? In On Extraction and
Extraposition in German. (Linguistik Aktuell/Current Linguistics, 11.), Ulli Lutz &
Jürgen Pafel (eds.), 45-88. Amsterdam: Benjamins.
Reis, Marga (2000). Was tun mit konfligierenden Daten? Handout of talk at Potsdam-Tübingen
Arbeitstreffen zur Datenproblematik, Tübingen 27-28.10.2000.
23
Sandberg, Bengt (1998). Zum es bei Transitiven Verben vor Satzförmigem Akkusativobjekt.
Tübingen: Narr.
Stevens, Stanley (1975). Psychophysics: Introduction to its Perceptual, Neural and Social
Prospects. New York: Wiley.