1 Bridge verbs and V2 verbs the same thing in spades? Sam Featherston, Universität Tübingen Sam Featherston Universität Tübingen SFB 441 Linguistische Datenstrukturen Nauklerstrasse 35 D-72074 Tübingen e-mail: [email protected] Tel.: +49 7071 2977152 Fax: +49 7071 295830 2 Bridge verbs and V2 verbs - the same thing in spades? Sam Featherston, Universität Tübingen In this paper we investigate the quality of ‘bridgeness’, the extent to which a verb permits extraction from its complement clause. In spite of considerable study, linguists have only a limited idea of the nature of the factor involved. We address the question whether the quality of being a bridge verb is the same as the quality which permits a verb to take a verb-second complement clause in German. We report the results of two experiments using the technique of magnitude estimation, a method of eliciting grammaticality judgements with a high degree of differentiation. The data reveals that the ‘bridge’ quality and the ‘V2’ quality covary very closely, suggesting that they may well be facets of the same factor. We discuss the implications this finding has for the nature of the bridge quality. 1 Introduction The cross-linguistic tendency of languages to permit constituents to occur phonologically elsewhere in the sound string from where they would normally be interpreted is one of the most remarkable features of the grammar. This dissociation of phonological and syntactic form has been analysed as syntactic movement within the generative tradition of syntax. However, in spite of four decades of intense study and undoubted progress, the nature of this displacement is still controversial and many outstanding questions remain, for example about the motor of movement processes and the obligatoriness of its application. The phenomenon of bridge verbs provides a case in point: while possible extractions are standardly captured in lexis-independent syntactic configurations, in practice only a subset of verbs seem to allow extraction of constituents from their complement clauses, demonstrating that movement restrictions are not merely syntactic but must also be related to lexical factors. We illustrate this in (1), where the verb meinen ‘think’ allows the extraction, hoffen ‘hope’ seems less natural, and bezweifeln ‘doubt’ feels inappropriate in the structure. (1) Wen meint /?hofft /*bezweifelt Hans, hat der Lehrer ausgeschimpft? who thinks /hopes /doubts Hans has the teacher scolded ' Who does Hans think/hope/doubt the teacher has scolded?' In the light of this, the nature of these ‘bridge’ verbs has received considerable interest, linguists generally adopting the tactic of attempting to identify first what characteristics, semantic or syntactic, make a bridge verb, with the hope that this information will enable further conclusions This work was supported by the Deutsche Forschungsgemeinschaft within SFB441 Linguistische Datenstrukturen. Many thanks to Wolfgang Sternefeld, project leader, Tanja Kiziak, and many other members of the SFB. All remaining errors are my own. 3 to be drawn. A question which has been raised in this regard with reference to German data is whether the verbs permitting extraction are exactly the same group as those allowing a Verb Second (V2) complementizerless complement clause as in (2). This too is a feature which appears to adhere to certain predicates and not to others. Note that we shall refer in the text to verbs which permit a complementizerless V2 complement as V2 verbs, for brevity. (2) Max hofft, er hat die Geschworenen überzeugt. Max hopes he has the jury convinced ' Max hopes he has convinced the jury.' In this study we employed the magnitude estimation methodology (Bard, Robertson & Sorace 1996, Cowart 1997, Keller et al 1998) to carefully controlled sets of materials. On the firm basis of this empirically obtained data, we are able to evaluate hypotheses relevant to this structure which have been advanced in the literature. We interpret the results as showing that the bridge verb feature and the V2 verb feature are at least closely related, but that the bridge quality is a continuum which interacts cumulatively with a range of other factors. It turns out that the standard structure assumed to be the test of the bridge quality, extraction from a verb-final complement, is in fact less useful as a measure of the ‘bridge’ quality, than is extraction from a V2 complement clause, since the latter shows greater amplitude, and thus clarity, of variation. We argue that judgements which have been advanced to support other conceptions of the bridge feature have been misleading, since they have used a binary grammaticality model, which is inappropriate for this data.1 2 The nature of bridge verbs The most common theme in discussion about bridge verbs is the question what decides whether a verb will join the class or not (for some early work see Erteschik-Shir 1973). A fundamental insight is that the meaning components of the verb play a role. The various authors treat these rather differently, but these differences are for our purposes fairly unimportant. For example, it seems to be generally agreed that verbs with a more factive interpretation permit extraction more readily (Kiparsky & Kiparsky 1970), so that predicates of stating and thinking are usually better than verbs of asking and doubting. A number of linguists have attempted to identify the relevant semantic features more precisely. For example, Cattell (1978) puts forward a relevant division between verbs which presuppose that the proposition within their complement clause is part of common belief, and those which do not. He then distinguishes within this second group between volunteered stance predicate types (eg claim, assert, say, allege) and response stance predicate types (eg deny, admit, confirm, agree). Cattell argues that it is the volunteered stance verbs which permit bridge effects. In his examples this is measured by the reference of an initial why to an embedded clause, as in Why do they claim/deny that she killed him? He judges those predicates to have the bridge feature which readily permit the reading where the why is understood as applying to the complement clause, ie why did she kill him? not why do they claim/deny it? In their discussion of island effects, Erteschik-Shir & Lappin (1979) analyse the relevant factor as dominance, which is essentially a sort of discourse focus: ‘A constituent is dominant 1 Much heat and little light is generated by discussions of precisely what is meant by grammaticality. We use the term here to refer to the construct measured in grammaticality judgements, after performance factors have been discounted to the extent that they can be recognized as such. We intend no theoretical claim about the boundaries of syntactic competence, but if we find effects related to structure and independent of lexis, for which there is no readily available processing, production, plausibility or other performance explanation, then we shall refer to this as related to grammaticality, using the term in the sense of syntactic well-formedness. For a discussion of the nature of syntactic well-formedness see Featherston (2004). 4 [...] when a speaker intends to draw attention to its semantic content.’ (45). They admit that this term is not clearly distinguishable from topic in the topic - comment pair. On this analysis, bridge verbs are those which contribute least ‘semantic content’ and ‘weight’ (Erteschik-Shir & Lappin 1979: 62-3). The use of a verb with more semantic weight in a matrix clause must be assumed to result from the speakers'intention to draw attention to its content. Kluender (1992) argues in a rather parallel manner, but discusses how the referential specificity of the complementizer, the subject, the extracted element, and the potential bridge verb may cumulatively be the causal factor of island effects. In the case of matrix verbs ‘referential specificity’ simply means more detailed and thus more specific semantic content. He argues that it is the additional processing load of a more semantically complex verb which makes extractions involving them seem more ‘difficult’ , and thus of dubious acceptability. All of these accounts can be seen as differing only in perspective, and identifying essentially the same factor, that of semantic weight. However, not all approaches stress the effect of meaning components: for example, Müller & Sternefeld (1995) assume that a complement CP has an NP shell, and the bridge quality is a reflexion of the ease with which this NP is incorporated into the verb. The most obvious competitor to these would be frequency: if a verb is sufficiently frequent it becomes more transparent to movement. Consideration of some good and bad bridge verbs makes it seem plausible that this frequency plays a role. In order to investigate this possibility we performed some frequency counts of the eight verbs that we tested in our experiments reported in this paper below. In table 1 we give their spoken and written lemma frequencies (CELEX, MPI Nijmegen) and infinitive form frequencies (COSMAS, IDS Mannheim).2 From these we calculate a mean log frequency score, which is intended to abstract away from the vagaries of individual corpus contents. We first convert each of the four frequency figures into logarithms, as is standard for frequency measures, and then sum the four of them, and divide by four. We compare these mean log frequencies with the ‘bridgeness’ score which we derive from the grammaticality judgements we elicited in our experiments reported below. These bridge feature scores are a measure of how good these verbs were judged to be as bridge verbs; technically the figures represent the mean normalized judgements of the extraction conditions in experiments. The origin and significance of these figures will become clear when we lay out our studies below, but readers should accept on trust for the moment that these scores constitute an empirically obtained relative measure of the extent to which speakers judge that these verbs permit extractions from their clausal complements, a higher score indicating greater ‘bridgeness’ . Verb sagen ‘say’ CELEX spoken 3545 CELEX COSMAS COSMAS Mean log Bridgeness score written spoken written frequency 8614 3637 93345 4.0039 -0.002 glauben ‘believe’ 1076 1728 194 28877 3.2544 -0.155 behaupten ‘claim’ 64 702 40 15126 2.6086 -0.134 hoffen ‘hope’ 99 569 64 26255 2.7440 -0.235 fürchten ‘fear’ 35 311 14 11954 2.3151 -0.395 115 731 492 17796 2.9667 -0.446 erzählen ‘tell (story)’ 2 The CELEX statistics come from the CELEX lexical database, release 2, Max Planck Institut für Psycholinguistik, Nijmegen, of 6 million words of German. The COSMAS data is the results of searches in the spoken and written public archives in COSMAS I, Institut für deutsche Sprache, Mannheim, for the infinitive forms of these verbs. At the time of these searches, this corpus contained 541 million accessible word-forms. 5 erklären ‘explain’ 106 1680 88 27946 2.9104 -0.619 bezweifeln ‘doubt’ 11 77 4 2241 1.7201 -0.764 Table 1: Frequency as a determinant of ‘bridge’ quality. The Mean Log Frequency is derived from the four raw frequency counts, the Bridgeness Score is from the experiments we report later in the paper. Inspection of the frequency scores reveals a fair degree of consistency between the four frequency measures. Pearson' s coefficients confirm strong correlations between them (all pairwise coefficients =>0.95). The mean frequency figures and bridgeness figures show a much weaker but nevertheless still significant correlation (coefficient 0.725). Although this data set confirms that there is a degree of correlation, it would appear that frequency is not the only factor affecting transparency to extraction. It is striking that the monosyllabic stems of hoffen and fürchten seem to form better bridge verbs than the bisyllabic stems of erzählen and erklären, in spite of the higher frequency of these latter. However, differences may come to from other sources; frequency scores are only ever a reflexion of frequency in the sample of texts that make up the corpus base, and these are inevitably subject to bias. For example, it seems implausible that erzählen should be eight times as frequent in the spoken language as hoffen (as it is in the COSMAS spoken corpus). The good correlation among the four frequency scores may reflect the fact that their constituent text bases are largely composed of the same ‘public’ language style. The difference between the frequencies and our experimentally obtained judgements of ‘bridgeness’ may represent the distance between this public variety of the language and the ‘informal’ style far more frequent in everyday use that our experimental subjects assume as their stylistically neutral language. To summarize, we may say that this data demonstrates that frequency correlates quite closely with the bridge quality, but it does not and cannot demonstrate that frequency is the determining factor and indeed it would tend to suggest that other factors too are relevant. One possibility is that frequency is not itself a causal factor of bridgeness at all, but merely relates to the parameter of semantic weight, the common denominator of the approaches of Cattell (1978), Erteschik-Shir & Lappin (1979) and Kluender (1992), which we mentioned above. Now it seems plausible that semantically light verbs are better bridge verbs, but this fact is of little use to us in constructing a linguistic account, since semantic weight is not a precise term and is not obviously quantifiable. Additionally, the causal relationship between frequency and semantic weight is unclear: Is a verb frequent because it is semantically light and therefore fits many circumstances, or does frequency of use erode semantic content and phonological form? Are both of these true? In the light of this, it will be clear that only a very woolly and untestable account of the bridge quality can be built on this basis, but in spite of these problems, it does seem probable that semantic weight is relevant. One possibility might be that semantically light verbs permit themselves to be more easily semantically compounded with their embedded verbs, thus effectively reducing the extraction distance to a single proposition. This has a degree of similarity with the incorporation account of Müller & Sternefeld (1995) mentioned above. On this semantic lightness account we would therefore derive the difference in acceptability between (3a-c) from the difference in feasibility of the compound predicates in (4a-c). The more plausible a compound predicate is, the easier it is to extract over it, since this requires the construction of the complex predicate. (3) a. Who does John think Mary likes? b. ?Who does John believe Mary likes? c. ??Who does John doubt Mary likes? 6 (4) a. b. c. THINKS.LIKES(Thomas, Mary, John) THINKS.TRUE.LIKES(Thomas, Mary John) THINKS.NOT.TRUE.LIKES(Thomas, Mary, John) The difficulties of any such a attempted explanation are obvious, and we shall take this approach no further here, noting only that this phenomenon has many similarities with that of ‘coherence’ in clauses (Bech 1955). Both properties are scalar rather than categorial and most obviously determined by lexical properties of the matrix verb, but they are also sensitive to semantic effects such as negation (5a) as well as surface effects such as heaviness (5b). (5) a. b. ?Who doesn' t John think Mary likes? ?Who does John think on the balance of the evidence Mary likes? It is thus apparent that the bridge quality is not something which is going to be pinned down easily, since it seems to straddle the boundaries between syntax, semantics and surface factors and depend on a wide range of variables. An example of the last: good bridge verbs tend crosslinguistically to be short, but whether this is really a causal factor or whether it is simply a correlate with frequency (Zipf' s Law) or semantic simplicity seems hard to tease apart. All of the above accounts of this phenomenon share this difficulty to some degree: they are all hypotheses, and in order to move the debate on we need to produce evidence which would decide between them. It is however not immediately obvious how this might be achieved, especially as it seems that these factors co-vary to a considerable extent. What can however be tested is how the syntactic behaviour of bridge verbs relates to other syntactic behaviour, and this is our aim here. The particular question we shall address relates to German. In this language there are two finite complement clause types, those with and without complementizers. While these correspond in many ways to complement clauses in English with and without complementizers, there is one major difference: while a clause with a complementizer has a clause-final verb (6a), a complementizerless clause has an initial topic position followed by the finite verb just like a matrix clause (6b). (6) a. b. Max hofft, dass er die Geschworenen überzeugt hat. Max hopes that he the jury convinced has ' Max hopes that he has convinced the jury.' Max hofft, er hat die Geschworenen überzeugt. Max hopes he has the jury convinced ' Max hopes he has convinced the jury.' It is clear that there is a significant overlap between bridge verbs and those verbs which permit verb-second complementizerless complement clauses. It is controversial, however, whether these verbs are indeed identical or whether they merely overlap. For example, Grewendorf (1988) suggests that they are indeed the same set of verbs, and refers to Haider (1984). Reis (1994) on the other hand argues against this proposal pointing out a number of differences, and Müller & Sternefeld (1995) come to the same conclusion. A finding that the bridge feature and the V2 feature were the same would be of theoretical significance, since it would provide a real clue to the nature of the bridge feature. It is also controversial what structure is the criterial condition for being a bridge verb in German. While the general view is that the grammaticality of extraction from a V-final complement is the most appropriate test, our own view is that extraction from V2 clauses is no less and in some ways more suitable. In our experiments reported below we attempted to obtain replicable data which would throw light on the relationship between the ability of a verb to take a 7 V2 complement clause, and the extent to which it licenses extraction from verb-final and V2 complement clauses. 3 Experiment 1 3.1 Design Our first aim was to determine what grammaticality patterns the different complement clause types and extractions would produce, and whether these would co-vary. If the bridge verb and V2 complement verb characteristics are linked, we should expect to see them behaving in a parallel fashion across verbs. If not, we should expect to see some contrasts. For each of eight verbs (see below), we tested the same syntactic conditions: there will be hierarchies of acceptability for each. If the pattern of response across verbs gives a parallel signature in both the extraction and V2 complement conditions (ie verbs are either good, bad or marginal in each, but not good in one and poor in the other) then we have strong evidence that the qualities making for a good bridge verb and for a good V2 verb are related. However, in order to make more meaningful statements about the relationship, we must also be able to distinguish between effects of extraction and effects of complement clause type. We therefore tested in six syntactic conditions: subject and object extraction from V2 and V-final clauses, and Yes/No questions with V2 and V-final clauses. (7) a. b. c. d. e. f. Wer glaubst du, hat den Schüler ausgeschimpft? who think you has the pupil scolded ' Who do you think scolded the pupil?' Wer glaubst du, dass den Schüler ausgeschimpft hat? who think you that the pupil scolded has Wen glaubst du, hat der Lehrer ausgeschimpft? whom think you has the teacher scolded ' Whom do you think the teacher scolded?' Wen glaubst du, dass der Lehrer ausgeschimpft hat? who think you that the teacher scolded has Glaubst du, der Lehrer hat den Schüler ausgeschimpft? think you the teacher has the pupil scolded ' Do you think the teacher scolded the pupil?' Glaubst du, dass der Lehrer den Schüler ausgeschimpft hat? think you that the teacher the pupil scolded has The contrast of (7ab) with (7cd) allows us to control for asymmetries in the extraction of subjects and objects; and the contrast (7ac) with (7bd) allows us to identify the effect of clause type. We contrast these with interrogatives (7ef) so that our conditions without extraction remain maximally similar to the extraction conditions. We tested eight verbs, on a continuum of bridgeness and in semantically related pairs, to the extent that this is possible: sagen ‘say’ und behaupten ‘claim’ , erklären ‘explain, declare’ und erzählen ‘tell (a story), relate’ , hoffen ‘hope’ and fürchten ‘fear’ , and glauben ‘think, believe’ and bezweifeln ‘doubt’ . The contrast of sagen and behaupten might highlight effects of frequency and length, while the pairs hoffen/fürchten, and glauben/bezweifeln might reveal the effect of negativity and, in the second case, factivity. The pair erzählen/erklären were included as closely matched ‘manner of saying’ verbs, which seemed to be poor but not impossible bridge verbs. Let us be clear that no claims of strong adequacy are intended for these matchings, and it was the continuum of bridgeness which was our primary concern. 8 3.2 Eliciting judgements The magnitude estimation methodology (Bard, Robertson & Sorace 1996, Cowart 1997, Keller 2000) allows us to obtain maximally differentiated judgements from a group of informants and compare them meaningfully. It derives from a methodology used to grade physical sensations such as heat and brightness and developed from there for use in attitude and opinion measurement (Stevens 1975). It varies from standard elicitation of grammaticality judgements in several ways. The first innovation is that subjects are asked to provide purely relative judgements: at no point is an absolute criterion of grammaticality applied. Also, all judgements are proportional; ie subjects are asked to state how much better or worse sentence A is than sentence B. This process is started by the presentation of a reference item, relative to which all subsequent judgements are made. This reference item remains visible throughout the experiment, and acts as a sort of anchor for the scale of judgements. Subjects themselves fix the value of the reference item, so that they can use a set of numbers that they feel comfortable with. Furthermore, the scale along which judgements are made is open-ended: subjects can always introduce a score better or worse than all those already used. Lastly, the scale has no minimum division: subjects can always produce an additional intermediate rating. The result is that subjects are able to produce judgements which distinguish the all the differences they perceive with minimal interference from the scale. A major strength of this approach is that the results are numerical and form an interval scale, so that standard statistical tests can be used to analyze the results. Since the design of this experiment is within-subjects, that is, all informants see all conditions and we expect each person to make the differences in judgements that we are interested in, we utilize repeated measures analyses of variance as our main statistical test (Clark 1973). 3.3 Materials and procedure Eighteen basic texts were constructed from which the syntactic conditions could be derived by embedding them in a matrix clause with one of our eight matrix verbs and extracting the subject or object as necessary (eg der Lehrer hat den Schüler ausgeschimpft ‘the teacher has told off the pupil’ ). The lexis in these experimental materials was strictly controlled for length (range 6-9 letters, subjects'mean 7.4 letters, objects'mean 7.6 letters) and lemma frequency (range 1-120 per million, subjects mean 44.2, objects mean 40.9).3 All arguments were preceded by definite articles. The verbs in the embedded clauses were all chosen as having a negative connotation (eg beleidigen ‘insult’ , verachten ‘despise’ , bestechen ‘bribe’ ) so as to prevent semantic mismatch with the matrix verb fürchten ‘fear’ . Otherwise, sentences such as Who do you fear that the doctor has cured? might well be scored badly because of their implausibility. The matrix subject was always du ‘you’ , as this seems most natural in questions. The homogeneity and plausibility of the materials were informally pre-tested in order to minimize variation between items. Eighteen versions of these experimental materials were constructed such that each subject saw each combination of syntactic condition and matrix verb once, and each item appeared equally often. Subjects also saw twelve filler items, so that the total number of sentences each subject saw was 60. Subjects were recruited by flier in Tübingen university and by newsgroup listing. A financial incentive to take part was offered. Twenty-five subjects took part, but two subjects' data was excluded from further analysis because of doubts about its quality.4 3 All frequencies were derived from the CELEX lexical database (release 2) of the Max Planck Institut, Nijmegen). 4 In both cases, the subjects produced data patterns which were precisely the inverse of that of the other subjects. A colleague 9 Participants were asked to supply their names, ages (mean age 30.4, range 20-52), sex (11 females, 12 males), occupations (all except one students or graduates) and dialect backgrounds (10 from the southern areas of Baden-Württemberg and Bavaria, 8 from the centre, 2 from the north, and 3 claiming to speak only standard German). Subjects logged themselves on to the experimental website and participated in the experiment remotely, as the experiment was made available on the web using the WebExp experimental software package (Keller et al 1998, see http://www.language-experiments.org). The experiment proceeded as follows: first subjects read a page of instructions outlining their task. The criterion they were to judge by was defined as whether the sentences ‘sound natural’ . They next filled in a personal details sheet. The first practice phase was designed to familiarize them with magnitude estimation; they were instructed to assign numeric values to line lengths relative to a reference line. This was followed by a second practice phase which extended the use of magnitude estimation to judging sentence acceptability. Only after this did the experiment itself begin. 3.4 Results For graphical presentation the data was first normalized by subtracting the subject' s mean from each score and then dividing it by the subject' s standard deviation (conversion to z-scores). This effectively unifies the different scales that the individual subjects adopted for themselves, and allows us to inspect the results visually. For perspicuity we present the results in two stages. Figure 1 presents just the extraction conditions (subject vs object extraction, V2 vs verb-final clause type). The vertical axis shows the grammaticality judgements assigned by our subjects: higher scores indicate greater perceived naturalness, but note that these scores are purely relative: there is no point which indicates absolute (un)grammaticality nor any boundary between grammaticality and ungrammaticality. Along the horizontal axis, the structures are grouped by matrix verb, which are ordered by their ‘bridgeness’ in these extraction conditions. The error bars show the mean normalized grammaticality judgement scores and their 95% confidence intervals. pointed out to us that he had forgotten during a trial of our experiment whether better sentences should be given higher or lower scores, no doubt under the influence of the standard German educational marking system, which runs from 1 (very good) to 6 (very bad). We hypothesize that these two subjects made the same mistake and exclude their data, though this does not materially alter our results. Subjects in subsequent experiments see an on-screen reminder that higher scores should be used for better sentences. 10 Figure 1: Subject and object extractions from V2 and V-final complement clauses, by matrix verb (sagen ‘say’ , glauben ‘think, believe’ , behaupten ’ claim’ , hoffen ‘hope’ , fürchten ‘fear’ , erzählen ‘tell (a story)’ , erklären ‘explain, declare’ , bezweifeln ‘doubt’ ). The first point to note: the subject and object extractions from V2 clauses are identical: subjects plainly do not distinguish between them.5 The equivalent extractions from V-final clauses on the other hand, show a subject-object asymmetry: here object extraction is clearly better than subject extraction. We analyzed these results by subjects and by items using a repeated measures anova procedure. The main effects for clause type (ie V2 clause, V-final clause) (F1=358.4 p<0.001, F2=77.66 p<0.001), and extraction type (ie subject extraction, object extraction) (F1=24.24 p<0.001, F2=24.45 p<0.001), and the interaction of clause type and extraction type (F1=39.38 p<0.001, F2=12.91 p=0.002) are all statistically robust.6 This interaction is the classic that-trace effect, whose existence in German is often doubted (see Featherston (to appear) for discussion). It is clear that extraction from V2 clauses is generally better than from V-final clauses, but that the preference for object extraction is consistent across verbs (no significant interaction of verb and extraction type, nor of verb, clause type and extraction type (both p>0.5). In fact it rather looks as if the difference between English and German is that extraction from clauses with complementizers is more acceptable in English. That is, there is much less difference between (8a) and (8b) than between their German equivalents (7cd) above. (8) a. b. Who do you think the teacher told off? Who do you think that the teacher told off? Thus the dividing line between the grammatical and ungrammatical is drawn between the subject and object extractions from clauses with complementizers in English (hence the that-trace effect) but most commonly between the extractions from V2 and the extractions from V-final clauses in 5 In fact this close correspondence makes this result interesting in itself from a methodological perspective, because it gives some idea of the sensitivity of the methodology and our implementation of it. 6 It will be noted that the tests for the by items analyses produce weaker scores than the by subjects analyses. This may be because some empty cells in the data set of the by items analysis had to be filled with zeros, since the repeated measures procedure requires all cells in the design x lexicalizations grid to have values. 11 German (eg Grewendorf 1988). Both pairs of extraction conditions decline in acceptability as the matrix verbs become less core members of the class of bridge verbs (effect of verb type: F1=22.56 p<0.001, F2=10.71 p<0.001), but the V2 extractions worsen much more obviously than the extractions from V-final clauses (interaction of verb and clause type F1=9.79 p<0.001, F2=4.12 p=0.002). There appears from figure 1 to be a less consistent downward trend in the V-final clause extractions than in the V2 extractions, but this is merely apparent, the result of our ordering the matrix verbs on the basis of their V2 extraction scores, on the basis that this structure reveals ‘bridgeness’ most clearly. The key point here is that the fall in well-formedness in the V-final extractions corresponds very closely to that in the V2 extractions, the smaller visual amplitude being a result of the lower starting point in grammaticality. These V-final extractions are standardly regarded as ungrammatical in most varieties of German and it is unsurprising that we find a reduction in elasticity of judgements among such highly marginal structures which are closer to the lower margins of perceived ungrammaticality. This becomes rather clearer when we look at figure 2. Here we have merged the subject and object extractions in each clause type, and included the data from the polarity questions. This makes it much clearer that the two extraction types are decreasing in grammaticality across the verbs in a very similar manner, with only the amplitude in the fall distinguishing them. The behaviour of the polarity question conditions, by their contrast, makes it clear how very similar the extractions conditions are. Figure 2: Experiment 1 results - the contrast of the extraction and non-extraction conditions in both V2 and V-final clause types. While the extraction conditions reflect the hierarchy of bridge quality among our matrix verbs that one might have expected, the polarity questions behave very differently, and this requires some explanation. On four verbs, behaupten, glauben, hoffen and fürchten, the scores are fairly evenly better than any of the extraction conditions. This is no surprise, since any biclausal extraction may be expected to have some cost in naturalness. On sagen and erklären, both polarity questions dip quite clearly, and on erzählen and bezweifeln the V2 question dips more strongly than the V-final question. This pattern of data bears no resemblance to the extraction conditions and is clearly not caused by just the bridge characteristic. Now the cause of the variation in the polarity question conditions cannot be established beyond doubt, but it seems 12 likely that two factors are interacting. Firstly the V2 polarity questions are influenced by the ability of the matrix verb to license a V2 complement. This probably explains why the V2 questions are sometimes worse than the V-final questions. But the major part is probably due to pragmatic factors. Recall that the matrix verbs all have second person subjects; the polarity questions are therefore questions to a speaker whether he or she is carrying out the speech act of the matrix verb. It is pragmatically fairly natural to ask a person if they think, claim, hope, doubt or fear something, since this is not directly observable, but rather less so to ask if they say, explain, or tell something. We suspect this factor is playing a role in producing these naturalness judgements. While there are complement choice effects in this result, they are badly obscured by what are most likely pragmatic effects. It is thus difficult to make any strong statement on the relationship of the bridge feature and the V2 feature on the basis of this data. In order to clarify this issue, we carried out a second study in which we sought to exclude the pragmatic oddness clouding our first set of results. 4 Experiment 2 4.1 Design In this follow-up experiment we tested the same verbs but examined only four syntactic conditions. Since the data from extraction conditions was so clear in the first experiment and anyway the contrast between subject and object extractions was not crucial to our argument, we included only object extractions from V2 (9a) and V-final clauses (9b), and compared them with 3rd person singular declaratives with V2 (9c) and V-final complement clauses (9d).7 (9) a. b. c. d. Wen sagt er, hat der Lehrer ausgeschimpft? who says he has the teacher scolded ’Who does he say the teacher has scolded?’ Wen sagt er, dass der Lehrer ausgeschimpft hat? who says he that the teacher scolded has Er sagt, der Lehrer hat den Schüler ausgeschimpft. he says the teacher has the pupil scolded ' He says the teacher has scolded the pupil.' Er sagt, dass der Lehrer den Schüler ausgeschimpft hat. he says that the teacher the pupil scolded has There were thus three changes from the materials in experiment 1. First, the non-extraction conditions were declaratives instead of interrogatives. Second, the matrix subject was third person, not second person. These two changes were made in order to avoid the pragmatic oddness which clouded the previous results. One more change was made: the reference sentence relative to which subjects initially make their judgements was different. In all other ways the methodology and procedure were exactly as before. Using a subset of the previous materials ten versions of the experimental materials were prepared. Subjects saw 32 conditions and eight filler items. Thirty subjects were recruited by flier in Tübingen university with the offer of a financial incentive. Two subjects'data was excluded from further analysis because of doubts about its quality.8 Participants were asked to 7 Let us note that we did in fact include some subject extractions from V-final clauses in the filler items to this experiment, so that the mean response of the subjects would change as little as possible. 8 There were two causes for concern in the first excluded subject’s data. First, the subject used a reference value of 100, but gave answers exact to one place of decimals, eg 73.8, 94.6. This is considerably beyond the fineness of distinction that most people feel they can detect. Second, one score was several standard deviations away from all the others, but did not resemble a simple typing error because it too was given to one decimal place. Although the general pattern of scores given was consistent with that 13 supply their names, ages (mean age 25.0, range 20-36), sex (9 females, 19 males), occupations (all except two students or graduates) and dialect backgrounds (14 from south Germany, BadenWürttemberg and Bavaria, 3 from the centre, 5 from the north, and 6 claiming to speak only standard German). 4.2 Results The results of experiment 2 are presented in figure 3. In the extraction conditions which replicate experiment 1 we may note some minor variation: in particular erzählen seems somewhat better than before relative to hoffen and fürchten. This difference is within the normal tolerances for variation, as indicated by the confidence interval error bars, but it is interesting that this second measurement of the hierarchy of bridgeness brings it more in line with the verb frequency data we noted above. More worryingly, the verb sagen performs worse than any other verb in the V-final extraction condition (Wen sagt er, dass der SUBJ geVERBt hat? ‘Who does he say that the SUBJ has VERBed?’ ). This surprisingly low score seems too deviant to dismiss as mere random variation; it is also consistently visible across subjects, and so it must be the result of some experimental bias in experiment 2 absent from experiment 1. This limits the range of possibilities, but the most likely explanation is that it is due to the similarity of this sentence type to the reference item of this experiment (Wer sagst du, hat den Politiker bestochen? ‘Who do you say has bribed the politician?’ ), with which subjects were instructed to compare each experimental sentence. In fact the reference sentence was the subject V2 extraction form of one of the experimental items. It seems probable that the direct comparability of the V-final extraction condition with sagen and the reference item caused subjects to judge this condition more harshly. This account is supported by the relatively good scores subjects gave to the V2 extraction condition with sagen, which may show a similar effect, but resulting in a relative upgrading of this experimental condition. At any rate we may be certain that this is some irrelevant effect, since it is plainly false that sagen is the worst bridge verb of our set, a fact confirmed by experiment 1 and the other conditions with sagen here in experiment 2. of other subjects, this subject’s data was therefore disregarded. The second excluded subject assigned all but two conditions the same score as the reference sentence (and the two exceptions received the same). This suggests that the subject had poorly understood the instructions, but such data is also problematic for the analysis in terms of multiples of standard deviations (zscores) because it causes the little differentiation the subject does make to be assigned excessive relative weight. These exclusions do not change the results in any material way. 14 Figure 3: Experiment 2 - Object extractions and declaratives in both V2 and V-final clause types, presented by matrix verb (sagen ‘say’ , glauben ‘think’ , behaupten ‘claim’ , hoffen ‘hope’ , fürchten ‘fear’ , erzählen tell (a story), erklären ‘explain, declare’ , bezweifeln ‘doubt’ ). The results of this second experiment are otherwise very clear. The V-final declaratives show no effect of the bridge quality of the matrix verb, and are scored consistently high across the top of the graph. The lack of variation between verbs here supports our supposition that pragmatic factors were affecting the scores of the Yes/No question conditions in experiment 1. Here, by using third person subjects and declaratives instead of interrogatives we have been able to remove the violent swings of acceptability of experiment 1, which would tend to confirm our attribution of this variation to plausibility. The very even scores for V-final declaratives also provide a very useful baseline for comparison, which allows us, for example, to exclude the possibility that some of the variation in the other conditions is due to verb preference for a clausal complement. Since this condition shows all of these verbs to be very natural with a clausal complement, every variation from the declarative V-final score can be reliably attributed to the extraction and V2 complement qualities that we are interested in. The extractions from V2 start fairly high but decline quite rapidly along the continuum of bridgeness. The V-final extractions, discounting sagen, start much lower and decline more gently. Apart from the quirky behaviour of sagen, both extraction conditions confirm the results from experiment 1, in that the extractions from V2 and the extractions from V-final correspond closely, differing only in the perceived amplitude of variation. Even the slight tendency for extraction over erzählen to be rather better than in experiment 1 is clearly visible in both extraction types. We therefore see some confirmation of our hypothesis that the bridge quality is affecting these two structures equally, and that the essence of the bridge quality is that of permitting extraction from a complement clause, whether this is a V2 or a V-final clause. This is supported by the statistical analysis: in a repeated measures anova, there are significant main effects for Verb (ie sagen, glauben etc) and Structure (ie wh-extraction or declarative), and significant interactions of Verb and Clause, Structure and Clause, and the three-way interaction Verb, Structure and Clause (all p<0.005).9 9 Verb (F1=16.12, p<0.001; F2=15.9, p<0.001), Structure (F1=539.31, p<0.001; F2=313.55, p<0.001), Verb x Clause (F1=12.04, p<0.001; F2=7.98, p<0.001), Structure x Clause (F1=113.04, p<0.001; F2=186.54, p<0.001), Verb x Structure x Clause (F1=4.77, p<0.001; F2=3.84, p=0.004, 15 Let us now consider the correspondence of the V2 declaratives with the V2 extractions. Again the match is very clear, and the contrast with the V-final structures could hardly be more striking. Both V2 conditions are judged weaker as they descend the bridge continuum, in remarkable parallel, while the V-final declaratives are invariably good and the V-final extractions very poor. The generalization seems to be that the V2 declaratives are consistently a little better than the V2 extractions, which comes as no surprise. Even the variations across the verbs match very closely across the two V2 conditions. After sagen, the next pair of verbs behaupten and glauben are about the same in both conditions, but there is a clear fall off down to hoffen, which however is only a little better than fürchten again in both conditions. The scores of erklären are a little better than fürchten in this second experiment, but there is a clear descent to erzählen, and again more steeply to bezweifeln, in both these two V2 conditions. Even the variation between this experiment and experiment 1 is consistent across these two conditions: in each erzählen is judged a little better than before relative to fürchten. Only sagen and bezweifeln spoil this otherwise exact correspondence. It is possible that the declarative V2 sagen is suffering from the same experimental effect as the extraction from V-final sagen, since it too seems to be worse than the equivalent behaupten score, which is counter-intuitive. The only clear breach of the pattern is on bezweifeln, where the V2 declarative is actually worse than the V2 extraction, but even here their relative positions are well within the normal range of variation. This close correspondence is reflected too in the statistical analysis: the Pearson’s correlation coefficient of the V2 declaratives and V2 extractions is 0.209, which is significant (p=0.002). The only other significant correlation between the structural conditions in this data was a negative correlation between the V2 declaratives and the V-final declaratives of -0.190 (p=0.004). This testifies to the sensitivity of V2 declaratives to the bridge factor, contrasted with the insensitivity of the V-final declaratives to it. All in all therefore, this data provides strong support to the thesis that the lexical feature permitting a V2 complement and the feature permitting extraction are the same. We see no reason to distinguish between them in this data. 5 Weighing up the evidence In figure 4 we combine the extraction and declarative data from the two experiments, which is feasible since both sets are normalized. This allows us to take a look at the broader picture and reduces the degree of random variation in those conditions which were tested in both experiments, since in these conditions each plotted data point is the mean of fifty-one subjects’ scores. The overall picture is clear: we see three groups of structures. The declarative V-finals cross the top of the graph, showing no effect of the verb. The extractions from V-final cross the bottom of the graph, the subject extractions somewhat worse than the object extractions. There is a clear but relatively weak drop-off in the scores from left to right, which is slightly obscured by the rogue sagen score in the object extractions, which is artificially low due to some irrelevant effect, we have argued. The middle group interests us the most. They form a clear group, bunching fairly tightly. The object extractions from V2 form the straightest line, which is to be expected since this condition was tested in both experiments, and so it is formed of more data points. The other two structures, subject extractions from V2 and declarative V2 show a little more variation, but the general trends of all three conditions correspond very closely. This is particulary clear in contrast to the other three conditions, which are not part of this group. It is also very clear when one looks back at figure 2, where the Yes/No questions plainly do not co-vary with the whquestions. 16 Figure 4: The declarative conditions and the combined extraction conditions of both experiments. This strong correspondence supports the analysis according to which the V2 feature is strongly related to with the bridge feature, as has occasionally been argued (and denied) in the literature (eg Grewendorf 1988, Müller & Sternefeld 1995). Such a conclusion is naturally tentative: our sample of verbs and structures is necessarily limited, but this data provides no reason not to associate these two lexical characteristics, on the contrary, the clear covariance of bridge feature and V2 feature provides strong evidence in favour. But this support does not only consist of this covariance; further inspection of these data sets reveals other evidence, which concerns the conclusiveness of apparent counter-examples. In experiment 1 we saw that judgements of relevant sentences can easily be affected by plausibility factors, which means that we must be very careful in selecting data if we wish to identify example structures which disconfirm the association of the V2 and bridge characteristics. Such apparent counter-examples may simply be revealing a difference in plausibility between structures in the bridge environment and structures in the V2 environment. To identify a counter-example it is necessary to exclude plausibility systematically, as can be illustrated in our experimental data. In experiment 1, the V2 and V-final interrogative conditions largely co-vary, which must feed the suspicion that plausibility is the cause of their variation. In experiment 2 on the other hand, the V2 and V-final declaratives behave independently, suggesting that plausibility is playing no role. Now the necessity of controlling for plausibility belongs to the ABC of data collection and must surely be no surprise to anyone. But the pattern of well-formedness judgements we observe provide another, stronger reason to question whether apparent counter-examples actually demonstrate a dissociation of V2 and bridge features. It is namely no counter-evidence to provide an example of a verb which permits a V2 complement but does not seem to permit extraction from a V-final complement; for our data makes it clear that V2 structures are in general better than extractions from V-final, in fact V2 declaratives seem to be a little better than V2 extractions. The claim that the V2 feature is the same as the bridge feature does not therefore predict that any verb which permits the one will also permit the other with the same degree of acceptability, rather it asserts that verbs will occupy the same position in the hierarchy of V2 licensing that they occupy in the bridge hierarchy. Verbs will thus either be good or bad 17 exemplars of both characteristics, relative to other verbs, but the two characteristics need not give rise to identical judgements. In the light of this, it will be clear that examples advanced as counter-evidence on the basis that the V2 structure seems feasible but the bridge structure does not (eg Reis 1994), do not in fact defeat the proposition that the two characteristics are aspects of the same feature. Let us note here an additional argument which Reis has raised against the association of V2 and bridge features. She suggests that data from certain preference predicates displays a very different pattern to that found in the verbs we have looked at, so much so that it undermines the case for the V2 and bridge features being mere aspects of the same characteristic. Let us examine some of these cases in the four relevant structures: V2 complement (10), V-final complement (11), extraction from V2 (12), extraction from V-final (13) (analyses and judgements from Reis 1994; Reis has more examples, but these are representative). While both simple declarative structure types are fine or at least feasible (10, 11), there is a mismatch between the extraction conditions. Extraction from a V2 complement (12) is quite impossible, but extraction from a V-final complement clause (13), although poor, is not completely excluded. These extractions are topicalizations rather than wh-elements, but this should not alter the case in any relevant way. Why should these structures behave so very differently if they are both long extractions? (10) (11) (12) (13) a. Mir wäre lieber, Hans würde die Papiere vernichten. to.me would.be preferable Hans would the papers destroy ' It would be preferable to me if Hans destroyed the papers.' b. Vernünftig wäre, du gingest morgen zu einem Spezialisten. sensible would.be you went tomorrow to a specialist ' It would be sensible for you to go to a specialist tomorrow.' c. Ich würde vorziehen, du sagtest mir endlich über diesen Vorfall die Wahrheit. I would prefer you told me at.last about this incident the truth ' I would prefer it if you at last told me the truth about this incident.' a. Mir wäre lieber, dass Hans die Papiere vernichten würde. to.me would.be preferable that Hans the papers destroy would b. Vernünftig wäre, dass du morgen zu einem Spezialisten gingest. sensible would.be that you tomorrow to a specialist went c. Ich würde vorziehen,dass du mir endlich über diesen Vorfall die Wahrheit sagtest. I would prefer that you me at.last about this incident the truth told a. *Die Papiere wäre mir lieber, würde Hans vernichten. the papers would.be to.me preferable would Hans destroy b. *Zu einem Spezialisten wäre vernünftig, gingest du morgen. to a specialist would.be sensible went you tomorrow c. *Über diesen Vorfall würde ich vorziehen, sagtest du mir endlich die Wahrheit. about this incident would I prefer told you me at.last the truth a. ?? Die Papiere wäre mir lieber, dass Hans vernichten würde. the papers would.be to.me preferable that Hans destroy would b. ?? Zu einem Spezialisten wäre vernünftig, dass du morgen gingest. to a specialist would.be sensible that you tomorrow went c. ?? Über diesen Vorfall würde ich vorziehen, dass du mir endlich die Wahrheit sagtest. about this incident would I prefer that you me at.last the truth told Now this data would indeed be a problem, if it were not that there is another analysis of these structures. It is surely no coincidence that all the matrix predicates Reis cites can take an quasiexpletive correlative pronoun (Mir wäre es lieber, Es wäre vernünftig, Ich würde es vorziehen). 18 Now this pronoun can in some cases be non-overt, but it is nevertheless part of the argument structure of the predicate, and can usually at least optionally be overt (for discussion see Reis 2000, Sandberg 1998). We would therefore argue that the embedded clause is the correlate of this pronoun, essentially standing in apposition to it; the clause is thus not the direct complement of the predicate. It follows that the relationship of this type of clause with its matrix verb is different to the relationship of a complement clause to its matrix verb, even if the two structures are not immediately distinguishable on the surface, which will be the case if the quasi-expletive es is non-overt and the complement clause is extraposed. We see this superficial similarity illustrated in (14) and (15). (14) Wenn du fremdgehen würdest, würde ich (es) vorziehen, dass du mir die Wahrheit sagst. if you had.an.affair would would I it prefer that you me the truth say (15) Wenn du fremdgehen würdest, würde ich (*es) hoffen, dass du mir die Wahrheit sagst. if you have.an.affair would would I it hope that you me the truth say In these examples we add a preceding context so as to permit both matrix verbs to appear felicitously as conditionals. In (14) the complement of the verb is the correlative es, which may optionally be non-overt, and which stands in a relationship similar to apposition to the embedded clause. The superficially similar structure in (15) can take no correlative es, as the embedded clause is the immediate complement of the verb, even though it has been extraposed. We argue therefore that the embedded clause is a complement in neither (10) nor (11), since these structures are similar to (14). If, as we suggest, (12) and (13) are not extractions from complement clauses, then they lose much of their force as counter-examples. Nevertheless, we must admit that this alternative analysis of these structures does not entirely defeat the force of Reis' s argument, for it is nevertheless true that the extraction examples in (12) are much worse than those in (13), which our analysis does not predict (quite the opposite indeed). Even if neither (10) nor (11) have a complement clause but merely an embedded correlative clause, why should extraction in (12) be so much worse than in (13)? One tempting possibility is that it is the role of the proposition as correlate of the subject which impedes extraction, since many correlative pronouns occupy the subject position, and subjects are known islands for extraction. However the (10c-13c) examples do not seem to confirm this: the structure type has a correlative pronoun, but it unambiguously occupies the object position, and the same constraint on extraction in (12c) seems to apply. On the other hand the object correlate (13c) does seem to be better than the subject correlates (13ab). Subject extraction restrictions cannot remove the problem however, since the V-final extractions in (13) are poor, but not impossible, and for our hypothesis that the bridge and V2 features are identical to go through, we must explain why therefore the extractions from V2 embedded clauses are so much worse. An approach to such questions which frequently yields results is to examine the data more closely. A complicating factor in these examples seems to be the adjunct status of many of the topics, and the lack of overt case marking in those which are arguments. Adjuncts and arguments which are not overtly case-marked are very easily parsed as related to the matrix clause, which can cause distortion of judgements. While the judger rationally knows that the topic should be associated with the embedded clause, not the matrix clause, it seems that we are not fully able to prevent the goodness of the construction on the matrix clause association reading affecting our judgements of the other reading. To exclude the possibility that we are unconsciously associating the topic with the matrix clause we must consider examples with overtly case-marked argument extraction, as in (16), where we have additionally standardized the lexis. 19 (16) a. b. c. *Den Brief wäre mir lieber, dass Hans vernichten würde. the letter would.be to.me preferable that Hans destroy would *Den Brief wäre vernünftig, dass Hans vernichten würde. the letter would.be sensible that Hans destroy would *Den Brief würde ich vorziehen, dass Hans vernichten würde. the letter would I prefer that Hans destroy would Once we have standardized the data and excluded the possibility of a matrix association of the topicalized element, two things become apparent. First, the apparent difference between the subject correlates (16ab) and the object correlate (16c) disappears: these extractions are all judged about equally bad, which would counter the suggestion that the grammatical function of the correlative es plays a role. Second, these examples are unambiguously ungrammatical, in the same way that their standardized and disambiguated equivalents in (17) are. (17) a. b. c. *Den Brief wäre mir lieber, würde Hans vernichten. the letter would.be to.me preferable would Hans destroy *Den Brief wäre vernünftig, würde Hans vernichten. the letter would.be sensible would Hans destroy *Den Brief würde ich vorziehen, würde Hans vernichten. the letter would I prefer would Hans destroy The intermediate judgements reported by Reis in (13) seem therefore to be largely a product of the ambiguity of attachment of the specific examples with adjunct or non-overtly case-marked topics. Let us admit, however, that while the examples in (16) are worse than those in (13) and must be regarded as no part of the language, they still seem not quite as bad as those in (17). We have no immediate explanation for this, but the explanation might be related to the certainty of analysis which the inclusion of a complementizer provides. A complementizer, by definition, introduces a clausal complement, which at least lets the parser know what structure it is dealing with even if this is not normally licit. In the absence of a complementizer, many different structures are possible, which leaves the parser in a more confused state. This is of course somewhat speculative and it is clear that further work will be necessary to clarify which factors are having what effects here (cf Reis 1996, Kiziak 2004). We conclude that these predicates which take correlative pronouns can not take clausal complements themselves, but do permit V2 or V-final embedded clauses to be correlated with their pronominal objects. They do not allow extraction from any clause associated with their correlative pronominal objects, which can be readily understood as a sub-case of the more general prohibition of extraction from a clause under an NP complement. Since V2 clauses and V-final clauses behave in a parallel manner in these structures, this state of affairs is quite compatible with the identity of bridge and V2 features. 6 Concluding remarks Our results throw up a number of interesting points. Firstly, our data underlines that the bridge feature is a continuum and not a categorial distinction: there is no absolute group of bridge verbs, only better and worse ones. Now few linguists would doubt this point (eg Reis 1994, Erteshik 1973) but equally few seem to follow its implications through consistently. Instead they continue to apply an essentially binary model of grammaticality to the data, though admittedly generally allowing intermediate positions. In the case of bridge and V2 verbs, this abstraction is plainly obscuring the picture. The hypothesis of the identity of V2 and bridge features does not make the prediction that if one of these structures is possible, the other one will be too, rather it 20 predicts that any verb will occupy the same position in the hierarchies of V2 licensing and bridge licensing, since these are essentially the same. This insight requires us to accept a gradient notion of grammaticality, at least for the structures in question here; since any attempt to divide the structures we tested into ‘grammatical’ and ‘ungrammatical’ groups would plainly be arbitrary. The best we might manage would be to distinguish a group of structures which are good enough for them to be used, and another group, which are so bad that they never occur in practice. Note that this does not in itself prejudice a structural account of syntactic phenomena, since the controlled elicitation of introspective judgements of well-formedness from multiple subjects always produces such a pattern. Given the freedom to do so, informants expressing judgements make far finer distinctions than standard assumptions about well-formedness would suggest, and they show little sign of finding absolute endpoints which might reflect categorical (un)grammaticality. If syntactic theory is to make full use of this rich data source it must come to terms with the inadequacy of the binary grammatical/ungrammatical abstraction of well-formedness which is generally assumed (see Featherston 2004 for more discussion of this). The next conclusion we might draw from this data set would be that extraction from V2 complement clauses is a better criterial structure for the judgement of the bridge qualities of a particular verb than extraction from a V-final clause. The reason for this is simple: this structure shows the same hierarchy but reveals greater amplitude of difference. This too requires that we suspend the assumption of binarity and think about grammatical acceptability as a continuum. The probable reason why extraction from V-final has been generally thought of the criterial structure is that it more readily produces outputs which are so bad that they are thought of as excluded from the language, while extractions just become ‘less good’ , a finding which has little meaning outside of a model of gradient grammaticality. Third, our results provide some suggestive evidence on the factors which predict bridge quality. In our set of eight verbs matched into related pairs, the pairs show some degree of common behaviour. Sagen and behaupten are both good, while the attitudinal hoffen and fürchten are both in the mid-range. Fürchten and bezweifeln, which both have a negative aspect, are both poorer than their positive pairs. Erzählen and erklären are both poor. There is also evidence of other factors: of the four polysyllabic stemmed verbs, three are in the lower half of the bridge and V2 scale. Intuitively it does seem to hold that the better scoring verbs are more semantically light than the weaker scoring ones; they are also more frequent. This data would therefore tend to support the suggestion that it is some combination of the factors of semantic lightness, shortness, and frequency which determines bridge and V2 quality. The causal relations between these factors, however, our data cannot illuminate. Last, we see no reason to distinguish between the grammars of north and south German speakers in this data, contrary to the general assumption that only southern German dialects permit extraction from V-final clauses (eg Grewendorf 1988; ch7). Figure 5 below shows the mean scores across verbs of the four extraction conditions and the two declaratives, by the regional background of the subjects. We define the ‘South’ as Baden-Württemberg and Bavaria (n=24), the ‘North’ as Hamburg, Bremen, Berlin, Schleswig-Holstein, Niedersachsen, Brandenburg and Mecklenburg-Vorpommern (n=7), the ‘Centre’ as all parts in between (n=11). Note that the ‘No dialect’ class (n=10) includes both those explicitly claiming no dialect influence as well as those giving an inappropriate response (almost always ‘German’ ). There is no visible systematic difference between the subjects from the different linguistic regions. Northern speakers make exactly the same distinctions that southern speakers do, and they permit extractions from V-final clauses to the same extent. This was reflected too in the statistical analysis: the factor Region was wholly insignificant (F=0.285 p=0.836). 21 Figure 5: The extraction and declarative conditions (of both experiments combined) distinguishing the dialect background of subjects. This finding leaves a question mark over the much discussed differences between northern and southern speakers of German in their judgements of these long extraction structures. Why do we fail to detect any sign of this well-known and recognized phenomenon? One approach to this puzzle would be to locate the contrast in the data: the conventional grammaticality judgement technique asks for a binary judgement whether a sentence is grammatical or ungrammatical, not how good it is relative to other structures. Since our data makes it clear that all speakers make the same relative judgements, it seems likely that the attested dialectal differences are due to them drawing the rather arbitrary line of absolute ungrammaticality in different places: southern speakers draw it rather lower on the scale and thus include more of the marginal structures within the ‘grammatical’ group, northern speakers draw it rather higher and tend to exclude them. But the grammars of both groups appear to be identical: what perhaps varies is the judgement of what counts as full ungrammaticality, and this seems likely to at least partly driven by frequency of occurrence. While the relative judgements of naturalness that we have employed in our studies largely reflect the computational complexity of a structure, the intuition of absolute grammaticality or ungrammaticality includes this information but is one step further removed. We hypothesize that the factor which drives the intuition of absolute grammaticality is our judgement whether a structure is not only good but also (potentially) frequent; the intuition of absolute ungrammaticality depends upon the feeling that a structure is so bad that it would never be (deliberately) used (see Featherston 2004 for discussion). On this account therefore, the difference in north and south German in extraction is independent of their grammars - which are identical as our data confirms - and merely reflects that northerners simply do not use and therefore do not hear this construction so often. Given a conventional grammaticality judgement task, they therefore tend to group these structures with the ‘ungrammaticals’ , rather than with the ‘grammaticals’ as southerners do. A training study by Fanselow et al (1999) who exposed north Germans to such structures and found increased use of them thereafter would offer some tentative support to the exposure-based suggestion. 7 References 22 Bard, Ellen, Dan Robertson & Antonella Sorace (1996). Magnitude estimation of linguistic acceptability. Language 72: 32-68. Bech, Gunnar (1955). Studien über das deutsche Verbum infinitum, Band 1. Kopenhagen. (New identical edition Tübingen: Niemeyer, 1983). Cowart, Wayne (1997). Experimental Syntax: Applying Objective Methods to Sentence Judgements. Thousand Oaks, California: Sage. Cattell, Ray (1978). On the source of interrogative adverbs. Language 54: 61-77. Clark, Herbert (1973). The language-as-fixed-effect fallacy: A critique of language statistics in psychological research. Journal of Verbal Learning & Verbal Behavior 12: 335-359. Erteschik-Shir, Nomi (1973). On the nature of island constraints. PhD dissertation, MIT. Erteschik-Shir, Nomi & Shalom Lappin (1979). Dominance and the functional explanation of island phenomena. Theoretical Linguistics 6: 41-86. Fanselow, Gisbert, Reinhold Kliegl & Matthias Schlesewsky (1999). ‘Long’ movement in Northern German: A training study. ms, University of Potsdam. Featherston, Samuel (2004). The Decathlon Model: Design features for an empirical syntax. To appear in Linguistic Evidence: Empirical, Theoretical, and Computational Perspectives, Marga Reis & Stephan Kepser (eds.). Tübingen: Narr (available at http://www.sfb441.uni-tuebingen.de/~sam/papers/LingEvid.pdf). Featherston, Samuel (to appear). Universals and grammaticality: Wh-constraints in German and English. Linguistics 43. Grewendorf, Günther (1988). Aspect der Deutschen Syntax. Eine Rektions-Bindungs-Analyse. Tübingen: Narr. Haider, Hubert (1984). Topic, Focus and verb second. Groninger Arbeiten zur Germanistischen Linguistik 22: 47-100. Keller, Frank (2000). Gradience in grammar: Experimental and computational aspects of degrees of grammaticality. PhD Thesis, University of Edinburgh. Keller, Frank, Martin Corley, Steffan Corley, Lars Konieczny & Amalia Todirascu (1998). WebExp: A Java Toolbox for Web-Based Psychological Experiments. (Technical Report HCRC/TR-99.) Human Communication Research Centre, University of Edinburgh. Kiparsky, Paul & Carl Kiparsky (1970). Fact. In Progress in Linguistics, Manfred Bierwisch & Karl Heidolph (eds.), 143-73. The Hague: Mouton. Kiziak, Tanja (2004). Einschub oder Bewegung? Empirische Evidenz zur Parenthese-Hypothese. MA thesis, Eberhard-Karls-Universität Tübingen . Kluender, Robert (1992). Deriving island constraints from principles of predication. In Island Constraints. Theory, Acquisition and Processing, Helen Goodluck & Michael Rochemont (eds.), 223-258. Dordrecht: Kluwer. Meyer, Roland (1999). Studien zu Inselbeschränkungen, vor allem im Russischen: Deklarativsätze und Nominalphrasen. MA thesis, Eberhard-Karls-Universität Tübingen. Müller, Gereon & Wolfgang Sternefeld (1995). Extraction, lexical variation, and the theory of barriers. In Lexical Knowledge in the Organization of the Language. (Current Issues in Linguistic Theory, 114.), Urs Egli, Peter Pause, Christoph Schwarze, Arnim von Stechow & Götz Wienold (eds.), 35-80. Amsterdam: Benjamins. Reis, Marga (1994). Brückeneigenschaften von Matrixsätzen bei langer Extraktion im Deutschen. Manuscript of talk at the Deutsche Gesellschaft für Sprachwissenschaft Conference, Münster, 9-11.3.1994. Reis, Marga (1996). Extractions from verb-second clauses in German? In On Extraction and Extraposition in German. (Linguistik Aktuell/Current Linguistics, 11.), Ulli Lutz & Jürgen Pafel (eds.), 45-88. Amsterdam: Benjamins. Reis, Marga (2000). Was tun mit konfligierenden Daten? Handout of talk at Potsdam-Tübingen Arbeitstreffen zur Datenproblematik, Tübingen 27-28.10.2000. 23 Sandberg, Bengt (1998). Zum es bei Transitiven Verben vor Satzförmigem Akkusativobjekt. Tübingen: Narr. Stevens, Stanley (1975). Psychophysics: Introduction to its Perceptual, Neural and Social Prospects. New York: Wiley.
© Copyright 2026 Paperzz