Applied Linguistics Advance Access published January 17, 2011 Applied Linguistics: 1–25 doi:10.1093/applin/amq053 ! Oxford University Press 2011 A Multidimensional Analysis of a Written L2 Spanish Corpus *YULY ASENCIÓN-DELANEY and JOSEPH COLLENTINE Northern Arizona University, Modern Languages, Box 6004, Flagstaff, AZ 86011, USA *E-mail: [email protected] INTRODUCTION Second language acquisition (SLA) research views learning not only as knowing individual lexical and grammatical features of a second language (L2) but also how these features work together to generate types of discourse (e.g. narratives, descriptions). In discussing certain mismatches between linguistic theories, psycholinguistics, and corpus-based investigations, Ellis (2002) comments that there has been traditionally a ‘blind faith’ in the (psycholinguistic) reality of structural categories (e.g. noun, verb). Collentine and AsenciónDelaney (2010) use corpus-based techniques to show that, statistically speaking, seemingly disparate lexical and grammatical L2 structures (e.g. verbs and adjectives) develop in tandem and work together to affect different types of discourse. Indeed, as Ellis et al. (2008) note, there is evidence that ‘human production grammar must store probabilistic relations between words’ (p. 377). Furthermore, Ellis et al. (2008) postulate that an important difference between native and non-native speakers is that ‘native speakers have extracted underlying co-occurrence information’. Downloaded from applij.oxfordjournals.org by guest on January 29, 2011 The present study adds to our understanding of how learners employ lexical and grammatical phenomena to communicate in writing in different types of interlanguage discourse. A multidimensional (factor) analysis of a corpus of L2 Spanish writing (202,241 words) generated by second- and third-year, university-level learners was performed. The analysis uncovered four significant clusters that can be considered distinct discourse types with two main stylistic variations: narrative (characterized by verbal features) and expository (characterized by nominal features). Results also provide examples of the multiple ways that stylistic sophistication and linguistic complexity occur in the L2. Although the Spanish learners’ discourse did not show signs of syntactic complexity (e.g. frequent use of relative clauses, subordinate clauses, use of clitics), the frequent use of nominal features affects informational density due to the presence of numerous derivational morphemes. Inflectional complexity in the form of marked forms was not predominant in the data set. Still, the learners’ verbal inflections did vary, which is a sign of L2 development (Howard 2002, 2006; Collentine 2004; Marsden and David 2008). 2 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS Developing L2 discursive abilities Canale and Swain’s (1980) and Bachman’s (1990) theories of L2 communicative competence afford discourse/textual competence a central role in SLA. The American Council on the Teaching of Foreign Languages (ACTFL 2001) proficiency scales assume a developmental progression, with learners developing conversational, then descriptive, then narratives abilities and finally more complex abilities. In general, the study of discursive abilities entails how learners obtain the ability to produce multipropositional segments whose semantic and structural interrelationships (e.g. gender/number agreement) cohere (Givón 1985). This research has considered how pragmatic abilities affect L2 discursive abilities, the assignment of discursive roles to grammatical constructs, and learners’ use of discourse markers. Pragmatic research concerned with the discursive context studies how we structure information across sentences, treating issues such as how syntactic structures affect coherence, topic/focus, and propositional presupposition (Horn and Ward 2006: xiii–xix). SLA pragmatic research, as Kasper (2001) notes, examines how learners organize discourse and how they employ grammatical structures to produce multipropositional messages (e.g. marking cause-effect chains with conditional if structures). The acquisition of what may be termed ‘grammatical-pragmatic’ abilities entails not only lexical and grammatical phenomena’s denotations but also their connotations as managed by readers/listeners to interpret multisentential messages. Research suggests that L2 discursive abilities develop as learners attain both grammatical and pragmatic abilities. In uninstructed learning contexts, functional necessities (e.g. distinguishing between topic and comment) influence which lexical and grammatical tools learners grammaticalize (Skiba and Dittmar 1992). Geyer (2007) presents data suggesting that once Japanese learners develop ‘several (grammatical-pragmatic) devices’ (p. 362) they can Downloaded from applij.oxfordjournals.org by guest on January 29, 2011 Researchers have powerful analytical tools for studying how lexical and grammatical features generate discourse types (Biber and Conrad 2001). Still, Myles (2005) and Myles and Mitchell (2005) argue that SLA research has been slow to embrace corpus-based technologies even though corpus linguistics is effective at helping theoretical linguistics to understand the nature of discourse types. Myles and Mitchell (2005) also note that corpus research can complement and increase the generalizability of SLA research by examining large amounts of data. We present the first-known multidimensional analysis of a written Spanish learner corpus examining how learners combine lexical and grammatical phenomena to generate different discourse types. Before presenting our research questions, we provide an overview of research about the acquisition of discursive abilities, how corpora have been used in SLA to understand the mapping of form to function, and the usefulness of the multidimensional corpus analysis. Y. ASENCIÓN-DELANEY AND J. COLLENTINE 3 SLA corpus research on the development of structural and lexical phenomena Corpus-based L2 research has helped us to understand how learners map form to function.2 It also elucidates the role of L2 formulaic segments. It has provided important insights into the interaction of grammar and the lexicon. What is consistent within this research is that L2 lexical phenomena and grammatical constructs work in tandem. Finally, corpus-based L2 research has explored ways to efficiently measure L2 syntactic complexity. Regarding the mapping of form to function, Klein and Perdue (1997) used a learner corpus to posit that uninstructed settings generate a ‘basic variety’, where word order and grammatical constructs result from functional communicative necessities. Granger et al. (2002) and Belz (2004) use corpus techniques to document how social and institutional pressures affect new L2 phenomena (e.g. da-compounds such as dazu ‘there-to’, davon ‘there-from’, darin ‘there-in’). Downloaded from applij.oxfordjournals.org by guest on January 29, 2011 chain utterances to achieve discursive organization. Geyer reports that advanced learners use coordinate and adverbial conjunctions to make distinctions, such as foreground/background and cohesive propositional segmenting. The research on the acquisition of Spanish’s two copulas ser and estar shows how learners associate lexical and grammatical features with particular discourse types. Cheng et al. (2008), examining a corpus of Mandarin Chinese L1 learners of Spanish, report that exploratory writing led to more copula estar + adjective usage. They also noted that this structure was mostly associated with narratives and descriptions. Collentine and Asención-Delaney (2010), in their corpus-based study of the Spanish copula ser + adjective and estar + adjective, report that overall, when learners begin mastering simple discourse types, lexical, syntactic, and morphological diversity increases and complexity increases.1 They found that ser + adjective appears in descriptive and evaluative discourse where lexical and morphological diversity and complexity occurs. However, estar + adjective is present in narrations, descriptions, and hypothetical discourse where, nonetheless, surprisingly little linguistic complexity occurs. Other discourse related research has studied L2 discourse markers. Upton and Connor (2001) use corpus techniques to compare formulaics in ‘application letters’ by L1 speakers of (American) English and L2 learners of English from Belgian and Finland. The L2 writers were much less formulaic than the L1 writers with politeness strategies, due to L1 influence. Forsberg (2005) uses an L2 French learner corpus from L1 Swedish speakers to study different types of idioms (e.g. grammatical, pragmatic, purely lexical idioms). Forsberg found that lexical idioms increase in writing over time. She also found that formulaics with a discursive role were ‘overused’ by learners of all levels. 4 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS Downloaded from applij.oxfordjournals.org by guest on January 29, 2011 Some investigators conceptualize formulaic segments (e.g. in other words, for example) as singular dictionary entries (cf. Ellis 2006). Using the British National Corpus (BNC Consortium 2001), Ellis et al. (2008) found that L2 learners at early stages of acquisition recognize formulaics based on their frequency in the input. Interestingly, however, Ellis et al. (2008) argue that native-like use of formulaics ultimately requires learners to alter their processing of words ‘for coherence: for co-occurrence greater than chance’ (p. 391). Regarding the interaction between morphology and the lexicon, Marsden and David (2008) study vocabulary development among L2 learners of Spanish at ages 9 and 13. The 13-year-old learners, who had 450 more hours of classroom instruction, showed greater lexical and inflectional diversity. Their analysis supports the notion that, in inflectionally rich languages like Spanish, an important indicator of development is not only accuracy, as measured by grammatical errors, but also increased inflectional/derivational variation (Howard 2002, 2006; Collentine 2009). Their research also indicates that a key factor in differentiating levels of development is changes in the part of speech (e.g. nouns, verbs, adjectives) that predominate in production. Marsden and David (2008) present evidence suggesting that as learners progress, in speech they produce more verbs than nouns, and that as they begin to produce more verbs, they also start to produce more adjectives. Since learners produce different discourse types as they progress, it is logical to surmise that these changes in lexical and grammatical production parallel changes in the discourse learners’ produce. Two research projects have used corpus techniques to study the emergence of linguistic complexity and semantic density. Operationalizing ‘complexity’ and ‘density’ in corpus-based studies is challenging but important. Ortega (2000) uses a corpus of intermediate-level L2-Spanish learners’ writing to identify reliable measures of syntactic complexity. Her analysis indicated that clause length, phrasal elaboration, and amount of subordination best predict syntactic complexity. Collentine (2004) employs corpus techniques to compare morphological and lexical complexity in an in-class learning context and a study-abroad context. After a semester, the study-abroad learners employed more morphological narrative complexity by using past-tense verbs, thirdperson morphology, past participles, and present participles (as well as public verbs; e.g. decir que ‘to say that’). From a learner’s perspective, morphological complexity in Spanish requires the use of a range of aspectual, person, number, and gender inflections beyond simple verb tenses (e.g. the present) and other unmarked morphemes (e.g. masculine-singular nouns/adjectives; cf. Collentine 2004: 237). The in-class group was more lexically complex, producing a higher concentration of nominal features—nouns and adjectives—and so semantically dense discourse (Biber 1988). Finally, Grant and Ginther (2000) urge corpus-based SLA researchers to combine qualitative with quantitative techniques to increase generalizability of results. Many approaches (e.g. part-of-speech tagging) do not generally factor in errors and small corpora bias from sampling particular tasks. While Y. ASENCIÓN-DELANEY AND J. COLLENTINE 5 we do not report error rates or omissions in this study, we are confident that our multidimensional approach and its mixed method research design (i.e. quantitative + qualitative analyses), together with the size of our corpus, address Grant and Ginther’s (2000) concerns. The multidimensional analysis and relevant corpus linguistic techniques Downloaded from applij.oxfordjournals.org by guest on January 29, 2011 Multidimensional corpus analysis shows how lexical and grammatical features bundle together to produce different and new types of discourse (Biber and Conrad 2001). It combines technological tools, exploratory factor analysis, and qualitative analysis of texts to determine which lexical and grammatical features reliably co-occur in a large corpus. Using a large collection of texts coupled with powerful statistical tools introduces fewer reliability threats than small-scale studies, where participants’ particular characteristics or raters’ judgement heavily influence results. Biber et al. (2006) provide the first multidimensional analysis of nativespeaker Spanish. They analyze a 20-million-word Spanish corpus (4,049 texts) with written and oral data representing 19 registers (e.g. face-to-face conversation, business letters). The analysis uncovered both well-known discourse types like narratives as well as new ones. For example, the following characterize ‘informationally rich’ discourse in Spanish: nouns, adjectives, definite articles, prepositions, derived nouns, type-token ratio, long words (i.e. multisyllabic words), and ergative se constructions. Unique to Spanish is also hypothetical discourse, containing concentration of structures such as the conditional and the subjunctive as well as future verb forms, verbs of obligation and causation (e.g. dejar, permitir, hacer + infinitive) and the conjunction que. Parodi (2007) used a 2.5-million-word corpus of native-speaker Spanish to study the differences between written and spoken Spanish. His analysis complements Biber et al. (2006) in that it reveals how lexical and grammatical features cluster together, based on whether the register is context dependent (i.e. the interpretation of important features depends on the ‘speech situation’), written academic, commissive in nature (in the pragmatic sense), attitudinal, or informational in focus. Additionally, Parodi provides a more refined definition (although based on many fewer tokens) of what constitutes Spanish narrative discourse than Biber et al. (2006); interestingly, Parodi’s analysis indicates that English and Spanish share many of the same narrative features. We present the first-known multidimensional analysis of L2 Spanish, studying how learners use various lexical and grammatical phenomena to generate IL-specific discourse types. We do not make a priori assumptions about which of these phenomena work in tandem nor do we assume that learner discourse types are the same as those of native-speakers. Our analysis serves as a first 6 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS step towards characterizing learner discourse at the second and third years of Spanish L2 instruction. Research questions We characterize Spanish learner discourse via a multidimensional analysis of a corpus of L2 Spanish generated by second- and third-year, universitylevel learners. Specifically, the study addresses the following research questions: METHOD Corpus description We used a 202,241-word corpus of written Spanish, comprising edited and non-edited compositions collected from English-speaking Spanish learners at the second-year (109,224 words) and third-year (93,017 words), levels in which more variety of texts could be collected due to more exposure to language instruction. To estimate the written proficiency of each level of instruction based on the ACTFL Writing Proficiency Scale (ACTFL 2001), the researchers selected a random sample of 50 entire documents from each of the two instructional levels (N = 100). After a training session on working with the scale, the researchers rated the samples independently. Each level on the scale was assigned a numerical value, with 0 representing a novice low and 9 a superior rating. We estimated the inter-rater reliability of the subjects’ proficiency ratings with a Pearson correlation since the datasets represented interval scales [r(df = 98) 0.97; p < 0.01]. The second-year learners wrote at the intermediate high level and the third-year learners at the advanced low level. This suggests that while the third-year learners generally narrated and produced a limited number of cohesive devices as well as a variety of complex syntactic structures, the second-year learners were beginning to produce narrative structure, although their control of the verbal constructs was still developing. The second-year learners also produced few cohesive devices and limited subordination. The corpus comprises writing samples used for course assessment purposes: letters, narratives, descriptions, summaries, and argumentative essays, both in and out of class as well as on exams. Given the study’s exploratory and descriptive nature, we did not control the type of tasks or topics within the corpus. Topics related to textbook themes (e.g. family, childhood) and cultural readings. Downloaded from applij.oxfordjournals.org by guest on January 29, 2011 1 How do lexical and grammatical phenomena cluster together in L2 Spanish writing? 2 What types of discourses (e.g. narrative, descriptive, hypothetical) surround the concentration of these lexical and grammatical features? Y. ASENCIÓN-DELANEY AND J. COLLENTINE 7 Procedures: tagging, searching and norming Downloaded from applij.oxfordjournals.org by guest on January 29, 2011 Understanding multidimensional analyses requires knowledge of three key concepts: part-of-speech tagging, search pattern, and statistical techniques for eliminating various biases in how tokens are counted. To access information about particular texts, each file includes a header with information about its topic, source type, author biographical information, and purpose (argumentative essay, narrative). To search for morphosyntactic information (e.g. all adjectives, all subjunctive forms), one needs a part-ofspeech tagger: software that annotates every word with information about its major word classes (e.g. adjective, noun, verb, determiner), basic morphological information (e.g. plural, preterit), as well as its lemma (i.e. its unmarked, dictionary root, such as a verb’s infinitive or a noun’s masculine, singular form). Part-of-speech tagging requires a dictionary with lexical and grammatical information. It also requires a pretagged corpus to train the software routines to determine unknown or ambiguous words’ probable tags. We compiled our own dictionary, utilized a training set from samples from the Corpus del español (Biber et al. 2006), and tagged the corpus with n-gram software routines from the Natural Language Tool Kit (NLTK; http://www.nltk.org/). Additionally, we wrote Spanish-specific routines (e.g. clitic sequences, derivational morphemes) to complement the NLTK routines to achieve greater tagging precision. After the corpus is tagged in this way, the investigator must verify the accuracy of the tagging and fix errors through further programming. We studied 78 lexical, grammatical, and lexico-grammatical features.3 The features involved all parts of speech, common morphosyntactic constructs studied by learners, as well as additional constructs studied in Biber et al. (2006). They represented adjectives (e.g. derived, postnominal position), nouns (e.g. derived, feminine), adverbs (e.g. place, time), verb classes (e.g. imperfect aspect, past participle), verb phrases (e.g. communication, knowledge), and certain morphosyntactic features like dependent clauses, noun phrase configurations (e.g. article plus noun), and pronoun usage (e.g. clitic—third person). Textual frequencies often require two mathematical conversions. First, norming transforms a phenomenon’s count to its normed frequency. Since text lengths vary, longer texts inflate certain items’ importance. To offset this text-length bias one scales a phenomenon’s frequency per text, such as per 1,000 words. Second, normalizing eliminates the feature-concentration bias: some phenomena are naturally scarce in a document (e.g. the subjunctive) while others are naturally common (e.g. articles) (cf. Biber and Conrad 2001). Normalizing converts a normed frequency to its z-score value vis-à-vis its normed frequency in each document. Consequently, one can measure the relative presence of two or more linguistic features within any given text. Additionally, one can sum various z-scores to determine how concentrated a set of features occur in any text or group of texts. 8 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS Factor analysis Downloaded from applij.oxfordjournals.org by guest on January 29, 2011 To determine how learners cluster lexical and grammatical features, we employ an exploratory, principal factor analysis (PFA) (Biber and Conrad 2001). PFA identifies dimensions—also referred to as factors—along which the 78 features in Table 1 co-vary statistically. The result is a series of dimensions according to which one could classify the texts of a corpus and the features characterizing each dimension. Frequently, a factor may have two opposing clusters, which is why this technique is often referred to as a multidimensional analysis. Opposing clusters of features occur in complementary distribution within the corpus. The factors differ in terms of how much variation they account for. The dimension accounting for most variance represents the primary, superset factor according to which all texts can be classified. Dimensions that represent less variance are subset factors of the superset. A superset factor may identify the features of formal and informal language, with the remaining factors representing genres within the superset. Each factor has an eigenvalue, which represents its importance relevant to the others. The examination of factors’ eigenvalues in scree plots of each factor’s total variance helps to determine how many factors are important enough to report. In a scree plot, the eigenvalues for each factor typically flatten out at some point, which is an indication of relatively unimportant factors. The measurement of a feature’s importance within a factor is referred to as its loading, which takes the form of a correlation coefficient that represents how much the feature correlates with the cluster identified, varying from 0.00 or no correlation to !1.00 or absolute correlation. A dimension with two significant clusters will have two sets of loadings, a set of positive loadings and another set of negative loadings, differing mathematically in terms of their sign (although the direction of a cluster’s sign is irrelevant). Higher absolute loadings are more useful in the interpretation of a cluster’s communicative function. For a cluster to represent some meaningful discourse type, Biber (1988) recommends that it should have at least five loadings at or above the !0.30 cutoff. What is useful about this approach is that it identifies different discourse types in a corpus, and the loadings reveal the features that most represent those discourse types. Biber et al. (2006), for instance, found six dimensions in oral and written native-speaker Spanish: four had significant clusters of both positive and negative features; two had only one significant cluster. In the first (superset) dimension, the positive features (e.g. nouns, post-modifying adjectives, long words, high-type token ratio) represented semantically dense, literate discourse and the negative features’ (e.g. suasive nominal clauses, second-person pronouns, demonstratives) oral discourse. To obtain a factor analysis that includes the fewest, most representative features in each factor while considering idiosyncrasies of textual data, linguistic PFA results are commonly ‘rotated’ (Biber 1988). This mathematical technique maximizes high loading values and minimizes low values. Following Biber (1988), we rotated the factors using the Promax method. Y. ASENCIÓN-DELANEY AND J. COLLENTINE 9 Table 1: Targeted linguistic features Variable Noun Derived nouns (NOUN_DER) Feminine nouns (NOUN_FEM) Masculine nouns (NOUN_MSC) Non-pluralizing nouns, e.g. virus ‘virus’ (NOUN_NPL) Plural nouns (NOUN_PLR) Singular nouns (NOUN_SNG) Apocopated adjectives, e.g. buen ‘good’ (ADJ_SHRT) Derived adjectives (ADJ_DERV) Feminine adjectives (ADJ_FEM) Four-inflection adjectives (ADJ_TYP1) Masculine adjectives (ADJ_MSC) Plural adjectives (ADJ_PLR) Postmodifying adjectives (ADJ_POST) Premodifying adjectives (ADJ_PRE) Singular adjectives (ADJ_SNG) Two-inflection adjectives (ADJ_TYP2) Third-person clitics (PRON_3RD) Preverb clitics (CLT_PRE) se + 3s verb, e.g. se come ‘one eats’ (SE_VRB3S) Subject pronouns (PRON_SUB) Unplanned se (SE_UNPLANNED) Article—noun segments (ART_NOUN) Definite articles (DEFART) Possessive adjectives (DET_POSS) Possessive syntax (WO_POSS) Present participle modified by noun, e.g. agua hirviendo ‘boiling water’ (PS_NOUN) Third-person verb (VRB_3RD) Conditional (VRB_COND) Copula plus adjective (not in other predicate categories) (COP_ADJ) Evaluative predicates (PRD_EVAL) Future (VRB_FUT) Gustar-like verbs (VRB_GULI) Imperfect (VRB_IMP) Infinitive (VRB_INF) Adjective Pronoun Other noun phrase elements Verbs Downloaded from applij.oxfordjournals.org by guest on January 29, 2011 Variable class 10 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS Variable class Miscellaneous Infinitive not preceded by verb or article (VRB_INVA) Past participle (VRB_PS) Past perfect (VRB_PPRF) Past subjunctive (VRB_PSBJ) Perfect tense verb (VRB_PRFC) Periphrastic future, e.g. voy a comer ‘I’m going to eat’(VRB_PERI) Predicates of conclusions. e.g. es obvio ‘it is obvious’ (PRD_CONC) Predicates of probability (PRD_PROB) Present participle (VRB_PP) Preterit (VRB_PRET) Progressive aspect (VRB_PRGA) Saber/conocer (SAB_CON) Suasive predicates (PRD_SUAS) Suasive verbs (VRB_SUAS) Subjunctive (VRB_SBJ) Verbs of communication, reporting (VRB_COM) Verbs of conclusions (VRB_CONC) Verbs of knowledge; cognitive verbs (VRB_CONO) Verbs of observation (VRB_OBS) Verbs of probability (VRB_PROB) Adverb of time (ADV_TIME) Adverbs of intensity (ADV_INTS) Adverbs of manner (ADV_MAN) Adverbs of place (ADV_PLC) Adverbs of probability (ADV_PROB) Comparatives (COMPARE) Conjunctions followed by finite verb (CONJ_VRB) Irregular comparatives (COMP_IRR) Lexical density ratio (DEN_TOK)4 Long words (LONG)5 Prepositions (PREPS) Que subordinator (QUE) Type token ratio (TYP_TOK) Downloaded from applij.oxfordjournals.org by guest on January 29, 2011 Adverbs Variable Y. ASENCIÓN-DELANEY AND J. COLLENTINE 11 Variable Adverbial clauses Adverbial conjunctions of mode, e.g. lo hice según me dijeron ‘I did it the way they told me’ (ADVCLS_M) Adverbial conjunctions of place, e.g. la casa donde vivo ‘the house where I live’ (ADVCLS_L) Adverbials clauses of time (ADVCLS_T) Causal adverbial conjunctions (ADVCLS_C) If clauses (ADVCLS_S) Purpose adverbial clauses, i.e. para que ‘so that’ (ADVCLS_P) Relative pronouns with pre-posed preposition (REL_EN) Relative pronouns with pre-posed preposition (REL_NO_EN) Relative clauses on subjects, e.g. la casa que está allá ‘the house that is there’(REL_SUBJ) Nominal clauses—non-subjunctive (NOM_IND) Nominal clauses—subjunctive (NOM_SUBJ) Adjective clauses Nominal clauses A factor’s meaning requires interpretation: What linguistic activity does this cluster of features represent? To aid in this interpretive process one sums the z-scores of a cluster’s significant loadings in order to easily identify the most representative texts of a factor or cluster. Then, in the qualitative analysis of our study, we examined the most representative texts using Biber et al. (2006) and other researchers’ (e.g. Longacre 1983) description of Spanish discourse types in order to determine the type of discourse portrayed by the cluster of linguistic features in those texts (see Figure 1 for distribution of words in the corpus by discourse type). RESULTS The following section summarizes the findings for the multidimensional analysis. First, we identify the five dimensions yielded by the statistical analysis. For each dimension, we present the lexical and grammatical features clustering for positive and negative loading sets of the dimension, and their communicative functions. Finally, differences between second and third year in each dimension are addressed. Downloaded from applij.oxfordjournals.org by guest on January 29, 2011 Variable class 12 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS Factors and dimensions The scree-plot analysis of the eigenvalues for the rotated factors of the PFA indicated that a five-factor solution was optimal, representing 20.2% of the shared variance (Table 2). Table 3 shows the feature clusters in the five factors with their loadings meeting the !0.30 cutoff. The following details the findings and our interpretations of the clusters’ discursive function, along with qualitative analysis of representative samples of the factors. Table 2: Eigenvalues of the rotated factor analysis Factors Eigenvalues Percent of variance Cumulative (%) 1 2 3 4 5 4.459 3.691 2.697 2.561 1.789 5.945 4.922 3.596 3.415 2.386 5.945 10.867 14.463 17.877 20.263 Downloaded from applij.oxfordjournals.org by guest on January 29, 2011 Figure 1: Corpus word counts by discourse type VRB_IMP (0.56) VRB_PRGA (0.55) VRB_PRET (0.53) VRB_SBJ (0.42) PRON_3RD (0.35) VRB_CONO (0.35) VRB_PSBJ (0.34) VRB_COND (0.33) CLT_PRE (0.33) SAB_CON (0.30) Negative cluster (loadings) ART_NOUN ("0.65) NOUN_FEM ("0.56) NOUN_DER ("0.53) NOUN_PLR ("0.51) NOUN_MSC ("0.36) ADJ_TYP1 ("0.36) ADJ_POST ("0.35) DEFART ("0.31) NOUN_SNG ("0.31) ADJ_FEM ("0.30) 1 ADJ_DERV (0.71) ADJ_PLR (0.59) ADJ_SNG (0.52) ADJ_TYP1 (0.51) ADJ_MSC (0.47) TYP_TOK (0.44) ADJ_TYP2 (0.43) ADJ_POST (0.41) ADJ_FEM (0.35) LONG (0.30) NOUN_SNG ("0.40) DEN_TOK ("0.38) 2 5 VRB_INVA (0.36) VRB_INF (0.35) 4 PREPS (0.53) VRB_INVA (0.52) VRB_INF (0.49) ADJ_PLR (0.31) VRB_IMP ("0.45) ADJ_SNG ("0.59) DEFART ("0.53) VRB_PRGA ("0.42) ADJ_MSC ("0.44) ADJ_PLR ("0.40) PREPS ("0.30) VRB_3RD (0.63) QUE (0.55) SAB_CON (0.43) VRB_CONO (0.42) REL_NO_EN (0.38) VRB_PROB (0.30) 3 Downloaded from applij.oxfordjournals.org by guest on January 29, 2011 Positive cluster (loadings) Factor Table 3: Summary of factors (i.e. dimensions) and significant clusters Y. ASENCIÓN-DELANEY AND J. COLLENTINE 13 14 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS Factor 1: Narrative vs. expository prose Narrative. Era un dı́a lleno de sol y el aire lleno de aromas diferentes a flores silvestres.. . . Al salir de mi dormitorio, todo parecı́a normal. No habı́a ninguna nube que se pudiera ver en el cielo, entonces después de hacer esta observación continué con mi plan de salir de mi dormitorio. Comencé a correr por los caminitos designados, y como el dı́a todavı́a parecı́a muy bonito decidı́ meterme adentro del bosque. (It was a sunny day and the air filled with aromas of different wildflowers.. . . Upon leaving my room, everything seemed normal. There was no cloud that could be seen in the sky, then after making this observation I continued with my plan to leave my bedroom. I started to run around the designated trails and as the day still looked very nice I decided to get into the woods.) Figure 2 describes the text types that have the highest average summed z-scores for the features in the narrative cluster. Text types with the highest average summed z-score are most representative of the cluster; while those texts with the highest opposing (i.e. the opposing sign) are antithetical to the cluster’s function. Interestingly, not only do narrative texts have high positive scores in dimension 1, but also argumentative essays and summaries. Students frequently approached summaries and argumentation with narrative elements; perhaps to compensate for a lack of more sophisticated abilities, which has shown to be the case for L2 writers of English in expository writing (Hinkel 2004). For instance, one second-year learner wrote an argumentative text about stereotypes of the colonial times and used evidence from a story about a fray in the new land. Argumentative essay. El cuento de fraile Bartolomé y los indı́genas maya es una estereotipo de los tiempos de la conquista de América Central. También las relaciones malas entre las culturas europeas y indı́genas. El cura fue a guatemala para convertir a los indı́genas. El misionero Bartolomé pensaba que su cultura de españa serı́a más avanzado que los maya. En última instancia este pensamiento fue su último. (The story of Brother Bartholomew and the indigenous Maya is a stereotype of the time of the conquest of Central America. Also, the bad relations between indigenous and European cultures. The priest went to Guatemala to convert the Downloaded from applij.oxfordjournals.org by guest on January 29, 2011 Factor 1 contained two significant clusters. The analysis indicates that secondand third-year learners’ written production tends to be either narrative or expository. The lexical and grammatical features in the positive set include eight verbal and two pronominal features, which constitute a narrative discourse. Two of the verbal features with the largest loadings (i.e. imperfect and preterit) are grammatical features used to present events and background descriptions in the past (Biber et al. 2006). Also, third-person pronouns and clitics are used to refer to presupposed participants or story protagonists in the narratives (Longacre 1983). Y. ASENCIÓN-DELANEY AND J. COLLENTINE 15 Indians. The missionary Bartolome thought that his culture from Spain would be more advanced than the Mayan. Ultimately this was his last thought.) In these narratives, the writers are present and involved. The cluster has grammatical variables including verbal features such as subjunctive, past subjunctive, conditional, and progressive aspect, the first two of which represent ‘irrealis’ modality (Biber et al. 2006), where learners describe feelings and attitudes towards possible eventualities. The presence of a lexical feature, such as knowledge verbs, indicates the writer’s perceptions (Weber and Bentivoglio 1991). These features personalize the narratives. Finally, it is also noteworthy that the presence of subjunctive forms in narratives indicates some degree of syntactic complexity in the form of subordination (Parodi 2007). The third-year learners trended towards using more narrative behaviors, whereas the third-year learners averaged a higher summed z-score on this cluster (M = 4.1; SD = 5.9) than the second-year learners (M = 3.0; SD = 5.9); the difference is notable but did not approach significance [F(1, 634) = 3.7; p = 0.06]. Dimension 1’s negative cluster included, exclusively, features associated with noun phrases. Discourse concentrated with nominal features such as nouns, articles, and adjectives is largely expository, where information is condensed into a few words (Biber et al. 2006). Such semantically dense discourse is often achieved through the use of multiple derivational morphemes Downloaded from applij.oxfordjournals.org by guest on January 29, 2011 Figure 2: Concentration of narrative features by text type and averaged summed z-score 16 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS (Biber 1988; Collentine 2004). Interestingly, learners in their second and third years already possess their own strategies for producing ‘literate’ discourse, some of which overlap with native-speaker features (cf. Biber et al. 2006). Figure 3 reports the average summed z-score of the expository features by text type, confirming our interpretation that the negative cluster of dimension 1 represents expository discourse. The expository cluster also occurs in mini-essay questions and descriptions, which the Expository essay section exemplifies. Expository essay. Los Estados Unidos es un paı́s bastante rico y los jóvenes tienen bastantes oportunidades en la vida. No obstante, la mitad de los jóvenes del Estados Unidos son malsanos y más y más tienen los problemas con hiperactividad y aprendizaje. Es obvio que muchos jóvenes están faltando los elementos necesarios del contentamiento y una vida saludable. (The United States is a rather rich country and young people have many opportunities in life. However, half of the youth of the United States are unhealthy and more and more have problems with Downloaded from applij.oxfordjournals.org by guest on January 29, 2011 Figure 3: Concentration of expository features by text type and averaged summed z-scores Y. ASENCIÓN-DELANEY AND J. COLLENTINE 17 hyperactivity and learning. It is clear that many young people are lacking the necessary elements of a happy and healthy life.) The third-year learners produced more nominal discourse features. The third-year learners averaged a higher summed z-score on this cluster (M = 2.4; SD = 4.7) than the second-year learners (M = 0.5; SD = 5.2), with the difference being statistically significant [F(1, 634) = 16.3; p # 0.00]. Factor 2: Descriptive expository prose Descriptive expository essay. El sueño americano se infiltra desde juventud, es evidente por televisión, escuela, la cultura y ejemplos del gobierno y polı́ticos. Esta influencia es subconsciente pero fuerte y se enseña el americano que la única cosa que se necesita hacer es trabaja fielmente y comprar las cosas correctas y eventualmente se recibirá la vida perfecta. Entonces, en la mente americano la definición del suceso es ser rico, ser bonito y joven y tener un montón de las cosas mejores y ricas. [The American dream infiltrates from youth, it is evident on television, school, culture and examples of government and politicians. This influence is subconscious but strong and they teach the American that the only thing that one needs to do is to work faithfully and buy the right things and eventually he will get the perfect life. Then, in the American mind the definition of event (sic: success) is to be rich, be beautiful and young and have a lot of the best and rich things.] Here we see a variety of uses of adjectives, such as nominalized adjectives (e.g. el americano ‘Americans’), descriptor chains (e.g. subconsciente pero fuerte ‘subconscious but strong’), and the copula ser or estar in predicative sentences. Furthermore, this sub-discourse type is probably more challenging to produce since the third-year learners produced more descriptive expository features. Downloaded from applij.oxfordjournals.org by guest on January 29, 2011 The positive cluster of dimension 2—the only significant one—represents a discourse that is both expository and descriptive. Thus, to the extent that this cluster represents a sub-discourse of the expository type identified in dimension 1, it seems that these learners do use lexico-grammatical features (e.g. adjectives with different morphological information) reflecting descriptive elements in expository writing but not as much as narrative elements. This cluster also contains measures of lexical complexity: long words and type/ token ratio. Adjectives add informational density to learners’ written prose. Still, post-nominal adjectives predominate in texts with this cluster. Additionally, at these instructional levels, a low degree of inflectional accuracy (i.e. agreement errors) is common. Texts involving expository prose and description have large average summed z-scores. Figure 4’s similarity to Figure 3 corroborates that this cluster is a specialized expository discourse. The difference becomes more evident upon consideration of the Descriptive expository essay section, which was written by a third-year learner. 18 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS The third-year learners averaged a higher summed z-score on this cluster (M = 2.8; SD = 5.3) than the second-year learners (M = "1.4; SD = 5.0), with the difference being statistically significant [F(1, 634) = 75.4; p # 0.00]. Learners at these levels exhibit some degree of morphological and syntactic sophistication when attempting to use lexico-grammatical features such as adjectives with inflections for number and gender in various morphological and syntactical (i.e. attributive and predicative roles) functions. Also, long words tend to be dense in derivational morphology and so, in content; a high type/token ratio indicates that various adjectives are employed in this discourse. Finally, the inclusion of several adjectival features reflects a slightly more developed syntax (Parodi 2007), since word order is a critical consideration with Spanish adjectives, which can be pre- or post-nominal, and descriptors are often chained together (i.e. X has qualities A, B, and C). The negative cluster of features—singular nouns and the lexical density measure—does not constitute a text type. Instead, it indicates what descriptive expository texts lack. Referential lexical items in the form of nouns are missing as well as a variety of parts of speech. Thus, descriptive expository texts lack overall specificity and contain numerous and varied adjectives. Downloaded from applij.oxfordjournals.org by guest on January 29, 2011 Figure 4: Concentration of descriptive expository prose features by text type and averaged summed z-score Y. ASENCIÓN-DELANEY AND J. COLLENTINE 19 Factor 3: Expository prose with a stance The six positive features of dimension 3 are indicative of discourse where writers are reporting both their own thoughts about the veracity of eventualities as well as others’ perspectives. This cluster includes mostly lexical features that indicate writers’ stance (Biber et al. 2006), or their commitment to the truth of some proposition, taking the form of verbs of knowledge, cognitive verbs, and verbs of probability (cf. Collentine 1995). The use of a grammatical feature such as third-person verb inflections is also said to minimize the writer’s risk of misconstruing the reference in the discourse (Castellano 2000). The cluster also includes complex syntactic features: the subordinating conjunction que and relative pronouns without prepositions (i.e. que, quien, cual), which help the reader to identify references in the discourse. Figure 5’s detailing of the average summed z-scores by text type confirms the expository function of this cluster. Argumentative essay. El autor Mario Benedetti posiblemente quiere explicar la imaginación de los niños. Los niños no piensan lógicamente porque no Downloaded from applij.oxfordjournals.org by guest on January 29, 2011 Figure 5: Concentration of expository prose with stance features by text type and averaged summed z-score 20 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS Factors 4 and 5 The last two factors did not include enough lexical and grammatical features to meet the five-feature threshold above the !0.30 level. However, Factors 4 and 5 share some linguistic features. For both factors, some elements in nominal clauses (i.e. singular and masculine adjectives in Factor 4, and plural adjectives and definite articles in Factor 5) appeared in complementary distribution with infinitive verbs. This is probably evidence to support the notion that nominal and verbal elements appear in complementary distribution in these learners’ writing. DISCUSSION AND CONCLUSIONS The development of learners’ interlanguage can be seen as the acquisition of the ability to create meanings by combining lexical, sentential, pragmatic, and discourse features. This study describes whether and how Spanish learners use lexico-grammatical features to produce interlanguage-specific discourse types. In response to our initial research questions, we found that second- and third-year Spanish L2 learners’ lexical and grammatical features clustered Downloaded from applij.oxfordjournals.org by guest on January 29, 2011 saben la realidad. Estaba la primera vez que Osvaldo mire la televisión. Cuando un niño o niña mira una persona mirando y hablando en su dirección, piensan que están hablando a ellos. Es una que nosotros no pensar, porque nos sabemos la realidad. También, los niños piensan que los personas en el televisor son sus amigos. [The author Mario Benedetti possibly wants to explain the imagination of children. Children do not think logically because they do not know reality. It was the first time Osvaldo watched TV. When a boy or a girl watches a person looking at or talking in their direction, they think they are talking to them. It is one that we do not to think (sic: think about), because we know the reality.] The writer here speculates about an author’s intention and argues for a specific motive by reporting on the thoughts and attitudes of the characters in the story. Dimension 3’s negative cluster only indicates what is missing where stance predominates. The features include lexical and grammatical narrative tools, namely, imperfect verbs, progressive aspect, and prepositions. These features describe the scene or the background context (e.g. time, place, people, and other co-occurring events) in which the main events occur, suggesting perhaps that these learners do not couple stances with narrative elements. Support for this is found in a comparison of the two levels of learners. The data indicate that expository prose with a stance occurs mostly in less proficient learners, which would explain why it is disassociated with certain narrative elements. The second-year learners averaged a higher summed z-score on this cluster (M = 3.9; SD = 4.5) than the third-year learners (M = 2.2; SD = 2.7), with the difference being statistically significant [F(1, 634) = 17.6; p # 0.00]. Y. ASENCIÓN-DELANEY AND J. COLLENTINE 21 Downloaded from applij.oxfordjournals.org by guest on January 29, 2011 reliably into five factors that mainly combined verbal or nominal features. The analysis uncovered four significant clusters that can be considered distinct discourse types. The communicative functions of these discourse types break down into two main stylistic variations: narrative (characterized by verbal features) and expository (characterized by nominal features). Only 20% of the variance in learners’ interlanguage could be explained by the five factors while Biber et al.’s (2006) native-speaker multidimensional analysis accounted for 45% of the targeted corpus’ variance. Biber et al.’s (2006) clusters were more numerous and contained more features. And, although their native-speaker analysis included data from the written and oral modes, whereas this study included data from only the written mode, the L2 clusters possibly had few features because clustering of features of any kind develops slowly in the L2, and the psycholinguistic consolidation of productive associations requires time and practice (Collentine and Asención-Delaney 2010). If so, it should also not be surprising that the discourse types favored by Spanish learners at the second and third year of university-level instruction are novel in comparison with native-speaker clusters in a number of respects. Additionally, the fact that the clusters are relatively homogeneous in terms of the parts of speech they represent (i.e. either verbal or nominal) indicates that these learners depend on a limited repertoire of lexico-grammatical features to achieve their communicative goals. The L2 expository dimension constitutes a concentrated use of nominal features. This discourse type represents a certain discourse complexity since in Biber’s (2001) study of English texts and in Parodi’s (2007) study of Spanish texts, nominal features work together to generate semantically dense discourse that is informationally rich. This is corroborated by the observation that advanced-low third-year learners produced more expository discourse. However, the learner discourse type uncovered here differs from native-speaker expository discourse in that intermediate-high second-year learners produce narrative elements in argumentation, such as arguing for a point with an illustrative story. Future research may reveal whether the narrative elements constitute a compensatory strategy as reported by Hinkel (2004) with English learners and/or a function of the Spanish curriculum, which highly emphasizes verb morphology (i.e. conjugations for preterit and imperfect). The narrative dimension showed that Spanish learners’ narrative discourse is not limited to presenting an account of events in the past, but it also reflects learners’ personal speculations, feelings, and attitudes towards the events. This involvement may be due to task requirements or to beginner–intermediate learners’ tendency to produce the L2 to talk about themselves, which is typical in the Spanish L2 curriculum. Finally, the remaining two discourse types are probably more or less useful to L2 Spanish writers, depending on their level of development. The descriptive expository prose is probably a more sophisticated version of learner expository prose in general, since the advanced-low learners utilized it more. On 22 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS NOTES 1 Morphological diversity occurs when a learner uses a variety rather than a small subset of the language’s inflectional and derivational morphemes (Howard 2002, 2006). Morphological complexity occurs where learners either employ words with multiple derivational morphemes or marked inflectional morphemes (e.g. subjunctive, conditional, future, past participle). 2 Just as digital technologies have altered various classroom practices, so has the ability to amass large corpora of digitized representations of the L2 changed how we design pedagogical grammars (O’Keeffe et al. 2007). Corpus-based data-driven learning strategies are beginning to be incorporated into pedagogical materials (Boulton 2009). 3 A lexical category is one identified by a search expression or measurement (e.g. lexical density ratio) focusing on the lexical properties of a word or segment, which primarily entails checking a word’s lemma. A grammatical category is one whose search expression focuses on the inflectional properties of a word or segment (e.g. person, Downloaded from applij.oxfordjournals.org by guest on January 29, 2011 the other hand, the expository prose with a stance is peculiar in that it contains features associated with stance and subordination, and yet the second-year learners used it more than the third-year learners. It was, nevertheless, shown that this discourse type does not appear intermixed with other important feature sets, as it is disassociated with important narrative features. The second-year learners may only have been able to produce segments with these features and not simultaneously generate other discourse types. Clearly, the emergence of stance and how it is operationalized in L2 writing deserves a more fine-grained analysis in future research. The findings provide examples of the multiple ways that linguistic complexity occurs in the L2. Although Spanish learners’ discourse did not show signs of syntactic complexity (e.g. frequent use of relative clauses, subordinate clauses, use of clitics; cf. Wolfe-Quintero et al. 1998; Ortega 2000), the frequent use of nominal features affects informational density due to the presence of numerous derivational morphemes. Inflectional complexity in the form of marked forms was not predominant in the data set. Still, the learners’ verbal inflections did vary, which is a sign of L2 development occurring (Howard 2002, 2006; Collentine 2004; Marsden and David 2008). Regarding the study’s limitations, the multidimensional analysis would have benefited from the inclusion of spontaneous learner samples (e.g. emails, chat scripts) to understand L2 discourse where there are high online cognitive demands. Additionally, our data are limited to intermediate-high and advanced-low learners of Spanish. Finally, given the nature of the multidimensional analysis that examines what appears in communication, as opposed to what does not, the present analysis does not consider learner errors (Grant and Ginther 2000). Nevertheless, the data and the qualitative analyses provide complementary perspectives on how L2 communication occurs in relatively extended discourse where learners must consider numerous lexical, inflectional, derivational, and syntactic features. Y. ASENCIÓN-DELANEY AND J. COLLENTINE number, tense); it may also entail high-frequency functors whose role is largely syntactic (e.g. si ‘if’, articles, comparative constructions). A lexicogrammatical category is identified by both lexical and grammatical properties, such as feminine nouns, and adverbial conjunctions. 4 Lexical density is derived from the total number of non-functor words 23 (nouns + adjectives + verbs + derived adverbs) divided by the total words in a text. 5 Long words are defined here as any word whose length is greater than the mean word length of the corpus plus the standard deviation word length of the corpus, e.g. mean word length: 4.6 + SD: 4.9 = 10.5–11. American Council on the Teaching of Foreign Languages. 2001. ‘ACTFL Proficiency Guidelines – Writing, available at http://www.actfl.org/i4a/pages/index.cfm? pageid=3326. Accessed 27 September 2010. Bachman, L. 1990. Fundamental Considerations in Language Testing. Oxford University Press. Belz, J. 2004. ‘Learner corpus analysis and the development of foreign language proficiency,’ System 32/4: 577–97. Biber, D. 1988. Variation across Speech and Writing. Cambridge University Press. Biber, D. 2001. ‘On the complexity of discourse complexity: A multidimensional analysis’ in S. Conrad and D. Biber (eds): Variation in English: Multidimensional Studies. Longman, pp. 215–40. Biber, D. and S. Conrad. 2001. ‘Introduction: Multidimensional analysis and the study of register variation’ in S. Conrad and D. Biber (eds): Variation in English: Multidimensional Studies. Longman, pp. 3–13. Biber, D., M. Davies, J. Jones, and N. TracyVentura. 2006. ‘Spoken and written register variation in Spanish: A multi-dimensional analysis,’ Corpora 1: 1–37. Boulton, A. 2009. ‘Testing the limits of data-driven learning: Language proficiency and training,’ ReCALL 21/1: 37–54. British National Consortium. 2001. ‘BNC World [Database], available at http:// www.natcorp.ox.ac.uk/. Accessed 09 July 2009. Canale, M. and M. Swain. 1980. ‘Theoretical bases of communicative approaches to second language teaching and testing,’ Applied Linguistics 1/1: 1–47. Castellano, A. 2000. ‘Ambigüedad y variación del pronombre personal sujeto’ in M. Muñoz, G. Fernández, A. Rodrı́guez, and V. Benı́tez (eds): IV Congreso de Lingüı́stica General. Universidad de Cádiz, pp. 521–31. Cheng, C., H-C. Lu, and P. Giannakouros. 2008. ‘The uses of Spanish copulas by Chinese-speaking learners in a free writing task,’ Bilingualism: Language and Cognition 11/ 3: 301–17. Collentine, J. 1995. ‘The development of complex syntax and mood-selection abilities by intermediate-level learners of Spanish,’ Hispania 78/1: 123–36. Collentine, J. 2004. ‘The effects of learning contexts on morphosyntactic and lexical development,’ Studies in Second Language Acquisition 26/ 2: 227–48. Collentine, J. 2009. ‘Study abroad research: Findings, implications and future directions’ in M. Long and C. Doughty (eds): The Handbook of Language Teaching. WileyBlackwell Press, pp. 218–34. Collentine, J. and Y. Asención-Delaney. 2010. ‘A corpus-based analysis of the discourse functions of Ser/Estar+adjective in three levels of Spanish FL learners,’ Language Learning 60/2: 409–45. Ellis, N. C. 2002. ‘Reflections on frequency effects in language processing,’ Studies in Second Language Acquisition 24/2: 297–339. Ellis, N. C. 2006. ‘Language acquisition as rational contingency learning,’ Applied Linguistics 27/1: 1–24. Ellis, N. C., R. Simpson-Vlach, and C. Maynard. 2008. ‘Formulaic language in native and second language speakers: Psycholinguistics, corpus Downloaded from applij.oxfordjournals.org by guest on January 29, 2011 REFERENCES 24 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS Marsden, E. and A. David. 2008. ‘Vocabulary use during conversation: a cross-sectional study of development from year 9 to year 13 among learners of Spanish and French,’ Language Learning Journal 36/2: 181–98. Myles, F. 2005. ‘Interlanguage corpora and second language acquisition research,’ Second Language Research 21/4: 373–91. Myles, F. and R. Mitchell. 2005. ‘Using information technology to support empirical SLA research,’ Journal of Applied Linguistics 1/2: 169–96. O’Keeffe, A., M. McCarthy, and R. Carter. 2007. From Corpus to Classroom. Cambridge University Press. Ortega, L. 2000. Understanding Syntactic Complexity: The Measurement of Change in the Syntax of Instructed L2 Spanish Learners. Unpublished dissertation, University of Hawaii at Manoa. Parodi, G. 2007. ‘Variation across registers in Spanish: Exploring the El Grial PUCV Corpus’ in G. Parodi (ed.): Working with Spanish Corpora. Continuum, pp. 11–53. Skiba, R. and N. Dittmar. 1992. ‘Pragmatic, semantic, and syntactic constraints and grammaticalization: A longitudinal perspective,’ Studies in Second Language Acquisition 14/3: 323–49. Upton, T. A. and U. Connor. 2001. ‘Using computerized corpus analysis to investigate the textlinguistic discourse moves of a genre,’ English for Specific Purposes 20/4: 313–29. Weber, E. and P. Bentivoglio. 1991. ‘Verbs of cognition in spoken Spanish: A discourse profile’ in S. Fleischman and L. Waugh (eds): Discourse Pragmatics and the Verb: Evidence from Romance. Routledge, pp. 194–213. Wolfe-Quintero, K., S. Inagaki, and S. Kim. 1998. ‘Second language development in writing: measures of fluency, accuracy and complexity’, Technical Report # 17. University of Hawai’i, Second Language Teaching and Curriculum Center. Downloaded from applij.oxfordjournals.org by guest on January 29, 2011 linguistics, and TESOL,’ TESOL Quarterly 42/3: 375–96. Forsberg, F. 2005. ‘Ready-to-speak’: Prefabricated sequences in spoken French as a second language and as a native language. [‘Prêt-à-parler’ séquences préfabriquées en français parlé L2 et L1],’ Cahiers De l’Institut De Linguistique De Louvain 31: 183–95. Geyer, N. 2007. ‘Self-qualification in L2 Japanese: An interface of pragmatic, grammatical, and discourse competences,’ Language Learning 57/3: 337–67. Givón, T. 1985. ‘Function, structure, and language acquisition’ in D. Slobin (ed.): The Cross Linguistic Study of Language Acquisition. Lawrence Erlbaum, pp. 1008–25. Granger, S., J. Hung, and S. Petch-Tyson. 2002. Computer Learner Corpora, Second Language Acquisition and Foreign Language Teaching. John Benjamins. Grant, L. and A. Ginther. 2000. ‘Using computer-tagged linguistic features to describe L2 writing differences,’ Journal of Second Language Writing 9/2: 123–45. Hinkel, E. 2004. ‘Tense, aspect and the passive voice in L1 and L2 academic texts,’ Language Teaching Research 8/1: 5–29. Horn, L. and G. Ward. 2006. The Handbook of Pragmatics. Wiley-Blackwell. Howard, M. 2002. ‘Prototypical and non-prototypical marking in the advanced learner’s aspectuo-temporal system,’ EuroSLA Yearbook 2: 87–114. Howard, M. 2006. ‘Variation in advanced French interlanguage: A comparison of three (socio)linguistic variables,’ Canadian Modern Language Review 62/3: 379–400. Kasper, G. 2001. ‘Four perspectives on L2 pragmatic development,’ Applied Linguistics 22/4: 502–30. Klein, W. and C. Perdue. 1997. ‘The basic variety (Or: Couldn’t natural languages be much simpler?),’ Second Language Research 13/4: 301–47. Longacre, R. 1983. The Grammar of Discourse. Plenum. Y. ASENCIÓN-DELANEY AND J. COLLENTINE 25 NOTES ON CONTRIBUTORS Yuly Asención-Delaney is an Associate Professor of Spanish at Northern Arizona University. Her research interests include corpus linguistics, Spanish second language acquisition, second language writing, and second language assessment. Address for correspondence: Northern Arizona University, Modern Languages, Box 6004, Flagstaff, AZ 86011, USA. <[email protected]> Joseph Collentine is Professor of Spanish and Chair of the Modern Languages Department at Northern Arizona University. His research interests include the study of L2 morphosyntax, the acquisition of mood, corpus linguistics, study abroad, and computer assisted language learning. Downloaded from applij.oxfordjournals.org by guest on January 29, 2011
© Copyright 2026 Paperzz