2/22 Conference of the International Association for World Englishes (IAWE) Aims World Englishes and World's Languages December 3-5 3-5, 2008 City University of Hong Kong Aims Sentence complexity and clause linking in African academic writing: a critical empirical p comparison p compare discourse features of African Englishes discuss key concepts of complexity and cohesion present a pilot study of new methodologies of analysis a top top--down/text down/text--based automatic texttext-analysis of statistical complexity variables using ComplexAna a bottom bottom--up/item up/item--based human analysis of different adverb adverb-types Josef Schmied Chair English Language & Linguistics Chemnitz University of Technology www www.tu www.tutu-chemnitz.de/phil/english/schmied tuchemnitz de/phil/english/schmied [email protected] [email protected] 3/22 Databases/Corpora Concepts & Methods Comparison:ICE Comparison:Nordic Outlook IAWE08 Hong Kong SAR, China Aims Databases/Corpora Concepts & Methods use an old (ICE(ICE-EA) and a new corpus (NORDIC Journal) and compare results test standard hypotheses on differences between Kenyan and Tanzanian English and discuss East African English in a larger African perspective 4/22 Comparison:ICE Comparison:Nordic Outlook IAWE08 IAWE08 1. Databases/Corpora p International Corpus of English – East Africa (1990--96): (1990 96) a stratified corpus of English as a Second Language: 500 texttext-types of 2000 words each Nordic Journal of African Studies: English as an Academic lingua franca mainly for African scholars 5/22 Aims Databases/Corpora Concepts & Methods 6/22 Comparison:ICE Comparison:Nordic Outlook IAWE08 IAWE08 Appendix 6: List of written texts from Tanzania (word count) PRINTED Informational: Learned Humanities Social Science Natural Science Technology/Agriculture/Environmental dev. W2A001T – W2A010T W2A011T – W2A020T W2A021T– W2A021T – W2A027T W2A031T W2A031T– – W2A040T Informational: Popular Humanities Social Science Natural Science Technology/Agriculture/Small Industry General 20.172 20.151 20 114 20.114 20.148 80.585 W2B001T – W2B010T W2B011T – W2B020T W2B021T – W2B24T W2B031T – W2B040T W2BGEN1T - W2BGEN8T Informational: Reportage Splash Reportage/Features 20 133 20.133 20.223 6.542 20.065 13 789 13.789 80.752 W2C001T - W2C0010T W2C011T - W2C020T 20.018 20 139 20.139 40.157 total total total Instructional administrative/regulatory Persuasive Institutional Personal Column total W2D001T - W2D010T 20.120 W2E001T – W2E010T W2E011T – W2E020T 20.078 20.125 40.203 7/22 8/22 Aims IAWE08 Databases/Corpora Concepts & Methods Comparison:ICE Comparison:Nordic Outlook IAWE08 2. Concepts & Methods automatic corpus processing: POS tagging using Penn Treebank + treetagger ComplexAna: p : Complexity p y Analyser y ComplexAna morphosyntactic (type/token) and semantic ((unknown words in WordNet) WordNet) with flexible parameter weight human corpus analysis: domain--specific adjuncts: linking, modal, evaluative, domain searches h with ith AntConc A tC f 181 adverbs for d b 9/22 Aims Databases/Corpora Concepts & Methods 10/22 10 /22 Comparison:ICE Comparison:Nordic Outlook Aims IAWE08 IAWE08 Databases/Corpora Concepts & Methods Comparison:ICE Comparison:Nordic Outlook 2.1. Automatic analysis with ComplexAna Table: Modal adjuncts in Huddleston/Pullum 2006:768 strong i assuredly certainly clearly definitely incontestably indubitably ineluctably inescapably manifestly necessarily obviously patently plainly surely truly unarguably unavoidably undeniably undoubtedly unquestionably apparently y doubtless evidently y presumably y seemingly gy iii arguably likely probably iv conceivably maybe perhaps ii • • • • • possibly weak • calculates single score of semantic complexity Evaluative adjuncts in Huddleston/Pullum 2006:771 11/22 11 /22 Aims IAWE08 Databases/Corpora Concepts & Methods tag accordingly v3 accurately y v8 actually v4 additionally link admittedly v4 alternatively link analytically s apparently m9 artificially s astonishingly m9 asymmetrically s asymptotically s automatically s autonomously s basically y v2 briefly v2 carefully v9 BW1A BW2L CM1L 12/22 12 /22 Comparison:ICE Comparison:Nordic Outlook IAWE08 2.2. Human analysis with AntConc Adverb CM2H ET1C ET2H POS-tags texts counts types/tokes identifies nominal items processes stoplist(s) searches WordNet for nominals GH1L KE1L KE2E 13/22 13 /22 Aims IAWE08 Databases/Corpora Concepts & Methods Aims IAWE08 3. Comparison: Ke and Tz in ICEICE-EA Databases/Corpora Concepts & Methods Complexity ICEICE-KE > ICE ICE--TZ Clause linking by adjuncts ICE ICE--KE > ICEICE-TZ informational learned Σ ldhumK ldhumT ldnatsK ldnatsT ldsocsK ldsocsT ldtechsKldtechsT KE 22305 22378 22210 22264 22154 22678 22098 22386 88767 19831 19785 19484 19496 19694 19727 19788 19751 78797 89 119 85 109 103 98 87 111 364 22 23 19 24 24 23 22 23 87.215 5672 6039 6513 6252 6062 6251 6363 6569 24610 Number of nouns considered (not in stoplist) Nouns known to WordNet ((%)) Nouns unknown to WordNet (%) Nouns not in frequency list (%) Maximum length of a considered noun Mean length of a considered noun Number of commas Max. number of commas/sentence Max. degree of noun specification Degree of Semantic Specialization of text Degree of Semantic Difficulty including a few surprises Comparison:ICE Comparison:Nordic Outlook 3.1. Complexity of informational learned texts in ICE ICE--EA COMPLEXITY variables Number of Tokens Number of Words Maximum number of words in a sentence Mean number of words in a sentence Number of nouns in text Hypotheses: yp 14/22 14 /22 Comparison:ICE Comparison:Nordic Outlook 1941 82.12 17.88 59.2 37 7.84 803 8 16 8.27 21.95 1739 85.68 14.32 58.77 38 7.82 781 7 16 8.17 21.47 2137 83.11 16.89 65.28 26 7.62 841 11 18 8.35 23.56 1899 83.78 16.22 61.98 32 7.77 815 7 16 8.33 22.31 1797 89.43 10.57 55.48 28 7.86 878 12 16 8.36 20.42 1706 88.45 11.55 54.34 37 8.01 1054 16 15 8.31 20.07 1746 88.2 11.8 55.73 58 7.78 793 10 16 8.18 20.82 1793 90.57 9.43 53.88 39 7.85 874 12 16 8.31 20.07 7621 342.86 57.14 235.69 149 31.1 3315 41 66 33.162 86.746 Ø TZ KE TZ 89706 22191.75 22426.5 78759 19699.25 19689.75 437 91 109.25 93.001 21.803787 23.250357 25111 6152.5 6277.75 7137 1905.25 1784.25 348.48 85.715 87.12 51.52 14.285 12.88 228.97 58.9225 57.2425 146 37.25 36.5 31.45 7.7750675 7.8624543 3524 828 75 828.75 881 42 10.25 10.5 63 16.5 15.75 33.122 8.2906088 8.2805363 83.923 21.686461 20.980642 hypo supported: KE more complex 15/22 15 /22 Aims Databases/Corpora Concepts & Methods 16/22 16 /22 Comparison:ICE Comparison:Nordic Outlook IAWE08 IAWE08 Complexity of informational popular texts in ICEICE-EA 3.2.Clause linkers spoken MonoScri ICE--EA spoken ICE k Di l Dialog DiaPub KE KE TZ 109 73 22 1 131 74 111 57 242 131 Σm1 1 0 1 0 2 0 2 0 4 0 Σm3 6 2 0 0 6 2 2 6 8 8 Σm6 13 3 0 1 13 4 8 9 21 13 link COMPLEXITY variables Number of Tokens Number of Words Maximum number of words in a sentence Mean number of words in a sentence Number of nouns in text Number of nouns considered (not in stoplist) Nouns known to WordNet (%) Nouns unknown to WordNet (%) Nouns not in frequency list (%) Maximum length of a considered noun Mean length of a considered noun Number of commas Max. number of commas/sentence Max. degree of noun specification Degree of Semantic Specialization of text Degree of Semantic Difficulty informational-popular Σ ppgenT pphumK pphumT ppnatsK ppnatsT ppsocsK ppsocsT pptechK pptechT KE 15156 22382 22965 22217 7246 22249 22163 21958 22037 88806 13487 19713 19784 19608 6390 19640 19487 19665 19429 78626 82 84 96 85 66 80 110 84 72 333 23 22 21 20 23 21 24 22 23 84.673 4172 5927 6176 5714 2129 5885 6211 6192 6945 23718 1619 2510 2729 1986 982 2184 1807 1929 1943 8609 85.48 78.92 75.45 87.71 84.42 86.26 82.01 87.92 79.26 340.81 14.52 21.08 24.55 12.29 15.58 13.74 17.99 12.08 20.74 59.19 55.4 63.98 65.52 61.08 60.49 58.7 59.21 58.74 61.81 242.5 53 42 43 33 24 31 33 33 41 139 7.49 7.16 7.13 7.33 7.40 7.54 7.34 7.43 7.51 29.462 523 986 1183 829 299 959 806 782 877 3556 9 8 13 11 8 9 10 11 12 39 16 15 15 17 17 17 17 17 18 66 8.40 8.44 8.56 8.36 8.37 8.44 8.39 8.37 8.42 33.606 21.06 23.05 23.76 22.05 22.17 21.73 22.26 21.56 23.43 88.389 Ø TZ KE TZ 89567 22202 17913 78577 19657 15715 426 83.25 85.2 114.28 21.17 22.86 25633 5929.5 5126.6 9080 2152.25 1816 406.62 85.20 81.32 93.38 14.7975 18.676 302.43 60.63 60.49 194 34.75 38.8 36.87 7.37 7.37 3688 889 737.6 52 9.75 10.4 83 16.5 16.6 42.14 8.40 8.43 112.69 22.10 v4: 215 actually 22.54 v7: 268 only hypo doubted: h d bt d TZ more complex! l ! less professional journalists? hypo h po supported: s ppo ted more clause linkers in KE DiaPriv TZ KE TZ KE TZ KE TZ Σm7 1 Σm9 49 18 16 0 65 18 31 41 96 59 modal 70 23 17 1 87 24 43 56 130 80 spec 2 21 14 27 25 0 0 0 1 0 0 0 1 0 17 12 4 11 48 Σv1 81 32 11 0 92 32 79 45 171 77 Σv2 52 13 8 3 60 16 13 11 73 27 Σv3 45 24 13 2 58 26 29 24 87 50 Σv4 216 23 50 5 266 28 20 56 286 84 Σv55 73 38 13 3 86 41 62 38 148 79 Σv6 9 4 0 0 9 4 17 6 26 10 Σv7 335 120 78 14 413 134 142 183 555 317 Σv8 29 20 4 3 33 23 34 42 67 65 Σv9 66 32 6 7 72 39 29 22 101 eval total 61 906 306 183 37 1089 343 425 427 1514 770 1102 414 226 41 1328 455 606 551 1934 1006 17/22 17 /22 Aims IAWE08 Databases/Corpora Concepts & Methods Aims written NonPrinted KE TZ more linkers!! KE TZ TZ KE InfReport TZ KE Creative Persuasiv TZ KE TZ KE TZ KE KE TZ TZ 37 157 95 88 59 66 15 21 20 21 21 17 224 213 261 370 0 0 4 7 3 1 1 0 2 3 2 0 13 11 13 11 1 7 3 7 4 2 0 0 2 0 3 0 12 9 13 16 9 8 14 43 20 7 3 9 9 7 0 0 52 66 61 0 1 1 2 0 0 0 0 2 2 0 1 3 5 3 6 17 27 19 26 18 24 12 10 27 35 12 14 96 109 113 136 modal 27 43 41 85 45 34 16 19 42 47 17 15 176 200 203 243 specific 14 13 17 18 9 8 1 6 4 12 2 4 39 48 53 61 26 48 52 58 58 72 34 24 28 47 42 36 220 237 246 285 6 8 17 12 7 10 2 3 2 4 6 2 35 31 41 39 5 46 28 31 18 26 7 16 17 9 4 0 84 82 89 128 28 26 19 10 15 15 3 3 12 10 4 6 54 44 82 70 32 72 65 56 68 56 21 15 22 28 12 19 201 174 233 246 3 11 5 4 2 4 0 2 0 2 8 3 16 15 19 26 136 204 89 115 120 123 80 70 53 90 79 89 428 487 564 691 link Σm1 Σm3 Σm6 Σm7 Σm9 Σv11 Σv2 Σv3 Σv4 Σv5 Σv6 Σv7 Σv8 Σv9 evaluative total 27 29 32 38 42 29 Comparison:ICE Comparison:Nordic Outlook 4. Comparison: Ke and Tz in NordicJ Printed InfPop Databases/Corpora Concepts & Methods IAWE08 Clause Cl li linkers k ICE-EA written ICEitt InfLearned 18/22 18 /22 Comparison:ICE Comparison:Nordic Outlook 15 17 18 22 13 14 130 120 157 74 149 18 24 23 29 24 19 3 3 28 21 9 15 94 87 112 111 281 468 330 353 354 354 165 153 180 233 177 184 1262 1277 1543 1745 359 681 483 544 467 462 197 199 246 313 217 220 1701 1738 2060 2419 Hypotheses: yp Complexity ICEICE-KE > ICE ICE--TZ Clause linking by adjuncts ICE ICE--KE > ICEICE-TZ including a few surprises 19/22 19 /22 Aims IAWE08 Databases/Corpora Concepts & Methods Aims 4.1. Complexity of KE/TZ articles in NordicJournal Corpus ComplexAna Scores ComplexAna Scores KE01h KE02h TZ01h TZ02h CM all CM all UK01h 10279 6000 7882 4449 8355.3 5709 7552 Types 8650 5092 6690 3792 7063.6 4780 6372 92 96 86 69 118 75 118.75 128 101 Mean words per sentence 15.99 21.67 18.23 19.35 21.25 20.43 20.49 nouns 3099 1732 1822 1350 2524.4 1958 2262.43 nouns considered nouns considered 1062 616 730 576 969 81 969.81 1958 987 76 987.76 nouns known to WordNet (%) 68.64 83.93 86.03 72.22 78.56 70.84 Databases/Corpora Concepts & Methods Comparison:ICE Comparison:Nordic Outlook IAWE08 4.2. Clause Linkers Conjuncts in the NordicJournal Corpus mean21 Tokens Max words per sentence Max. words per sentence 20/22 20 /22 Comparison:ICE Comparison:Nordic Outlook ClauseLink KE01h KE02h TZ01h TZ02h CMall16 UK01h mean22 19.1 Conj ncts Conjuncts but 24 8 66 30 15 6 although 12 6 4 2 4 1 3.5 78.36 while 48 4 4 6 5 7 13.8 14.2 nouns unknown to WordNet (%) 31.36 16.07 13.97 27.78 21.44 29.16 21.64 if 12 2 20 8 7 5 nouns not in frequency list (%) nouns not in frequency list (%) 64 97 64.97 52 11 52.11 52 74 52.74 63 54 63.54 62 07 62.07 66 70 66.70 60 12 60.12 whether 8 4 4 2 1 6 42 4.2 63 77 18 24 27 21 33.17 because 4 4 50 10 7 7 11.8 Mean length noun 6.71 7.54 7.07 7.13 7.18 5.64 6.95 commas 375 219 437 173 397 06 397.06 208 360 28 360.28 7 9 8 19 15.688 9 13.99 18 30 10 14 14 15 15 14.875 14 14.99 126 58 158 Max. length noun Max. commas in a sentence Max. degree sem. specialization of a noun S Sem. specialization of the text i li i f h 8 28 8.28 8 45 8.45 8 24 8.24 8 09 8.09 8 26 8.26 8 34 8.34 8 24 8.24 Degree of Semantic Difficulty 24.13 20.24 19.77 23.31 22.44 23.86 22.16 in order to since sum conjuncts 2 58 4.0 5 4 7.6 43 36 67.8 21/22 21 /22 Aims IAWE08 Databases/Corpora Concepts & Methods Aims KE01h KE02h TZ01h TZ02h Databases/Corpora Concepts & Methods Comparison:ICE Comparison:Nordic Outlook IAWE08 Adjuncts in the NordicJournal Corpus ClauseLink 22/22 22 /22 Comparison:ICE Comparison:Nordic Outlook CMall16 UK01h mean22 5. Outlook Aduncts firstly secondly 4 4 4 on the one hand on the h other h h hand d finally lastly also furthermore however 6 60 38 10 68 22 14 16 2 10 moreover y similarly 3 1 1 1 1 1 2.4 4.0 14 2 7 7 9 3 2 nevertheless though 4 yet anyway otherwise accordingly 2 2 1.5 1.3 1.0 15 1.5 4.0 2.0 1 4 2.0 4.7 4 2 2 2 2 2 3.0 20 2.0 5.8 2.6 2 3 6 4 41 1 1 21 2.4 12.9 12 9 6.6 66.7 84 57 134.5 consequently therefore h f thus sum adjuncts 2 38 188 22 4 90 40 8 8 54 sum conjuncts+adjuncts 314 148 198 112 1 practical: expand the data base to achieve significance: NORDICJournal and ICEweb compare varieties i ti in i ICE, ICE incl. i l diachronic di h i and d texttext t t-type t variation use similar methodology for Specialised and Popular Academic English (SPACE) Corpus 27.0 2.0 11.8 2 8 1 2 theoretical: sequence of ESL author preferences in New Englishes Englishes,, e e.g e.g. g. cohesion from f om fo formal mal e explicit plicit to semantic implicit reader adaptation where possible and necessary
© Copyright 2025 Paperzz