The one-sample t test • One common statistic for hypothesis testing is the t statistic CHI-SQUARE TEST x̄ − µ t=p s2/N John Fry Boise State University • The t test looks at the mean x̄ and variance s2 of a sample • The null hypothesis is that the sample is drawn from a population with mean µ (that is, we expect x̄ ≈ µ) • If t is high enough, we can reject the null hypothesis and conclude that the sample is not drawn from that population Linguistics 497: Corpus Linguistics, Spring 2011, Boise State University Linguistics 497: Corpus Linguistics, Spring 2011, Boise State University Welch’s two-sample t test Welch’s two-sample t test in R Welch’s two-sample t test compares the means of two samples > t.test(c(72,73,76,76,78),c(67,72,76,76,84)) Welch Two Sample t-test x¯1 − x¯2 t=q 2 s1 s21 N1 + N2 Linguistics 497: Corpus Linguistics, Spring 2011, Boise State University 1 data: c(72, 73, 76, 76, 78) and c(67, 72, 76, 76, 84) t = 0, df = 5.202, p-value = 1 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -7.622409 7.622409 sample estimates: mean of x mean of y 75 75 2 Linguistics 497: Corpus Linguistics, Spring 2011, Boise State University 3 Pearson’s χ2 (‘chi square’) test Differences between t and χ2 tests • The most popular hypothesis test in corpus linguistics is the χ2 (‘chi square’) test • The t-test compares the means of continuous (interval or ratio) variables (e.g., height, weight, rainfall) • The χ2 test compares a set of observed frequencies O with a set of expected frequencies E • The χ2 test is for the observed frequencies of nominal (categorical) variables (e.g., male vs. female) χ2 = • The t test assumes that the population is normally distributed X (O − E)2 E • If the difference between observed and expected frequencies is large, we can reject the null hypothesis of independence H 0 : χ2 = 0 H1 : χ2 > 0 Linguistics 497: Corpus Linguistics, Spring 2011, Boise State University Normality is a reasonable assumption in many empirical sciences, but probably not corpus linguistics (cf. Zipf’s Law) 4 χ2 example: phrasal verbs VPO construction brought back the book Chi-squared test for given probabilities data: c(194, 209) X-squared = 0.5583, df = 1, p-value = 0.4549 Null hypothesis: both constructions are equally frequent • Say we looked in a large corpus and found the VOP pattern (e.g., brought the book back) is more frequent VPO 194 • Running the χ2 test in R > chisq.test(c(194, 209)) • Is one construction more common than the other? Observed 5 χ2 test in R • In phrasal verbs, the object (O) and particle (P) can alternate VOP construction brought the book back Linguistics 497: Corpus Linguistics, Spring 2011, Boise State University VOP 209 • Interpretation: The difference is statistically insignificant (χ2 = 0.56; df = 1; p = 0.455), so we must assume the two constructions are equally frequent in the population for which the sample is representative Are these results statistically significant? Linguistics 497: Corpus Linguistics, Spring 2011, Boise State University 6 Linguistics 497: Corpus Linguistics, Spring 2011, Boise State University 7 χ2 test in R χ2 example: word frequencies in Moby Dick • Note that chisq.test only needs the vector of observed frequencies; it computes the expected frequencies itself • Use str to see the structure of the test result > str(chisq.test(c(194, 209))) List of 8 $ statistic: Named num 0.558 ..- attr(*, "names")= chr "X-squared" $ parameter: Named num 1 ..- attr(*, "names")= chr "df" $ p.value : num 0.455 $ method : chr "Chi-squared test for given probabilities" $ data.name: chr "c(194, 209)" $ observed : num [1:2] 194 209 $ expected : num [1:2] 202 202 $ residuals: num [1:2] -0.528 0.528 - attr(*, "class")= chr "htest" Linguistics 497: Corpus Linguistics, Spring 2011, Boise State University 8 2 × 2 contingency tables Result R Result ¬R Condition ¬C X Z • For such a 2 × 2 contingency table, we compute χ2 as data: table(tokens) X-squared = 33580949, df = 16873, p-value < 2.2e-16 # We reject the null hypothesis that all word types are # equally frequent. Duh! Linguistics 497: Corpus Linguistics, Spring 2011, Boise State University 9 • Step 1: assemble our data into a 2 × 2 contingency table Placebo 25 60 Treatment 35 51 • Step 2: calculate χ2 as follows: χ2 = (25 + 35 + 60 + 51)(25 × 51 − 35 × 60)2 = 2.39 (25 + 35)(25 + 60)(35 + 51)(60 + 51) • Step 3: determine significance (W + X + Y + Z)(W Z − XY )2 (W + X)(W + Y )(X + Z)(Y + Z) Linguistics 497: Corpus Linguistics, Spring 2011, Boise State University Chi-squared test for given probabilities Number of people cured Number of people not cured • The conditions and results are both categorical variables, such as ‘treatment’ vs. ‘placebo’, or ‘male’ vs. ‘female’ χ2 = # Is the observed frequency of each word type in Moby Dick # essentially the same? > chisq.test(table(tokens)) Example: treatment vs. placebo • Another form of the χ2 test is for a 2 × 2 contingency table, containing observed frequencies W , X, Y , and Z Condition C W Y # Read in Moby Dick and tokenize it > moby <- scan(what="c", sep="\n", file="melville-moby_dick.txt") Read 19252 items > moby <- tolower(moby) > words <- unlist(strsplit(moby, "\\W+")) > tokens <- words[words != ""] If χ2 > 3.841, then we can reject the null hypothesis with 95% confidence (p < 0.05) 10 Linguistics 497: Corpus Linguistics, Spring 2011, Boise State University 11 Interpreting χ2 results Treatment vs. placebo data in R • χ2 table for a 2 × 2 contingency table: α χ2 0.5 0.455 0.10 2.706 0.05 3.841 0.02 5.412 0.01 6.635 • We put the data in the form of a 2x2 matrix 0.001 10.827 > chisq.test(matrix(c(25,60,35,51), nrow=2)) Pearson’s Chi-squared test with Yates’ continuity correction data: matrix(c(25, 60, 35, 51), nrow = 2) X-squared = 1.9208, df = 1, p-value = 0.1658 • Without the correction, results match our hand calculation • The p-value is the probability of obtaining a result at least as extreme as a given data point, under the null hypothesis • One rejects the null hypothesis only if p is smaller than or equal to a previously chosen significance level α Linguistics 497: Corpus Linguistics, Spring 2011, Boise State University 12 Interpreting χ2 results 0.5 0.455 0.10 2.706 0.05 3.841 0.02 5.412 0.01 6.635 0.001 10.827 • The standard significance level used in the social sciences is α = 0.05, but corpus linguists often use the stricter α = 0.01 • One quasi-standard way of reporting results: Linguistics 497: Corpus Linguistics, Spring 2011, Boise State University 13 – ‘Light’ NPs include N, Det N, pronouns, names – ‘Heavy’ NPs contain adjectives, PPs, and other modifiers • The null hypothesis is that there is no difference; subject NPs are no lighter or heavier than non-subject NPs • Aarts (1971) Table 4.10 (p. 45): Subject position Non-subject position ‘significant’ ‘very significant’ ‘highly significant’ Linguistics 497: Corpus Linguistics, Spring 2011, Boise State University data: matrix(c(25, 60, 35, 51), nrow = 2) X-squared = 2.3906, df = 1, p-value = 0.1221 • Aarts (1971) (ch. 4) examined the ‘heaviness’ of NPs that occur in Subject position in the SEU corpus • In the placebo example, χ2 = 2.39, which is too small; we cannot reject the null hypothesis p < 0.05 p < 0.01 p < 0.001 Pearson’s Chi-squared test Example: heaviness of subject NPs • χ2 table for a 2 × 2 contingency table: α χ2 > chisq.test(matrix(c(25,60,35,51), nrow=2), correct=F) 14 Light NP 6749 4770 Heavy NP 1160 4331 Linguistics 497: Corpus Linguistics, Spring 2011, Boise State University 15 Example: heaviness of subject NPs Phrasal verbs and concreteness • In phrasal verbs, the object (O) and particle (P) can alternate • Run the χ2 test in R VOP construction brought the book back > chisq.test(matrix(c(6749,1160,4770,4331), nrow=2)) Pearson’s Chi-squared test with Yates’ continuity correction data: matrix(c(6749, 1160, 4770, 4331), nrow = 2) X-squared = 2096.486, df = 1, p-value < 2.2e-16 • The χ2 value for this table is enormous (p < 0.001), which means we can confidently reject the null hypothesis and conclude that subjects are ‘lighter’ VPO construction brought back the book • Gries (2003) looked at whether the object (O) was abstract (like peace) or concrete (like book) Object Abstract Concrete VPO 125 69 VOP 64 145 > chisq.test(matrix(c(125,64,69,145), nrow=2)) Pearson’s Chi-squared test with Yates’ continuity correction data: matrix(c(125, 64, 69, 145), nrow = 2) X-squared = 44.8365, df = 1, p-value = 2.142e-11 Linguistics 497: Corpus Linguistics, Spring 2011, Boise State University 16 Other uses for the χ2 statistic • Finding translation pairs in aligned corpora vache ¬vache cow 59 8 ¬cow 6 570934 Here χ2 = 456400, so we conclude these are translation pairs • As a metric for corpus similarity (Kilgarriff & Rose 1998) word 1 word 2 word 3 ... Corpus 1 60 500 124 Corpus 2 9 76 20 Since the count ratios are similar, we cannot reject the null hypothesis that both corpora are drawn from the same source Linguistics 497: Corpus Linguistics, Spring 2011, Boise State University 18 Linguistics 497: Corpus Linguistics, Spring 2011, Boise State University 17
© Copyright 2026 Paperzz