chi-square test - Boise State University

The one-sample t test
• One common statistic for hypothesis testing is the t statistic
CHI-SQUARE TEST
x̄ − µ
t=p
s2/N
John Fry
Boise State University
• The t test looks at the mean x̄ and variance s2 of a sample
• The null hypothesis is that the sample is drawn from a
population with mean µ (that is, we expect x̄ ≈ µ)
• If t is high enough, we can reject the null hypothesis and
conclude that the sample is not drawn from that population
Linguistics 497: Corpus Linguistics, Spring 2011, Boise State University
Linguistics 497: Corpus Linguistics, Spring 2011, Boise State University
Welch’s two-sample t test
Welch’s two-sample t test in R
Welch’s two-sample t test compares the means of two samples
> t.test(c(72,73,76,76,78),c(67,72,76,76,84))
Welch Two Sample t-test
x¯1 − x¯2
t=q 2
s1
s21
N1 + N2
Linguistics 497: Corpus Linguistics, Spring 2011, Boise State University
1
data: c(72, 73, 76, 76, 78) and c(67, 72, 76, 76, 84)
t = 0, df = 5.202, p-value = 1
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-7.622409 7.622409
sample estimates:
mean of x mean of y
75
75
2
Linguistics 497: Corpus Linguistics, Spring 2011, Boise State University
3
Pearson’s χ2 (‘chi square’) test
Differences between t and χ2 tests
• The most popular hypothesis test in corpus linguistics is the
χ2 (‘chi square’) test
• The t-test compares the means of continuous (interval or
ratio) variables (e.g., height, weight, rainfall)
• The χ2 test compares a set of observed frequencies O with a
set of expected frequencies E
• The χ2 test is for the observed frequencies of nominal
(categorical) variables (e.g., male vs. female)
χ2 =
• The t test assumes that the population is normally distributed
X (O − E)2
E
• If the difference between observed and expected frequencies
is large, we can reject the null hypothesis of independence
H 0 : χ2 = 0
H1 : χ2 > 0
Linguistics 497: Corpus Linguistics, Spring 2011, Boise State University
Normality is a reasonable assumption in many empirical
sciences, but probably not corpus linguistics (cf. Zipf’s Law)
4
χ2 example: phrasal verbs
VPO construction
brought back the book
Chi-squared test for given probabilities
data: c(194, 209)
X-squared = 0.5583, df = 1, p-value = 0.4549
Null hypothesis: both constructions are equally frequent
• Say we looked in a large corpus and found the VOP pattern
(e.g., brought the book back) is more frequent
VPO
194
• Running the χ2 test in R
> chisq.test(c(194, 209))
• Is one construction more common than the other?
Observed
5
χ2 test in R
• In phrasal verbs, the object (O) and particle (P) can alternate
VOP construction
brought the book back
Linguistics 497: Corpus Linguistics, Spring 2011, Boise State University
VOP
209
• Interpretation: The difference is statistically insignificant
(χ2 = 0.56; df = 1; p = 0.455), so we must assume the two
constructions are equally frequent in the population for which
the sample is representative
Are these results statistically significant?
Linguistics 497: Corpus Linguistics, Spring 2011, Boise State University
6
Linguistics 497: Corpus Linguistics, Spring 2011, Boise State University
7
χ2 test in R
χ2 example: word frequencies in Moby Dick
• Note that chisq.test only needs the vector of observed
frequencies; it computes the expected frequencies itself
• Use str to see the structure of the test result
> str(chisq.test(c(194, 209)))
List of 8
$ statistic: Named num 0.558
..- attr(*, "names")= chr "X-squared"
$ parameter: Named num 1
..- attr(*, "names")= chr "df"
$ p.value : num 0.455
$ method
: chr "Chi-squared test for given probabilities"
$ data.name: chr "c(194, 209)"
$ observed : num [1:2] 194 209
$ expected : num [1:2] 202 202
$ residuals: num [1:2] -0.528 0.528
- attr(*, "class")= chr "htest"
Linguistics 497: Corpus Linguistics, Spring 2011, Boise State University
8
2 × 2 contingency tables
Result R
Result ¬R
Condition ¬C
X
Z
• For such a 2 × 2 contingency table, we compute χ2 as
data: table(tokens)
X-squared = 33580949, df = 16873, p-value < 2.2e-16
# We reject the null hypothesis that all word types are
# equally frequent. Duh!
Linguistics 497: Corpus Linguistics, Spring 2011, Boise State University
9
• Step 1: assemble our data into a 2 × 2 contingency table
Placebo
25
60
Treatment
35
51
• Step 2: calculate χ2 as follows:
χ2 =
(25 + 35 + 60 + 51)(25 × 51 − 35 × 60)2
= 2.39
(25 + 35)(25 + 60)(35 + 51)(60 + 51)
• Step 3: determine significance
(W + X + Y + Z)(W Z − XY )2
(W + X)(W + Y )(X + Z)(Y + Z)
Linguistics 497: Corpus Linguistics, Spring 2011, Boise State University
Chi-squared test for given probabilities
Number of people cured
Number of people not cured
• The conditions and results are both categorical variables, such
as ‘treatment’ vs. ‘placebo’, or ‘male’ vs. ‘female’
χ2 =
# Is the observed frequency of each word type in Moby Dick
# essentially the same?
> chisq.test(table(tokens))
Example: treatment vs. placebo
• Another form of the χ2 test is for a 2 × 2 contingency table,
containing observed frequencies W , X, Y , and Z
Condition C
W
Y
# Read in Moby Dick and tokenize it
> moby <- scan(what="c", sep="\n", file="melville-moby_dick.txt")
Read 19252 items
> moby <- tolower(moby)
> words <- unlist(strsplit(moby, "\\W+"))
> tokens <- words[words != ""]
If χ2 > 3.841, then we can reject the null hypothesis with
95% confidence (p < 0.05)
10
Linguistics 497: Corpus Linguistics, Spring 2011, Boise State University
11
Interpreting χ2 results
Treatment vs. placebo data in R
• χ2 table for a 2 × 2 contingency table:
α
χ2
0.5
0.455
0.10
2.706
0.05
3.841
0.02
5.412
0.01
6.635
• We put the data in the form of a 2x2 matrix
0.001
10.827
> chisq.test(matrix(c(25,60,35,51), nrow=2))
Pearson’s Chi-squared test with Yates’ continuity correction
data: matrix(c(25, 60, 35, 51), nrow = 2)
X-squared = 1.9208, df = 1, p-value = 0.1658
• Without the correction, results match our hand calculation
• The p-value is the probability of obtaining a result at least as
extreme as a given data point, under the null hypothesis
• One rejects the null hypothesis only if p is smaller than or
equal to a previously chosen significance level α
Linguistics 497: Corpus Linguistics, Spring 2011, Boise State University
12
Interpreting χ2 results
0.5
0.455
0.10
2.706
0.05
3.841
0.02
5.412
0.01
6.635
0.001
10.827
• The standard significance level used in the social sciences is
α = 0.05, but corpus linguists often use the stricter α = 0.01
• One quasi-standard way of reporting results:
Linguistics 497: Corpus Linguistics, Spring 2011, Boise State University
13
– ‘Light’ NPs include N, Det N, pronouns, names
– ‘Heavy’ NPs contain adjectives, PPs, and other modifiers
• The null hypothesis is that there is no difference; subject NPs
are no lighter or heavier than non-subject NPs
• Aarts (1971) Table 4.10 (p. 45):
Subject position
Non-subject position
‘significant’
‘very significant’
‘highly significant’
Linguistics 497: Corpus Linguistics, Spring 2011, Boise State University
data: matrix(c(25, 60, 35, 51), nrow = 2)
X-squared = 2.3906, df = 1, p-value = 0.1221
• Aarts (1971) (ch. 4) examined the ‘heaviness’ of NPs that
occur in Subject position in the SEU corpus
• In the placebo example, χ2 = 2.39, which is too small; we
cannot reject the null hypothesis
p < 0.05
p < 0.01
p < 0.001
Pearson’s Chi-squared test
Example: heaviness of subject NPs
• χ2 table for a 2 × 2 contingency table:
α
χ2
> chisq.test(matrix(c(25,60,35,51), nrow=2), correct=F)
14
Light NP
6749
4770
Heavy NP
1160
4331
Linguistics 497: Corpus Linguistics, Spring 2011, Boise State University
15
Example: heaviness of subject NPs
Phrasal verbs and concreteness
• In phrasal verbs, the object (O) and particle (P) can alternate
• Run the χ2 test in R
VOP construction
brought the book back
> chisq.test(matrix(c(6749,1160,4770,4331), nrow=2))
Pearson’s Chi-squared test with Yates’ continuity correction
data: matrix(c(6749, 1160, 4770, 4331), nrow = 2)
X-squared = 2096.486, df = 1, p-value < 2.2e-16
• The χ2 value for this table is enormous (p < 0.001), which
means we can confidently reject the null hypothesis and
conclude that subjects are ‘lighter’
VPO construction
brought back the book
• Gries (2003) looked at whether the object (O) was abstract
(like peace) or concrete (like book)
Object
Abstract
Concrete
VPO
125
69
VOP
64
145
> chisq.test(matrix(c(125,64,69,145), nrow=2))
Pearson’s Chi-squared test with Yates’ continuity correction
data: matrix(c(125, 64, 69, 145), nrow = 2)
X-squared = 44.8365, df = 1, p-value = 2.142e-11
Linguistics 497: Corpus Linguistics, Spring 2011, Boise State University
16
Other uses for the χ2 statistic
• Finding translation pairs in aligned corpora
vache
¬vache
cow
59
8
¬cow
6
570934
Here χ2 = 456400, so we conclude these are translation pairs
• As a metric for corpus similarity (Kilgarriff & Rose 1998)
word 1
word 2
word 3
...
Corpus 1
60
500
124
Corpus 2
9
76
20
Since the count ratios are similar, we cannot reject the null
hypothesis that both corpora are drawn from the same source
Linguistics 497: Corpus Linguistics, Spring 2011, Boise State University
18
Linguistics 497: Corpus Linguistics, Spring 2011, Boise State University
17