Chapter 8 - El Camino College

Chapter 12
Section 12.1 – Goodness of Fit
We used the Chi-square distribution to test variance and standard deviation, but we will also use it to
test the independence of two variables…
When one is testing to see whether a frequency distribution fits a specific pattern, the chi-square
goodness of fit test is used.
Example #1
Suppose a market analyst wished to see whether consumers have any preference among five flavors of a
new soda. The following data was obtained
Cherry
Strawberry Orange
Lime
Grape
32
28
14
10
16
If there was NO preference, one would expect each flavor to be selected with equal frequency.
This is referred to as the ______________________distribution
OBSERVED
Cherry
Strawberry
Orange
Lime
Grape
32
28
16
14
10
EXPECTED
The observed vs. Expected will almost always be different …. But the question is ..
”Are these differences significant? Or are they due to chance?”
Let’s start …
 Step 1 – State the hypothesis
Remember that the Null hypothesis should state that there’s NO difference or No change.
H 0 : Consumers show no preference for flavors of the fruit soda.
H 1 : Consumers show a preference
1

Step 2 – find the CV using the  2 Distribution and df = 5-1 =4 ( number of categories - 1)

Step 3 – Compute the test value using the Goodness of Fit formula
2

Observed  Expected 
Test _ Value  
Expected
Cherry
Strawberry Orange
Lime
Grape
OBSERVED
32
28
16
14
10
EXPECTED
20
20
20
20
20
Using the FORMULA …

Step 4 – Make a decision

Summarize
Using Commands in
CALCULATOR
2
Example #2
Randomly selected deaths from car crashes were obtained, and the results are included in the table
below. Use a 0.05 significance level to test the claim that car crash fatalities occur with equal frequency
on the different days of the week.
Day of the week
Frequency (observed)
Sunday
132
Monday
98
Tuesday
95
Wednesday
98
Thursday
105
Friday
133
Saturday
158
EXPECTED
3
Example #3
A researcher has developed a theoretical model for predicting eye color. After examining a random
sample of parents, she predicts the eye color of the first child. The table below lists the eye color of
offspring. Based on her theory, she predicted that 87% of the offspring would have brow eyes, 8%
would have blue eyes and 5% would have green eyes. Use significance level of 0.05 to test the claim that
the actual frequencies correspond to her predicted distribution.
Frequency
Brown eyes
Blue eyes
Green Eyes
132
17
0
4
12.2 – Contingency Tables and Association
Definitions
1) Contingency table relates tow categories of data.
2) Marginal distribution is a frequency or relative frequency distribution of either the row or
column variable.
3) Conditional distribution lists the relative frequency for each CELL.
Example #1
Suppose a sociologist wishes to see whether the number of years a college person has competed is
related to her or his place of residence. A sample of 88 people is selected and classified as shown.
Location
No College
4 Year Degree Advanced Degree
Urban
15
12
8
Suburban
8
15
9
Rural
6
8
7
Total
TOTAL
a) Construct a frequency marginal distribution.
b) Construct a relative frequency marginal distribution
5
12.3 – Test for Independence and Homogeneity
a)  2 test for independence is used to determine whether there is an association between row
variables and column variables in contingency tables.
1. NULL hypothesis = Variables are NOT associated … ie, variables are independent
2. Alternate hypothesis = Variables are dependent.
Example #1
Suppose a sociologist wishes to see whether the number of years a college person has competed is
related to her or his place of residence. A sample of 88 people is selected and classified as shown.
At significance level of 0.05, can the sociologist conclude that a person’s location is dependent on the
number of years of college?
Location
No College
4 Year Degree
Advanced Degree
Urban
15
12
8
Suburban
8
15
9
Rural
6
8
7
Total
TOTAL

Step 1 – state the hypothesis
H 0 : A person’s place of residence is independent of the number of years of college completed
H 1 : A person’s place of residence is dependent of the number of years of college completed

Step 2 – Find the CV where df = (rows-1)(colums -1) = (3-1)(3-1) = (2)(2) =4
6

Step 3 – Find the Expected value of each cell using the formula below… and write it in contingency
table.
Expected =

row _ sumcolumn _ sum
GRAND_ TOTAL
After you have all EXPECTED VALUES, compute the test value using


Observed  Expected 2
Expected
Make Decision and summarize
Example #1
A random sample of 90 adults is classified according to gender and the number of hours they
watch television during a week.
Hours/Week
Male
Female
More than 25 hours
15
29
Less than 25 hours
27
19
Use a 0.01 significance level to test that the time spent watching television and the gender of a
viewer are not related.
7
Example #2
If you’re using a graphing calculator, please write the function you’re using. Don’t forget to include
the equation I gave in class. Follow the format we went over in class!!
Suppose a study of speeding violations and drivers who use car phones produced the following fictional
data:
Car phone user
Not a car phone user
Total
Speeding violation in No speeding violation in the
the last year
last year
25
280
305
45
405
450
70
685
755
Total
a) Compute the MARGINAL Frequency distributions.
b) P(person is a car phone user) =
c) P(person had no violation in the last year) =
d) P(person had no violation in the last year AND was a car phone user) =
e) P(person is a car phone user OR person had no violation in the last year) =
h) P(person is a car phone user GIVEN person had a violation in the last year) =
i) P(person had no violation last year GIVEN person was not a car phone user) =
j) Is using a “car phone” independent of receiving a speeding violation? Use the 5-Step
hypothesis test to answer this question.
8