Confidence interval

Confidence Intervals & Hypothesis
Testing for Proportions
 Know
 We
the symbols and the meanings
can always calculate/know a statistic
 Often we don’t know (and will never know)
the value of the parameter(s); we would
have to take a census... Is that even
possible?

Half of U.S. college graduates agree their
education was worth the cost.

What are some questions you have about these
findings? Discuss with a partner for 2 minutes &
then be prepared to share out. Don’t look
ahead! 

http://www.gallup.com/services/185888/gallup-purdue-index-report2015.aspxg_source=REPORT&g_medium=topic&g_campaign=tiles

Who did Gallup ask? AA grads? BA/BS grads? MA/MS grads?

What was Gallup’s process for selecting the graduates?

What was the question that Gallup asked (how was it worded)?

How many graduates did Gallup ask?

Since they probably didn’t ask EVERY college graduate (do you
agree with this assumption?), Gallup is trying to make a
statement about ALL college graduates based on just a sample of
college graduates.

So do you think the sample results exactly predict the entire
population of US college graduates? Do you think the entire
population who feels this way is exactly 50%? More? Less? Do
you agree that it could be within a range of reasonable values?

The examples have described situations in which
we know the value of the population parameter,
p.

Very unrealistic

The whole point of carrying out a survey (most of
the time) is that we don’t know the value of p,
but we want to estimate it

Think about the elections... Parties are taking a
lot of sample surveys (polls) to see who is
‘leading’
 Took
a random sample of 2,928 adults in the
US and asked them if they believed that
reducing the spread of AIDS and other
infectious diseases was an important policy
goal for the US government.
 1,551responded
 Spiral
‘yes;’ 53%
back for a moment... Random? Large
sample? Big population?
 Random?
Large sample? Big population?
(So, if we wanted to find a probability using
the CLT, we could...)
 These
are the exact conditions we must
check to create a confidence interval as well
 More
on confidence intervals in a few...

The above percentage just tells us about OUR
sample of those specific 2,928 people.

What about another sample? Would we get a
different % of yes’s?

What about the percentage of all adults in the
US who believe this? Do you think the
percentage of all adults in the US who believe
this is exactly 53%? If not, then what is a
reasonable ‘guess range’? Come up to the board
and write your reasonable/likely ‘guess range.’

How about: From 0% to 100%? What do you
think about that reasonable ‘guess range’?

We don’t know p, population parameter; we do
know p
ˆ for this sample; it’s 53%. We also know:

Our estimate (53%) is unbiased; remember
sampling distributions? (maybe not exactly = p;
maybe just a little low or a little high)

Standard error (typical amount of variability) is
pˆ (1  pˆ )
0.53(1  0.53)
about

 0.0092  0.9%

n

2928
Because we have a ‘large sample,’ the
probability
distribution of our p
ˆ s is close to

Normally distributed & centered around the true
population parameter.

True, unknown population parameter probably centered
around 0.53; Normally distributed; standard error (SD;
amount of variability in sample statistic) = 0.009; so ...

About 68% of the data is as close or closer than 1
standard error away from the unknown population
parameter, p

95% of the data is as close or closer than 2 standard
errors away from the unknown population parameter, p

99.7% of the data is as close or closer than 3 standard
errors away from the unknown population parameter, p

True, unknown population parameter probably
centered around 0.53; Normally distributed; standard
error (SD) = 0.009

So we can be highly confident, 99.7% confident, that
the true, unknown population proportion, p, is
between
0.53 + (3)(0.009)
to
0.53 – (3)(0.009)

This is a confidence interval; we are 99.7% confident
that the the interval from about 50.3% to 55.7%
captures the true, unknown population proportion of
Americans who believe that reducing the spread of
AIDS and other infectious diseases is an important
policy goal for the US government.
 Confidence
interval; we are 99.7% confident
that the the interval from about 50.3% to
55.7% captures the true, unknown population
proportion of Americans who believe that
reducing the spread of AIDS and other
infectious diseases is an important policy goal
for the US government.
 How
did we do on our reasonable guesses?
Look on the board...
 True,
unknown population parameter probably
centered around 0.53; Normally distributed;
standard error (SD) = 0.009
 What
if we wanted to construct a confidence
interval in which we are 95% confident? Let’s
try it now. Let’s do the calculations...
 What
about a 90% confidence interval? Can we
do the calculations by hand? Why/why not?
 Let’s
use StatCrunch...

What proportion of us have at least one tattoo? So our
sample statistic, our p
ˆ=

If we were to ask another group of COC students, we would
get another (likely different) p
ˆ?

445 Math 075
 students were asked this last Spring; 133/445 =
0.299 = 29.9% had at least one tattoo

Remember, larger n, generally less variation; but still
value (unbiased estimator)
centered at same

We want to be able to say with a high level of certainty what
proportion of all COC students have at least one tattoo. But
we don’t know the true, unknown population parameter, p.

We don’t know p (population parameter)

ˆ (sample statistic); actually we have 2
We do know p
sample statistics – our class and the Math 075 data

Our estimators are unbiased (what does that mean?)

 check conditions for each of our samples:
Let’s
Random selection; Large sample; Big population

Can we use either sample statistic (either pˆ )? If so,
which should we use?

Calculate our standard deviation (our standard error)


Our distribution is ≈ Normal (because our
conditions are met), centered around p
ˆ ; 68%
with 1 SD; 95% within 2 SDs; 99.7% within 3 SDs

Let’s create a 95% confidence interval ...


We are 95% confident that the interval from
_____ to _____ captures the true, unknown
population parameter, p, the proportion of all
COC students that have at least one tattoo.

This is a confidence interval with a 95%
confidence level

Our distribution is ≈ Normal (because our
conditions are met), centered around p
ˆ ; 68%
with 1 SD; 95% within 2 SDs; 99.7% within 3 SDs

Let’s create a 99.7% confidence interval ...


We are 99.7% confident that the interval from
_____ to _____ captures the true, unknown
population parameter, p, the proportion of all
COC students that have at least one tattoo.

This is a confidence interval with a 99.7%
confidence level

Our distribution is ≈ Normal (because our conditions
ˆ ; 68% with 1 SD; 95%
are met), centered around p
within 2 SDs; 99.7% within 3 SDs

How about a 90% confidence interval or 99%
confidence interval? StatCrunch!


We are xx% confident that the interval from _____ to
_____ captures the true, unknown population
parameter, p, the proportion of all COC students that
have at least one tattoo.

How will we ever know if we did a good job
estimating the true proportion of all COC students
who have at least one tattoo?
 What
did you notice about the lengths of our
confidence intervals as we changed
confidence levels?
 More
on this a little later...

Statistical inference provides methods for
drawing conclusions about a population based on
sample data

Methods used for statistical inference assume
that the data was produced by properly
randomized design

Confidence intervals, are one type of inference,
and are based on sampling distributions of
statistics. The other type of inference we will
learn and practice is hypothesis testing (more on
this later).

Estimator ± margin of error

Our estimator we just used was our sample proportion, our

Our margin of error we just used was our standard error,
our standard deviation, multiplied by the number of
standard deviations away we are from the center


Margin of error tells us amount we are most likely ‘off’
with our estimate

Margin of error helps account for sampling variability (NOT
any of the bias’ we discussed...voluntary response, nonresponse, et.)
pˆ
 that
the mean temperature in Santa Clarita
in degrees Fahrenheit is between -50 and
150?
 that
the mean temperature in Santa Clarita
in degrees Fahrenheit is between 70 and
70.001?
 that
the mean temperature in Santa Clarita
in degrees Fahrenheit is between -50 and
150?
 that
the mean temperature in Santa Clarita
in degrees Fahrenheit is between 70 and
70.001?
 In
general, large interval  high confidence
level; small interval  lower confidence
level
 99%
confidence level
 95%
confidence level
 90%
confidence level
 Typically
we want both: a reasonably high
confidence level AND a reasonably small
interval; but there are trade-offs; more on
this in a little bit
 Will
we ever know for sure if we captured
the true unknown population parameter p?
No. Actual p is unknown.
 Interpretation
of a confidence interval:
“I am ___% confident that the interval
from _____ to _____ captures the true,
unknown population proportion of
(context).”

Stat, Proportion Stat, One Sample, With
Summary
# of successes: 7
 # of observations: 10
 Confidence Interval for p
 Level: .90

Now, change to 700, 1000, & .90; what do you
observe about the width of the confidence
interval?
 Now, change to 7000, 10000, & .90; what do you
observe about the width of the confidence
interval?

 Stat,
Proportion Stat, One Sample, With
Summary




# of successes: 7000
# of observations: 10,000
Confidence Interval for p
Change Level to: .99
 What
do you observe about the width of the
confidence interval?
 Now, change to 0.60 level; what do you
observe about the width of the confidence
interval?

The lower the confidence level (say 10% confident),
the shorter/more narrow the confidence interval (I
am 10% confident that the mean temperature in
Santa Clarita is between 70.01 degrees and 70.02
degrees)

The higher the confidence level (say 99% confident),
the wider the confidence interval (I am 99% confident
that the mean temperature in Santa Clarita is
between 40 degrees and 100 degrees)

Also, the larger the n (sample size), shorter the
confidence interval (small MOE)

Smaller the n (sample size), longer the confidence
interval (large MOE)
 So,
if you want (need) high confidence level
AND small(er) interval (margin of error), it is
possible if you are willing to increase n
 Can
be expensive, time-consuming
 Sometimes
 In
not realistic (why?)
reality, you may need to compromise on the
confidence level (lower confidence level)
and/or your n (smaller n).

Alcohol abuse is considered by some as the #1
problem on college campuses. How common is it?

A recent SRS of 10,904 US college students
collected information on drinking behavior &
alcohol-related problems. The researchers
defined “frequent binge drinking” as having 5 or
more drinks in a row 3 or more times in the past 2
weeks. According to this definition, 2,486 students
were classified as frequent binge drinkers.

Based on these data, what can we say about the
proportion of all US college students who have
engaged in frequent binge drinking?

Let’s create a confidence interval so we can
approximate the true population proportion of
all US college students who engaged in frequent
binge drinking.

How confident do we want to be (i.e, what
confidence level do we want to use)?

We must check conditions before we calculate a
confidence interval...
Random?
 Large sample?
 Big population?

 Perform
Stat Crunch calculations
 stat, proportion stat, 1 sample, with
summary
Always conclude with interpretation, in
context
I am 99% confident that the interval from
about 22% to 24% contains the true, unknown
population parameter, p, the actual
proportion of all US college students who
have engaged in frequent binge drinking.
In a random sample of 400 Americans, each person
was asked if they are satisfied with the amount
of vacation time they given by their employers.
336 of them said that they were not satisfied
with their vacation time.
In Stat Crunch, construct a 99% confidence interval
in order to estimate the true percent of all
Americans that are not satisfied with their
vacation time. Remember to check conditions &
provide a well-worded conclusion.
Review: What would happen to the width of the
confidence interval if we created a 90%
confidence interval?
In a random sample of 72 adults in Santa
Clarita, each person was asked if they
support the death penalty. 31 adults in the
sample said they do support the death
penalty.
Using Stat Crunch, calculate a 95% confidence
interval to estimate the true proportion of
all people in Santa Clarita that support the
death penalty. Remember to check
conditions & provide a well-worded
conclusion ?
What percent of eligible Americans vote? In
2008, a random sample of 3,000 American
adults that were eligible to vote was taken
and we found that 2,040 of them voted.
Construct a 90% confidence interval estimate
of the true population percent of all
Americans that vote. Don’t forget about
conditions & provide a well-worded
conclusion.

Choose a data set from the Math 140 Spring data (that you
have not used before); it should be ‘yes/no’ or
‘black/brown/red’ or ... What type of data am I telling you
to choose? Why?

Cut and paste into Stat Crunch

Check conditions to be sure you can calculate a meaningful
confidence interval

Calculate a confidence interval (you choose the confidence
level) so you can confidently say something about ALL COC
students based on the Math 140 data

Interpret your results; be sure to use the word ‘all’ in your
interpretation; print out; turn in
The basics of Significance Testing

Already discussed confidence intervals for unknown
population parameter, p

Confidence Intervals used when the goal is to estimate
an unknown population parameter like ρ (like when we
estimated the true proportion of all 20,000 COC
students who have at least one tattoo)

Now... statistical inference through significance tests

Evaluate evidence (a statistic) provided by sample
data about some claim concerning an unknown
population parameter like ρ
 There
once were four students who missed
the midterm for their statistics class. They
went to the professor together and said,
“Please let us make up the exam. We
carpool together, and on our way to the
exam, we got a flat tire. That’s why we
missed the exam.” The professor didn’t
believe them, but instead of arguing he said,
“Sure, you can make up the exam. Be in my
office tomorrow at 8.”
 The
next day, they met in his office. He sent
each student to a separate room and gave
them an exam. The exam consisted of only
one question: “Which tire?”
 Let’s
image all four students answered, “left
rear tire.”
 So...
what do you think? Were students most
likely telling the truth? Lying?

What are the chances of all of them guessing the
same tire?

Let’s simulate; using StatCrunch, input RFront,
LFront, RRear, Lrear

Data, sample, choose your data, sample size 4,
number of samples 10, sample with replacement

How many times, just by random chance, does
Stat Crunch choose the same tire? Let’s create a
dot plot on the board; what do you think?
 Assuming
the students were lying, the chances
that all four of them would guess the same
tire, just by random chance, according to our
simulation, is ... look at our dot plot...
 If
we carried out this simulation again, would
we get the same data? The same exact dot
plot?

Surprised or not?

The professor suspected they had been lying. That’s
why he did what he did.

Maybe they just got lucky ... just by chance they all
guess the same tire. How ‘lucky’ would they have to
be?

The theoretical probability that all four students
would guess the same tire is about ...

Do you consider that likely/typical or unlikely/rare
that they could have just simply, by chance guessed
the same tire? Look at our dot plot...
 Let’s
think about another hypothesis
test/inference example/situation...
I
claim that in the last 5 years of playing
basketball, I, on average, make 90% of my
basketball free throws.
 To
test my claim, I am asked to shoot 10 free
throws. I make 2 of the 10 (only 20%).
 Do
you still believe my claim that I make 90%
of my free throw? Why or why not?
 Do
you still believe my claim that I make 90%
of my free throw? Why or why not?
 Do
you agree that statistics vary from sample
to sample? So if I attempted another 10 free
throws, chances are I would make something
other than 2 of them?
 So
the question is... would me actually making
only 2 out of 10 (or 20%) happen so, so very
rarely (assuming my 90% claim were true) that
now you are starting to question/doubt if my
claim really is true.

Let’s simulate. Let’s assume that we believe the claim.
Pull up the random digits table. Let’s say 0 through 8
represent making a basket; 9 represents missing a
basket.

Go into the table on a random line, look at ten 1-digit
numbers, duplicates are OK.

Count the number of ‘baskets’ made in ten 1-digit
numbers. Do this three times. Put your magnets up on
the class dot plot.

So the question is... would me actually making only 2
out of 10 (or 20%) happen so, so very rarely (assuming
my 90% claim were true) that now you are starting to
question/doubt if my claim really is true.
A
formal procedure that enables us to choose
between two hypotheses when we are
uncertain about our measurements.
 Basic
idea... An outcome that would rarely
happen if a claim were really true is good
evidence that the claim is not true.
 If
we flip a penny, we can agree that the
probability of heads or tails is 0.50; fair.
 However,
some claim if we spin a penny on a
table, because the heads side bulges
outwards, the lack of symmetry will cause
the spinning coin to land on one side more
often than the other; probability is not 0.50
for each side; unfair
 Some
people might find this claim
outrageous; completely false
Null hypothesis, Ho, p = 0.50
null hypothesis is always neutral, no
change, always equal
null hypothesis is always in terms of
population parameter (like p or μ)
Alternative hypothesis, Ha, p ≠ 0.50
- alternative hypothesis is always <, >, or ≠
alternative hypothesis is always in terms
of population parameter (like p or μ)
Null hypothesis, Ho, p = 0.50
In the beginning, we assume null is true
(like defendant is assumed not guilty in the
beginning of a trial) until there is
overwhelming evidence that suggests this is not
so; then we may reject this belief if/when the
evidence is clearly against it
Alternative hypothesis, Ha, p ≠ 0.50
 The
null hypothesis always gets the benefit
of the doubt and is assumed to be true
throughout the hypothesis-testing procedure.
If we decide at the last step that the
observed outcome (our sample statistic) is
extremely unusual under this assumption,
then and only then do we reject the null
hypothesis.
 If
null hypothesis is correct, then when we
spin a coin a number of times, about ½ of
the outcomes should be heads. If null
hypothesis is wrong, we will see either a
much larger or much smaller proportion.
 Let’s
spin some pennies. Spin (on desk) 20
times. Count the # of heads
 Calculate
the sample proportion,
and write on board
pˆ
of heads
 Let’s
look at our sampling distribution;
describe using SOCS (review)
 If
we did this again, would be get different
results?
 How
‘extreme’ of a result would we need for
you to not believe our null hypothesis/to
reject null?
 We
will come back to this later in the
chapter...
 What’s
 Ho:
 Ho:
 Ho:
 Ho:
 Ho:
wrong with ...
p = 0.17
p = - 0.20
p > 0.45
p = 1.50
pˆ = 0.92
Ha:
Ha:
Ha:
Ha:
Ha:
p
p
p
p
≠ 0.19
< 0.15
= 0.45
> 1.50
pˆ < 0.92
A recent Gallup Poll report on a national
survey of 1028 teenagers revealed that
72% of teens said they rarely or never
argue with their friends. You wonder
whether this national result would be
different in your school. So you conduct
your own survey of a random sample of
students at your school.
The proportion of people who live after
suffering a stroke is 0.85. A drug
manufacturer has just developed a new
treatment that they claim will increase the
survival rate.
A change is made that should improve student
satisfaction with the parking situation at
COC. The null hypothesis, that there is an
improvement, is tested versus the
alternative, that there is no change.
A
researcher tests the following null
hypothesis
 Ho:

pˆ = 0.80
A statistics instructor at COC read that 90%
of all college students use social media on
a regular basis. She wonders if the
percent of COC students who use social
media on a regular basis is different.
Ho: p = 0.90
Ha :
p > 0.91
The Census Bureau reports that households
typically spend 31% of their total spending on
housing. A homebuilders association in
Cleveland believes that it is lower in their
area. They interview a sample of 40
households in the Cleveland metropolitan
area to learn what percent of their spending
goes toward housing. Take p to be the typical
percent of spending devoted to housing
among all Cleveland households.
H0: p = 31%
Ha: p < 31%

Surprise itself; when something unexpected
occurs (like only making 20% of free throws when
we claimed to make 90%)

Null hypothesis tells us what to expect; it’s what
we believe throughout the process until we see
evidence otherwise

If we see something unexpected, then we should
doubt the null hypothesis

If we are really surprised, then we should
rejected it altogether

Instead of just not surprising, kind of surprising,
very surprising, etc., we have...

p-value

A p-value is a probability. Assuming the null
hypothesis is true, the p-value is the probability
that if the experiment were repeated many
times, we would get as extreme or more
extreme outcome than the one we actually got
(our statistic). A small p-value suggests that a
surprising outcome has occurred and discredits
the null hypothesis.
A p-value is a quantitative measure of rarity
of/how unlikely a finding
Small p-values are evidence against Ho
Large p-values fail to give evidence against Ho
 Understanding
how to interpret a p-value is
crucial to understanding hypothesis testing.
 StatCrunch
will calculate the p-value, but we
need to understand how the software did the
calculation
 The
meaning of the phrase, “as extreme as
or more extreme than’ depends on the
alternative hypothesis


Note: the closer the number of heads is to 10, the larger the
p-value
Also note the p-value for an outcome of 11 heads is the same
as for 9 heads, etc.
 Most
of the time, we take one more step to
assess evidence against Ho
 We
compare the p-value to some predetermined value (versus ‘unlikely’) called a
significance level, symbol α (alpha)
 Can
think of this as a rejection zone (sketch)
 Significance
level makes ‘not likely’ more
exact, more informative
 Most
common α levels are α = 0.05 or α =
0.01
 Interpretation:

At α = 0.05, data give evidence against Ho so
strong it would happen no more than 5% of the
time
 If
p-value is as small or smaller than α, we
say data are statistically significant at level α
 Note:
‘significant’ in statistics doesn’t mean
important (like in English); it means not
likely to happen by chance
 Ho:
p = ...
Ha: p ...
 I gathered sample data, and calculated a pvalue based on sample data (probability of
getting that value or more extreme assuming
that null hypothesis is true)
 1-sided
 2-sided
 If
p-value is p = 0.03... this is significant at α
= 0.05 level (in rejection zone)
 If
p-value is p = 0.03... this is not significant
at α = 0.01 level (not in rejection zone)
Reject Ho (Null Hypothesis):
This happens when sample statistic is
statistically significant, p-value is too
unlikely to have occurred by chance (we
don’t believe null hypothesis), in the
rejection zone
Wording must reference all of the following for
a complete interpretation... p-value, α level,
reject Ho, and conclusion in context (caution
about using the word ‘cause’ or ‘prove’).
Fail to Reject Ho (Null Hypothesis):
This happens when sample statistic could have
occurred by chance (we do believe null
hypothesis; we don’t believe the
alternative), not in rejection zone
Wording must reference all of the following for
a complete interpretation... p-value, α level,
fail to reject Ho, and conclusion in context
(caution about using the word ‘cause’ or
‘prove’)

Random Sample ... randomly selected or randomly
assigned

Large Sample Size; Normality (see next slide) ...
npo ≥ 10 and n(1 – po) ≥ 10; the sample has at least
10 expected successes and at least 10 expected
failures

Big Population (Independence) ... Population at
least 10 times sample size; and each observation
has no influence on any other
 ...if
these conditions are satisfied, then we
can use the Central Limit Theorem for
sample proportions; distribution is ≈ Normal!
That’s a great thing!
 When
doing a hypothesis test, you MUST
check conditions... this is an essential part of
the hypothesis testing process
According to the National Institute for Occupational Safety and Health,
job stress poses a major threat to the health of workers. A national
survey of restaurant employees found that 75% said that work stress
had a negative impact on their personal lives.
A random sample of 100 employees from a large restaurant chain finds
that 68 answer “Yes” when asked, “Does work stress have a negative
impact on your personal life?” Is this good reason to think that the
proportion of all employees in this chain who would say “Yes” differs
from the national proportion p0 = 0.75?
H0: p = 0.75
Ha: p ≠ 0.75
We want to test a claim about p, the true proportion of all of this
chain's employees who would say that work stress has a negative
impact on their personal lives.
Conditions: 1-sample proportion hypothesis test;
α = 5% (rejection zone)
Random Sample – stated in problem
Large Sample Size/Normality - The expected
number of “Yes” and “No” responses are
(100)(0.75) = 75 and (100)(0.25) = 25,
respectively. Both are at least 10.
Big Population (Independence) - Since we are
sampling without replacement, this “large
chain” must have at least (10)(100) = 1000
employees.
Calculations for 1-sample proportion 2-sided
hypothesis test; use Stat Crunch
Stat, proportion stats, 1 sample, with summary
z = -1.616
P-value = 0.1059
Interpretation:
Fail to reject Ho. With a p-value of 0.1059 and
an α = 5%, we fail to reject the null
hypothesis and conclude that there is not
enough evidence to suggest that the
proportion of this chain restaurant's
employees who suffer from work stress is
different from the national survey result,
0.75.
Fail to reject Ho. With a p-value of 0.1059 and
an α = 5%, we fail to reject the null
hypothesis and conclude that there is not
enough evidence to suggest that the
proportion of this chain restaurant's
employees who suffer from work stress is
different from the national survey result,
0.75.
Run a Confidence Interval; which confidence
level? Why can we do this?

In a recent study, 73% of first-year college
students responding to a national survey
identified “being very well-off financially” as an
important personal goal. A state university finds
that 132 of a random sample of 200 of its firstyear students say that this goal is important.

Is there evidence that the proportion of all firstyear students at this university who think being
very well-off is important differs from the
national value, 73%? Carry out a significance test
to help answer this question.
We want to test
Ho: p = 0.73 versus Ha: p ≠ 0.73
regarding the proportion of all first-year
students at this university who think being
very well-off is important differs from the
national value of 73%.
Conditions: 1-sample proportion hypothesis test; α =
5% (rejection zone)
Random Sample/SRS – stated in problem
Large Sample Size/Normality – np ≥ 10 & n (1 – p) ≥
10
(200)(0.73) ≥ 10 & (200) (1 -0.73) ≥ 10
Big Population (Independence) – We must assume at
least (10)(200) first-year students in the
population.
Calculations... Stat Crunch
Stat, proportion stats, 1 sample, with summary
z = -2.22
P-value = 0.0258
Reject Ho. With a p-value of 0.0258, and
assuming an α = 0.05, we have statistically
significant evidence that the proportion of
all first-year students at this university who
think being very well-off is important differs
from the national value.
(decision, p-value, α, and context... always in
terms of alternative hypothesis)
Reject Ho. With a p-value of 0.0258, and
assuming an α = 0.05, we have statistically
significant evidence that the proportion of
all first-year students at this university who
think being very well-off is important differs
from the national value.
What if.... Our alpha had been 1%? Would our
decision have changed?
Reject Ho. With a p-value of 0.0258, and assuming
an α = 0.05, we have statistically significant
evidence that the proportion of all first-year
students at this university who think being very
well-off is important differs from the national
value.
Let’s go back to 5% alpha level; Can we calculate a
confidence interval? Why? How? Does it
confirm out findings?
Do we have to calculate a CI to confirm our
findings every time we conduct an hypothesis
test?
Researchers wondered whether a greater proportion
of people now dream in color than did so before
color television and movies became as prominent
as they are today. In the past, before color TV
and movies, this proportion was 0.29. Recently
researchers took a random sample of 113 people.
Of these 113 people, 92 reported dreaming in
color.
Is there evidence (at a significance level of 1%) that
more people today dream in color than in the past
(before color TV and movies became as prominent
as they are today)? Carry out an appropriate
hypothesis test to help answer this question.
What are our null and alternative hypotheses?
Conditions: 1-sample proportion; α = 1% (rejection zone)
Random Sample –
Large Sample Size/Normality Big Population/Independence –
Calculations –
Determination and interpretation -
What are our null and alternative hypotheses
Ho: p = 0.29 Ha: p > 0.29
Conditions 1-sample proportion hypothesis test; α = 1% (rejection
zone)
Random Sample
Large Sample Size/Normality
Big Population/Independence
Calculations
z = 12.28, p-value ≈ 0
Decision and interpretation
Reject null hypothesis. At an alpha level of 1%, and a p-value of
about zero, there is sufficient evidence to suggest that more
people today dream in color than in the past (before color TVs,
etc.)
Can we/should we calculate a confidence interval to confirm our
findings?
 Ho:
 Ha :
p1 = p 2
p1 ≠ or > or < p2
Stat Crunch will calculate this for us; no need
to memorize
 Random;
each n must be randomly selected
or randomly assigned; each n must be
independent from the other
 Large Count/Normality: Each of the
following must be ≥ 10:
n1 pˆ1 10
n1(1 pˆ1) 10
n2 pˆ 2 10
n2 (1 pˆ 2 ) 10
 Big
Population
Each of the populations must be at least (10)
times each of the corresponding sample sizes
To study the long-term effects of preschool programs for
poor children, a research foundation followed two
randomly-chosen/assigned groups of Michigan children
since early childhood. A control group of 61 children
represents population 1, poor children with no preschool. Another group of 62 from the same area and
similar backgrounds attended pre-school as 3- and 4year-olds represents population 2, poor children who
attend pre-school. Sizes are n1 = 61 and n2 = 62.
One response variable of interest is the need for social
services as adults. In the past ten years, 38 of the
preschool sample and 49 of the control sample have
needed social services (mainly welfare). Carry out an
hypothesis test to determine if there is significant
evidence that pre-school reduces or increases the later
need for social services?
State null and alternative hypothesis
Ho: pno pre-school = ppre-school
OR pno pre-school - ppre-school = 0
Ha: pno pre-school ≠ ppre-school
OR pno pre-school - ppre-school ≠ 0
Where p is the true, unknown population proportion for all
children like these needing social services
Procedure: 2-proportion hypothesis test
Random, Large Count/Normal, Big Population
Stat Crunch to calculate test statistic, p-value,
etc.
Stats, proportion stats, two sample, with
summary
z = -2.3201
p-value = 0.0203
Interpretation:
Reject null hypothesis. At a significance level of
5% (α = 0.05), and a p-value of approximately
0.02 there is sufficient evidence to show that
p no pre-school ≠ p pre-school (or evidence that
pre-school reduces or increases/changes the
later need for social services)
Reject null hypothesis. At a significance level of
5% (α = 0.05), and a p-value of approximately
0.02 there is sufficient evidence to show that
p no pre-school ≠ p pre-school (or evidence that
pre-school reduces or increases/changes the
later need for social services)
Can/should we calculate a confidence interval
to confirm our findings? Why? How?
The elderly fear crime more than younger people,
even though they are less likely to be victims of
crime. One of the few studies that looked at
older blacks recruited random samples of 56
black women and 63 black men over the age of
65 from Atlantic City, New Jersey. Of the women,
27 said they “felt vulnerable” to crime; 46 of the
men said this.
What proportion of women in the sample feel
vulnerable? Of men? (Note: Men are victims of
crime more often than women, so we expect a
higher proportion of men to feel vulnerable.)
Test the hypothesis that the true, unknown
population proportion of all elderly black
males who feel vulnerable is higher than that
of all elderly black women who feel
vulnerable. You may assume that all
conditions have been checked and met.

sample statistics: 46/63 men & 27/56 women

z = 2.7731

Reject null hypothesis. At any reasonable alpha
level, with a p-value less than 1%, we have
evidence to suggest that the proportion of all
black men who feel vulnerable is higher than the
proportion of all black women who feel
vulnerable.

Can/should we calculate a confidence interval to
confirm our findings?
P-value = 0.0028
California’s controversial ‘three strikes law’
requires judges to sentence anyone
convicted of three felony offenses to life in
prison. Supporters say that this decreases
crime; opponents argue that people serving
life sentences have nothing to lose, so
violence within the prison system increases.

Researchers looked at data from the California
Department of Corrections.
Of 734 randomly-selected prisoners who had
three strikes, 163 of them had committed
‘serious’ offenses while in the prison system
 Of 3,188 randomly-selected prisoners who did
not have three strikes, 974 had committed
‘serious’ offenses while in the prison system


Determine whether those with three strikes tend
to have more offenses than those who do not.
Use a 5% significance level.
Sample statistic for prisoners who had three strikes
was 163/734 ≈ 22.2%
Sample statistic for prisoners who did not have
three strikes was 974/3188 ≈ 30.6%
z = - 4.49
P-value = 0.9999
Fail to reject null hypothesis. At a 5% alpha level
and a p-value ≈ 1, there is not sufficient
evidence to conclude that all prisoners who have
three strikes commit more serious offences
within the prison system than all prisoners who
do not have three strikes.
Fail to reject null hypothesis. At a 5% alpha
level and a p-value ≈ 1, there is not
sufficient evidence to conclude that all
prisoners who have three strikes commit
more serious offences within the prison
system than all prisoners who do not have
three strikes.
Confidence interval? Why? Why not?

High levels of cholesterol in the blood are associated
with higher risk of heart attacks. Will using a drug to
lower blood cholesterol reduce heart attacks? The
Helsinki Heart Study looked at this question.

Middle-aged men were assigned at random to one of
two treatments: 2,051 men took the drug gemfibrozil
to reduce their cholesterol levels, and a control
group of 2,030 men took a placebo. During the next
five years, 56 men in the gemfibrozil group and 84
men in the placebo group had heart attacks.

Is the apparent benefit of gemfibrozil statistically
significant? Use a 1% alpha level.
We want to draw conclusions about p1, the
proportion of all middle-aged men who would
suffer heart attacks after taking gemfibrozil,
and p2, the proportion of all middle-aged men
who would suffer heart attacks if they only
took a placebo.
We hope to show that gemfibrozil reduces heart
attacks, so we have a one-sided alternative.
n gemfibrozil = 2,051
x gemfibrozil = 56
n placebo = 2,030
x placebo = 84
Sample statistic for gemfibrozil = 56/2051 ≈2.7%
had heart attacks
Sample statistic for placebo = 84/2030 ≈ 4.1% had
heart attacks
Is this difference just due to chance? Or is there
really a difference between the medication and
the placebo? Perform an hypothesis test to help
you come to a conclusion.

Significance tests are used in a variety of settings...
Marketing, FDA drug testing, discrimination court
cases, etc.

Significance tests quantify event that is unlikely to
occur simply by chance

Different levels of significance (α) are chosen
depending on the given situation; typically α = 0.10,
0.05, or 0.01

Continue to use caution when using “prove” or
“cause”... even when doing hypothesis testing
 P-values
allow us to decide individually if
evidence is sufficiently strong
 But,
there is still no practical distinction
between p-values of, say, 0.049 and 0.051 if
our alpha level was, say, 5%
 Statistical
inference does not correct basic
flaws in survey or experimental design, such
as ...
Sometimes we do everything correctly... data
collection, conditions, calculations, interpretation...
but we still make an incorrect
decision/determination... perhaps we just happen to
get a sample statistic that is very extreme... that
really doesn’t represent our population accurately
... we reject the null hypothesis when we really should
have failed to reject (Ho was really true)
OR we fail to reject the null hypothesis when we really
should have rejected the null hypothesis (Ho was
really false)
... we make an “error”... that is NOT our fault!
Type I Error
We reject Ho (null hypothesis) when Ho is really
true
In other words, we determine Ha (alternative
hypothesis) is true when, in actuality, Ho (null
hypothesis) is true

Type II Error
We fail to reject Ho (null hypothesis) when Ho is
really false
In other words, we determine Ho (null hypothesis)
is true, when, in reality, Ha (alternative
hypothesis) is true
•
 Probability
of Type I Error (rejecting Ho when
null is really true): α, your significance level
for the hypothesis test.
 Probability
of Type II Error (failing to reject
Ho when alternative is really true): β. Very
complicated to calculate.
 Power:
Probability that a test will reject Ho
when Ha is true
 Think
of power as making the correct
decision, not making an error, not making a
mistake
 High
level of power is a good thing
 Power
= 1 – β (remember β is probability of
making a type II error); so ‘power’ and β are
complimentary
 How
can we increase power (making the
correct decision)?
 Increase
α
 Increase
n
 Decrease
standard deviation (same effect as
increasing the sample size, n)
 Following
3 problems are some practice
problems in which you need to decide what
procedure to do and why. Be prepared to
defend your choice of procedures.
 After deciding which procedure, go ahead
and check conditions and run the procedure.
 Work with a partner.
 You try it for 15-20 minutes then we will
debrief.
1. A question was asked in a tattoo magazine whether a man or a woman is
more likely to have a tattoo. A random sample of 857 men found that 146
of them had at least one tattoo. A random sample of 794 women found
that 137 of them had at least one tattoo.
2. A body mass index of 20-25 indicates that a woman is of a healthy
weight. A recent inter-national study reported that 30% of all adult
women in the world maintain a healthy BMI. A random survey of 745
women living in Los Angeles found that 198 of them had a healthy BMI
score. We wonder if the inter-national study results are also true for
women who live in Los Angeles.
3. In March 2003, a research group asked 2400 randomly selected Americans
whether they believe that the U.S. made the right or wrong decision to
use military force in Iraq? Of the 2400 adults, 1862 said that they
believed that the U.S. did make the correct decision. In February
2008, the question was asked again to 2180 randomly selected
Americans and 684 of them said that the U.S. did make the correct
decision. Has the proportion of all Americans’ opinions changed
between 2003 and 2008?
 1-proportion
confidence interval
 2-proportion confidence interval
 1-proportion hypothesis test
 2-proportion hypothesis test
 All
inference processes/methods, making a
statement about a population based on a
sample statistic