Lecture 10 - Penn State Department of Statistics

Sept. 19 Statistic for the day:
Percent of Pennsylvania adults who feel
that the NCAA sanctions against Penn
State were too severe: 44%
not severe enough: 14%
Margin of error: 2.5%
Do you have a tattoo?
Yes
Men
No
Men
Yes
Women
No
Women
7%
93%
16%
84%
Assignment: Finish Reading Chapter 5
Exercises: pp. 100-101 #7, 8, 10, 13
What is the margin of error for the
different percentages?
What should you report?
Example: women who answer yes 16%
Based on:
83 men
137 women
STAT 100 FA 05
Tattoo results, Fall 2008
Yes
Men
No
Men
Yes
Women
No
Women
16.5%
83.5%
13%
87%
N = 137
Margin of error = .09 or 9%
Fall 2005: 7%
Report: 16% ± 9% (that is, 7% to 25%)
Fall 2005: 16%
Based on:
97 men
132 women
STAT 100 FA 08
So there is a huge margin of error
and the 16% is fairly uncertain.
What percent has a tattoo?
Students in one section of STAT 100:
Men
Women
FA 2001
11% (n=37)
13% (n=77)
SP 2004
15% (n=100)
23% (n=136)
FA 2005
7% (n=83)
16% (n=137)
FA 2008
16.5% (n=97)
13% (n=132)
Let’s focus on the men in FA 2008.
16.5% (n=97)
What percent has a tattoo?
Careful: Answer could change depending on percent
of whom!
§  If I want to know the percent of all STAT 100
males in FA 2008, then the answer is
16.5%. (When you have the entire
population as your sample, it’s called a
census.)
§  If I want to know the percent of all males at
Penn State University Park, then the
answer is 16.5% ±10%.
Note: 10% is roughly 1/sqrt(97).
1
Is anything wrong with the tattoo
survey?
What is the target population?
Recall this example:
Do you think that the ‘morning-after’ contraceptive
pill should be available over the counter?
Yes
No
Not sure
59.1%
37.1%
3.8%
What sort of sample do we have?
USA Today call-in poll, 2004
• The responding group is not representative of any larger group!
• Opinions reflect only those of the people who decide to respond.
• These polls are unscientific and worthless.
Volunteer response vs. volunteer sample
(p. 71)
Contraceptive call-in poll?
Volunteer sample!
1936 Literary Digest poll?
Volunteer response!
Sampling methods
n (Simple) random sampling
n Stratified random sampling
n Cluster sampling
n Systematic sampling
n Bad: Haphazard or convenience sampling (as in tattoo
survey)
Which is worse?
Volunteer sample!
Simple random sampling
Roughly speaking, ensure that each individual has the
same chance of being selected. More precisely:
n Draw your sample of size n in such a manner that ALL
possible samples of size n have the same probability of
being selected.
Example: Pennsylvania’s “MATCH 6” lottery, in
which 6 numbers are picked from 49.
Stratified random sampling
n Divide population into subgroups, or strata
n From each stratum, select a random sample
Example: Select a simple random sample from
each of four groups of students (in-state nonminority, in-state minority, out-of-state nonminority, out-of-state minority) to ensure
adequate representation of each group.
2
Cluster sampling
n Divide population into subgroups, or clusters
n Select a simple random sample of clusters
n Measure individuals within selected clusters according
to some plan
Example: To study high schoolers, first take a
random sample of schools and then look in depth
at all students in selected schools
Systematic sampling
•  From a list of individuals in the population, select every
kth individual
Example: Does anyone know the origin of the word
“decimate”?
A bad example of sampling:
The Hite report on female sexuality (1976)
•  Around 100,000 questionnaires mailed out
•  4.5% response rate
•  Anger of women was one theme, but angry
women would have been more likely to
respond.
Recall the Literary Digest Poll of 1936
(page 73 of the text)
Cluster sampling vs. stratified sampling
•  In cluster sampling, the cluster is treated as the
primary sampling unit.
•  In stratified sampling, the individual is treated as
the primary sampling unit, just like in simple
random sampling – but each stratum gets its own
simple random sample.
Note: A subgroup may be called either a “cluster” or
a “stratum,” depending on the context.
Quiz: The Gallup Poll is a good poll
because:
a) 
b) 
c) 
d) 
It uses a random sample of the population
It uses a large sample.
It is done over the telephone.
Gallup is the oldest polling organization.
Chapter 5: Experiments and
observational studies
Both of these types of studies often have:
EXPLANATORY VARIABLE -- says which
population we sampled from.
RESPONSE VARIABLE -- says what we
measured or counted.
Typical research question: How does the explanatory
relate to the response?
3
Suppose we design a study to compare the SAT
scores of the College of Liberal Arts and the
College of Education. We want to see if we
can claim there is a significant difference.
Do we have
an observational study?
a randomized experiment?
What are the response and explanatory
variables?
Suppose your roommate is part of a
comparative study to see if Vitamin C is
effective at reducing the effects of a head
cold.
Do we have
an observational study?
a randomized experiment?
What are the response and explanatory
variables?
The key to a good observational study or
a good randomized experiment is
RANDOMIZATION
in both cases.
•  In observational studies we need a random
sample from each population.
•  In randomized experiments we must
randomize the subjects to the different
treatments (or treatment and control groups).
4