Elementary Statistics and Inference Elementary Statistics and

Elementary Statistics and
Inference
22S:025 or 7P:025
Lecture 25
1
Elementary Statistics and
Inference
22S:025 or 7P:025
Chapter 19
2
Chapter 19 – Sample Surveys
E.
Most research is conducted with data from a Sample of
observations from a Population. The results of the
analyses are used to make an inference regarding the
nature of the Population.
„
Population – The total number of subjects
(observations) in a Universe.
„
Sample – A subset of observations from the
Population.
3
1
Chapter 19 – Sample Surveys (cont.)
„
Population Facts – Parameters (e.g., mean = µ, std dev
= σ).
„
Sample Facts – Statistics (e.g., mean = , std dev = S)
„
Investigators estimate population parameters from
statistics.
4
Chapter 19 – Sample Surveys (cont.)
B.
Bad Estimates Can be Made of Parameters
„
When sampling is not representative of members of the
population – 1936 Literary Digest Poll used telephone
directories and club membership lists to select a
directories,
sample of 2.4 million voters – did not reflect mood of
the country because many were unemployed and had
no telephone.
5
Chapter 19 – Sample Surveys (cont.)
C.
Dewey vs. Truman 1948
„
Big Sampling Bias – quota sampling – where
interviewer had wide choices in selecting persons to be
interviewed – not representative of the population
population.
6
2
Chapter 19 – Sample Surveys (cont.)
D.
Using Chance (Probability) in Survey Work
„
Put the name of each survey candidate on a ticket, put
the tickets in a box, and randomly select the names of
the persons to be surveyed.
„
Each person has the same chance of being selected to
answer the survey (Simple Random Sampling).
Example: N=1000 names on tickets in a box.
Randomly select 100 tickets for persons to be
surveyed. Everyone has the same chance of
being selected for the survey – Sampling
Without Replacement.
7
Chapter 19 – Sample Surveys (cont.)
„
More sophisticated methods are now used to select
representative samples of the population.
„
Multistage Cluster Sampling – 4 geographical regions of
USA – within each region develop representative
population centers – within each population center you
have wards – and within each ward you have precincts.
8
Chapter 19 – Sample Surveys (cont.)
9
3
Chapter 19 – Sample Surveys (cont.)
„
The advantage of multistage sampling is that the
interviewer has no discretion at all as to whom to
interview – there is a definite procedure used for
selecting the representative sample, and it involves the
planned use of chance (p
p
(probability)
y) in selecting
g the
sample.
10
Chapter 19 – Sample Surveys (cont.)
E.
Do Probability Sampling Methods Work?
„
Since 1948 the error in estimating the winner in
presidential elections has been very small – because
impartial probability methods have been used to select
the sample of likely voters.
11
Chapter 19 – Sample Surveys (cont.)
12
4
Chapter 19 – Sample Surveys (cont.)
F.
A Closer Look At the Gallup Poll
„
In the 1984 election, the Gallup Organization tried to
screen its survey respondents to include only those
who were likely to vote -
13
14
Chapter 19 – Sample Surveys (cont.)
„
They screened the non voters from the voters, and the
undecided.
„
The survey respondent drops his/her preference for the
candidate in an enclosed sealed envelope into a box –
the choice was unknown to the interviewer.
15
5
Chapter 19 – Sample Surveys (cont.)
G.
Telephone Surveys
„
Cheaper, just as effective – since nearly every home has a
phone.
„
Sample picked –
Random – area code (e.g., 319)
Random – exchange within area code (e.g., 430)
Bank – (e.g., 31)
Digits – (e.g., 99)
Call: 319-430-3199
16
17
Chapter 19 – Sample Surveys (cont.)
H.
Chance Errors and Bias
„
Imagine a box with a very large number of tickets with
“ones” and “zeros” on the tickets.
„
Select a sample of tickets without replacement percentage of 1’s in sample = percentage of 1’s in the
box + chance error.
18
6
Chapter 19 – Sample Surveys (cont.)
„
Questions we will address about chance errors:
How big are they likely to be?
‰
How do they depend on the size of the sample? the
size of the population?
‰
How big does the sample have to be to keep the
chance errors under control?
‰
19
Chapter 19 – Sample Surveys (cont.)
„
In complicated samples, the equation has to take bias
into account:
Estimate = parameter + bias + chance error
Exercise Set A – (pp. 349-351) #1, 2, 3, 4, 5
20
Chapter 19 – Sample Surveys (cont.)
I.
Review Exercises – (pp. 351-353) #2, 5, 7, 9, 10
#10.
n=100
P( H ) =
1
2
avg =
SD =
1
2
1
2
⎛1⎞
E (Heads) = n ⋅ avg = 100⎜ ⎟ = 50
⎝ 2⎠
⎛1⎞
SE (Heads) n ⋅ SD = 100 ⎜ ⎟ = 5
⎝ 2⎠
45 50 55
21
7
Chapter 19 – Sample Surveys (cont.)
„
I would pick 45-55 because these numbers have the
greatest likelihood of representing the number of heads
in 100 tosses of the coin – 68.27% of the time we would
g
obtain a sum in this range.
„
If you count the end points 44.5 – 55.5 (73%)
22
Chapter 19 – Sample Surveys (cont.)
#7. p. 352
A coin is tossed 1000 times. There are two options:
i)
To win $1
$1.00
00 if the number of heads is between 490
and 510
ii)
To win $1.00 if the percentage of heads is between
48% and 52% (480-520)
Which option is better?
23
Chapter 19 – Sample Surveys (cont.)
1
2
1
SD of the box =
2
avg of the box =
H
1
T
0
⎛1⎞
E (Heads) = n ⋅ avg of the box = 1000⎜ ⎟ = 500
⎝ 2⎠
⎛1⎞
⎛1⎞
SE (Heads) = n ⋅ SD of the box = 1000 ⎜ ⎟ = 31.62⎜ ⎟ = 15.81
⎝ 2⎠
⎝ 2⎠
24
8
Chapter 19 – Sample Surveys (cont.)
SE=15.81
480 500 520
Heads
490 510
490 − 500
= −.63
15.81
480 − 500
= −1.265
15.81
510 − 500
= .63
15.81
520 − 500
= 1.265
15.81
Z=
X − Mean
SE
Z=
X − Mean
SE
The chance of number of heads between 480 and 520 is
greater than chance of heads between 490 and 510.
25
9