(sampling distribution) 抽样分布的形成过程

Chapter 9
Means and
Proportions
as Random Variables
Review
• In chapter 8 we learned how to work with
random variables, including how to make
probability statements about ranges of
possible values, what to expect as the longrun average, and how to identify special
classes of random variables.
• Today we will learn chapter 9 “Means and
Proportions as random variables”
2017/7/29
Learning objectives
• This chapter introduces the reasoning that allows
researchers to make conclusions about entire
populations using relatively small samples of
individuals.
• Based on statistical theory, we will be able to
determine how close our sample answer is to
what we really want to know, the true answer for
the population.
2017/7/29
Learning contents
1. Understanding Dissimilarity Among
Samples
2. Sampling Distributions for Sample
Proportions
3. Sampling Distributions for Sample Means
4. What to Expect in Other Situations: CLT
5.Sampling Distribution for Any Statistic
2017/7/29
Learning Emphases and Difficulties
• Determine the sampling distributions of:
– Means.
– Proportions.
• Explain the Central Limit Theorem.
• Determine the effect on the sampling distribution
when the samples are relatively large compared
to the population from which they are drawn
2017/7/29
Teaching methods
• Both English and Chinese
• Both PPT and writing on blackboard
2017/7/29
9.1 Understanding Dissimilarity
Among Samples p293
• The secret to understanding how things work is
to understand what kind of dissimilarity was we
should expect to see among different samples
from the sample population
• Key: Need to understand what kind of dissimilarity
we should expect to see in various samples from the
same population.
2017/7/29
• For example, Suppose we knew most samples were
likely to provide an answer that is within 10% of the
population answer.
• Then we would also know the population answer
should be within 10% of whatever our specific sample
gave.
• => Have a good guess about the population value based
on just the sample value. P294
• Statisticians have worked out similar techniques for a
variety of sample measurements
2017/7/29
Statistics and Parameters
A statistic is a numerical value computed from a
sample. Its value may differ for different samples.
e.g. sample mean x , sample standard deviation s, and
sample proportion p̂ .
A parameter is a numerical value associated with a
population. Considered fixed and unchanging.
e.g. population mean m, population standard deviation
s, and population proportion p.
2017/7/29
Sampling Distributions
• Example: Population: 1,2,3,4 repeated
sampling with replacement
• Sample size n=2 There are 16 outcomes
•n = 2
•The first
observed
number
•1
•2
•3
•4
•1
•1,1
•1,2
•1,3
•1,4
•2
•2,1
•2,2
•2,3
•2,4
•3
•3,1
•3,2
•3,3
•3,4
•4
•4,1
•4,2
•4,3
•4,4
2017/7/29
•The second observed number
• We discuss the sample mean, for different sample, the
sample mean =1.0, 1.5, 2.0, 2.5, 1.5,………..
• What can you see from this example?
• sample Mean
•The first
observed
number
The second observed number
•1
•2
•3
•4
•1
•1.0
•1.5
•2.0
•2.5
•2
•1.5
•2.0
•2.5
•3.0
•3
•2.0
•2.5
•3.0
•3.5
•4
•2.5
•3.0
•3.5
•4.0
2017/7/29
• sample Mean
•The first
observed
number
•1
•2
•3
•4
•1
•1.0
•1.5
•2.0
•2.5
•2
•1.5
•2.0
•2.5
•3.0
•3
•2.0
•2.5
•3.0
•3.5
•4
•2.5
•3.0
•3.5
•4.0
P(x)
0.3
0.2
0.1
0
1.0 1.5 2.0 2.5 3.0 3.5 4.0
2017/7/29
x
• Repeated sample of the same sample size from
a population
• Every time a new sample is taken, sample
statistic (the sample mean) will change.
• Think of the “sample mean” as a random
variable
• Since it is R.V, so there exists the distribution
of possible values
2017/7/29
What is the Sampling Distributions?
Each new sample taken =>
sample statistic will change=>think of the
statistics as a random variable
P295 The distribution of possible values of a statistic
for repeated samples of the same size from a
population is called the sampling distribution of the
statistic.
2017/7/29
抽样分布
(sampling distribution)
1.
样本统计量是随机变量,它有若干可能取值(即可能样
本指标数值),每个可能取值有一定的可能性(即概率)
从而形成它的概率分布,统计上称为抽样分布。简言之,
抽样分布就是指样本统计量的概率分布。
– 例如:样本均值, 样本比例,样本方差等
2.
样本统计量的概率分布,是一种理论分布
– 在重复选取容量为n的样本时,由该统计量的所有
可能取值形成的相对频数分布
3.
结果来自容量相同的所有可能样本
4.
提供了样本统计量长远而稳定的信息,是进行推断的理
论基础,也是抽样推断科学性的重要依据
2017/7/29
抽样分布的形成过程
(sampling distribution)
总体
样
本
2017/7/29
计算样本统计
量
如:样本均值
、比例、方差
Example 9.1
Mean Hours of Sleep
for College Students
Survey of n = 190 college students.
“How many hours of sleep did you get last night?”
Sample mean = 7.1 hours.
If we repeatedly took
samples of 190 and each
time computed the sample
mean, the histogram of the
resulting sample mean
values would look like the
histogram at the right:
Many statistics of interest have sampling distributions
that are approximately normal distributions
2017/7/29
9.2 Sampling Distributions
for Sample Proportions p296
• Suppose we conduct a binomial experiment with n
trials and get successes on x of the trials. Or,
suppose we measure a categorical variable for a
representative sample of n individuals. In each case,
we can compute the statistic : the sample
proportion p̂
• If we repeated the binomial experiment or collected
a new sample, we would probably get a different
value for the sample proportion
2017/7/29
• For instance, we would like to know what
proportion of a large population carries the gene
for a certain disease.
• Sample 25 people from the population. Suppose
that in truth 40% of the population carries the
gene.
• sampling Proportion is random variable
• Four possible random samples of 25 people:
2017/7/29
Many Possible Samples
Sample 1: X =12, proportion with gene =12/25 = 0.48 or 48%.
Sample 2: X = 9, proportion with gene = 9/25 = 0.36 or 36%.
Sample 3: X = 10, proportion with gene = 10/25 = 0.40 or 40%.
Sample 4: X = 7, proportion with gene = 7/25 = 0.28 or 28%.
Note:
• P296: Each sample gave a different answer, which did
not always match the population value of 40%.
• P297:Although we cannot determine whether one sample
will accurately reflect the population, statisticians have
determined what to expect for most possible samples.
2017/7/29
The Normal Curve Approximation Rule
for Sample Proportions
Let p = population proportion of interest
or binomial probability of success.
Let p̂ = sample proportion or proportion of successes.
If numerous random samples or repetitions of the same size n
are taken, the distribution of possible values of p̂ is
approximately a normal curve distribution with
• Mean = p
p (1  p )
• Standard deviation = s.d.( p̂ ) =
n
This approximate distribution is sampling distribution of p̂ .
2017/7/29
The Normal Curve Approximation Rule for
Sample Proportions
.
Normal Approximation Rule can be applied in two
situations:
Situation 1: A random sample is taken from a
population.
Situation 2: A binomial experiment is repeated
numerous times
In each situation, three conditions must be met:
.
2017/7/29
• Condition 1: The Physical Situation
There is an actual population or repeatable situation.
• Condition 2: Data Collection
A random sample is obtained or situation repeated
many times.
• Condition 3: The Size of the Sample or Number of
Trials
The size of the sample or number of repetitions is
relatively large,
np and n(1-p) must be at least 5 and preferably at
least 10
2017/7/29
Examples for which Rule Applies
• Election Polls: to estimate proportion who favor a
candidate; units = all voters.
• Television Ratings: to estimate proportion of households
watching TV program; units = all households with TV.
• Consumer Preferences: to estimate proportion of
consumers who prefer new recipe compared with old;
units = all consumers.
• Testing ESP: to estimate probability a person can
successfully guess which of 5 symbols on a hidden card;
repeatable situation = a guess.
2017/7/29
Example 9.2 p298 Possible Sample Proportions
Favoring a Candidate
Suppose 40% all voters favor Candidate X. Pollsters take a
sample of n = 2400 voters. Rule states the sample proportion
who favor X will have approximately a normal distribution with
mean = p = 0.4 and s.d.( p̂) =
Histogram at right
shows sample
proportions resulting
from simulating this
situation 400 times.
2017/7/29
p(1  p)
n

0.4(1  0.4)
2400
 0.01
Note 1: p300
In practice, we don’t know the true population proportion p,
so we cannot compute the standard deviation of p̂ ,
s.d.( p̂ ) =
p (1  p )
n
.
we only take one random sample, so we only have one sample
proportion p̂ . Replacing p with p̂ in the standard deviation
expression gives us an estimate that is called the standard
error of p̂.
s.e.( p̂ ) =
pˆ (1  pˆ )
.
n
2017/7/29
Note 2
• when population is finite and sampling is
without replacement, the standard error is
multiplied by what is called “finite population
correction factor”
– Rule of Thumb: Use FPC when n > 5%•N.
– Apply to: Standard errors of mean and proportion.
FPC N  n
N 1
2017/7/29
For example
•
In a certain university, the proportion of students
failing an introductory statistics course is
30%.for a sample of 300 students doing the unit
this semester, what’s the probability that less
than 25% will fail the course?
2017/7/29
Example
• Seventy percent of 1000 households in a certain
suburb have VCR’s . If a sample of 100 is taken,
what is the probability of at least 80% having
VCR?
2017/7/29
9.3 Sampling Distributions for
Sample Means p301
• Suppose we want to estimate the mean weight loss for all
who attend clinic for 10 weeks. Suppose (unknown to us) the
distribution of weight loss is approximately N(8 pounds, 5
pounds).
• We will take a random sample of 25 people from this
population and record for each X = weight loss.
• We know the value of the sample mean will vary
for different samples of n = 25.
• What do we expect those means to be?
2017/7/29
Many Possible Samples
Four possible random samples of 25 people:
Sample 1: Mean = 8.32 pounds, standard deviation = 4.74 pounds.
Sample 2: Mean = 6.76 pounds, standard deviation = 4.73 pounds.
Sample 3: Mean = 8.48 pounds, standard deviation = 5.27 pounds.
Sample 4: Mean = 7.16 pounds, standard deviation = 5.93 pounds.
Note:
• Each sample gave a different answer, which did not always
match the population mean of 8 pounds.
• Although we cannot determine whether one sample mean will
accurately reflect the population mean, statisticians have
determined what to expect for most possible sample means.
2017/7/29
回忆以前的一个例子
 现从总体中抽取n=2的简单随机样本,在重复抽
样条件下,共有42=16个样本。所有样本的结果为
•所有可能的n = 2 的样本(共16个)
2017/7/29
•第二个观察值
•第一个
•观察值
•1
•2
•3
•4
•1
•1,1
•1,2
•1,3
•1,4
•2
•2,1
•2,2
•2,3
•2,4
•3
•3,1
•3,2
•3,3
•3,4
•4
•4,1
•4,2
•4,3
•4,4
样本均值的抽样分布
(例题分析)
 计算出各样本的均值,如下表。并给出样本均
值的抽样分布
•16个样本的均值(x)
P(x)
0.3
•第二个观察值
•第一个
•观察值
•1
•2
•3
•4
•1
•1.0
•1.5
•2.0
•2.5
•2
•1.5
•2.0
•2.5
•3.0
•3
•2.0
•2.5
•3.0
•3.5
•4
•2.5
•3.0
•3.5
•4.0
2017/7/29
0.2
0.1
0
1.0 1.5 2.0 2.5 3.0 3.5 4.0
样本均值的抽样分布
x
样本均值的分布与总体分布的比较
(例题分析)
总体分布
.3
.3
P(x)
抽样分布
.2
.2
.1
.1
0
0
1
2
3
4
1.0 1.5 2.0 2.5 3.0 3.5 4.0
x
m = 2.5
σ2 =1.25
2017/7/29
m x  2.5
s x2  0.625
The Normal Curve Approximation Rule for
Sample Means p304
Let m = mean for population of interest.
Let s = standard deviation for population of interest.
Let x = sample mean.
If numerous random samples of the same size n are taken, the
distribution of possible values of x is approximately a normal
curve distribution with
• Mean = m
s
• Standard deviation = s.d.( x ) =
n
This approximate distribution is sampling distribution of x .
2017/7/29
The Normal Curve Approximation Rule for
Sample Means
Normal Approximation Rule can be applied in two situations:
Situation 1: The population of measurements of interest is
bell-shaped and a random sample of any size is measured.
Situation 2: The population of measurements of interest is not
bell-shaped but a large random sample is measured.
Note: Difficult to get a Random Sample? Researchers usually
willing to use Rule as long as they have a representative sample
with no obvious sources of confounding or bias.
2017/7/29
Examples for which Rule Applies
• Average Weight Loss: to estimate average weight loss;
weight assumed bell-shaped; population = all current
and potential clients.
• Average Age At Death: to estimate average age at which
left-handed adults (over 50) die; ages at death not bellshaped so need n  30; population = all left-handed
people who live to be at least 50.
• Average Student Income: to estimate mean monthly
income of students at university who work; incomes not
bell-shaped and outliers likely, so need large random
sample of students; population = all students at
university who work.
2017/7/29
Example 9.4
Hypothetical Mean
Weight Loss
Suppose the distribution of weight loss is approximately
N(8 pounds, 5 pounds) and we will take a random sample
of n = 25 clients. Rule states the sample mean weight loss
will have a normal distribution with
s
5

 1 pound
mean = m = 8 pounds and s.d.( x ) =
n
25
Histogram at right shows
sample means resulting
from simulating this
situation 400 times.
Empirical Rule:
It is almost certain that
the sample mean will be
between 5 and 11 pounds.
2017/7/29
Note 1:Standard Error of the Mean p305
In practice, the population standard deviation s is rarely known,
so we cannot compute the standard deviation of x ,
s.d .( x ) 
s
n
In practice, we only take one random sample, so we only have
the sample mean x and the sample standard deviation s.
Replacing s with s in the standard deviation expression gives
us an estimate that is called the standard error of x .
s
s.e.( x ) 
n
For a sample of n = 25 weight losses, the standard deviation is s
= 4.74 pounds. So the standard error of the mean is 0.948
pounds.
2017/7/29
Note 2:Increasing the Size of the Sample p305
Suppose we take n = 100 people instead of just 25.
The standard deviation of the mean would be
s
5
s.d .( x ) 

 0.5 pounds
n
100
• For samples of n = 25,
sample means are likely to
range between 8 ± 3 pounds
=> 5 to 11 pounds.
• For samples of n = 100,
sample means are likely to
range only between 8 ± 1.5
pounds => 6.5 to 9.5 pounds.
Larger samples tend to result in more accurate estimates
of population values than smaller samples. p306
2017/7/29
Note 3:Sampling for a Long, Long Time:
The Law of Large Numbers p306
• Law of large numbers , which guarantees that
the sample mean will eventually get “close” to
the population mean no matter how small a
difference you use to define “close”.
• The law of large numbers is also used by
insurance companies and so on to give them
peace of mind about what will happen to their
average profit ( or loss) in the long run.
2017/7/29
LLN = peace of mind to casinos, insurance companies.
• Eventually, after enough gamblers or
customers,
the mean net profit will be close to the
theoretical mean.
• Price to pay = must have enough $ on hand to
pay the occasional winner or claimant.
2017/7/29
9.4 What to Expect in Other
Situations: CLT p309
The Central Limit Theorem states that if n is
sufficiently large, the sample means of random
samples from a population with mean m and finite
standard deviation s are approximately normally
distributed with mean m and standard deviation s
n
.
Technical Note:
The mean and standard deviation given in the CLT
hold for any sample size; it is only the “approximately
normal” shape that requires n to be sufficiently large.
2017/7/29
central limit theorem
2017/7/29
P310
• You may notice that the CLT is nothing more
than a restatement of the rule for sample means,
except that in ……
2017/7/29
Example 9.5 California Decco Winnings
California Decco lottery game: mean amount
lost per ticket over millions of tickets sold is m =
$0.35; standard deviation s = $29.67 => large
variability in possible amounts won/lost, from net
win of $4999 to net loss of $1.
Suppose store sells 100,000 tickets in a year.
What are the possible values for the average
loss per ticket?
CLT => distribution of possible sample mean
loss per ticket is approximately normal with
2017/7/29
Example 9.5
California Decco Winnings
mean (loss) = m = $0.35and s.d.(x ) =
s
n

$29.67
 $0.09
100000
Empirical Rule: The mean loss is almost surely between
$0.08 and $0.62 => total loss for the 100,000 tickets is
likely between $8,000 to $62,000!
There are better ways to invest $100,000.
2017/7/29
9.5 Sampling Distribution
for Any Statistic p311
Every statistic has a sampling distribution,
but the appropriate distribution may not always
be normal, or even approximately bell-shaped.
Construct an approximate sampling distribution
for a statistic by actually taking repeated samples
of the same size from a population and constructing
a relative frequency histogram for the values of the
statistic over the many samples.
2017/7/29
Example 9.6
Winning the Lottery
by Betting on Birthdays
Pennsylvania Cash 5 lottery game:
Select 5 numbers from integers 1 to 39.
Grand prize won if match all 5 numbers.
One strategy = 5 numbers bet correspond to birth days
of month for 5 family members => no chance to win if
highest number drawn is 32 to 39.
Statistic of interest = H = highest of five integers
randomly drawn without replacement from 1 to 39.
e.g. if numbers selected are 3, 12, 22, 36, 37 then H = 37.
2017/7/29
2017/7/29
Example 9.6
Winning the Lottery
by Betting on Birthdays (cont)
Summarized below: value of H for 1560 games.
Highest number over 31 occurred in 72% of the games.
Most common value of H = 39 in 13.5% of games.
2017/7/29
• Notice that the shape of the probability
distribution for H is not symmetric and is skewed
to the left. This example illustrates that not all
sample statistics have bell-shaped sampling
distributions.
2017/7/29
Note : sampling distribution function in
common use
• Normal distribution
• chi-square distribution
• the student’s distribution
• F-distribution
2017/7/29
chi-square distribution
(1)概念
设 x1 , x2 ..., 是独立同分布的随机变量,且每个随机变量都服从标准正态
xn
2
n
分布即x i ~ N (0,
1,则随机变量
)
 2   xi
i 1
的分布称作自由度为n的2分布,记为2(n)。其分布密度为:
n
x

1


1
2
2

x
e

n
n
2

 2  
f ( x)
2



 0
2017/7/29
x0
x0
(2)分位数(或临界值)
若
或


p  2 ( n)  k  


p  2 (n)  k  1  
则称k为2 (n)的  分位数,记为k≜
2017/7/29
 2 (n)
(3)性质
• 若X服从2 (n),则均值E(X)=n ,方差 D
(X)=2n 。
• 2分布具有可加性。若 X1,X2相互独立,
X1~ 2(n1) ,X2~2(n2)
则(X1+X2)~2(n1+n2)
• 当n→∞时,2分布渐进于正态分布,即
2(n)~N(n,2n)
2017/7/29
(4) 2分布的图形
选择容量为n 的
总体
不同容量样本的抽样分布
简单随机样本
s
m
计算样本方差S2
n=4
n=10
计算卡方值
n=20
2 = (n-1)S2/σ2
计算出所有的
 2值
2017/7/29
n=1
2
the student’s distribution
(1)概念
设随机变量x与Y相互独立,而且X~N(0,1),Y~ 2 (n),则称随机变量
服从自由度为n的t分布,
t 
X
Y
n
记作t(n)。其分布密度为:
f (t ) 
2017/7/29
n 1
(
)
2
n
t2 
(1  )
n
n
( )
2
n 1
2
   t  
(2)分位数(临界值)
若P(t(n)>k)=α 则称k为t分布的α分位数。 记为 K≜tα(n)
(3)t分布的图形
标准正态分布
t 分布
t (df = 13)
正态分布
t (df = 5)
Z
X
t 分布与正态分布的比较
2017/7/29
不同自由度的t分布
t
(4)性质。
①t分布的均值E(t)=0 ,方差
V(t)=n / n-2(n > 2)。
②t分布密度是关于t=0对称的,故有t1-α(n)=-tα(n)。
③当n  ∞ 时,t分布渐进于标准正态分布。
2017/7/29
F-distribution
(1)概念
设随机变量X和Y相互独立,且分别服从自由度为 n1,n2 的2分布,则
服从第一自由度为n1,第二自由度为n2的F分布,
记作F~F( n1 , n2 )。其分布密度为:
X / n1
F
Y / n2
n1  n 2


(
)
n

n1 21
2
(
) x

n1
n2
n
2

  ( 2 ) ( 2 )
f x   




 0
2017/7/29
n1
1
2
n
(1  1 x )
n2
n1 n 2
2
x0
x0
(2)分位数(临界值 )
若P[F(n1,n2)>K]=α 则称K为F分布的α分位数
记为:K≜Fα(n1,n2)
2017/7/29
(3)F分布的性质
①统计量F是大于零的正数。
②F分布曲线为正偏态,其尾端以横轴为渐进线趋于
无穷。
③F分布是一种连续的概率分布,不同的自由度组合
有不同的F分布曲线
2017/7/29
9.6 Standardized Statistics p313
If conditions are met, these standardized
statistics have, approximately, a standard
normal distribution N(0,1).
2017/7/29
z
z
xm
s
n
p 
p(1  p)
n
z
s
n  1s
~ N (0,1)
s
n
2
2
( x1  x 2 )  ( m1  m 2 )
s 12
n1
2017/7/29
t
~ N (0,1)
xm

s 22
n2
~ t (n  1)
~  n  1
2
~ N (0,1)
ESS / k
F 
RSS /( n  k  1)
t
( x1  x2 )  ( m1  m2 )
sp
t 
2017/7/29
1 1

n1 n2
~
F(k , n-k-1)
~ t (n1  n2  2)
( x1  x 2 )  ( m1  m 2 )
s12
s 22

n1
n2
~ t (v )
Example 9.7 p314
Unpopular TV Shows
Networks cancel shows with low ratings. Ratings based on
random sample of households, using the sample proportion p̂
watching show as estimate of population proportion p. If p <
0.20, show will be cancelled.
Suppose in a random sample of 1600 households, 288 are
watching (for proportion of 288/1600 = 0.18). Is it likely
to see p̂ = 0.18 even if p were 0.20 (or higher)?
z
pˆ  p
p( 1 p)
n

0.18 0.20
0.20( 1 0.20 )
1600
 2.00
The sample proportion of 0.18 is about
2 standard deviations below the mean of 0.20.
2017/7/29
9.7 Student’s t-Distribution:
Replacing s with s p314
Dilemma: we generally don’t know s. Using s we have:
xm xm
n (x  m)
t


s.e.( x ) s / n
s
If the sample size n is small,
this standardized statistic will
not have a N(0,1) distribution
but rather a t-distribution with
n – 1 degrees of freedom (df).
More on t-distributions and tables of probability
areas in Chapters 12-13.
2017/7/29
Example 9.8
Standardized Mean Weights
Claim: mean weight loss is m = 8 pounds.
Sample of n =25 people gave a sample mean weight loss of x
= 8.32 pounds and a sample standard deviation of s = 4.74
pounds.
Is the sample mean of 8.32 pounds reasonable to expect if m
= 8 pounds?
t
x m
s
n

8.328
4.74
25
 0.34
The sample mean of 8.32 is only about one-third
of a standard error above 8, which is consistent
with a population mean weight loss of 8 pounds.
2017/7/29
Note: population normal? sample size? s2 is known?
•
大样本:①若方差(s2) 未知,总
体服从正态分布.②若方差(s2)
z 
x m
s
n
~ N (0,1)
已知,即使总体不是正态分布,
可用正态分布近似
•
小样本(n < 30) ::总体服从正态分
布,且方差(s2) 未知,
2017/7/29
t
xm
s
n
~ t (n  1)
Example
• State the nature of the following sampling
distribution
• (1) sample of 40 taken from a normal population
with a standard deviation of 5
• (2) sample of 25 with a standard deviation of 2
taken from a normal population
• (3) sample of 40 taken from a non-normal
population with a standard deviation of 2
• (4) sample of 20 with a standard deviation of 8
taken from a non –normal population
2017/7/29
9.8 Statistical Inference
• Confidence Intervals: uses sample data
to provide an interval of values that the
researcher is confident covers the true value
for the population.
• Hypothesis Testing or Significance Testing:
uses sample data to attempt to reject the
hypothesis that nothing interesting is
happening, i.e. to reject the notion that chance
alone can explain the sample results.
2017/7/29
Case Study 9.1
Do Americans Really Vote
When They Say They Do?
Election of 1994:
• Time Magazine Poll: n = 800 adults (two days after
election), 56% reported that they had voted.
• Info from Committee for the Study of the American
Electorate: only 39% of American adults had voted.
If p = 0.39 then sample proportions for samples of size
n = 800 should vary approximately normally with …
mean = p = 0.39 and s.d.( p̂ ) =
2017/7/29
p(1  p)
n

0.39 (1  0.39 )
800
 0.017
Case Study 9.1
Do Americans Really Vote
When They Say They Do?
If respondents were telling the truth, the sample percent
should be no higher than 39% + 3(1.7%) = 44.1%,
nowhere near the reported percentage of 56%.
If 39% of the population voted, the standardized score
for the reported value of 56% is …
0.56  0.39
z
 10.0
0.017
It is virtually impossible to obtain a standardized score of
10.
2017/7/29
Key Terms
•
•
•
•
•
•
•
•
•
•
•
Statistic统计值
Parameter参数值
Sampling distribution抽样分布
Sample proportion样本比值
Normal curve approximation rule for sample proportions样本比
值的正态曲线近似法则
Sampling distribution of p hat p统计值的样本分布
Standard deviation of p hat p统计值的标准差
Standard error of p hat p统计值的标准误
Normal curve approximation rule for sample means 样本均值的
正态曲线近似法则
Sampling distribution of x bar x均值的样本分布
Sampling distribution of the mean 均值的样本分布
2017/7/29
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Standard deviation of the sample mean 样本均值的标准差
Standard error of the mean 均值的标准误
Law of large numbers 大数定律
Central limit theorem 中心极限定理
Standardized statistic 标准化统计值
Standardized z-statistic 标准化z统计值
Student’s t-distribution 学生t分布
T-distribution T分布
Degrees of the freedom (df) 自由度
Standardized t-statistic 标准化t统计值
Statistical inference 统计推论
Confidence interval 置信区间
Hypothesis testing 假设检验
Significance testing 显著检验
Statistical significance 统计意义上的显著
2017/7/29
CH8-9 Homework
• Q1 .The number of arrivals at a bakery has a
possion distribution with a mean of 12 per hour
• (1) what is the probability of 4 customers
arriving in a half-hour interval?
• (2) what’s the probability of between 2 and 5
customers (inclusive) arriving in a 15-minute
interval
2017/7/29
Q2
• A certain college claims that 80% of its
graduates get a job after graduation . Of the next
19 graduates, what’s the probability of
• (1) at least five graduates getting a job?
• (2) between 3 and 10 ten (inclusive) getting a job?
• (3) no more than 4 getting a job?
2017/7/29
Q3
• A recently established building society rejects
40% of its home loan. Of the next 50 application
for home loans. What is the approximate
probability of
• (1) less than 15 loans being rejected?
• (2) up to 24 loans being rejected?
2017/7/29
Q4
• The lifetime of a certain brand of tyres is
approximately normally distributed with a mean
of 60000 kilometers and a standard deviation of
2000 km
• (1) what proportion of tyres will last more than
67000km?
• (2)how many kilometers would you expect the
weakest 15% of the tyres to last?
2017/7/29
Q5
• Decide the nature of distribution for the following
• (1)samples of 10 with a mean of 12 and a standard
deviation of 5 taken form a skewed population
• (2)sample of 39 with a mean of 60 and a standard
deviation of 10 taken from a skewed population
• (3) sample of 15 taken from a normal population with a
mean of 5 and a standard deviation of 1
• (4) sample of 10 with a mean 16 and a standard
deviation of 2 taken from a normal population
2017/7/29
Q6
• Ken hooky fried chicken buys all its chicken
from chook supplies Pty Ltd. Chook supplies
produces chicken with a mean weight of 1.5
kilograms and a standard deviation of 0.2
kilograms, the weight of chicken are known to
be normally distributed. If a sample of 100
chicken is taken.
2017/7/29
• (1) what is the probability of the sample mean
being above 1.6kg?
• (2)what is the probability of the sample mean
being between 1.35 and 1.45 kg?
• (3) if the sample is taken from a population of
1000 chickens, what is the revised answer to
part(1),(2)?
2017/7/29
Q7
• P319 9.14
2017/7/29