Part V - Chance Variability
Dr. Joseph Brennan
Math 148, BU
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
1 / 78
Law of Averages
In Chapter 13 we discussed the Kerrich coin-tossing experiment.
Kerrich was a South African who spent World War II as a Nazi prisoner.
He spent his time flipping a coin 10, 000 times, faithfully recording the
results.
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
2 / 78
Law of Averages
Law of Averages: If an experiment is independently repeated a
large number of times, the percentage of occurrences of a specific event E
will be the theoretical probability of the event occurring, but of by some
amount - the chance error.
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
3 / 78
Law of Averages
As the coin toss was repeated, the percentage of heads approaches its
theoretical expectation: 50%.
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
4 / 78
Law of Averages
Caution
The Law of Averages is commonly misunderstood as the Gamblers
Fallacy:
”By some magic everything will
balance out. With a run of 10 heads
a tail is becoming more likely.”
This is very false. After a run of 10 heads the probability of tossing a tail
is still 50%!
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
5 / 78
Law of Averages
In fact, the number of heads above half is quickly increasing as the
experiment proceeds. A gambler betting on tails and hoping for balance
would be devastated as the tails appear about 134 times less than heads
after 10, 000 tosses.
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
6 / 78
Law of Averages
In our coin-flipping experiment; the number of heads will be around half
the number of tosses plus or minus the chance error.
As the number of tosses goes up, the chance error gets larger in absolute
terms.
However, when viewed relatively, the chance error as a percentage
decreases.
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
7 / 78
Sample Spaces
Recall that a sample spaces S lists all the possible outcomes of a study.
Example (3 coins):
We can record an outcome as a string of heads and tails, such as HHT.
The corresponding sample space is
S = {HHH, HHT, HTH, THH, TTH, THT, HTT, TTT}.
It is often more convenient to deal with outcomes as numbers, rather than
as verbal statements.
Suppose we are interested in the number of heads. Let X denote the
number of heads in 3 tosses.
For instance, if the outcome is HHT, then X = 2.
The possible values of X are 0, 1, 2, and 3. For every outcome from S,
X will take a particular value:
Outcome
X
HHH
3
Dr. Joseph Brennan (Math 148, BU)
HHT
2
HTH
2
THH
2
TTH
1
Part V - Chance Variability
THT
1
HTT
1
TTT
0
8 / 78
Random Variable
Random Variable: An unknown subject to random change. Often a
random variable will be an unknown numerical result of study.
A random variable has a numerical sample space where each outcome has
an assigned probability. There is not necessarily equal assigned
probabilities:
The quantity X in the previous Example is a random variable because its
value is unknown unless the tossing experiment is performed.
Definition: A random variable is an unknown numerical result of a study.
Mathematically, a random variable is a function which assigns a
numerical value to each outcome in a sample space S.
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
9 / 78
Example (3 coins)
We have two different sample spaces for our 3 coin experiment:
S = {HHH, HHT, HTH, THH, TTH, THT, HTT, TTT}.
S ∗ = {0, 1, 2, 3}
The sample space S describes 8 equally likely outcomes for our coin flips
while the sample space S ∗ describes 4 not equally likely outcomes.
Recall that S ∗ represents the values of the random variable X, the number
of heads resulting from three coin flips.
P(X = 0) = P(TTT ) =
1 1 1
1
· · =
2 2 2
8
P(X = 1) = P(HTT or TTH or THT ) =
P(X = 2) =
3
8
P(X = 3) =
3
8
1
8
S ∗ does not contain information about the order of heads and tails.
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
10 / 78
Discrete and Continuous Random Variables
Discrete Random Variables:
A discrete random variable has a number of possible values which
can be listed. Mathematically we say the number of possible values
are countable.
Variable X in Example (3 coins) is discrete. Simple actions are
discrete: rolling dice, flipping coins, dealing cards, drawing names from
a hat, spinning a wheel, . . .
Continuous Random Variables:
A continuous random variable takes values in an interval of numbers.
It is impossible to list or count all the possible values of a continuous
random variable. Mathematically we say the number of possible
values are uncountable.
For the data on heights of people, the average height x̄ is a continuous
random variable which takes on values from some interval, say, [0, 200]
(in inches).
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
11 / 78
Probability Distributions
Any random variable X , discrete or continuous, can be described with
A probability distribution.
A mean and standard deviation.
The probability distribution of a random variable X is defined by
specifying the possible values of X and their probabilities.
For discrete random variables the probability distribution is given by
the probability table and is represented graphically as the
probability histogram.
For continuous random variables the probability distribution is given
by the probability density function and is represented graphically by
the density curve.
Recall that we discussed density curves in Part II.
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
12 / 78
The Mean of a Random Variable X
In Part II (Descriptive Statistics) we discussed the mean and standard
deviation, x̄ and s, of data sets to measure the center and spread of the
observations. Similar definitions exist for random variables:
The mean of the random variable X , denoted µ, measures the centrality
of the probability distribution.
The mean µ is computed from the probability distribution of X as a
weighted average of the possible values of X with weights being the
probabilities of these values.
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
13 / 78
The Expected Value
The mean µ of a random variable X is often called the expected value of
X . It means that the observed value of a random variable is expected to
be around its expected value; the difference is the chance error. In other
words,
observed value of X = µ + chance error
We never expect a random variable X to be exactly equal to its
expected value µ.
The likely size of the chance error can be determined by the standard
deviation, denoted σ.
The standard deviation σ measures the distribution’s spread and is a
quantity which is computed from the probability distribution of X .
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
14 / 78
Random Variable X and Population
A population of interest is often characterized by the random variable X .
Example:
Suppose we are interested in the distribution of American heights. The
random variable X (height) describes the population (US people).
The distribution of X is called the population distribution, and the
distribution parameters, µ and σ, are the population parameters.
Population parameters are fixed constants which are usually unknown and
need to be estimated.
A sample (data set) should be viewed as values (realizations) of the
random variable X drawn from the probability distribution.
The sample mean x̄ and standard deviation s estimates the unknown
population mean µ standard deviation σ.
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
15 / 78
Discrete Random Variables
The distribution of a discrete random variable X is summarized in the
distribution table:
Value of X
Probability
x1
p1
x2
p2
x3
p3
...
...
xk
pk
The symbols xi represent the distinct possible values of X and pi is the
probability associated to xi .
p1 + p2 + . . . + pk = 1
(or 100%)
This is due to all possible values of X being listed in the sample space
S = {x1 , x2 , . . . , xk }.
The events X = xi and X = xj , i 6= j, are disjoint since the random
variable X cannot take two distinct values at the same time.
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
16 / 78
Example (Fish)
A resort on a lake claims that the distribution of the number of fish X in
the daily catch of experienced fisherman is given below.
x
P(X = x)
0
0.02
1
0.08
2
0.10
3
0.18
4
0.25
5
0.20
6
0.15
7
0.02
Find the following :
(a) P(X ≥ 5)
(b) P(2 < X < 5)
(c) y if P(X ≤ y ) = 0.2
(d) y if P(X > y ) = 0.37
(e) P(X 6= 5)
(f) P(X < 2 or X = 6)
(g) P(X < 2 and X > 4)
(h) P(X = 9)
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
0.37
0.43
y =2
y =4
1 − 0.20 = 0.80
0.25
0
0
17 / 78
Probability Histograms
The probability distribution of a random variable X is called the
probability histogram.
There are k bars, where k is the number of possible values of X .
The i-th bar is centered at the xi , has a unit width and height pi .
The areas of the probability histograms display the assignment of
probabilities to possible values of X .
Example (3 coins) The distribution table for X , the number of heads
after 3 coin flips, is given below:
Dr. Joseph Brennan (Math 148, BU)
X
0
1
2
3
P(X )
1
8
3
8
3
8
1
8
Part V - Chance Variability
18 / 78
Example (3 coins): The Probability Histogram.
The probability histogram for the 3 coins example is shown below.
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
19 / 78
Probability Histograms and Data Histograms
Do not confuse the probability histogram and the data (empirical)
histogram!
The probability histogram is a theoretical histogram which shows
the probabilities of possible outcomes.
Each bar on the probability histogram shows the probability of a certain
outcome.
The data histogram is an empirical histogram which shows the
distribution of observed outcomes.
Each bar on the data histogram represents the observed frequency of
that outcome.
As the probability is a long run frequency, we should think of the
probability histograms as idealized pictures of the results of very many
trials.
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
20 / 78
Example (Two Dice)
Two dice are rolled. Find the distribution of the total and plot its
probability histogram.
Solution: Let X denote the sum on the two dice. There are 11 possible
values of X .
Value of X 2
3
4
5
6
7
8
9 10 11 12
Probability
1
36
Dr. Joseph Brennan (Math 148, BU)
2
36
3
36
4
36
5
36
6
36
Part V - Chance Variability
5
36
4
36
3
36
2
36
1
36
21 / 78
Example (Two dice)
A computer simulated throwing a pair of dice, and the experiment was
repeated 100 times, 1000 times and then 10, 000 times. The empirical
histograms for the sums are plotted below:
We can see that the empirical histogram converges (gets closer and closer)
to the probability histogram as the number of repetitions increases.
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
22 / 78
Discrete Random Variable: µ and σ
Mean: The mean µ of a discrete random variable is found by
multiplying each possible value by its probability and adding together all
the products:
µ = x1 p1 + x2 p2 + . . . + xk pk =
k
X
xi pi
i=1
Standard Deviation: The standard deviation σ of a discrete
random variable is found with the aid of µ:
q
σ =
(x1 − µ)2 p1 + (x2 − µ)2 p2 + . . . (xk − µ)2 pk
v
u k
uX
= t (xi − µ)2 pi
i=1
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
23 / 78
Example (Two dice): µ and σ
Two dice are rolled. The distribution table for this random event:
Value of X
2
3
4
5
6
7
8
9 10 11
1
2
3
4
5
6
5
4
3
2
Probability A 36
36
36
36
36
36
36
36
36
36
12
1
36
The mean:
1
2
3
1
+3·
+3·
+ . . . + 12 ·
= 7.
36
36
36
36
This shouldn’t be too much of a surprise as we’ve seen in class that the
mean for rolling one die is 3.5.
µ=2·
The standard deviation:
r
1
2
1
σ = (2 − 7)2 ·
+ (3 − 7)2 ·
+ . . . + (12 − 7)2 ·
≈ 2.415.
36
36
36
Interpretation: If an experiment is repeated many-many times, and the
average of outcomes, x̄, is computed, it is expected to be close to 7. An
interpretation of the standard deviation is not so clear.
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
24 / 78
Box Models
Many statistical questions can be framed in terms of drawing tickets from
a box.
Box Model: A model framing a statistical question as drawing tickets
(with or without replacement) from a box. The tickets are to be labeled
with numerical values linked to a random variable.
Example: Suppose we are to flip one fair coin, we would be able to model
the possible outcomes in terms of drawing from a box:
There are two tickets in the box. The first is labeled 1 and the second
is labeled 0.
Flipping a head is equivalent to drawing a 1 from the box and a tail is
equivalent to drawing a 2.
If we were to flip the coin multiple times, we would be drawing
multiple tickets from the box. As coin flips are independent, we draw
from the box with replacement.
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
25 / 78
Box Models
A Box Model is a version of a Distribution Table for a random variable.
They allow one to simplify a question to an easily visualized experiment (a
common theme in mathematics... a large number of questions can be
framed as manipulating objects found in a box).
A box model should be used when a question requires classifying and
counting. If you are interested in a certain subset of values for a random
variable, label the tickets corresponding to your interests as 1 and the
remaining tickets as 0.
Example: If you are interested in the occurrence of 3 or 4 when rolling a
die your box model
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
26 / 78
Box Models
When we wish to describe the expected value and standard deviation
for a box model, we use the formulas for discrete random variables, but we
have a simpler way to visualize these ideas.
The expected value of a random variable is the average of the tickets
occupying the box model.
The standard deviation of a random variable is the standard deviation
of the tickets.
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
27 / 78
The Sum of n Independent Outcomes
Since individual outcomes of an experiment are values of a random
variable X , the sum of multiple outcomes will also be a random variable.
What are the mean and standard deviation of this variable?
We will use the following notation:
n is the number of repetitions of an experiment.
µ and σ are the mean and standard deviation of the random variable
X which describes a single outcome of an experiment.
In terms of a box model, n is the number of draws that we will draw from
our box. Because we want Independent Events, drawing from the box is
with replacement.
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
28 / 78
The Sum of n Independent Outcomes
When the same experiment is repeated independently n times, the
following is true for the sum of outcomes:
The expected value of the sum of n independent outcomes of an
experiment:
nµ
The standard error of the sum of n independent outcomes of an
experiment:
√
nσ
The second part of the above rule is called the the Square Root Law.
Note that the above rule is true for any sequence of independent random
variables, discrete or continuous!
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
29 / 78
Example (Test)
Example 4 (Test)
A test has 20 multiple choice questions. Each question has 5 possible
answers, one of which is correct. A correct answer is worth 5 points, so
the total possible score is 100. A student answers all questions by guessing
at random. What is the expected value and standard deviation of their
total score?
Solution: Let X be a number of points earned on one question. Then X
is a random variable which has the following distribution:
Value of X
Probability
Dr. Joseph Brennan (Math 148, BU)
A
0
5
4
5
1
5
Part V - Chance Variability
30 / 78
Example (Test)
Value of X
Probability
The mean of X is
µ=0·
5
4
5
1
5
4
1
+5· =1
5
5
The standard deviation of X is
r
σ=
A
0
(0 − 1)2 ·
4
1
+ (5 − 1)2 · = 2
5
5
We are interested in the mean and standard error of the sum of scores
from 20 questions. The questions are independent. We have:
The mean of the sum = 20 · 1 = 20 points,
The standard error of the sum =
Dr. Joseph Brennan (Math 148, BU)
√
20 · 2 ≈ 8.94 points.
Part V - Chance Variability
31 / 78
A Computational Trick!
When there are just two numbers, x1 and x2 , in the distribution of X the
distribution’s standard deviation, σ, can be computed by using the
following short-cut formula:
√
σ = |x1 − x2 | p1 p2
where pi is the probability of xi .
Example (Test) The standard deviation for the distribution of points
earned by guessing on one question can be easily found as
r
4 1
2
σ = |0 − 5|
· =5· =2
5 5
5
which coincides with what we found before.
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
32 / 78
A Computational Trick!
This trick will be used often when we are interested in classifying and
counting. These problems are framed as a box model with tickets being
either 0 or 1.
p
σ=
(fraction of 1’s) × (fraction of 0’s)
r
σ=
Dr. Joseph Brennan (Math 148, BU)
2 4
× = 0.47
6 6
Part V - Chance Variability
33 / 78
Standard Error
An observed value differs from the expected value by the chance error.
The likely size of the chance error is given by the standard error.
The sum of the points earned from randomly selecting answers on our 20
question test is expected to be 20 give or take the standard error of 8.94
points.
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
34 / 78
The Binomial Setting
1
There are a fixed number of n of repeated trials.
2
The trials are independent. In other words, the outcome of any
particular trial is not influenced by previous outcomes.
3
The outcome of every trial falls into one of just two categories, which
for convenience we call success and failure.
4
The probability of a success, call it p, is the same for each trial.
5
It is the total number of successes that is of interest, not their order
of occurrence.
NOTE: The Binomial Setting can be framed as a box model with only 1’s
and 0’s where draws are performed with replacement.
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
35 / 78
Binomial Setting
The binomial setting is appropriate under the sampling WITH replacement
scheme. When sampling WITHOUT replacement, removing objects from
the population changes the probability of success for the next trial and
introduces dependence between the trials.
However, when the population is large enough:
Removing a few items from it doesn’t change the proportion of
successes and failures significantly.
Successive trials are nearly independent.
Conclusion:: We can apply the binomial setting for sampling without
replacement problems when the population is large.
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
36 / 78
Binomial Coefficients
The number of ways in which exactly k successes can occur in the n trials
of a binomial experiment can be found as
n
n!
=
,
k
k!(n − k)!
where n! = 1 · 2 · 3 · . . . n. The exclamation mark is read factorial and
is read n choose k.
n
k
Example: Students are given a list of nine books and told that they will
be examined on the contents of five of them. How many combinations of
five books are possible?
9
9!
=
= 126.
5
5!4!
There are 126 possible combinations of 5 books out of 9 books.
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
37 / 78
The Binomial Distribution
Let X denote the number of successes under the binomial setting. Then X
is a random variable which may take values 0, 1, 2, 3, ..., n. In particular,
X = 0 means no successes in n trails. Only failures were observed.
X = n means the outcomes of all n trails are successes.
X = 5 means 5 successes in n trials.
It turns out that X has a special discrete distribution which is called the
binomial distribution. The probabilities of values of X are computed as
n k
P(X = k) =
p (1 − p)n−k , k = 0, 1, 2, . . . , n.
(1)
k
So the binomial distribution is a probability distribution of a random
variable X which has 2 parameters: p (probability of success) and n (the
number of trials).
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
38 / 78
Binomial Mean and Standard Deviation
Let X be a binomial random variable with parameters n (number of trials)
and p (probability of success in each trial). Then the mean and standard
deviation of X are
µ = np,
p
σ = np(1 − p).
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
39 / 78
Example (Heart Attack)
The Helsinki Heart Study asked whether the anticholesterol drug
gemfibrozil reduces heart attacks. The Helsinki study planned to give
gemfibrozil to about 2000 men aged 40 to 55 and a placebo to another
2000. The probability of a heart attack during the five-year period of the
study for men this age is about 0.04. What are the mean and standard
deviation of the number of heart attacks that will be observed in one
group if the treatment does not change this probability?
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
40 / 78
Example (Heart Attack)
Solution:
There are 2000 independent observations, each having probability p = 0.04
of a heart attack. The count X of heart attacks has binomial distribution.
µ = np = 2000 · 0.04 = 80,
p
p
σ = np(1 − p) = 2000 · 0.04 · (1 − 0.04) ≈ 8.76.
In fact, there were 84 heart attacks among the 2035 men actually assigned
to the placebo, quite close to the mean. The gemfibrozil group of 2046
men suffered only 56 heart attacks. This is evidence that the drug does
reduce the chance of a heart attack.
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
41 / 78
Example (Light Bulbs)
For a lot of 1,000,000 light bulbs the probability of a defective bulb is 0.01.
What is the probability that there are 20,000 defective bulbs in a lot?
Solution : There are n = 1, 000, 000 bulbs (trials). The probability of a
defect (success) for each bulb is p = 0.01. Let X be a number of defective
bulbs out of n = 1, 000, 000. Then X has a binomial distribution. The
expected value of X is
µ = np = 1, 000, 000 · 0.01 = 10, 000.
The standard deviation of X is
p
p
√
σ = np(1 − p) = 1, 000, 000 · 0.01 · 0.99 = 9900 ≈ 99.5.
We want to compute the probability that X = 20, 000.
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
42 / 78
Example (Light Bulbs)
We have
P(X = 20, 000) =
=
1, 000, 000
0.0120,000 (1 − 0.01)1,000,000−20,000
20, 000
1, 000, 000!
0.0120,000 0.99980,000 .
20, 000! · 980, 000!
If you try to compute 1, 000, 000!, it may crash you computer! How can
we compute the desired probability then?
We may approximate it using the normal approximation to the binomial
distribution.
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
43 / 78
Normal Approximation to the Binomial Distribution
Consider the probability histograms for binomial distributions with
different values of n and p.
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
44 / 78
Normal Approximation to the Binomial Distribution
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
45 / 78
Normal Approximation to the Binomial Distribution
We can see that some of the probability histograms are bell-shaped. This
suggests that the binomial distribution may be approximated by the
normal distribution for certain combinations of n and p.
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
46 / 78
Normal Approximation to the Binomial Distribution
In particular, observe the following:
For a fixed p, the larger the sample size n, the better the normal
approximation to the binomial distribution.
For a fixed n, the closer p to 0.5, the better the normal approximation
to the binomial distribution.
NORMAL APPROXIMATION for BINOMIAL COUNTS
Let X be a random variable which has a binomial distribution with
parameters n and p. When n is large, the distribution of X is
approximately normal.
X is approximately normalpwith mean np and standard deviation
np(1 − p).
As a rule, we will use this approximation for values of n and p that satisfy
np ≥ 10 and n(1 − p) ≥ 10.
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
47 / 78
Normal Approximation to the Binomial Distribution
A few remarks are in order :
The above normal approximation is easy to remember because it says
that X is approximately normal with its usual mean and standard
deviation.
The true distribution of X is binomial, not normal. The normal
distribution is just a good approximation of the binomial probability
histogram when the conditions in the rule are satisfied.
The normal approximation to the binomial distribution consists in
replacing the actual probability histogram with the normal curve
before computing areas.
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
48 / 78
Example (College Commute)
A recent survey on a college campus revealed that 40% of the students
live at home and commute to college. If a random sample of 320 students
is questioned, what is the probability of finding at least 130 students who
live at home?
Solution : Let X be a count of students who live at home in a sample of
size n = 320. Then X has binomial distribution with parameters n = 320
and p = 0.4. We need to compute
P(X ≥ 130)
= P(X = 130) + P(X = 131) + . . . + P(X = 320)
320
320
320
=
0.4130 0.6190 +
0.4131 0.6189 + . . .
0.4320 0.60 .
130
131
320
The above computation is cumbersome. Can we use the normal approximation to
the binomial distribution to compute P(X ≥ 130)?
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
49 / 78
Example (College Commute)
Check the conditions of the rule:
np = 320 · 0.4 = 128,
n(1 − p) = 320(1 − 0.4) = 192
Since both np and n(1 − p) are greater than 10, we can use the normal
approximation to the binomial distribution.
p
√
√
np(1 − p) = 320 · 0.4 · 0.6 = 76.8 ≈ 8.76.
Then X
pis approximately normal with the mean = np = 129 and
SD = np(1 − p) ≈ 8.76.
How good is the normal approximation to the binomial distribution?
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
50 / 78
Example (College Commute)
The figure below displays the probability histogram of the binomial
distribution (bar graph) with the density curve of the approximating
normal distribution superimposed.
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
51 / 78
Example (College Commute)
Both distributions have the same mean and standard deviation, and both
the area under the histogram and the area under the curve are 1. The
normal curve fits the histogram very well.
The normal approximation to the probability of at least 130 students is
the area under the normal curve to the right of
130 − 128
130 − np
z=p
=
≈ 0.228.
8.76
np(1 − p)
P(X ≥ 130) ≈ P(Z ≥ 0.25) =
Dr. Joseph Brennan (Math 148, BU)
100% − 19.74%
= 40.13%
2
Part V - Chance Variability
(or 0.4013).
52 / 78
Example (College Commute)
The actual binomial probability that there are at least 130 students who
live at home can be computed:
P(X ≥ 130) = P(X = 130) + P(X = 131) + . . . + P(X = 320) = 0.4306.
The above probability is the area under the binomial probability
histogram to the right of the value x = 130.
Note that the actual and approximate probabilities are quite close. The
normal approximation works well in this case.
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
53 / 78
Example (Light Bulbs)
Recall that for a lot of 1,000,000 bulbs the probability of a defective bulb
is 0.01. We want to find the probability that there are 20,000 defective
bulbs in the lot.
We justified that X , a count of defective bulbs in a lot, is a binomial
random variable with parameters p = 0.01 and n = 1, 000, 000. We also
concluded that we will not handle the direct binomial computation
1, 000, 000
P(X = 20, 000) =
0.0120,000 (1 − 0.01)1,000,000−20,000 ,
20, 000
since computing factorials of large numbers is simply not feasible!
The probability histogram for the binomial distribution with parameters
p = 0.01 and n = 1, 000, 000 has 1,000,001 bars centered at the values 0,
1, 2, ..., 1000000.
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
54 / 78
Example (Light Bulbs)
The chance that X = 20, 000 is the area of the bar over 20,000. We want
to use the normal distribution to approximate the area of this rectangle.
The base of this rectangle goes from 19999.5 to 20000.5 on the number of
defective details scale. In standard units the base of the rectangle goes
≈ 100.497 to z2 = 20000.5−10000
≈ 100.508. Then
from z1 = 19999.5−10,000
99.5
99.5
P(X = 20, 000) ≈ P(100.497 ≤ Z ≤ 100.508) ≈ 0.
There is almost no chance that the lot will contain EXACTLY 20,000
defective light bulbs. We can expect the normal approximation to be quite
accurate in this example since
np = 1, 000, 000 · 0.01 = 10, 000 > 10.
and n(1 − p) = 1, 000, 000 · 0.99 = 990, 000 > 10.
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
55 / 78
Continuous Random Variables
Continuous random variables take values from the intervals on the
number line.
Examples of continuous random variables:
Weight;
Height;
Volume.
The probability distributions of continuous random variables are given by
probability density functions p(x) which are displayed graphically as
density curves. Most density curves are smooth curves without sharp
edges.
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
56 / 78
Probabilities of Events for Continuous Distributions
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
57 / 78
More about Continuous Random Variables
For any continuous distribution:
The total area under the density curve is 1.
The probability density is a non-negative function: p(x) ≥ 0.
The probability of any event is the area under the density curve
and above the values of X that make up the event.
The probability that X having a continuous distribution takes any
particular value x is zero :
P(X = x) = 0.
Explanation: X takes infinitely many values, so the probability that X
takes any particular value is 0.
A continuous random variable assigns probabilities to intervals of
outcomes rather than to individual outcomes.
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
58 / 78
Important Continuous Distributions
There are many important continuous distributions including:
Normal distribution,
Chi-square distribution (often written χ2 distribution),
t-distribution,
F -distribution,
Exponential distribution,
Uniform distribution,
Weibull distribution.
In this unit we will discuss just the normal distribution. This course will
also discuss the Chi-square distribution and the t-distribution.
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
59 / 78
The Normal Distribution
We have used the standard normal curve for computations of chances or
percents of observations many times in the past.
Figure : The standard normal curve.
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
60 / 78
Parameters of the Normal Distribution
Many random variables, such as height, weight, reaction time to a
medication, scores on the IQ test, have distributions of the bell-shaped
type which can be reasonably approximated by a normal curve.
The normal distribution should be viewed as a convenient model for
many random variables.
There is the whole family of normal distributions, not just the standard normal
distribution with µ = 0 and σ = 1 that appears in the normal table.
Different members of a normal family have different values of the
parameters, µ and σ. All normal distributions have the same overall bell
shape. Parameters µ and σ transform the bell as follows:
The value of µ determines the centering of the distribution. Changing
µ merely translates the curve to the right or the left.
The value of σ determines the spread of the bell. Larger values of σ
correspond to greater spread of the curve.
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
61 / 78
The Normal Density
We express the fact that the random variable X has the normal
distribution with parameters µ and σ in the following way:
X ∼ N(µ, σ 2 ).
The parameters of the normal distribution play the following role:
µ is the mean of the distribution (of X ),
σ is the standard deviation of the distribution (of X ).
Special case: The normal table gives the probabilities P(−∞ < Z < z),
where Z ∼ N(0, 1).
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
62 / 78
The Role of σ
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
63 / 78
Several Normal Distributions
Several normal distributions with different means and standard deviations
are shown on the plot below.
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
64 / 78
Normal Distribution: Range of Possible Values
Even though the normal density curve is defined on the whole number
line, the probability that X will fall outside of the interval µ ± 3σ is very
small (0.27%).
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
65 / 78
The Standard Normal Distribution
Let X ∼ N(µ, σ 2 ).
The new random variable Z = X σ−µ has the standard normal distribution.
As in the mean is 0 and the standard deviation is 1.
For practice, this means that all the probability computations for normal
distributions may be performed just using the standard normal distribution.
P(x1 < X < x2 ) = P(z1 < Z < z2 )
where z1 and z2 are computed by
z1 =
Dr. Joseph Brennan (Math 148, BU)
x1 − µ
σ
and z2 =
Part V - Chance Variability
x2 − µ
.
σ
66 / 78
Universal Use of the Standard Normal Distribution.
The relationship between the areas involved is shown below :
Thus, we only need the standard normal table to compute probabilities
of events for ALL normal distributions.
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
67 / 78
Percentages of Observations and Probabilities.
In Chapter 5 we used the normal curve to approximate the data’s
histograms. We computed the area under the normal curve to approximate
percentage of observations which fall into the corresponding interval on
the data’s histogram.
Now we are using the normal curve to compute the probability that a
normally distributed random variable X will take values from a particular
interval.
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
68 / 78
Caution
When we are in the Binomial Setting we can use our rule to know if the
distribution approximates a normal curve.
This holds specifically because we are summing the draws from our box.
It is not true in general that all operations associated with drawing from a
box will eventually normalize.
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
69 / 78
Product Histograms
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
70 / 78
Product Histograms
If we look at the probability histograms resulting from multiple single die
rolls which are multiplied we do not tend towards a nice bell shape:
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
71 / 78
The Central Limit Theorem (CLT)
The Central Limit Theorem: When drawing at random with
replacement from a box, the probability histogram for the sum will
approximately follow the normal curve, even if the contents of the box do
not. The larger the number of draws, the better the normal approximation.
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
72 / 78
The Central Limit Theorem (CLT)
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
73 / 78
The Central Limit Theorem (CLT)
The sample size n should be at least 30 (n ≥ 30) before the normal
approximation can be used.
For symmetric population distributions the distribution of x̄ is usually
normal-like even at n = 10 or more.
For very skewed populations distributions larger values of n may be
needed to overcome the skewness.
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
74 / 78
The Central Limit Theorem (CLT) at Work
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
75 / 78
Distribution of a Sum
The following result is a consequence from the CLT.
Suppose that:
We repeat the same experiment n times,
The outcomes of repeated experiments are independent,
Every outcome (described with the random variable X ) has the mean
µ and standard deviation σ.
When n is large enough (n ≥ 30), the distribution of the SUM OF
OUTCOMES x1 + x2 + . . . + xn is approximately
√ 2 ,
N nµ, nσ
which means a normal distribution with mean nµ and SD
√
nσ.
If the distribution of X is normal, the above result holds exactly.
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
76 / 78
Example (Airline Passengers)
In response to the increasing weight of airline passengers, the Federal
Aviation Administration in 2003 told airlines to assume that passengers
average 190 pounds in summer, including clothing and carry-on baggage.
But passengers vary! A reasonable standard deviation is 35 pounds.
Assume that the weights of airline passengers are normally distributed.
Question: A commuter plane carries 19 passengers. What is the
probability that the total weight of the passengers exceeds 4000 pounds?
Solution: We have n = 19 passengers. The mean weight of a passenger is
µ = 190, and the standard deviation is σ = 35. Let x1 , x2 , . . . , x19 denote
the passengers’ weights.
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
77 / 78
Example (Airline Passengers)
We want to find the probability that the sum of weights,
x1 + x2 + . . . + x19 ,
exceeds 4000 pounds.
Since the distribution of X (weight of an airline passenger) is normal, the
distribution of the sum is normal with:
√
mean = 19 · 190 = 3610 and SD = 1935 ≈ 152.56.
Computing the z - score for 4000:
z=
4000 − 3610
≈ 2.56
152.56
We have:
P
19
X
!
xi > 4000
= P(Z > 2.56) = 100% − 99.48% = 0.52%.
i=1
Dr. Joseph Brennan (Math 148, BU)
Part V - Chance Variability
78 / 78
© Copyright 2026 Paperzz