Normal Distribution

Probability Distributions
(where statistics meets probabilities)
Recall: The histogram
 histograms are graphs that are created when measured data are sorted into groups or
‘class intervals’ or ‘bin widths’
 the width of each bar is known as the ‘class width’ – a small class width may result in a
histogram that does not effectively summarise the distribution; at least 5 bars are needed
in the data set so that a representative display is achieved
 histograms can come in many shapes:
(i) bimodal – a distribution that has two peaks
(ii) uniform distribution – height of each bar is roughly equal
(iii) skewed to the left or right – the direction of the skew is determined by the direction
the mean has shifted

mound-shaped distribution – symmetrical about a line passing through the interval with
the greatest frequency
As you can see, histograms come in many shapes and sizes, but a histogram that is symmetrical
and bell-shaped is commonly known as a normal distribution (well… actually it is a discrete
approximation of a continuous curve).
In statistics we are often interested in situations that result from elements of chance (such as
rolling two die). We look at random variables. The probabilities within their range of possible
values of these random variables are called probability distributions. A few are of particular
interest to us: The binomial probability distribution (discrete), the Poisson distribution
(discrete) and the normal distribution (continuous). Sadly, the hypergeometric distribution
(discrete) is not in the curriculum but I might mention it if I have time.
Exploration: Fill in the table below for the sum of two dies.
Sum
1
2
3
4
5
6
1
2
3
4
5
6
Plot the relative frequency* on the graph below:
* the frequency of a
value or group of
values expressed as
a fraction or percent
of the whole data
set.
Relative
Frequency
Observations (Sum)
What is the mode?
What is the median?
What is the mean?
Add all the relative frequencies – what was the result?
Discrete Probability Distribution
If you add up all the probabilities of all possible outcomes for a discrete random variable, X,
you need to get a total of 1. The distribution of each probability is called a discrete probability
distribution. If the discrete random variable can assume any of the values x1 , x2 , x3 ,..., xn , then
n
 P( X  x )  1
i 1
i
In the case above, You get:
n
 P( X  x )
i 1
i
 P(2)  P(3)  P(4)  P(5)  P(6)  P(7)  P(8)  P(9)  P(10)  P(11)  P(12)
1
2
3
4
5
6
5
4
3
2
1
         
36 36 36 36 36 36 36 36 36 36 36
1

Another way to look at it is to make a table:
x
2
3
4
5
6
7
8
9
10
11
12
P( X  x )
1/36
2/36
3/36
4/36
5/36
6/36
5/36
4/36
3/36
2/36
1/36
Sum=36/36
Example: A discrete random variable X has the following probability distribution:
x
0
1
2
3
Find
P( X  x )
p
2p
1-2p2
2p-3p2
(a) the value of p;
(b) P( X  2) ;
(c) P( X  2) ;
(d) P( X  2 X  2) .
Example: Urn A contains 5 red and 3 blue marbles; urn B contains 2 red and 4 blue marbles. A
marble is selected from each urn and the colour noted. Let X represent the number of red marbles
selected. Tabulate the probability distribution for X.
The Expected Value (mean for probabilities)
The expected value of the random variable is a measure of the central tendency of X. It is a
weighted average or a long term average. It can be written as
i n
E( X )     xi P( X  x)
i1
These are the kind of ‘averages’ gamblers, stock analysts, brokers, and actuaries look at.
Example: Consider the probability distribution of the number of heads that appear in 3 tosses of
a coin.
x P(X=x)
xP(X=x)
0 0.125
0.000
1 0.375
0.375
2 0.375
0.750
3 0.125
0.375
3
 xP( X  x)  1.5
x 0
The expected number of heads when 3 coins are tossed is 1.5.
Let’s play a game with 2 die.
First, let’s look at the mean of the distribution.
x
2
3
4
5
6
7
8
9
10
11
12
P( X  x )
xP( X  x)
1/36
2/36
3/36
4/36
5/36
6/36
5/36
4/36
3/36
2/36
1/36
2/36
6/36
12/36
20/36
30/36
42/36
40/36
36/36
30/36
22/36
12/36
12
 xP( X  x)  7
x2
Nothing special there since it is what I expected.
Now let’s make things a little more interesting: You bet $1.00 to play. If you get a seven or
eleven, you win $3. Otherwise I keep the $1. What is the mean return of this game?
x
3
-1
P( X  x )
xP( X  x)
8/36
28/36
24/36
-28/36
3
 xP( X  x)  0.11
x  1
This means that in the long run, you would lose $0.11 for every dollar you put into the game.
Explore: Discuss how this is used in various fields.
Explore: Try making up your own game and find the long term value (mean).
Explore: Look at the long term value for a lottery ticket.
The Measure of Spread (standard deviation)
The variance provides a measure of spread of the variability of the results. It can be written as
n
Var ( X )   2  E (( X   ) 2 )   ( xi   ) 2 P( X  x)
i 1
or
Var ( X )   2  E ( X 2 )   2
A clearer measure of spread is the standard deviation which has the same units as the values. .
It is simply the square root of the variance and can be written as
Sd ( X )    E ( X 2 )   2
Example: Consider the following probability distribution.
x
2
3
4
P(X=x)
0.3
0.5
0.2
Find (i) the mean, (ii) the variance and (iii) the standard deviation.
Binomial Probability Distribution
This is a special type of a discrete probability distribution. A lot of the probabilities that we will
look at have to do with a ‘success’ or ‘failure’ (either-or) situation which is repeated many times
and each trial is independent. These are called Bernoulli trials. Let’s say that the probability of
success is p and that of failure is q=1-p. We are interested in looking at r successes in n trials
(which gives us n-r failures). This yields a binomial probability distribution!!!
Binomial? You mean… Yup, the values of the probabilities are successive terms of the binomial
n
expansion of  p  q  . Since p  q  1 then the expansion must be equal to 1 (of course since it
contains all possible outcomes.
The probability of obtaining r successes in n independent trials is
n
P(n, r , p)    p r (1  p) n r for 0  r  n
r
Look at the tree diagram to convince yourself.
We can write B(n, p) as a shorthand to say that the distribution is binomial with n trials and p the
probability of success.
The Expected Value (mean), the Variance and the Standard Deviation of a Binomial
distribution
It can be easily argued that the expected value can be found using E( X )    np for a
binomial distribution. Really? Well think about it: The expected value of one trial is given by
E(x)  1 p  0  1 p p . The mean of n independent trials is hence E ( X )  np but
let’s prove it.
n
E(X)   xp(x)
x0
 n
  x   p xqn  x
x 0  x
n
n
 n
  x   p xqn  x
x 0  x
When x  0 then the term =0 therefore
n
 x
x 1
n
 x
x 1
n
 n
x 1
n
 n
x 1
n!
p xqn  x
x!(n  x)!
n  (n  1)!
p xqn x
x  (x  1)!(n  x)!
(n  1)!
p xqn x
(x  1)!(n  x)!
(n  1)!
p  p x 1q n  x
(x  1)!(n  x)!
(n  1)!
p x 1q n  x
x 1 (x  1)!(n  1)  (x  1) !
n
 np
n
 n  1 x 1 n  x
 np 
p q
x 1  x  1
 np
n 1
 n  1
  x  1 p
x 1 (n 1)( x 1)
q
x 1 0
The sum describes the terms of another binomial
whose number of trials is n-1 and whose value is x-1.
 n  1
n 1
  x  1 p
Therefore:
x 1 (n 1)( x 1)
q
 ( p  q)n  1
x 1 0
Hence
n 1
E(X)  np
 n  1
  x  1 p
x 1 (n 1)( x 1)
q
 np(1)
x 1 0
E(X)  np
Variance and standard deviation in a binomial distribution is less obvious but also yields a
straightforward formula, Var(X)   2  np(1  q) and Sd( X )   
np(1  p) .
The variance of one trial is given by
2
2
2
 2  1     p  0     1  p   1  p   p  p2  1  p   p  1  p . The variance of
n independent trials is hence  2  np 1  p  .
Let’s prove that:
n
Var(X)   (x   )2 P(X  x)
x0
 n
  (x  np)2   p x q n  x
 x
x0
n
 n
   x 2  2npx  (np)2    p x q n  x
 x
x0
n
n
n
n
 n
 n
 n
  x 2   p x q n  x  2np x   p x q n  x  n 2 p 2    p x q n  x
 x
x0
x  0  x
x  0  x
n
n
 n x n  x
 n x n  x
 n
2 2
  (x  x  x)   p q  2np x   p q  n p    p x q n  x
 x
x0
x 0  x
x  0  x
n
2
n
n
n
n
 n
 n
 n
 n
  x(x  1)   p x q n  x   x   p x q n  x  2np x   p x q n  x  n 2 p 2    p x q n  x
 x
x0
x  0  x
x 0  x
x  0  x
 n
  x(x  1)   p x q n  x  np  2np(np)  n 2 p 2 (1)
 x
x0
n
When x  0 and x  1 then each respective term =0 therefore
n
 n
  x(x  1)   p x q n  x  np  2n 2 p 2  n 2 p 2
 x
x2
n
  x(x  1)
x2
n
  x(x  1)
x2
n
  n(n  1)
x2
n!
p x q n  x  np  n 2 p 2
x!(n  x)!
n(n  1)(n  2)!
p x q n  x  np  n 2 p 2
x(x  1)(x  2)!(n  x)!
(n  2)!
p x q n  x  np  n 2 p 2
(x  2)!(n  x)!
n
 n  2  x n x
2 2
 n(n  1) 
 p q  np  n p
x

2

x2
n
 n  2  2 x2 n x
2 2
 n(n  1) 
 p p q  np  n p
x

2

x2
 n  2  x  2 (n  2)( x  2)
 np  n 2 p 2
 x  2  p q
x20
The sum describes the terms of another binomial
 n(n  1) p 2
n2

whose number of trials is n-2 and whose value is x-2.
 n  2  x  2 (n  2)( x  2)
 ( p  q)n  1
 x  2  p q
x20
n2
Therefore:

Hence
 n(n  1) p 2 (1)  np  n 2 p 2
 n 2 p 2  np 2  np  n 2 p 2
 np(1  p)
Example: A die is tossed 180 times. Find the mean and standard deviation of the random
variable representing the total number of sixes obtained.
Example: How many throws of two dice are required to ensure that the probability of obtaining
at least one “double six” is greater than 0.95? (note there are two ways to do this…using the
equation above will yield a situation that requires you to guess and check. You can also try
1  P(no double 6 in n trials) )
Example: Five percent of a large consignment of fruit is inedible. Find the probability that in a
random selection of 10 pieces of fruit from this consignment, exactly two pieces are inedible.
(note: this is not really independent events but the difference will be negligible for very large
consignments). What is the mean and standard deviation in this case?
Example: A manufacturer finds that 30% of the items produced from one of the assembly lines
are defective. During a floor inspection, the manufacturer selects 6 items from this assembly line.
Find the probability that the manufacturer finds (a) two defectives (b) at least two defectives.
The Poisson Distribution
The Poisson distribution (not named for the fish) models situations where there is a minimum
number of ‘successes’ but no maximum number of ‘successes’. The probability of getting a large
number of successes does go down to a very small number rapidly. It is used to determine the
probability of obtaining a certain number of successes that can take place in a certain time (or
space) interval. Examples of situations where this is applicable:
- the number of phone calls per hour
- the number of misprints on a page of a book
- the number of fish caught in a day
- the number of car accidents on a given road per month
It is defined with the equation P(X  x) 
m x e m
, x  0,1, 2, 3, 4,... where m is the parameter.
x!
Conditions for a Poisson distribution:
- the average number of occurrences (  ) is constant for each time interval.
- The possibility of more than one occurrence in a given time interval is very small (the
number of occurrences in a given interval is small… about 10% or less)
- The number of occurrences within the intervals are independent of each other.
Exploration
Use a graphing calculator to do the following exploration.
- Find the mean and the standard deviation of the table which represent the number of
errors, X, and their frequency.
X
Frequency
-
1
11
2
16
3
18
4
15
5
9
6
5
7
1
8
1
0
1
2
3
4
5
6
7
8
9
Find the mean and standard deviation of this model
What do you notice?
If the sum continued to infinity, what would the mean and standard deviation be?
The Expected Value (mean) and Standard Deviation of a Binomial Distribution
These are quite easy:
The mean is m and the variance is m hence the standard deviation is
We can write Po(m) to represent a distribution of mean m.
Proof
9
0
10
1
Make a table of X and probability P(X=x) using the mean as m in the Poisson model for x
= 0 to 10
X
Frequency
-
0
3
m.
10
n
E(X)   xp(x)
x0
m x e m
 x
x!
x0

m x e m
since x
 0 when x  0
x!

m x e m
 x
x!
x 1

 x
x 1
m  m x 1e m
x(x  1)!
m x 1e m
x 1 (x  1)!

 m

 m

m x 1e m 


x 1 0 (x  1)! 

The sum describes the terms of another Poisson Distribution
whose start value is x-1.
m x 1e m
1

x 1 0 (x  1)!

Therefore:
E(X)  m
Var(X)  E(X 2 )  E(X)2
 E(X 2 )  m 2
Need E(X 2 )

E(X 2 )   x 2 P(X  x)
x0

  x(x  1  1)
x0

  x(x  1)
x0

  x(x  1)
x0
m x e m
x!
m x e m
m x e m
x
x!
x!
m x e m  m x e m
 x
x!
x!
x0
m x e m
  x(x  1)
m
x!
x0

m x e m
Since x(x  1)
 0 for x=0 and x=1
x!

m x e m
  x(x  1)
m
x!
x2
m 2 m x  2 e m
m
(x  2)!
x2


m x  2 e m
m
x  2 (x  2)!
The sum describes the terms of another Poisson Distribution
whose start value is x-1.

 m2 
m x  2 e m
1
x  2 (x  2)!

Therefore:
E(X 2 )  m 2  m
Var(X)  E(X 2 )  E(X)2


 m2  m  m2
m
Var(X)  m
Example: One gram of a radioactive substance is positioned so that each emission of an alpha
particle will flash on a screen. The emissions over 500 periods of 10 second duration are given in
the following table:
Number/Period 0
1
2
3
4
5
6
7
Frequency
91
156
132
75
33
9
3
1
(a) Find the mean of the distribution.
(b) Fit a Poisson model to the data and compare the actual data to that of this model.
(c) Find the standard deviation of the distribution. How close I it to m found in (a)?
Example: Top cars rent cars to tourists. They have four cars, which are hired out on a daily
basis. The number of requests per day is distributed according to the Poisson model with a mean
of 3. Determine the probability that:
(a) none of the cars are rented;
(b) At least 3 of the cars are rented;
(c) Some requests will be refused;
(d) All are hired out given that at least two are.
EXTRA: Hypergeometric Distribution
The Hypergeometric Distribution is another special type of a discrete distribution. We must
select a sample from a population without replacement. Each must have only a success or failure
outcome.
For a population of size N, which is known to contain D ‘defective items’, if we select random
sample of size n from this population (and do so without replacement), then, if we define the
random variable X, to be the number of defectives observed in the sample of size n, we say that
X has a Hypergeometric Distribution. We have
 D  N  D
 

x
nx 
P( X  x)    
, x  0,1, 2.., n
N
 
n
Example: Of the 15 light bulbs in a box, 5 are defective. Four bulbs are selected at random from
the box. Let the random variable X represent the number of defective bulbs selected. Construct a
table to represent this distribution and show that the sum of the probabilities is 1.
Example: A sports committee at the local hospital consists of 5 members. A new committee is to
be elected, from 5 women and 4 men. What is the probability that the committee will consist of 3
women?
Binomial or Hypergeometric?
An accounting population consists of 2000 line items of which 10% are incorrectly stated.
Find the probability that no more than 2 incorrectly stated accounts will be found in a sample
of size 10.
Let X = number of incorrectly stated accounts.
P(X ≤2) = P(X = 0) + P(X = 1) + P(X = 2)
With Replacement Sampling
P(X  0)  0.910  .3487
 10 
P(X  1)    0.11 0.9 9  .3874
1 
 10 
P(X  2)    0.12 0.9 8  .1937
2 
P(X ≤2) = .9298
Without Replacement Sampling
1800 


10 

P( X  0) 
 .3476
 2000 


10 
 200 1800 



1  9 

P( X  1) 
 .3881
 2000 


10 
 200 1800 



2  8

  .1939
P( X  2) 
 2000 


10 
P(X ≤2) = .9296
Binomial or Hypergeometric?
What is the probability of getting no more than 2 misstated accounts in a simple random
sample of size 10 drawn without replacement from an accounting population which has a
10% misstatement rate.
Pop Size
20
200
2000
Bin Approx
P(X = 0)
.2368
.3398
.3476
.3487
P(X = 1)
.5263
.3974
.3881
.3874
P(X = 2)
.2368
.1975
.1939
.1937
P(X ≤ 2)
.9999
.9347
.9296
.9298