Probability for Computer Scientists
This material is provided for the educational use
of students in CSE2400 at FIT. No further use
or reproduction is permitted.
Copyright G.A.Marin, 2008,
All rights reserved.
Applied Statistics: Probability
1-1
Permutations and Combinations
Suppose that we have n objects O1 , O2 ,..., On .
A permutation of order k is an "ordered" selection of k of these for 1 ≤ k ≤ n.
A combination of order k is an "unordered" selection of k of these.
Common notation: P ( n, k ) or Pkn = n(n − 1)
(n − k + 1) = n k
⎛ n ⎞ nk
C ( n, k ) = C = ⎜ ⎟ = .
⎝k ⎠ k!
Example: Given the 5 letters a,b,c,d,e how many ways can we list 3 of the 5
when order is important?
n
k
Answer: P35 =53 =5*4*3=60.
Note that each choice of 3 letters (such as a,c,e) results in 6 different results:
ace, aec, cae, cea, eac, eca...
Example: Given the 5 letters above how many ways can we choose 3 of the 5
when order is NOT important?
⎛ 5 ⎞ 53 5*4*3
= 10. In this case we have the 60 that result when we care about order
Answer: ⎜ ⎟ = =
3
3!
3*2*1
⎝ ⎠
divided by 6 (the number of orderings of 3 fixed letters).
Applied Statistics: Probability
1-2
Definition
n k = n × (n − 1) × × (n − k + 1) for any positive integer n and for integers k
such that 1 ≤ k ≤ n. This symbol is pronouced "n to the k falling."
Examples: 6 3 = 6 × 5 × 4 = 120.
36 is not defined (for our purposes).
55 = 5!
⎛ n ⎞ nk
Again: P = n and ⎜ ⎟ = .
⎝k ⎠ k!
n
k
k
Applied Statistics: Probability
1-3
Permutations of Multiple Types
The number of permutations of n = n1 + n2 +
+ nr objects of which n1 are of
one type, n2 are of a second type, ..., and nr are of an rth type is
n!
.
n1 !n2 ! nr !
Example: Suppose we have 2 red buttons, 3 white buttons, and 4 blue buttons.
How many different orderings (permutations) are there?
Answer:
9!
= 1260.
2!3!4!
Applied Statistics: Probability
1-4
Try These
There are 12 marbles in an urn. 8 are white and 4 are red. The white marbles
are numbered w1,w2,...,w8 and the red ones are numbered r1,r2,r3,r4.
For (a) - (d): Without looking into the urn you draw out 5 marbles.
(a) How many unique choices can you get if order matters? 12 5 = 95, 040
⎛12 ⎞
(b) How many unique choices can you get if order does not matter? ⎜ ⎟ = 792
⎝5 ⎠
(c) How many ways can you choose 3 white marbles and 2 red marbles if
order matters? You will fill 5 "slots" by drawing. First determine which
⎛5⎞
two slots (positions) will be occupied by 2 red marbles: ⎜ ⎟ = 10. Next
⎝ 2⎠
multiply by orderings of 3 white and 2 red: 10i8 3 i4 2 = 40,320.
(d) How many ways can you choose 3 white marbles and 2 red marbles if
⎛ 8 ⎞⎛ 4 ⎞
order does not matter? ⎜ ⎟⎜ ⎟ = 336
⎝ 3 ⎠⎝ 2 ⎠
(e) How many marbles must you draw to be sure of getting two red ones? 10
Applied Statistics: Probability
1-5
Complex Combinations
How many ways are there to create a “full house” (3-of-akind plus a pair) using a standard deck of 52 playing
cards?
⎛13 ⎞ ⎛ 4 ⎞⎛12 ⎞ ⎛ 4 ⎞
⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ = 13i4i12i6 = 3, 744.
⎝1 ⎠ ⎝ 3 ⎠⎝1 ⎠ ⎝ 2 ⎠
(choose denomination)x(choose 3 of 4 of given
denomination)x(choose one of the remaining
denominations)x(choose 2 of 4 of this second
denomination).
This follows from the multiplication principle (Theorem 2.3.1 in text).
Applied Statistics: Probability
1-6
Try these…
⎛ n ⎞ ⎛ n⎞
Suppose ⎜ ⎟ = ⎜ ⎟ . What is n ?
⎝11⎠ ⎝ 7 ⎠
⎛18 ⎞ ⎛18 ⎞
Suppose ⎜ ⎟ = ⎜
⎟ . What is r ?
⎝ r ⎠ ⎝ r − 2⎠
Applied Statistics: Probability
1-7
Examples*
Consider a machining operation in which a piece of sheet metal
needs two identical diameter
holes drilled and two identical size notches cut. We denote a
drilling operation as d and a
notching operation as n. In determining a schedule for a machine
shop, we might be interested
in the number of different possible sequences of the four
operations. The number of possible
sequences for two drilling operations and two notching operations
is
4!
2!2!
=6
The six sequences are easily summarized: ddnn, dndn, dnnd, nddn,
ndnd, nndd.
*Applied Statistics and Probability for Engineers, Douglas C. Montgomery,
George C. Runger, John Wiley & Sons, Inc. 2006
Applied Statistics: Probability
1-8
Example*
A printed circuit board has eight different locations in which a
component can be placed.
If five identical components are to be placed on the board, how
many different designs are possible?
Each design is a subset of the eight locations that are to contain
the components. The number of possible designs is, therefore,
⎛ 8 ⎞ ⎛ 8 ⎞ 83 8*7 *6
=
= 56.
⎜ ⎟=⎜ ⎟=
⎝ 5 ⎠ ⎝ 3 ⎠ 3! 3* 2*1
*Applied Statistics and Probability for Engineers, Douglas C. Montgomery,
George C. Runger, John Wiley & Sons, Inc. 2006
Applied Statistics: Probability
1-9
Sample Space
Definition: The totality of the possible
outcomes of a random experiment is called
the Sample Space, Ω.
Finite
Outcome from one roll of one die ⇒ Ω = {1, 2,3, 4,5, 6} .
Countable
The number of attempts until a message is transmitted successfully
when the probability of success on any one attempt is p
⇒ Ω = {1, 2,3, 4,5, 6,...} =
+
.
Continuous (We begin with the discrete
The time (in seconds) until a lightbulb burns out
⇒ Ω = {t ∈
: t ≥ 0} , where
cases.)
is the set of all real numbers.
Applied Statistics: Probability
1-10
Events
Definition: An event is a collection of points from the
sample space. Example: the result of one throw of die is
odd.
We use sets to describe events.
From the die example let the set of "even" outcomes be E = {2, 4, 6} .
Let the set of "odd" outcomes be O = {1,3,5} .
If
Ω is finite or countable, then a “simple” event is an event
that contains only one point from the sample space.
For the die example the simple events are S1 = {1} , S 2 = {2} ,..., S6 = {6} .
Suppose we toss a coin until first Head appears. What are
the simple events?
Unless stated otherwise, ALL SUBSETS of a sample space
are included as possible events. (Generally we will not be
interested in most of these, and many events will have
probability zero.)
Applied Statistics: Probability
1-11
Describe the sample space and events
Each of 3 machine parts is classified as
either above or below spec.
At
least one part is below spec.
An order for an automobile can specify
either an automatic or standard
transmission, premium or standard stereo,
V6 or V8 engine, leather or cloth interior,
and colors: red, blue, black, green, white.
Orders
have premium stereo, leather interior,
and a V8 engine.
Applied Statistics: Probability
1-12
Describe: sample space and events
The number of hours of normal use of a lightbulb.
Lightbulbs that last between 1500 and 1800 hours.
The individual weights of automobiles crossing a
bridge measured in tons to nearest hundredth of a
ton.
Autos crossing that weigh more than 3,000 pounds.
A message is transmitted repeatedly until
transmission is successful.
Those messages transmitted 3 or fewer times.
Applied Statistics: Probability
1-13
Operations on Events
Because the sample space is a set, Ω, and any event is a subset A ⊂ Ω, we
form new events from existing events by using the usual set theory operations.
A ∩ B ⇒ Both A and B occur.
A ∪ B ⇒ At least one of A or B occurs.
A ⇒ A does not occur.
S ∩ A = S − A ⇒ S occurs and A does not occur.
∅ ⇒ the empty set (a set that contains no elements).
A ∩ B = ∅ ⇒ A and B are "mutually exclusive."
A ⊂ B ⇒ Every element of A is an element of B, or, if A occurs, B occurs.
Review Venn diagrams (in text).
Applied Statistics: Probability
1-14
Example
Four bits are transmitted over a digital communications channel. Each bit is
either distorted or received without distortion. Let Ai denote the event that
the ith bit is distorted, i = 1, 2,3, 4.
(a) Describe the sample space.
(b) What is the event A1 ?
(c) What is the event A1 ∪ A2?
(c) What is the event A1 ∩ A2?
(d) What is the event A1′ ?
Applied Statistics: Probability
1-15
Venn Diagrams
Identify the following events:
B
A
(a ) A′
(b) A ∩ B
(c ) ( A ∩ B ) ∪ C
C
(d )
( B ∪ C )′
(e)
( A ∩ B )′ ∪ C
Applied Statistics: Probability
1-16
Mutually Exclusive &
Collectively Exhaustive
A collection of events
mutually exclusive if
Ai ∩ Aj =
A1 , A2 ,... is said to be
{
φ if i ≠ j
Ai = Aj if i = j.
A collection of events is collectively exhaustive if
∪ Ai = Ω.
i
A collection of events forms a partition of Ω if
they are mutually exclusive and collectively
exhaustive. A collection of mutually exclusive
events forms a partition of an event E if ∪ A = E.
i
i
Applied Statistics: Probability
1-17
Partition of
Ω
A1 A2
An −1
An
The sets Ai are "events." No two of them intersect (mutually exclusive) and
their union covers the entire sample space.
Applied Statistics: Probability
1-18
Probability measure
We use a probability measure to represent
the relative likelihood that a random event
will occur.
The probability of an event A is denoted
P( A).
Axioms:
A 1. For every event A , P ( A ) ≥ 0.
A 2. P ( Ω ) = 1.
A 3. If A and B are m utually exclusive, then
P (A ∪ B )= P (A )+ P (B ).
A 4. If the events A1 , A2 , ... are m utually exclusive, then
⎡ ∞
⎤
P ⎢ ∪ An ⎥ =
⎣ n =1 ⎦
∞
∑ P(A
n =1
n
).
Applied Statistics: Probability
1-19
Theorem:
Given a sample space, Ω, a "well-defined" collection of events,
F, and a probability measure, P, defined on these events then
the following hold:
(a) P ( ∅ ) = 0.
(b) P [ A] = 1 − P ⎡⎣ A⎤⎦ , ∀A ∈ F.
(c) P [ A ∪ B ] = P [ A] + P [ B ] − P [ A ∩ B ] , ∀A, B ∈ F.
(d) A ⊂ B ⇒ P [ A] ≤ P [ B ] , ∀A, B ∈ F.
You must “know” these and be able to use them to solve
problems. Don’t worry about proving them.
Applied Statistics: Probability
1-20
Applying the Theorem
We roll 1 die and obtain one of the numbers 1 through 6 with equal probability.
(a) What is the probability that we obtain a 7?
The event we want is ∅; thus, the probability is 0.
(b) What is the probability that we do NOT get a 1?
5
The event we want is {1}′ or Ω ∼ {1} , and P ⎡{1}′ ⎤ = 1 − P ⎡⎣{1}⎤⎦ = .
⎢⎣ ⎥⎦
6
(c) What is the probability that we get a 1 or a 3?
1 1 1
+ = .
6 6 3
(d) If E = {1, 4,5, 6} , and E ⊂ G , what might the event G be?
P ⎡⎣{1} ∪ {3}⎤⎦ = P ⎡⎣{1}⎤⎦ + P ⎡⎣{3}⎤⎦ =
G = {1, 2, 4,5, 6} , G = {1,3, 4,5, 6} , G = {1, 2,3, 4,5, 6} , or
G = E.
Note that in all of these cases P [ E ] ≤ P [G ] .
Applied Statistics: Probability
1-21
Assigning Discrete Probabilities
When there are exactly n possible outcomes of an experiment, x1 , x2 ,..., xn then
the assigned probabilities, p ( xi ) , i = 1, 2,...n, must satisfy the following:
(1) 0 ≤ p ( xi ) ≤ 1, i = 1, 2,..., n.
n
(2)
∑ p ( x ) = 1.
i =1
i
1
If all of the outcomes have equal probability, then each p ( xi ) = ; thus, the
n
1
probability of any particular outcome on the roll of a fair die is .
6
Suppose, however, we have a biased die and the probability of a 4 is 3 times
more likely than the probability of any other outcome. This implies that
p (1) = p ( 2 ) = p ( 3) = p ( 5 ) = p ( 6 ) = a (for example) and p ( 4 ) = 3a.
1
1
3
It follows that 8a = 1 ⇒ a = . Thus, p ( i ) = , i ≠ 4, and p (4) = .
8
8
8
Applied Statistics: Probability
1-22
Complex Combinations
How many ways are there to create a “full house” (3-of-akind plus a pair) using a standard deck of 52 playing
cards?
⎛13 ⎞ ⎛ 4 ⎞ ⎛12 ⎞ ⎛ 4 ⎞
⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ = 13i4i12i6 = 3, 744.
⎝1 ⎠ ⎝ 3 ⎠ ⎝1 ⎠ ⎝ 2 ⎠
(choose denomination)x(choose 3 of 4 of given
denomination)x(choose one of the remaining
denominations)x(choose 2 of 4 of this second
denomination).
This follows from the multiplication principle (Theorem 2.3.1 in text).
Applied Statistics: Probability
1-23
What is the probability of a “full house”?
In discrete problems we interpret probability as a ratio:
number of successful outcomes
successes
.
=
total number of outcomes.
successes + failures
In this case the number of successful outcomes is the number
of ways to get a full house (3,744). The total number of outcomes is:
⎛ 52 ⎞ 525 52i51i50i49i48
=
= 2,598,960.
⎜ ⎟=
5i4i3i2i1
⎝ 5 ⎠ 5!
3,744
= 0.00144
Thus, the probability of getting a full house is
2,598,960
This is an example of a hypergeometric distribution; we'll study this soon.
A full house happens about once in every 694 hands! This is why
people invented wild cards.
Applied Statistics: Probability
1-24
Conditional Probability
The conditional probability of A given
that B has occurred is
P( A ∩ B)
, provided P ( B ) ≠ 0.
P( A | B) =
P( B)
Q1: What is the probability of obtaining
a total of 8 when rolling two dice?
Q2: Suppose you roll two dice that you
cannot see. Someone tells you that the
sum is greater than 6. What is the
probability that the sum is 8?
Applied Statistics: Probability
1-25
Dice Problem
Let A be the event of getting 8 on the roll of two dice. Let B be the event
that the sum of the two dice is greater than 6. The first question is Find P( A).
Here is the sample space:
(1,1) (1,2 )(1,3)(1, 4 )(1,5 ) (1, 6 )
(2,1) ( 2,2 )( 2,3)( 2, 4 )( 2,5 )( 2, 6 )
(3,1) ( 3,2 )( 3,3)( 3, 4 )( 3,5 )( 3, 6 )
(4,1) ( 4,2 )( 4,3)( 4, 4 )( 4,5 )( 4, 6 )
(5,1) ( 5,2 )( 5,3)( 5, 4 )( 5,5 )( 5, 6 )
(6,1) ( 6,2 )( 6,3)( 6, 4 )( 6,5 )( 6, 6 )
Sum=8.
P( A) =
5
.
36
because we are assuming that
each outcome pair has the
same probability, 1/36.
Applied Statistics: Probability
1-26
Dice Problem (conditional)
Here we roll the dice and learn that the sum is greater than 6. Let B
represent the event that the sum is greater than 6. With this
knowledge the sample space becomes the following:
(1, 6 )
( 2,5)( 2, 6 )
( 3, 4 )( 3,5)( 3, 6 )
( 4,3)( 4, 4 )( 4,5)( 4, 6 )
( 5,2 )( 5,3)( 5, 4 )( 5,5)( 5, 6 )
(6,1) ( 6,2 )( 6,3)( 6, 4 )( 6,5 )( 6, 6 )
It follows that P( A | B ) =
5
.
21
Alternatively, by definition of conditional probability, we have
P( A ∩ B ) P( A)
P( A | B) =
=
because A ∩ B = A.
P( B)
P( B)
Note well!
5
P ( A) 36 5
=
= .
Furthermore,
21
P( B)
21 So…the definition makes sense!
36
Applied Statistics: Probability
1-27
Try this.
A university has 600 freshmen, 500 sophomores, and 400 juniors.
80 of the freshmen, 60 of the sophomores, and 50 of the juniors
are Computer Science majors. For this problem assume there are
NO seniors.
What is the probability that a student, selected at random, is
a freshman or a CS major (or both)?
If a student is a CS major, what is the probability he/she is a
sophomore?
Applied Statistics: Probability
1-28
Use these steps to solve previous slide
What is the sample space?
2. What are the events (subsets) of
interest?
3. What are the probabilities of the events
of interest?
4. What is the answer to the problem?
1.
Applied Statistics: Probability
1-29
Alternate Form
We have seen that the conditional probability of event A given that event
B has occurred is: P( A | B) = P(PA(∩B)B) , provided P ( B ) ≠ 0.
Clearly this implies that P( A ∩ B) = P( A | B) P( B). This is referred to as the
"multiplication rule," and holds even when P ( B ) = 0. Notice that we could also
write P( A ∩ B) = P( B | A) P( A). Both these equations always hold for any
two events. But there is a special case where the conditional probabilities above
are not needed.
Note: memorize these conditional probability equations TODAY. They are
extremely important.
Applied Statistics: Probability
1-30
Independent Events
Two events A and B are independent iff
the probability P(A∩B) =P(APB
) ( ).
Example (dice)
Q1:
If one die is rolled twice, is the probability
of getting a 3 on the first roll independent of
the probability of getting a 3 on the second
roll?
Q2: If one die is rolled twice, is the
probability that their sum is greater than 5
independent of the probability that the first
roll produces a 1?
Applied Statistics: Probability
1-31
Dice sample spaces (Q1)
The sample space associated with one roll of a die: Ω1 = 1, 2,3, 4,5, 6.
Unless otherwise stated we assume the die is fair so that the probability of
1
any one of the simple events is . The sample space associated with two
6
rolls of one die (or with one roll of a pair of dice):
(1,1) (1,2 )(1,3)(1, 4 )(1,5 )(1, 6 )
(2,1) ( 2,2 )( 2,3)( 2, 4 )( 2,5 )( 2, 6 )
(3,1) ( 3,2 )( 3,3)( 3, 4 ) ( 3, 5 )( 3, 6 )
(4,1) ( 4,2 )( 4,3)( 4, 4 )( 4,5 )( 4, 6 )
(5,1) ( 5,2 )( 5,3)( 5, 4 )( 5,5 )( 5, 6 )
(6,1) ( 6,2 )( 6,3)( 6, 4 )( 6,5 )( 6, 6 )
1
1
Clearly P(3,3) = . P(3 on first roll) = P(3 on second roll) = .
36
6
1 1 1
Because × = , the two events are independent.
6 6 36
Applied Statistics: Probability
1-32
Dice: Q2
1
. Let G5 be the event that
6
the sum of the two dice is greater than 5 and F1 be the event that the first roll
The probability of getting a 1 on the first die is
produces a 1. The sample space is:
F1 ∩ G 5
(1,1) (1,2 ) (1,3)(1, 4 )(1,5 )(1, 6 )
(2,1) ( 2,2 )( 2,3)( 2, 4 )( 2,5 )( 2, 6 )
(3,1) ( 3,2 )( 3,3)( 3, 4 )( 3,5 )( 3, 6 )
(4,1) ( 4,2 )( 4,3)( 4, 4 )( 4,5 )( 4, 6 )
(5,1) ( 5,2 )( 5,3)( 5, 4 )( 5,5 )( 5, 6 )
(6,1) ( 6,2 )( 6,3)( 6, 4 )( 6,5 )( 6, 6 )
P [ F 1 ∩ G 5] =
⇒
2
1
= .
36 18
26 13
P [ G 5] =
= .
36 18
1
P [ F1] = .
6
P [ F 1 ∩ G 5] =
1
1 13 13
≠ P [ F1] P [G5] = × =
.
18
6 18 108
Thus, these two events are NOT independent.
Applied Statistics: Probability
1-33
Practice Quiz 1 – Explain your work
as you have been taught in class.
1.
A university has 600 freshmen, 500 sophomores,
and 400 juniors. 80 of the freshmen, 60 of the
sophomores, and 50 of the juniors are Computer
Science majors. For this problem assume there are
NO seniors. If a student is a CS major, what is the
probability that he/she is a Junior?
∞
i
⎛1⎞
2. Evaluate ∑ ⎜ ⎟ .
i =3 ⎝ 4 ⎠
3. What is the probability of drawing 2 pairs in a draw
of 5 cards from a standard deck of 52 cards? (A
pair is two cards of the same denomination – such
as two aces, two sixes, or two kings.)
Applied Statistics: Probability
1-34
Multiplication and Total Probability
Rules*
Multiplication Rule
*This slide from Applied Statistics and Probability
for Engineers,3rd Ed ,by Douglas C. Montgomery and George C. Runger,
John Wiley & Sons, Inc. 2006
Applied Statistics: Probability
1-35
Multiplication and Total Probability
Rules*
*This slide from Applied Statistics and Probability
for Engineers,3rd Ed ,by Douglas C. Montgomery and George C. Runger, John
Wiley & Songs, Inc. 2006
Applied Statistics: Probability
1-36
Multiplication and Total Probability
Rules*
Total Probability Rule
Partitioning an event into two
mutually exclusive subsets.
Partitioning an event into several
mutually exclusive subsets.
*This slide from Applied Statistics and Probability
for Engineers,3rd Ed ,by Douglas C. Montgomery and George C. Runger,
John Wiley & Sons, Inc. 2006
Applied Statistics: Probability
1-37
Problem 2-97a
A batch of 25 injection-molded parts
contains 5 that have suffered excessive
shrinkage. If two parts are selected at
random, and without replacement, what is
the probability that the second part
selected is one with excessive shrinkage?
S={pairs
(f,s) of first-selected, secondselected taken from 25 total with 5 defects}
SD={second selected (no replace) is a defect}
FD={first selected is a defect}
FN={first selected is not a defect}.
Applied Statistics: Probability
1-38
Problem Solution
We seek P[SD]=P[SD ∩ FD]+P[SD ∩ FN]
This becomes P[ SD | FN ]P[ FN ] + P[ SD | FD]P[ FD]
5 4 4 5 1 1 1
= * + * = +
= = 0.2.
24 5 24 25 6 30 5
Applied Statistics: Probability
1-39
Multiplication and Total Probability
Rules*
Total Probability Rule (multiple events)
*This slide from Applied Statistics and Probability
for Engineers,3rd Ed ,by Douglas C. Montgomery and George C. Runger,
John Wiley & Sons, Inc. 2006
Applied Statistics: Probability
1-40
Total Probability Example
A semiconductor manufacturer has the following data regarding
the effect of contaminants on the probability that chips fail.
Probability of Failure
Level of Contamination
0.1
High
0.01
Medium
0.001
Low
In a particular production run 20% of the chips have high-level, 30%
have medium-level, and 50% have low-level contamination. What is
the probability that one of the resulting chips fails?
Applied Statistics: Probability
1-41
Bernoulli Trials
“Consider an experiment that has two possible outcomes,
success and failure. Let the probability of success be p and
the probability of failure be q where p+q=1. Now consider
the compound experiment consisting of a sequence of n
independent repetitions of this experiment. Such a
sequence is known as a sequence of Bernoulli Trials.”
The probability of obtaining exactly k successes in a
sequence of n Bernoulli trials is the binomial probability
p (k ) = ( nk ) p k q n − k .
Note that the sum of the probabilities
∑ p(k ) = 1. Thus they
k
are said to form a probability distribution.
Applied Statistics: Probability
1-42
Probability Distribution
When we take a discrete or countable sample space Ω = {s1 , s2 ,...} and assign
probabilities to each of the possible simple events: P({s1}) = p1 , P({s2 }) = p2 ,...,
we have created a probability distribution. (Think that you have "distributed"
all of the probability over all possible events.) As an example, if I toss a coin
1
1
one time then P( H ) = and P (T ) = represents a probability distribution.
2
2
The single coin toss distribution also is an example of a Bernoulli trial because
it has only two possible outcomes (generally called "success" or "failure).
Applied Statistics: Probability
1-43
Binomial Probability Distribution
The binomial probabilities are defined by p( k ) = ( nk ) p k q n − k , where
p is the probability of success and q is the probability of failure in n Bernoulli
trials. Suppose we toss a coin 10 times and we want the total number of heads.
Then p = P [ H ] , q = P [T ] , n = 10. Using the above formula we obtain the
probabilities:
Binomial n=10 p=0.5
3.00E-01
2.50E-01
2.00E-01
1.50E-01
probabilities
1.00E-01
5.00E-02
0.00E+00
0 1 2 3 4 5 6 7 8 9 10
Applied Statistics: Probability
1-44
Regarding Parameters
Notice that the binomial distribution is completely defined by the formula
for its probabilities, p (k ) = ( nk ) p k q n − k , and by it "parameters" p and n.
The binomial probability equation never changes so we regard a binomial
distribution as being defined by its parameters. This is typical of all probability
distributions (using their own parameters, of course).
One of the problems we often face in statistics is estimating the parameters
after collecting data that we know (or believe) comes from a particular
probability distribution (such as the p and n for the binomial). Alternatively,
we may choose to estimate "statistics" such as mean and variance that are
functions of these parameters. We'll get to this, after we consider random
variables and the the continuous sample space.
Applied Statistics: Probability
1-45
Example (from Trivedi*)
Consider a binary communication channel transmitting coded words of n bits
each. Assume that the probability of successful transmission of a single bit is
p and that the probability of an error is q = 1 − p. Assume also that the code
is capable of correcting up to e errors, where e ≥ 0. If we assume that the
transmission of successive bits is independent, then the probability of successful word transmission is:
Pw = P [ e or fewer errors in n trials]
⎛ n ⎞ i n −i
= ∑⎜ ⎟q p .
i =0 ⎝ i ⎠
Notice that a "success for the Binomial distribution" means getting an error,
which has probability q.
e
*Probability and Statistics with Reliability, Queuing and Computer Science
Applications, 2nd Ed, Kishor S. Trivedi, J. Wiley & Sons, NY 2002
Applied Statistics: Probability
1-46
Example
A communications network is being shared by 100 workstations. Time is
divided into intervals that are 100 ms long. One and only one workstation
may transmit during one of these time intervals. When a workstation is
ready to transmit, it will wait until the beginning of the next 100ms time
interval before attempting to transmit. If more than one workstation is ready
at that moment, a collision occurs; and each of the k ready workstations waits
a random amount of time before trying again. If k = 1, then transmission is
successful. Suppose the probability of a workstation being ready to transmit
is p. Show how probability of collision varies as p varies between 0 and 0.1.
Applied Statistics: Probability
1-47
Practice Quiz 2
A partial deck of playing cards (fewer than 52 cards)
contains some spades, hearts, diamonds, and clubs (NOT 13
of each “suit”). If a card is drawn at random, then the
probability that it is a spade is 0.2. We write this as
P[Spade]=0.2. Similarly, P[Heart]=0.3, P[Diamond]=0.25,
P[Club]=0.25. Each of the 4 suits has some number of
“face” cards (King, Queen, Jack). If the drawn card is a
spade, the probability is 0.25 that it is a face card. If it is
a heart, the probability is 0.25 that it is a face card. If it
is a diamond, the probability is 0.2 that it is a face card.
If it is a club, the probability is 0.1, that it is a face card.
1.
2.
3.
What is the probability that the randomly drawn card is a
face card?
What is the probability that the card is a Heart and a face
card?
If the card is a face card, what is the probability that it is a
spade?
Applied Statistics: Probability
1-48
Discrete Random Variables
G. A. Marin
Applied Statistics: Probability
1-49
Review of “function”
Defn: A function is a set of ordered pairs such that no two pairs have the same
first element (unless they also have the same second element).
{
(
)
}
Example: g = (1,2 ) , 3, 5 , ( 5,12 ) defines a function, g , whose "domain"
consists of the real numbers 1,3,5 and whose "range" consists of the numbers
2, 5,12. All functions are said to "map" values in their domain to values in
their range.
Example: f ( x) = x 2 + 5. Here a function is defined using a formula.
This actually implies the the function is f =
{( x, x
2
}
+ 5 ) : x is a real number .
Notice the following:
(a) The function has a "name." Here that name is f .
(b) The implied domain of the function includes all real numbers, x, that
can be plugged into the formula. In this case that includes all real no's.
(c) Every number x in the domain (all reals) is "mapped" to the number
x 2 +5. Thus f (1) = 6, f (−5) = 30, f (π ) = π 2 +5.
(d) Sometimes we write this as 1 → 6, -5 → 30, π → π 2 +5.
(e) The range of f is { x : x ≥ 5} .
Applied Statistics: Probability
1-50
Random Variable
Definition: A random variable
X on a
sample space Ω is a function that assigns a
real number x to each sample point s ∈ Ω.
The inverse image of X ( s ) is the set of all
points in Ω that the random variable X
maps to the value x .
It is denoted Ax = {s ∈ Ω | X ( s ) = x}.
Ω discrete ⇒ X discrete
Ω continuous ⇒ X continuous
Applied Statistics: Probability
1-51
Random Variable
Ω = {s1 , s2 , s3 ,...}
X
= ( −∞, ∞ )
We write X ( s ) = x, where s ∈ Ω, and x ∈ .
We define Ax as the set of all points in Ω that "map" into the value x ∈ .
Sometimes we write Ax = X ( x ) and state that Ax is the "inverse image"
of the value x under the random variable X . For discrete random variables,
we then define the probability of the value x to equal P [ Ax ] .
p X ( x) = P [ Ax ] .
OR we may be given a discrete (continuous later) random variable, a description
of the values it can produce and the probability of each value. For example,
For k = 1, 2,..., n, P [ X = k ] = pk . In this case we need not know what the
underlying experiment really is.
Applied Statistics: Probability
1-52
The role of a random variable
Experiment 1: Roll 1 fair die and determine the outcome.
Experiment 2: Spin an arrow that lands with equal probability on one of
the numbers 1 through 6.
Experiment 3: You have 6 cards numbered 1 through 6. Shuffle them
and draw one at random. Replace the card and reshuffle to repeat.
Notice that we’d represent the sample space of
each of these as {1,2,3,4,5,6} usually without
drawing dice or arrows or cards, but the sample
spaces really include dice, arrows, cards.
For each probability distribution
1
let pi = for i = 1, 2,...6.
6
The importance of the random variable is that it lets us deal with
such an experimental setup without thinking dice, arrows, or cards.
We say: “Let X be a random variable such that takes on the discrete
values 1,2,3,4,5,6. Its probability mass function is given as:
the probability that X=i is 1/6. We write this as
p X (i ) =
1
for i=1,2,...,6.
6
Applied Statistics: Probability
1-53
Probability Mass Function
If X is a discrete random variable, then its probability mass function (pmf)
is given by:
p X ( x ) = P ( X = x ) = P ( Ax ) =
∑ P ( s ).
s∈ Ax
The pmf satisfies the following properties:
(p1) 0 ≤ p X ( x ) ≤ 1 for all values, x , such that P[ X = x ] is defined.
(p2) If X is a discrete random var iable then
∑p
X
( xi ) = 1, where the set { x1 , x2 ,...} includes all real
i
numbers, x , such that p X ( x ) ≠ 0.
Note: you cannot define a pmf without first defining a random
variable. You can, however, define a probability distribution directly
on a sample space with no random variable defined.
Applied Statistics: Probability
1-54
Discrete RV Example 1
Let the sample space Ω represent all possible outcomes of a roll of one die; thus,
Ω = {1, 2,3, 4,5, 6} . We define the random variable X on this sample space as
⎧1
follows: X (i ) = ⎨
⎩0
if i = 1, 2
. Because the probability of rolling a 1 or 2
if i = 3, 4, 5, 6
⎧1
⎪⎪ 3
1
is , we define X's probability mass function as p X ( i ) = ⎨
3
⎪2
⎪⎩ 3
if i = 1
. X has
if i = 0
a Bernoulli distribution. Alternatively, we could just write that p X (1) =
2
p X ( 0 ) = , or we could define the pmf using a table:
3
1
and
3
pmf of X
Value Prob
0
2/3
1
1/3
Applied Statistics: Probability
1-55
Discrete RV Example 2
A die is tossed until the occurrence of the first 6. Let the random variable
X = k if the first 6 occurs on the kth roll for integer k > 0. What is the
probability mass function (pmf) for X ?
In order for the first 6 to occur on the 5th toss, for example, we must have
the event AAAA6 occur where A means any result other than 6. Clearly,
these represent a sequence of 5 Bernoulli trials where success = 6 and
failure = 1 through 5. Each trial is independent; thus, the probability
4
⎛ 5 ⎞ ⎛ 1 ⎞ 625
of this particular result is ⎜ ⎟ ⎜ ⎟ =
= 0.08. Similarly, the probability
6
6
7776
⎝ ⎠ ⎝ ⎠
⎛5⎞
of the first 6 on the kth roll is ⎜ ⎟
⎝6⎠
⎛5⎞
p X (k ) = ⎜ ⎟
⎝6⎠
k −1
k −1
⎛1⎞
⎜ ⎟ . This defines the pmf,
⎝6⎠
⎛1⎞
⎜ ⎟ . This is a particular instance of the geometric distribution.
⎝6⎠
Applied Statistics: Probability
1-56
Useful “die” illustrations
(1) Roll a die once and the probability of getting any one number (choose one of six) is 1/6.
1
The uniform distribution for X : p X ( k ) = , k = 1, 2,..., 6.
6
(2) Roll a die n times and count the number of times, k , that you get, say, a 2. This is
⎛ n⎞⎛ 1 ⎞ ⎛ 5 ⎞
⎜ ⎟⎜ ⎟ ⎜ ⎟
⎝ k ⎠⎝ 6 ⎠ ⎝ 6 ⎠
k
n−k
⎛ n⎞⎛ 1 ⎞ ⎛ 5 ⎞
. The binomial distribution for X : p X ( k ) = ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎝ k ⎠⎝ 6 ⎠ ⎝ 6 ⎠
k
n−k
, k = 0,1,...n.
(3) Roll a die once, twice, ... until you get, say, a 2 for the first time. Suppose that the first
⎛5⎞
time you get the 2 is on the kth roll. The probability of this is ⎜ ⎟
⎝6⎠
⎛5⎞
distribution for X : p X ( k ) = ⎜ ⎟
⎝6⎠
k −1
k −1
1
. The geometric
6
1
, k = 1, 2,....
6
Applied Statistics: Probability
1-57
Probability of Sets & Intervals
For a discrete RV, X , and any set of real numbers, A, we can write:
P( X ∈ A) = ∑ pX ( xi ).
xi ∈A
If A = (a, b), we write: P( X ∈ A) = P(a<X <b).
If A = (a, b], we write: P( X ∈ A) = P(a<X ≤ b), etc.
For any real number x the probability that the random variable X takes
a value in the interval (−∞, x] is especially important and is denoted as:
FX (x) = P(−∞ < X ≤ x) = P( X ≤ x) = ∑ pX (t), where the last equality holds
t ≤x
only for discrete RVs X . The function F is called the cummulative distribution
function (or just the distribution function) of X .
Applied Statistics: Probability
1-58
Simple cdf example
Let X be a random variable with pmf given by:
1
2
5
p X (1) = , p X ( 2 ) = , and p X ( 3) = .
8
8
8
Then the cdf, FX , is given by:
⎧0
⎪1
⎪
⎪8
FX ( x) = ⎨
⎪3
⎪8
⎪1
⎩
for x < 1
for 1 ≤ x < 2
for 2 ≤ x < 3
for x ≥ 3.
NOTICE that F simply adds up the probability mass function's range values
as it gets to them (starting from -∞ and moving towards +∞). F starts at 0
(for a discrete random variable like this one) and adds up the probabilities
until it ends at 1. The function F is defined for ALL REAL NUMBERS.
(Its domain is all reals.) The range of F is always between 0 and 1. The
"meaning" of F is that "FX ( x) is the probability that X ≤ x." For a discrete
random variable the graph of FX is a step function.
Applied Statistics: Probability
1-59
Cumulative Distribution Function Properties
Important: P(a < X ≤ b) = FX (b) − FX (a).
(F1) 0 ≤ FX (x) ≤1.
(F2) F(x) is an increasing function of x.
(F3) lim F(x) = 0 and lim F(x) =1.
x→-∞
x→+∞
(F4) For discrete X that has positive probability only at the values x1, x2...
F has a positive jump at xi equal to pX (xi ) and takes a constant value in the
interval [xi−1, xi ). Thus, it graphs a step function.
Cumulative distribution functions of discrete RVs grow only by jumps, and
cumulative distribution functions of continuous RVs have no jumps. A RV
is said to be of mixed type if it has continuous intervals plus jumps.
Applied Statistics: Probability
1-60
Bernoulli Distribution
The RV, X, is Bernoulli (or has a Bernoulli distribution) if its pmf is given by
p0 = pX (0) = q and
p1 = pX (1) = p where p + q =1.
The corresponding CDF is given by:
F(x) =
{
0 for x<0
q for 0≤x<1
1 for x≥1.
Example: Roll a die once. Let X=1
if the result is 1 or 2. Let X=0
otherwise. This is a Bernoulli trial
with p=1/3 and q=2/3.
Applied Statistics: Probability
1-61
Bernoulli pmf
Bernoulli Distribution p=0.5
0.6
0.5
0.4
Bernoulli Distribution
p=0.5
0.3
0.2
0.1
0
0
1
Applied Statistics: Probability
1-62
Bernoulli cdf p=0.5
⎧0 for x < 0
⎪
Write as: F ( x ) = ⎨0.5 for 0 ≤ x < 1
⎪1 otherwise.
⎩
Notice that the cdf is defined for all real numbers, x.
Applied Statistics: Probability
1-63
Discrete Uniform Distribution
Let X be a random variable that can take any of n values x1 ,x2 ,...,xn
1
. The RV X is said to have a Discrete Uniform
n
Distribution, and has pmf given by:
with equal probability
p X ( xi ) =
{
for i =1,2,..., n
0
otherwise.
1
n
If we let X take on the integer values 1,2,...,n, then its
distribution function is given by
⎧0 for x < 1
⎪ ⎣⎢ x ⎦⎥
⎪ 1 ⎢ x⎥
FX ( x) = ⎨∑ = ⎣ ⎦ for 1 ≤ x ≤ n
n
⎪ i =1 n
⎪1 for x > n.
⎩
Applied Statistics: Probability
1-64
Discrete Uniform pmf n=10
Discrete Uniform pmf
0.12
0.1
0.08
Discrete Unifrom
n=10
0.06
0.04
0.02
0
1
2
3
4
5
6
7
8
9 10
Applied Statistics: Probability
1-65
Discrete Uniform cdf n=10
⎧0 for x < 1
⎪ ⎣⎢ x ⎦⎥
⎢⎣ x ⎥⎦
⎪
FX ( x) = ⎨∑ p X ( xi ) =
for 1 ≤ x ≤ 10
10
⎪ i =1
⎪1 otherwise.
⎩
Applied Statistics: Probability
1-66
Binomial Distribution
Let Yn denote the number of successes in n Bernoulli trials.
The pmf of Yn is given by:
pk = P(Yn = k ) = PYn (k ) =
{
( )p
n
k
0
k
(1− p )n−k
for 0 ≤ k ≤ n , k an integer,
otherwise.
The random variable Yn is said to have a binomial
distribution if
⎧0 for t < 0
⎪ ⎢⎣t ⎥⎦
⎪
P[Yn ≤ t ] = FYn (t ) = ⎨∑( in ) pi (1 − p)n−i for 0 ≤ t ≤ n
Example: Toss a coin 10
⎪ i =0
times and count the total
⎪1 for t > 0.
⎩
number of heads. This is
binomial with p=0.5.
Applied Statistics: Probability
1-67
Probability Mass Function
Binomial n=10 p=0.5
3.00E-01
2.50E-01
2.00E-01
1.50E-01
probabilities
1.00E-01
5.00E-02
0.00E+00
0 1 2 3 4 5 6 7 8 9 10
Applied Statistics: Probability
1-68
Binomial cdf n=10 p=0.5
⎧0 for t < 0
⎪ ⎢⎣t ⎥⎦
⎪
10 −i
i
P[Yn ≤ t ] = FYn (t ) = ⎨∑ ( 10
p
(1
−
p
)
for 0 ≤ t ≤ 10
i )
⎪ i =0
⎪1 otherwise.
⎩
Applied Statistics: Probability
1-69
Geometric Distribution
Consider any arbitrary sequence of Bernoulli trials and let Z
be the number of trials up to and including the first success.
Z is said to have a geometric distribution with pmf given by
pZ (i ) = q i −1 p for i = 1, 2,.... and probabilities p + q = 1 This is
∞
p
well-defined because ∑ pq =
= 1.
1− q
i =1
The distribution function of Z is given by
i-1
⎧0 for t < 1
⎪
FZ (t ) = ⎨ ⎢⎣t ⎥⎦
i −1
⎣⎢t ⎦⎥
p
(1
p
)
1
(1
p
)
for t ≥ 1.
−
=
−
−
⎪∑
⎩ i =1
Example: See the previous example concerning rolling 1 die until a 6 occurs.
Applied Statistics: Probability
1-70
Geometric pmf p=0.5
Geometric pmf Example
0.6
0.5
0.4
0.3
geom p=0.5
0.2
0.1
0
1
2
3
4
5
6
7
8
9
10
Applied Statistics: Probability
1-71
Geometric cdf p=0.5
B
⎧0 for t < 1
⎪
FZ (t ) = ⎨ ⎢⎣t ⎥⎦
⎢⎣t ⎥⎦
i −1
p
(1
p
)
1
(1
p
)
for t > 1.
−
=
−
−
⎪∑
⎩ i =1
Applied Statistics: Probability
1-72
Poisson Distribution
A random variable, X t , has a Poisson Distribution with
parameter α >0 if its pmf is given by:
(α t ) k (e −α t )
P( X t = k ) =
for k = 0,1,.... and t ≥ 0. (A distinct RV for each t.)
k!
NOTE: The Poisson is typically used to model the number of jobs arriving
during time t in a time-share system, the arrival of calls at a switchboard,
the arrival of messages at a terminal, etc. The parameter α is then interpreted
as an arrival rate "per unit time." That is, if t is in seconds, then α must be
the average arrivals per second. (In our text the parameter is given as λ .)
The cumulative distribution function is:
⎧0 for x < 0
⎪
FX t ( x ) = ⎨ ⎢⎣ x ⎥⎦ (α t ) k (e −α t )
for x ≥ 0.
⎪∑
k!
⎩ k =0
Notice that in mathematical notation α does not typically appear on
the left-hand side even though the function is unspecified without it.
Applied Statistics: Probability
1-73
Packet Arrival Example
---- Packet Arrivals ---X1
X2
X3
X4
Each of the random variables
…
X1
XN
X2
…
XN
(α ) k (e −α )
has a Poisson distribution; thus, P( X i = k ) =
for i = 1, 2,..., n.
k!
If Yt represents the total number of arrivals during any time t , then
(α t ) k (e −α t )
P (Yt = k ) =
, per the previous slide.
k!
Applied Statistics: Probability
1-74
Poisson pmf (x,3,1)
Poisson pmf
0.25
0.2
0.15
Poisson alpha=3
0.1
0.05
0
0
1
2
3
4
5
6
7
8
9
10
Applied Statistics: Probability
1-75
Poisson cdf α = 3 and t = 1.
Applied Statistics: Probability
1-76
Poisson Example
Connections arrive at a switch at a rate of 11 per ms. The arrival distribution is Poisson.
(a) What is the probability that exactly 11 calls arrive in one ms? (b) What is the probability
that exactly 100 calls arrive in 10 ms? (c) What is the probability that the number of calls
arriving in 2 ms is greater that 7 and less than or equal to 10?
Let X t be the random variable giving the number of arrivals during t ms. We know
that X t has a Poisson distribution, which implies that P [ X t = k ] =
arrival rate is 11
ms
; thus, P [ X t = k ] =
(11t )
(α t )
( e ) with t in ms.
k!
(11)
(b) Probability of 100 calls in 10 ms is P [ X 10 = 100] =
(c) P [ 7 < X 2 ≤ 10] = ∑
k =8
(e
k!
−11× 2
The
) = 22
8
e 22
( e ) = 0.119.
−11
11!
(11×10 )
100
k
−α t
−11t
k
(a) Probability of exactly 11 arrivals in one ms is P [ X 1 = 11] =
(11× 2 )
(e ) .
k!
11
10
k
(e
100!
−11×10
) = 0.025.
⎛ 1 22 222 ⎞ 228 ⎛ 794 ⎞
⎜ + +
⎟ = 22 ⎜
⎟ = 0.003.
⎝ 8! 9! 10! ⎠ e ⎝ 10! ⎠
Applied Statistics: Probability
1-77
Summary for Discrete Random Variable, X
To define a pmf when X takes integer values write the following:
⎛n⎞
k
n−k
"The pmf is p X ( k ) = an expression often involving k , such as ⎜ ⎟ ( p ) (1 − p ) ."
⎝k ⎠
This means the
probability X = k .
Be sure to specify all possible values of k , such as "for integers k = 1, 2,..., n."
Be sure to use the values of other parameters (p, n, α ...) that are correct for this
particular problem.
To define a cdf write the following:
⎧0
⎪ ⎣⎢ x ⎦⎥
⎪
"The cdf is: FX ( x ) = ⎨∑ ( the expression for p X ( k ) )
⎪ k =1
⎪1
⎩
This means the
probability X ≤ x.
for k < whatever min
for 1 ≤ x ≤ n
"
for k > n
Applied Statistics: Probability
1-78
Practice Quiz 3
A college student phones his girlfriend once each night for three nights. The
1
anytime he calls. Suppose that the random
3
variable X equals the number of nights (out of three) that he is able to reach her.
1. What is the pmf for X ? What is the name of X ' s distribution?
probability that he reaches her is
2. What is the cdf for X ?
3. Now suppose that this student will phone once each night until the first night
that he is able to reach his girlfriend. Let Y be a random variable that equals
the number of nights that it takes him to reach her for the first time. (For example,
Y = 2 if she doesn't answer the first night but does answer the second night.)
What is the pmf for Y ? What is the name of Y ' s distribution?
Applied Statistics: Probability
1-79
Suppose values of X are not integers.
You may have to list each possible value and its probability. For example, suppose that
1
1
1
1
X = − with probability , X = 1 with probability , X = π with probability .
2
6
3
2
You can define the pmf in a table:
This mean the
probability X = x.
x
p X ( x)
1
−
2
1
6
1
1
3
π
1
2
Applied Statistics: Probability
1-80
The cdf for this example…
⎧
⎪0
⎪
⎪⎪ 1
FX ( x) = ⎨ 6
⎪1
⎪
This means the
⎪2
probability X ≤ x. ⎪
⎩1
1
for x < −
2
1
for − ≤ x < 1
2
for 1 ≤ x < π
for x ≥ π .
Applied Statistics: Probability
1-81
Mean and Variance of a Discrete
Random Variable
Definition
Working formula: Var ( X ) = E ( X 2 ) − E 2 ( X ).
Applied Statistics: Probability
1-82
Mean and Variance of a Discrete
Random Variable
Figure 3-5 A probability distribution can be viewed as a loading
with the mean equal to the balance point. Parts (a) and (b)
illustrate equal means, but Part (a) illustrates a larger variance.
Applied Statistics: Probability
1-83
Mean and Variance of a Discrete
Random Variable
Figure 3-6 The probability distribution illustrated in Parts (a)
and (b) differ even though they have equal means and equal
variances.
Applied Statistics: Probability
1-84
Example 3-11
Applied Statistics: Probability
1-85
Properties of E(X) and Var(X)
If X and Y are discrete random variables and a and b are real numbers, then
* E (aX ) = aE ( X )
* E ( X + Y ) = E ( X ) + E (Y ) and E ( aX + bY ) = aE ( X ) + bE (Y )
* Var ( aX ) = a 2Var ( X )
* Var ( X + Y ) = Var ( X ) + Var (Y ), only when X and Y are independent.
and Var ( aX + bY ) = a 2Var ( X ) + b 2Var (Y ) , only when X and Y are independent.
Continuing the example from previous slide:
E (5 X ) = 5(12.5) = 62.5
Var (5 X ) = 25(1.85) = 46.25
Applied Statistics: Probability
1-86
Expected Value vs Average
Suppose we take 10 playing cards numbered Ace, 2,3,4,5,6,7,8,9,10 and arrange
them randomly, face-down, on a table. If we choose one at random (and then replace
and reshuffle), it is equally likely that we get any value between 1 and 10. If the
random variable V gives the value obtained, then V has a discrete uniform distribution
on the integers 1 through 10. If we were asked, "What is the average face value of
1+2+3+4+5+6+7+8+9+10
these 10 cards?", we would compute
= 5.5. If we're
10
asked "What is the expected value of V?", we find 1( 101 ) + 2 ( 101 ) + + 10 ( 101 ) = 5.5.
In fact, for any random variable with discrete uniform distribution (like our "die")
the expected value is the same as the average of all possible values. This is NOT
TRUE for other distributions.
If we actually perform the experiment by drawing 10 times, we are not likely to get
each value exactly once. For example, we might draw 1,5,2,6,8,9,7,2,3,6. The
average of these outcomes is 4.9 - NOT 5.5. The more times we repeat the draw,
the closer we are likely to get to the expected average of 5.5. So you might think
of the expected value of a random variable as the value expected from averaging
many outcomes.
Applied Statistics: Probability
1-87
Binomial Mean and Variance
The mean of a binomial distribution with parameter p is
n
⎛n⎞ k
⎛n⎞ k
n−k
E ( X ) = ∑ k ⎜ ⎟ p (1 − p ) = ∑ k ⎜ ⎟ p (1 − p ) n − k . Let m = k − 1 to get
k =0 ⎝ k ⎠
k =1 ⎝ k ⎠
n
m +1
⎛n
⎞ m +1
n −−−−
n −1− m
n −1− m
m +1
= ∑ (m + 1) ⎜
−
=
+
−
(1
)
(
1)
(1
)
p
p
m
p
p
∑
⎟
( m + 1)!
m =0
m =0
⎝ m + 1⎠
n −1
n −1
(n − 1) m m
= np ∑
p (1 − p ) n −1− m = np.
m!
m=0
Similar work will show that Var(X ) = np(1 − p), but there are much
easier ways to show this.
n −1
Applied Statistics: Probability
1-88
Exercise
The interactive computer system at Gnu
Glue has 20 communication lines to the
central computer system. The lines
operate independently and the probability
that any particular line is in use is 0.6.
What is the probability that 10 or more
lines are in use?
What is the expected number of lines in
use?
What is the standard deviation of lines in
use?
Applied Statistics: Probability
1-89
Binomial Revisited
Recall that the Binomial RV X = X 1 + X 2 + ... + X n where each X i has a
Bernoulli distribution and is mutually independent with the others.
Because E ( X i ) = p, it follows trivially that E ( X ) = np. Because of
independence we can write that
n
Var ( X ) = ∑ Var ( X i ) also. Var ( X i ) = E ( X i2 ) − E 2 ( X ) = p − p 2 = p(1 − p ).
i =1
It follows simply that Var ( X ) = np (1 − p ).
Applied Statistics: Probability
1-90
Poisson Mean and Variance
The mean of a Poisson distribution with parameter λ > 0 is
∞
λ k e−λ
k =0
k!
∑k
λ k −1
∞
= λ e− λ ∑
k =1
(k − 1)!
∞
Similarly, E ( X 2 ) = ∑ k
k =0
= λ e − λ eλ = λ.
k −λ
λ
e
2
k!
∞
= λ e− λ ∑ k
k =1
∞
= λ e − λ [∑ (k − 1)
k =1
∞
= λ 2e−λ ∑
k =2
λ k −1
λ k −1
(k − 1)!
∞
+∑
λ k −1
]
(k − 1)! k=1 (k − 1)!
λ k −2
(k − 2)!
∞
+ λ e−λ ∑
k=1
λ k −1
(k − 1)!
= λ 2 + λ.
It follows that Var(X ) = E ( X 2 ) − E 2 ( X ) = λ 2 + λ − λ 2 = λ .
NOTE : The mean and variance of the Poisson random variable,
X t is λt (or α t ).
Applied Statistics: Probability
1-91
Exercise
Suppose it has been determined that the
number of inquiries that arrive per second
at the central computer system can be
described by a Poisson random variable
with an average rate of 10 messages per
second. What is the probability that no
inquiries arrive in a 1-second period?
What is the probability that 15 or fewer
inquiries arrive in a 1-second period? What
are the mean and variance of the number
of arrivals in 1 second?
Applied Statistics: Probability
1-92
Geometric Mean and Variance
∞
E ( X ) = ∑ kp (1 − p)
k =1
k −1
∞
= p ∑ k (1 − p ) k −1.
k =1
∞
∞
1
1
k −1
′
Write s ( p ) = ∑ (1 − p ) = . Then s ( p) = −∑ k (1 − p ) = − 2 .
p
p
k =0
k =1
k
⎛ 1 ⎞ 1
Thus, E ( X ) = p ⎜ 2 ⎟ = .
⎝p ⎠ p
Homework: Use similar technique to show that Var ( X ) =
(1 − p)
.
2
p
Applied Statistics: Probability
1-93
Discrete Uniform Distribution
Let X be a random variable with a discrete uniform distribution on the integers
n ( n + 1) n + 1
k 1 n
1, 2,..., n. Then E ( X ) = ∑ = ∑ k =
.
=
n k =1
2n
2
k =1 n
n
k 2 1 n 2 n ( n + 1)( 2n + 1) ( n + 1)( 2n + 1)
=
Similarly, E ( X ) = ∑ = ∑ k =
.
n k =1
n (6)
6
k =1 n
n
2
n + 1)( 2n + 1) ( n + 1)
(
−
Therefore, Var ( X ) =
2 ( n + 1)( 2n + 1) 3 ( n + 1)
=
−
6
4
12
12
n + 1)( 4n + 2 − 3n − 3) ( n + 1)( n − 1) n 2 − 1
(
=
=
=
.
12
12
12
2
Example: Let X be uniformly distributed on 1,2,...10. Then E ( X ) =
Var ( X ) =
99 33
= .
12 4
2
11
and
2
Applied Statistics: Probability
1-94
Hypergeometric Distribution
Suppose that a set of n objects includes k objects of type 1 (successes?) and
n − k objects of type 0 (failures perhaps?). A sample of size m is selected from
the n objects "without replacement," where m ≤ n (and k ≤ n). Let X be the
random variable that denotes the number of type 1 objects in the sample. Then
X is said to be a hypergeometric random variable and its pdf is given by:
⎧⎛ k ⎞ ⎛ n − k ⎞
⎪⎜ ⎟ ⎜
⎟
i
m
i
−
⎝
⎠
⎝
⎠ for i = max 0, m + k − n to min k , m
⎪⎪
{
}
{ }
.
pX ( i ) = ⎨ ⎛ n ⎞
⎪ ⎜m⎟
⎪ ⎝ ⎠
⎪⎩0 otherwise
Values of i (examples):
(1) n = 20, k = 5 (type 1), m = 3 (sample size) ⇒ i = 0,1,...,3
(2) Same as (1) but m = 7 ⇒ i = 0,1,...5
(3) Same as (1) but m = 17 ⇒ i = 2,3...,5.
Applied Statistics: Probability
1-95
Text problem 3-101
A company employs 800 men under the age of 55. Suppose that 30% carry a marker
on the male chromosome that indicates an increased risk for high blood pressure.
(a) If 10 men in the company are tested for the marker in this chromosome, what is
the probability that exactly 1 man has the marker?
Answer: Notice that this is certainly sampling without replacement. (We don't
put the first man back into the pool before we draw the second one.) Let X be
the number of men that have the marker in a sample of size 10. X is hypergeo⎛ 240 ⎞⎛ 560 ⎞
⎜
⎟⎜
⎟
1
9
⎠⎝
⎠ = 0.12.
metric. Thus, p X (1) = ⎝
⎛ 800 ⎞
⎜
⎟
10
⎝
⎠
Applied Statistics: Probability
1-96
Text problem 3-101 Continued
(b) If 10 men are tested for the marker, what is the probability that more than
1 has the marker?
Answer: Out of 10 the number with the marker can be 0,1,2,...,10 (because
⎧ ⎛ 240 ⎞⎛ 560 ⎞
⎪⎜
⎟⎜
⎟
i
i
10
−
⎝
⎠⎝
⎠ i = 0,1,...,10
⎪⎪
a total of 240 have the marker). Thus, p X (i ) = ⎨
⎛ 800 ⎞
⎜
⎟
⎪
10
⎝
⎠
⎪
⎪⎩0
otherwise.
10
1
i=2
i =0
The answer is either ∑ p X (i ) or 1 − ∑ p X (i ) = 0.852.
Applied Statistics: Probability
1-97
Mean and Variance of Hypergeometric
If X is a hypergeometric random variable with parameters n (total objects), k
(number of type 1 objects), and m (sample size), then μ = E ( X ) = mp and
k
⎛n−m⎞
=
,
where
p
(the proportion of type 1
⎟
n
⎝ n −1 ⎠
σ 2 = var( X ) = mp (1 − p ) ⎜
objects in the total).
⎛ 240 ⎞
Example: In the previous problem E ( X ) = 10 ⎜
⎟ = 3 and
⎝ 800 ⎠
⎛ 240 ⎞ ⎛ 240 ⎞ ⎛ 800 − 10 ⎞
Var(X ) = 10 ⎜
⎟ ⎜1 −
⎟⎜
⎟ = 2.076.
⎝ 800 ⎠ ⎝ 800 ⎠ ⎝ 799 ⎠
Applied Statistics: Probability
1-98
Continuous Random Variables and
Moments of Random Variables
G. A. Marin
For educational purposes only. No
further distribution authorized.
Applied Statistics: Probability
1-99
Continuous Cumulative “Distribution Function”
The CDF FX of a random variable X is defined to be the function
FX ( x) = P( X ≤ x), −∞ < x < ∞.
The subscript is dropped if there is no abiguity.
A continuous random variable is characterized by a distribution
function that is a continuous function of x for all x ∈ . If the
distribution function has a derivative at all except, possibly,
a finite number of points, then the random variable is said to be
absolutely continuous.
Example:
⎧0, x < 0
⎪
FX ( x) = ⎨ x, 0 ≤ x < 1
⎪1, x ≥ 1.
This means the
⎩
probability X ≤ x.
Applied Statistics: Probability 1-100
Properties of CDF
* 0 ≤ F ( x) ≤ 1, −∞ < x < ∞
* F ( x) is an increasing function of x.
*
lim F ( x) = 0 and lim F ( x) = 1.
x →- ∞
x →+∞
Applied Statistics: Probability 1-101
Probability “Density Function”
dF ( x)
is called the
dx
probability density function (pdf) of X . Thus, discrete random variables have a
For a continuous (differentiable) random variable, X , f ( x) =
probability mass function and continuous random variables have a probability
density function. The cumulative distribution function is used in both cases (or in
"mixed" cases). We obtain the distribution function from the density function
through integration:
This means NOTHING
x
P(X ≤ x) = F(x) =
∫ f (t )dt ,
− ∞ < x < ∞.
except through integration.
-∞
The pdf satisfies the following properties:
(1) f ( x) ≥ 0 for all x.
∞
(2)
∫ f ( x)dx = 1.
Note: in most of our problems
The pdf will be defined “piecewise.”
-∞
Applied Statistics: Probability 1-102
Example
The probability density function f is given as:
⎧⎪ xk2 for x > 2
f ( x) = ⎨
⎪⎩0 otherwise.
What is the value of k ? What is the corresponding cdf?
Applied Statistics: Probability 1-103
Probabilities on Intervals and cdf
Suppose X is a continuous RV with pdf given by f and cdf given by F . This
implies that for any real number x, P [ X ≤ x ] = F ( x) =
implies that P [ a ≤ X ≤ b ] = F (b) − F (a )
x
∫
f (t )dt. It also
−∞
P [ a<X ≤ b ] = F (b) − F (a )
P [ a ≤ X < b ] = F (b) − F (a )
P [ a < X < b ] = F (b) − F (a ).
All 4 cases hold because, for a continuous random variable X ,
P [ X = a ] = P [ X = b ] = 0.
That is, the probability of any particular value is zero in this case.
Note that it is traditional to write F ( x) = P [ X ≤ x ] . If the distribution
is continuous, it is also true that F ( x) = P [ X < x ] .
Applied Statistics: Probability 1-104
Exponential Distribution
A random variable has an exponential distribution if for some λ >0 its
distribution function is given by:
⎧1 − e − λ x , if 0 ≤ x < ∞
F ( x) = ⎨
otherwise.
⎩0
It follows that its pdf is given by:
Note that in most problems the parameter λ
represents a "rate," such as a rate of arrivals
or a rate of failures.
⎧λ e − λ x , if x ≥ 0
f ( x) = ⎨
otherwise.
⎩0
Examples of use:
• Interarrival times at a communication switch
• Service times at a server
• Time to failure or repair of a component.
Applied Statistics: Probability 1-105
Exponential pdf λ = 2
⎧λ e − λ x , if x ≥ 0
f ( x) = ⎨
otherwise.
⎩0
Note the values of pdf mean “nothing” except through integration.
Applied Statistics: Probability 1-106
Exponential cdf
λ=2
⎧1 − e − λ x , if 0 ≤ x < ∞
F ( x) = ⎨
otherwise.
⎩0
Applied Statistics: Probability 1-107
Class Problem
Suppose that we stand at a mile marker on I-4 and watch cars pass. We notice
that on the average 10 cars pass by us per minute and we're given that the time
lapse between two consecutive cars has an exponential distribution. If we begin
timing at the moment that one car passes by, what is the probability that we will
have to wait more than 20 secs for the next car to pass?
Note 20 sec = 1/3 min.
1⎤
1
⎡
Answer. Let W be waiting time in minutes. We seek P ⎢W > ⎥ = 1 − F ( ),
3⎦
3
⎣
where F (t ) = 1 − e − λt , is the exponential cdf. The average "rate" is λ =10;
1 ⎤ −103
⎡
thus, the answer is P ⎢W > ⎥ = e = 0.036
3⎦
⎣
Applied Statistics: Probability 1-108
Simple Exercises
Use F in the previous problem to write:
Probability W<6
Probability W>6
Probability W<0
Probability W<-1
Probability 2<W<5
Probability W=1.
Applied Statistics: Probability 1-109
Memoryless Property
If X has an exponential distribution, x > 0, and t > 0, then we know that
x
P( X ≤ x) = ∫ λ e
0
−λ y
t+x
dy and P ( X ≤ t + x) =
∫
λ e− λ y dy.
0
t+x
P ⎡⎣( X ≤ t + x ) ∩ ( X > t ) ⎤⎦
Thus, P ( X ≤ t + x | X > t ) =
=
P( X > t )
∫
t
∞
λ e − λ y dy
.
−λ y
e
dy
λ
∫
t
e − λt (1 − e − λ x )
−λ x
=
=
1
−
e
= P( X ≤ x).
− λt
e
This is why we don't replace lightbulbs until they fail. (Would you like
it if waiting time at your doctor's office was exponentially distributed?)
Applied Statistics: Probability 1-110
Exponential/Poisson Relationship
Show that the time between adjacent arrivals of
a Poisson Process has an exponential distribution.
Hint: If N t denotes the number of arrivals during time t
and N t has a Poisson distribution, then the probability
that waiting time to the next event is greater than t
is P[W > t ], where W is waiting time, and
0 arrivals during time t.
P[W > t ] = P[ N t = 0].
|
t
4 arrivals during time t.
|
|
Applied Statistics: Probability 1-111
Properties of Gamma Function
On the next slide we introduce the gamma distributions, which is a family
of distributions that includes the exponential distribution. That definition
incorporates something called the gamma function, which is defined as
∞
Γ(α ) = ∫ xα −1e − x dx, α > 0.
0
Using integration by parts one can show Γ (α ) = (α − 1) Γ (α − 1) for α > 1.
Because Γ(1) = 1, it follows that Γ(n) = (n -1)Γ(n -1) = ... = (n -1)! when n
is a positive integer. Note also that Γ ( 12 ) = π . Also, it is well known that
∫
∞
0
α -1 − λ x
x e
dx =
Γ (α )
λ
α
, for α > 0 and λ > 0. We shall refer to the last equation
as the "gamma integration formula."
Applied Statistics: Probability 1-112
Example
Evaluate
∫
∞
0
x 2 e −4 x dx.
Answer: Recall that
∫
∞
0
x
α −1 − λ x
e
dx =
Γ (α )
λ
α
, for λ >0 and α > 0.
In this case α -1 = 2 ⇒ α = 3 while λ =4. It follows that
∫
∞
0
2 −4 x
xe
Γ ( 3) 2! 1
= .
dx = 3 =
4
64 32
Applied Statistics: Probability 1-113
Gamma Distribution
A random variable with pdf given by
λ α t α −1e − λt
f (t ) =
, α > 0, t > 0, λ > 0
Γ(α )
Is said to have a Gamma distribution with parameters
λ and α and we write X ∼ GAM(λ ,α ).
The parameter α is called the shape parameter and the parameter
λ is called the scale parameter. For α =1 the gamma becomes
identical to the exponential distribution. NOTE: If a sequence
of random variables X 1 , X 2 ,..., X k are mutually independent and
identically distributed as GAM(λ ,α ), then their sum has a
GAM(λ ,kα ) distribution.
Applied Statistics: Probability 1-114
Gamma Density:
g ( x, λ , α )
scale, shape
Applied Statistics: Probability 1-115
Practice Quiz 4
The pdf of a random variable, X , is given by:
for x < 0
⎧0
⎪ 3
⎪x
fX ( x) = ⎨
for 0 ≤ x ≤ 4
64
⎪
for x > 4.
⎪⎩0
Answer each of the following and EXPLAIN (show your work).
(a ) What is the cdf of X ? (Find it explicitly.)
(b) What is P ( X > 2 ) ?
(c) What is P( X > 2 | X > 1) ?
BONUS:
Use the gamma integration formula to evaluate the following integral:
∞
3 −2 x
x
∫ e dx
0
Applied Statistics: Probability 1-116
Mean and Variance of a Continuous
Random Variable
Definition
Applied Statistics: Probability 1-117
Expected Value (continuous example)
for x < 0
⎧0
⎪ 3
⎪x
Let f X ( x ) = ⎨
⎪ 64
⎪⎩0
∞
for 0 ≤ x ≤ 4 By definition
for x > 4.
4
x3
x 5 4 1024 16
E ( X ) = ∫ xf X ( x)dx = ∫ xi dx =
=
= .
64
5 × 64 0 5 × 64 5
−∞
0
∞
4
3
6
4 4096 32
x
x
2
2
2
E ( X ) = ∫ x f X ( x) dx = ∫ x i dx =
=
=
64
6 × 64 0 6 × 64 3
−∞
0
2
32 ⎛ 16 ⎞
− ⎜ ⎟ = 0.427
Var ( X ) = E ( X ) − ⎡⎣ E ( X ) ⎤⎦ =
3 ⎝ 5⎠
2
2
Applied Statistics: Probability 1-118
Existence of E(X)
Continuing a previous example: Let X be a random variable with pdf given by:
⎧⎪ x22 for x > 2
f ( x) = ⎨
⎪⎩0 otherwise.
∞
∞
2
2
∞
Notice that E ( X ) = ∫ x 2 dx = ∫ dx = 2 ln x 2 = ∞.
x
x
2
2
Thus, well-defined random variables may not have finite means. Similarly,
a random variable may have a finite mean and not have a finite 2nd moment
or a finite kth moment (to be defined).
Applied Statistics: Probability 1-119
Mean and Variance of a Continuous
Random Variable
Expected Value of a Function of a Continuous
Random Variable
Applied Statistics: Probability 1-120
Other expected values
Let X be a random variable with pdf given by f .
Let μ = E ( X ) =
∞
∫ xf ( x)dx.
(μ is called the mean of X .)
−∞
Then
(a) E ( X 2 ) =
∞
∫
x 2 f ( x)dx.
−∞
∞
(b) E ( X 2 + 5 X − 2) =
2
x
(
∫ − 5 x − 2) f ( x)dx.
−∞
∞
(c) E (sin X ) =
∫ ( sin x ) f ( x)dx.
−∞
(d) σ 2 = Var ( X ) = E ( X − μ ) 2 =
Variance of X
∞
2
(
x
−
μ
)
f ( x)dx.
∫
−∞
Applied Statistics: Probability 1-121
Try these with/without Mathcad
⎧
⎪0 for x < 0
⎪
π
⎪
Let X be a random variable with probability density f ( x ) = ⎨cos( x) for 0 ≤ x ≤ .
2
⎪
π
⎪
x
>
0
for
⎪⎩
2
(a) Find the cdf of X .
(b) Find E ( X ).
(c) Find E (sin X ).
Applied Statistics: Probability 1-122
Exponential Mean and Variance
Let X be an exponential random variable with parameter λ.
Then E ( X ) =
1
λ
and Var( X ) =
1
λ2
.
∞
α −1 − λ x
x
∫ e dx =
Proof. Recall the gamma integration formula:
0
∞
Then E ( X ) =
∫ xf
∞
X
−∞
( x)dx = ∫ xλ e
−λ x
dx = λ
Γ(2)
λ
0
=
2
∞
1
λ
2
2
2
0
Thus, Var(X ) =
2
λ
2
−
1
λ
2
=
1
λ
2
λ
α
.
.
Similarly, Var(X ) = E ( X ) − μ and E ( X ) = ∫ x λ e
2
Γ(α )
−λ x
dx = λ
Γ(3)
λ
3
=
2
λ
2
.
.
Applied Statistics: Probability 1-123
Exponential Example
The lifetime of a particular brand of lightbulb is exponentially distributed.
The "mean time to failure" (MTTF) is 2000 hours. If the lightbulb is installed
at time t = 0, what is the probability that it fails at time t = 3000 hours? What
is the probability that it fails in fewer than 3000 hours?
Applied Statistics: Probability 1-124
Gamma Mean and Variance
Recall that the pdf of a Gamma random variable is given by:
λ α t α −1e− λt
f (t ) =
, α > 0, t > 0
Γ(α )
shape
Think scale**
The expected value (mean) of a Gamma random variable is
α
.
λ
α
The variance is 2 .
λ
Note that when α =1, the distribution is exponential.
Applied Statistics: Probability 1-125
Class Exercise
A gamma distribution has a mean of 1.5 and a variance of 0.75. Sketch the pdf.
α
α
= 1.5 and 2 = 0.75. First, solve for α and λ . From the first
λ
λ
1.5λ
= 0.75.
equation we get α =1.5λ . Substitute this in second equation to get
λ2
This gives λ =2 which implies α =3. This is all we need to obtain pdf values.
We have
0.596
1
Important: Make sure
that you can find the
scale and shape
parameters from a given
mean and variance.
Scale=2 and Shape=3
Gam( x) 0.5
0
0
0
0
1
2
x
3
3
Applied Statistics: Probability 1-126
Continuous Uniform Distribution
A continuous random variable X is said to have a uniform distribution
over the interval (a,b) if its density is given by:
⎧ 1
,a < x < b
⎪
f ( x) = ⎨ b − a
⎪⎩0,
otherwise.
The distribution function is:
x<a
⎧0,
⎪x−a
⎪
F ( x) = ⎨
, a ≤ x < b,
⎪b − a
x ≥ b.
⎪⎩1,
Applied Statistics: Probability 1-127
Continuous Uniform Density on (3,5)
Applied Statistics: Probability 1-128
Continuous Uniform cdf on (3,5)
Applied Statistics: Probability 1-129
Continuous Uniform Mean and Variance
Let X be a random variable having the continuous uniform distribution
over the interval (a, b). It's density is, thus,
⎧ b −1 a
f ( x) = ⎨
⎩0
for a < x < b
otherwise.
b
E ( X ) = ∫ x b −1 a dx = 12 (b 2 − a 2 )(
a
1
b+a
)=
.
2
b−a
b3 − a 3
. It follows that
Similarly, E ( X ) =
b−a
2
1
3
b3 − a 3 ⎛ b + a ⎞ (b − a ) 2
1
−⎜
Var ( X ) = 3
.
⎟ =
b−a ⎝ 2 ⎠
12
Recall that if X has a "discrete" uniform distribution on 1,2,...,n, then
2
n +1
n2 − 1
E( X ) =
and Var (X ) =
.
2
12
Applied Statistics: Probability 1-130
Example:
Suppose that the time that it takes to drive from the Orlando airport to FIT is
uniformly distributed between one hour and one and a quarter hours.
(a) What is the mean driving time?
(b) What is the standard deviation of the driving time?
(c) What is the probability that it will take less than one hour and 5 minutes
to make the trip?
(d) 80% of the time the trip will take less than ______
minutes?
Applied Statistics: Probability 1-131
Means and Variances (so far)
Distribution
E(X)
Var(X)
Bernoulli
p
p (1 − p )
Binomial
np
np (1 − p )
Geometric
1
(1 − p )
Discrete
Uniform
p
n +1
2
Poisson
α (or λ )
α (or λ )
Exponential
1
1
Gamma
λ
α
λ
Continuous
Uniform
a+b
2
λ2
α
λ2
2
(b − a )
12
p2
n2 − 1
12
Applied Statistics: Probability 1-132
Practice Quiz 5
1. Suppose that the driving time between Orlando and Melbourne is uniformly
distributed between one hour and one hour plus 15 minutes.
(a) What is the variance of the driving time.?
(b) Eighty percent of all drivers make the trip in fewer than x minutes. What
is x ?
2. The random variable X has an exponential distribution. What is the pdf of X ?
What is the mean of X ? What is the variance of X ? (Just write down what
we known the mean and variance are. Do not derive them from the definition.)
3. The random variable X has a Binomial distribution and the total number of
trials is 30.
(a) If the probability of success on a single trial is 0.2, write the pmf for X .
(b) If E ( X ) = 2.1, then what is the probability of success on a single trial?
Applied Statistics: Probability 1-133
Normal Distribution
Definition
Applied Statistics: Probability 1-134
Normal Distribution
Figure 4-10 Normal probability density functions for
selected values of the parameters μ and σ2.
Applied Statistics: Probability 1-135
Normal Distribution
Some useful results concerning the normal distribution
Applied Statistics: Probability 1-136
Normal Distribution
Definition : Standard Normal
Applied Statistics: Probability 1-137
Normal Distribution
Example 4-11
Figure 4-13 Standard normal probability density function.
Applied Statistics: Probability 1-138
Standard Normal Table
0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.0
0.5000
0.5040
0.5080
0.5120
0.5160
0.5199
0.5239
0.5279
0.5319
0.5359
0.1
0.5398
0.5438
0.5478
0.5517
0.5557
0.5596
0.5636
0.5675
0.5714
0.5753
0.2
0.5793
0.5832
0.5871
0.5910
0.5948
0.5987
0.6026
0.6064
0.6103
0.6141
0.3
0.6179
0.6217
0.6255
0.6293
0.6331
0.6368
0.6406
0.6443
0.6480
0.6517
0.4
0.6554
0.6591
0.6628
0.6664
0.6700
0.6736
0.6772
0.6808
0.6844
0.6879
0.5
0.6915
0.6950
0.6985
0.7019
0.7054
0.7088
0.7123
0.7157
0.7190
0.7224
0.6
0.7257
0.7291
0.7324
0.7357
0.7389
0.7422
0.7454
0.7486
0.7517
0.7549
0.7
0.7580
0.7611
0.7642
0.7673
0.7704
0.7734
0.7764
0.7794
0.7823
0.7852
0.8
0.7881
0.7910
0.7939
0.7967
0.7995
0.8023
0.8051
0.8078
0.8106
0.8133
0.9
0.8159
0.8186
0.8212
0.8238
0.8264
0.8289
0.8315
0.8340
0.8365
0.8389
1.0
0.8413
0.8438
0.8461
0.8485
0.8508
0.8531
0.8554
0.8577
0.8599
0.8621
1.1
0.8643
0.8665
0.8686
0.8708
0.8729
0.8749
0.8770
0.8790
0.8810
0.8830
1.2
0.8849
0.8869
0.8888
0.8907
0.8925
0.8944
0.8962
0.8980
0.8997
0.9015
1.3
0.9032
0.9049
0.9066
0.9082
0.9099
0.9115
0.9131
0.9147
0.9162
0.9177
1.4
0.9192
0.9207
0.9222
0.9236
0.9251
0.9265
0.9279
0.9292
0.9306
0.9319
1.5
0.9332
0.9345
0.9357
0.9370
0.9382
0.9394
0.9406
0.9418
0.9429
0.9441
1.6
0.9452
0.9463
0.9474
0.9484
0.9495
0.9505
0.9515
0.9525
0.9535
0.9545
1.7
0.9554
0.9564
0.9573
0.9582
0.9591
0.9599
0.9608
0.9616
0.9625
0.9633
1.8
0.9641
0.9649
0.9656
0.9664
0.9671
0.9678
0.9686
0.9693
0.9699
0.9706
1.9
0.9713
0.9719
0.9726
0.9732
0.9738
0.9744
0.9750
0.9756
0.9761
0.9767
2.0
0.9772
0.9778
0.9783
0.9788
0.9793
0.9798
0.9803
0.9808
0.9812
0.9817
2.1
0.9821
0.9826
0.9830
0.9834
0.9838
0.9842
0.9846
0.9850
0.9854
0.9857
2.2
0.9861
0.9864
0.9868
0.9871
0.9875
0.9878
0.9881
0.9884
0.9887
0.9890
2.3
0.9893
0.9896
0.9898
0.9901
0.9904
0.9906
0.9909
0.9911
0.9913
0.9916
2.4
0.9918
0.9920
0.9922
0.9925
0.9927
0.9929
0.9931
0.9932
0.9934
0.9936
2.5
0.9938
0.9940
0.9941
0.9943
0.9945
0.9946
0.9948
0.9949
0.9951
0.9952
2.6
0.9953
0.9955
0.9956
0.9957
0.9959
0.9960
0.9961
0.9962
0.9963
0.9964
2.7
0.9965
0.9966
0.9967
0.9968
0.9969
0.9970
0.9971
0.9972
0.9973
0.9974
2.8
0.9974
0.9975
0.9976
0.9977
0.9977
0.9978
0.9979
0.9979
0.9980
0.9981
2.9
0.9981
0.9982
0.9982
0.9983
0.9984
0.9984
3.0
0.9987
0.9987
0.9987
0.9988
0.9988
0.9989
0.9985
0.9985
0.9986
0.9986
0.9989
0.9989
0.9990
0.9990
Applied Statistics: Probability 1-139
Normal Distribution
Standardizing
Applied Statistics: Probability 1-140
Normal Distribution
To Calculate Probability
Applied Statistics: Probability 1-141
Normal Distribution
Example 4-13
Applied Statistics: Probability 1-142
Normal Distribution
Example 4-14
Note: In MathCad we simply compute:
Mean and standard deviation
Applied Statistics: Probability 1-143
Normal Distribution
Example 4-14 (continued)
Applied Statistics: Probability 1-144
Normal Distribution
Example 4-14 (continued)
Figure 4-16 Determining the value of x to meet a
specified probability.
Applied Statistics: Probability
1-145
Recall Gamma Function
The gamma function (studied previously) is defined as:
∞
Γ(α ) = ∫ xα −1e − x dx, α > 0.
0
Using integration by parts one can show Γ (α ) = (α − 1) Γ (α − 1) for α > 1.
Because Γ(1) = 1, it follows that Γ(n) = (n-1)Γ(n-1) = ... = (n-1)!
Note also that Γ (
1
2
) = π.
Also, it is well known that
∫
∞
0
α -1 − λ x
x e
dx =
Γ (α )
λ
α
.
We shall refer to the last equation as the "gamma integration formula."
Applied Statistics: Probability 1-146
Joint Probability Distributions
When two or more random variables naturally occur together or take values that
seem to be related, it is common to consider their joint probability distributions.
Suppose, for example, that X and Y are two random variables whose outcomes
we wish to consider jointly. In a manner similar to single random variables we
consider two cases.
(1) X and Y are discrete. Here we use a joint pmf and joint cdf (tbd).
(2) X and Y are continuous. Here we use a joint pdf and joint cdf (tbd).
Examples:
(1) The location of a ship is given by its latitude and longitude. Suppose you are
searching for a ship from its last known position. It is natural to consider the
joint distribution of ( X , Y ), the ship's probable position.
(2) Similarly an aircraft's position might be predicted using a joint distribution of
(X , Y , Z ), where ( X , Y ) is as above and Z is altitude.
Applied Statistics: Probability 1-147
Discrete Random Variables
For discrete random variables X 1 , X 2 ,..., X n their joint probability mass
function is given by:
p X ( x ) = p X1 , X 2 ,... X n ( x1 , x2 ,..., xn )
= P ( X 1 = x1 , X 2 = x2 ,..., X n = xn )
If only two random variables are involved, we usually denote them as X and Y,
instead of X 1 , and X 2 , and we write:
p X ,Y ( x, y ) = P ( X = x, Y = y ).
Note that the author writes the latter as f X ,Y ( x, y ) = P ( X = x, Y = y ).
Applied Statistics: Probability 1-148
Simple example
The joint distribution of ( X , Y ) is given as follows:
1
p X ,Y (1,1) = ,
3
1
1
p X ,Y ( 2,1) = and p X ,Y ( 2, 2 ) =
6
6
1
1
1
p X ,Y ( 3,1) = , p X ,Y ( 3, 2 ) = , and p X ,Y ( 3,3) = .
9
9
9
Do the values of X and Y seem to "influence" each other?
Applied Statistics: Probability 1-149
Joint PMF and Independence
For discrete random variables X 1 , X 2 ,..., X n their joint probability mass
function is given by:
p X ( x ) = p X1 , X 2 ,... X n ( x1 , x2 ,..., xn )
= P ( X 1 = x1 , X 2 = x2 ,..., X n = xn )
The discrete random variables X 1 , X 2 ,..., X n are said to be mutually
independent if their joint pmf can be written as:
p X ( x ) = p X1 ( x1 ) p X 2 ( x2 )iii p X n ( xn ).
Applied Statistics: Probability 1-150
Problem:
Joint pmf for X 1 and X 2
X2 =1
X2 = 2
X2 = 3
X1 = 1
1/12
1/6
1/12
X1 = 2
1/6
1/4
1/12
X1 = 3
1/12
1/12
0
Find:
(a) P [ X 1 X 2 is even ] (b) P [ X 1 is odd ] (c) P [ X 1 ≤ 1.5] (d) P [ X 2 is odd | X 1 is odd ]
Are X 1 and X 2 independent?
Applied Statistics: Probability 1-151
Marginal pmf
If X 1 , X 2 ,..., X n are discrete random variables with joint pmf p X1 , X 2 ,..., X n ( x1 , x2 ,..., xn ) ,
then p X i ( xi ) =
∑
[ X i = xi ]
p X1 , X 2 ,..., X n ( x1 , x2 ,..., xn ), where the sum is over the points in the
range of ( X 1 , X 2 ,..., X n ) where X i = xi . The function p X i ( xi ) is called the marginal
probability mass function for X i .
Example: Using the previous slide the marginal pmf for X 1 is given by:
1
p X1 (1) =
3
1
p X1 ( 2 ) =
2
1
p X1 ( 3 ) = .
6
The notation above is difficult; however, notice that, in the example, to get p X1 (1) =
you just sum over all the ( x1 , x2 ) pair values that have 1 as the value for X 1.
1
3
Applied Statistics: Probability 1-152
Joint cdf for Discrete RVs
If X 1 , X 2 ,..., X n are discrete random variables, then their joint cumulative
distribution function is given by:
FX1 , X 2 ,..., X n ( x1 , x2 ,...xn ) = P ( X 1 ≤ x1 , X 2 ≤ x2 ,..., X n ≤ xn ) .
If there are only two random variables involved we usually write:
FX ,Y ( x, y ) = P ( X ≤ x, Y ≤ y ) .
Returning to the "Simple Example" we find, for example,
2
FX ,Y ( 2,3) = .
3
Applied Statistics: Probability 1-153
Skip Section 5-1.3
We will not cover the material on
conditional probability distributions that is
in this section of the text.
Applied Statistics: Probability 1-154
∫∫ f ( x, y)dA
Double Integral:
R
z
b n( x)
=∫
z=f(x,y)
∫
f ( x, y )dydx
a m( x)
Iterated Integral
y
f(x,y)
Y=n(x)
Y=m(x)
x,y
x=a
R
x=b
x
Applied Statistics: Probability 1-155
Example:
Evaluate the double integral
∫∫ 2 xydA where R is the region bounded by the
R
curve y = x 2 and the lines y = 0 and x = 2.
2 x2
2
x2
2
x
Answer: ∫ ∫ 2 xydydx = ∫ x ∫ 2 ydydx = ∫ x ⎛⎜ y 2|
0
0 0
0
0
0 ⎝
5
2
⎞ dx
⎟
⎠
2
x 6 2 32
= ∫ xx dx = | = .
6 0 3
0
4
4
2
x
2
Computes the volume
under the surface
z=2xy and above the
region R.
Region R
0
0
0
0
1
2
x
x=2
3
3
Applied Statistics: Probability 1-156
Another Evaluation Approach
⎡
⎤
∫0 ∫0 2 xydydx = ∫0 x ⎢⎢ ∫0 2 ydy ⎥⎥dx
⎣
⎦
2 x2
2
x2
x2
The inside integral is simply
4
2
ydy
y
|
x
.
=
=
∫
2 x2
0
0
Substitute this inside the brackets above to get
2
6
x
4
xx
∫0 dx = 6
2
0
32
= .
3
Applied Statistics: Probability 1-157
Evaluate the double integral
3
4
x
∫∫ ydxdy where R is the region bounded by
R
y = x 2 and y = 2 x.
10
10
8
6
2
x
2⋅ x
4
2
0
0
0
0
1
2
x
3
3
Applied Statistics: Probability 1-158
Extra Practice
Evaluate the double integrals:
1.
2 2
x
3
∫∫ y dxdy for R bounded by y = x, y = 2 x, x = 1.
R
2.
3 4
3
x
y
dxdy
R
y
x
10
for
bounded
by
, y = 0, x = 1.
=
∫∫
R
3.
2
2
2
y
dxdy
R
y
x
x
y
for
bounded
by
,
2
.
=
=
−
∫∫
R
4.
∫∫
xydxdy for R the triangle with vertices ( 0, 0 ) , (1,1) , ( 4,1) .
R
5.
2
2
2
x
ydxdy
R
x
=
−
−
y
x
=
+
y
12
for
bounded
by
1
,
1
, y = 0, y = −1.
∫∫
R
6.
2
y
+ 4 ) dxdy for R bounded by x = 1, x = 4, y =
3
(
∫∫
R
1
1
,y=− .
x
x
Applied Statistics: Probability 1-159
Problem 3
5
4
The two curves are y1 = x 2
y1 ( x)3
y2 ( x)
2
y2 = 2 − x
(-1.353,1.831)
(1,1)
1
0
−2
−1
0
1
2
3
x
1
Iterated integral:
2− x
∫ ∫
−1.353
y 2 dydx
0
Applied Statistics: Probability 1-160
Joint Distribution of Continuous RVs
The cumulative joint distribution of continuous random variables X and Y is
defined by FX ,Y ( x, y ) = P ( X ≤ x, Y ≤ y ), −∞ < x < ∞, −∞ < y < ∞.
Properties:
* 0 ≤ F ( x, y ) ≤ 1
* F ( x, y ) is monotone increasing in BOTH variables
* P(a < X ≤ b and c < Y ≤ d ) = F (b, d ) − F (a, d ) − F (b, c) + F (a, c)
Note:
lim
FX ,Y ( x, y ) = FX ( x) is called the marginal cumulative distribution of X .
lim
FX ,Y ( x, y ) = FY ( y ) is called the marginal cumulative distribution of Y.
y →∞
x →∞
Think of FX ( x ) = P [ X ≤ x ] = P [ X ≤ x ∩ Y < ∞ ] ≈ FX ,Y ( x, ∞ ) .
Also: The marginal cdf is defined in the same manner for discrete random variables.
Applied Statistics: Probability 1-161
Joint Probability Density
If X and Y are both continuous random variables, there is often a function f
x
such that F ( x, y ) =
y
∫∫
f (u , v)dvdu. The function f is known as the joint
−∞ −∞
probability density function. Note that
b d
P(a < x ≤ b, c < y ≤ d ) = ∫ ∫ f ( x, y )dydx.
a c
Also, by definition of marginal distribution we know that
x ∞
FX ( x) =
∫∫
f (u , v)dvdu.
−∞ −∞
∞
The marginal pdf of X is f X ( x) =
∫
f ( x, y ) dy. Similarly, the marginal pdf of Y
−∞
∞
is fY ( y ) =
∫
f ( x, y ) dx.
−∞
Applied Statistics: Probability 1-162
Example (using Mathcad):
The random variables X and Y have the following joint pdf:
f ( x, y ) :=
1
⋅ ( x + y ) ⋅ ( 1 ≤ x ≤ 3) ⋅ ( 0 ≤ y ≤ 5)
45
(You need only define the non-zero portion of f to Mathcad.)
⌠
⎮
⎮
⌡
Checking:
3
1
⌠
⎮
⎮
⌡
5
1
45
⋅ ( x + y ) dy d x = 1
0
⌠
The marginal cdf for X is FX( x) := ⎮
⎮
⌡
x
1
or
FX( x) →
⌠
⎮
⎮
⌡
1
45
⋅ ( u + v ) dv du
0
( x − 1) ⋅ ( x + 6)
for 1 ≤ x ≤ 3.
18
⌠
The marginal cdf for Y is FY( y ) := ⎮
⎮
⌡
y
0
or FY( y ) →
5
y ⋅ ( y + 4)
45
⌠
⎮
⎮
⌡
3
1
45
⋅ ( u + v ) du dv
1
for 0 ≤ y ≤ 5.
Applied Statistics: Probability 1-163
Example: Joint Density of X,Y
⎧ 16 for (x, y ) ∈ A
Let f ( x, y ) = ⎨
where A is the triangle shown.
⎩0 otherwise
What is the probability that X ≤ 2 and Y ≤ 2?
y=3
P [ X ≤ 2, Y ≤ 2] = ∫
y = ( x − 1)
3
4
1
A
x =1
3
2 4 ( x −1)
2
x=5
Also, answer is ratio of
area of small triangle to
area of A. 3
= ∫ 16 y|
3 ( x −1)
4
0
1
dydx
0
2
dx = ∫ 81 ( x − 1)dx
1
2
⎛
⎞2 1
x
1
= 8 ⎜ − x ⎟| = .
⎝ 2
⎠ 1 16
3
1
=
6 48 16
This only works because f is constant above area A.
8
∫
1
6
=
Applied Statistics: Probability 1-164
Marginal Density of X
(previous example)
fX ( x) =
∞
∫
f X ,Y ( x, y ) dy, for 1 ≤ x ≤ 5, (The value of x is held constant, and
−∞
and the value of y is "integrated out.")
=
3 ( x −1)
4
∫
0
1
6
y 34 ( x −1) 1
dy = |
= ( x − 1) .
0
6
8
∞
Notice that
∫
−∞
2
⎛
⎞5
x
1
1
f X ( x ) dx = ∫ 8 ( x − 1) dx = 8 ⎜ − x ⎟|
⎝ 2
⎠1
1
= 18 ⎡⎣( 252 − 5 ) − ( 12 − 1) ⎤⎦ = 1
5
as required for a pdf.
Applied Statistics: Probability 1-165
Example:
The random variables X and Y have the joint density function
⎧⎪ xy exp ⎡ − 12 ( x 2 + y 2 ) ⎤ for x > 0 and y > 0
⎣
⎦
f ( x, y ) = ⎨
otherwise.
⎪⎩0
Find FY ( y ) , f X ( x ) , and F (1, 2).
∞
Solution : We begin by finding f X ( x ) = ∫ f X ,Y ( x, y ) dy.
0
∞
f X ( x ) = ∫ xy exp ⎣⎡ − 12 ( x 2 + y 2 ) ⎦⎤ dy = xe
−
x2
2
0
By symmetry, fY ( y ) = ye
−
y2
2
∫
∞
0
ye
−
y2
2
dy = xe
−
x2
2
.
, also. This is Y 's density function,
y
t2
−
2
and we can now find the cdf, FY ( y ) = ∫ te dt = −e
t2
−
y
2
0
|
= 1− e
y2
−
2
.
0
Applied Statistics: Probability 1-166
Finding F(1,2)…
Note that in the region of integration, x>0 & y>0, values of y do not
depend on values of x.
F (1, 2) ≡ FX ,Y (1, 2) = ∫
1
∫
2
0 0
1
= ∫ xe
0
x2
−
2
xy exp ⎡⎣ − 12 ( x 2 + y 2 ) ⎤⎦ dydx
⎤
⎥dx
∫0 ye dydx = ∫0 xe
⎥
0⎦
⎡ − x2
x2
1
−
= (1 − e −2 ) ∫ xe 2 dx = (1 − e −2 ) ⎢ −e 2
0
⎢
⎣
2
y2
−
2
1
x2
−
2
⎡ − y2
⎢ −e 2
⎢
⎣
2
⎤
⎥
⎥
0⎦
1
1
− ⎞
⎛
= (1 − e ) ⎜1 − e 2 ⎟ ≈ 0.340.
⎝
⎠
−2
Applied Statistics: Probability 1-167
2
2
1
⎪⎧ xy exp ⎣⎡ − 2 ( x + y ) ⎦⎤ for x > 0 and y > 0
Scatterplot of f ( x, y ) = ⎨
otherwise.
⎪⎩0
Applied Statistics: Probability 1-168
Lessons to Learn
F(1,2)=P(X<1,Y<2)
Integrate a pdf to get a cdf: FX ( x) =
x
∫
f X (t )dt.
−∞
“Limit out the other variables of a joint
cdf to get a marginal cdf.”
lim
y →∞
FX ,Y ( x, y ) = FX ( x)
Integrate out the other variables of the
joint pdf to get a marginal pdf.
fX ( x) =
∞
∫
−∞
f X ,Y ( x, y )dy.
Applied Statistics: Probability 1-169
Bivariate Normal pdf
μx := 0
σx := 1
μy := 0
σy := 1
ρ = 0.6
ρ := 0
c :=
1
2⋅ π⋅ σx⋅ σy ⋅ 1 − ρ
2
Applied Statistics: Probability 1-170
Independent RVs
Two random variables, X and Y , are independent if
FX ,Y ( x, y ) = FX ( x) FY ( y ) for − ∞ < x < ∞ and − ∞ < y < ∞.
If the corresponding density functions exist, this is equivalent to
f X ( x, y ) = f X ( x) fY ( y ) for − ∞ < x < ∞ and − ∞ < y < ∞.
EXAMPLE:
Let X and Y have joint pdf f ( x, y ) =
{
1 , x 2 + y 2 ≤1
π
0, otherwise
.
Determine the marginal pdf's of X and Y . Are X and Y independent?
Applied Statistics: Probability 1-171
Solution (Independent RV’s)
We're given: f ( x, y ) =
∞
f X ( x) =
∫
−∞
{
1 , x 2 + y 2 ≤1
π
0, otherwise
1-x 2
f ( x, y )dy =
∫
- 1-x 2
1
π
; thus, the marginal density for X is
dy =
1
π
y|
1-x 2
- 1-x 2
=
2
π
1-x 2 , −1 ≤ x ≤ 1.
Because of the symmetry the marginal density of Y is
2
1 − y 2 , −1 ≤ y ≤ 1.
fY ( y ) =
π
To check for independence notice that
1
2 2 4
f X ,Y (0, 0) = and f X (0) fY (0) = × = .
π
π π
π
Thus, X and Y are NOT independent.
Applied Statistics: Probability 1-172
Keeping Sums in the Family
A sum of independent normals is normal.
A sum of independent Poissons is Poisson.
A sum of independent exponentials is
Erlang.
A sum of independent gammas is gamma.
A large sum of independent, identically
distributed RVs is approximately normal.
Applied Statistics: Probability 1-173
Linearity of Expectation
Suppose that X and Y are random variables and that a and b are any two
real numbers, then E (aX + bY ) = aE ( X ) + bE (Y ).
In previous example find E (3 X + 2Y ).
Use this property to derive the working
Formula for variance: σ 2 = E ( X 2 ) − E 2 ( X ).
Also notice that Var (aX ) = a 2Var ( X ) and that
Var (aX + bY ) = a 2Var ( X ) + b 2Var (Y ) if X and Y are independent.
Applied Statistics: Probability 1-174
© Copyright 2026 Paperzz