Probability The probability Pr(A) of an event A is a number in [0,1

Probability
The probability Pr(A) of an event A is a number in [0, 1] that represents how likely A is to occur.
The larger the value of Pr(A), the more likely the event is to occur.
Pr(A) = 1 means the event must occur.
Pr(A) = 0 means the event cannot occur.
Pr(A) > Pr(B) means A is more likely to occur than B.
Events A and B are called disjoint (mutually exclusive) if they cannot both occur simultaneously,
that is, if Pr(A and B) = 0. Equivalently, saying A and B are disjoint means
Pr(A or B) = Pr(A) + Pr(B).
Events A and B are called complementary if they are disjoint, but one of them must occur. Equivalently,
Pr(A) + Pr(B) = Pr(A or B) = 1.
An random variable is a quantity that assumes different values with certain probabilities.
In other words, X is a random variable if we can assign values to
Pr(X = x),
Pr(X ≤ x),
Pr(X < x),
Pr(X ≥ x),
Pr(X > x)
for every real number x.
The events X = x and X 6= x are complimentary: Pr(X = x) = 1 − Pr(X 6= x).
The events X ≤ x and X > x are complimentary: Pr(X ≤ x) = 1 − Pr(X > x).
The events X ≥ x and X < x are complimentary: Pr(X ≥ x) = 1 − Pr(X < x).
Example. If we toss two fair coins, there are four possible outcomes:
•
•
•
•
HH
HT
TH
TT
where H is heads and T is tails.
Since the coins are fair and the tosses are independent (the outcome of one toss doesn’t affect the
outcome of the other), each of the four outcomes has probability 21 · 12 = 14 .
Let Y be the random variable defined by


 0

1
Y =
2



3
if the outcome is HH
if the outcome is HT
if the outcome is TH
if the outcome is TT
1
2
Then
1
for k = 0, 1, 2, 3
4
Pr(Y = −1) = 0
Pr(Y = k) =
Pr(Y = π) = 0
Pr(Y > 2) = Pr(Y = 3) =
1
4
Pr(Y ≤ 2) = Pr(Y = 0 or Y = 1 or Y = 2) = Pr(Y = 0) + Pr(Y = 1) + Pr(Y = 2) =
Pr(Y ≤ 2) = 1 − Pr(Y > 2) = 1 −
3
4
1 3
−
4 4
The random variable Y is an example of a discrete random variable.
A random variable is called a discrete random variable if it assume only finitely many or countably
many values, that is, if we can list the values it assumes as x1 , x2 , . . . ,.
Roughly, a continuous random variable is one that can assume a continuum of values. We will
give a precise definition below after stating some terminology, but here are some things that can be
modelled by a continuous random variables:
• the amount of rainfall in vancouver next week
• the lifespan of a lightbulb
• the height of a randomly selected person in Canada
The cumulative distribution function (CDF) of a random variable X is the function
F (x) = Pr(X ≤ x).
A function F is a CDF of some random variable if and only if the following properties hold.
(1) F is right-continuous: limx→c+ F (x) = F (c) for all real numbers c.
(2) F is non-decreasing: F (x) ≤ F (y) when x ≤ y for all real numbers x, y.
(3) lim F (x) = 1.
x→∞
(4)
lim F (x) = 0.
x→−∞
Properties (2), (3), and (4) imply 0 ≤ F (x) ≤ 1 for all real numbers x.
Example. Let a > 0 be a constant. Show that F (x) =
function.
1
is a cumulative distribution
1 + e−ax
(1) Since 1 + e−ax is continuous and never 0, F (x) is continuous and, therefore, right-continuous
everywhere.
(2) Since
d
d
e−ax
F (x) =
(1 + e−ax )−1 = −(1 + e−ax )−2 (−ae−ax ) =
≥ 0,
dx
dx
(1 + e−ax )2
for all x, F is non-decreasing.
(3) lim F (x) = lim
x→∞
x→∞
1
1
=
=1
1 + e−ax
1+0
3
1
=0
x→−∞ 1 + e−ax
(4) lim F (x) = lim
x→−∞
Therefore F (x) =
1
is a CDF.
1 + e−ax
1
2
Extra Example. The function F (x) = k arctan(x) +
the value of k.
is a cumulative distribution function. Find
We must have lim F (x) = 1. So
x→∞
π 1
1
=k +
1 = lim F (x) = lim k arctan(x) +
x→∞
x→∞
2
2 2
Solving for k yields k =
1
.
π
A random variable is called a continuous random variable if its CDF is continuous.
If X is a continuous random variable, then
Pr(X ≤ x) = Pr(X < x),
Pr(X = x) = 0,
Pr(X ≥ x) = Pr(X > x),
Pr(a ≤ X ≤ b) = Pr(a < X ≤ b) = Pr(a ≤ X < b) = Pr(a < X < b).
for all real numbers x, a, b. We will prove Pr(X = x) = 0 below; all the other formulas follow
easily from it.
For a continuous random variable X whose cumulative distribution function F is differentiable, the
probability density function (PDF) of X is defined to be
f (x) =
d
F (x),
dx
and, moreover,
Z
x
Pr(X ≤ x) = F (x) =
f (t) dt
−∞
Z
Pr(a ≤ X ≤ b) = F (b) − F (a) =
b
f (x) dx
a
for all real numbers x, a, b with a ≤ b.
A function f is a PDF of some random variable if and only if the following properties hold.
(1) Z
f is non-negative: f (x) ≥ 0 for all real numbers x.
∞
(2)
f (x) dx = 1
−∞
Example. Let
kx(1 − x)2 if 0 ≤ x ≤ 1
0
otherwise
be the probability distribution function of a random variable X.
f (x) =
(a) Find the value of the constant k.
(b) Find the probability that
1
2
≤ X < 1.
(c) Find the cumulative distribution function of X.
4
Part (a):
We must have
R∞
−∞ f (x) dx
Z
= 1. Therefore
∞
Z
1
kx(1 − x)2 dx
f (x) dx =
1=
−∞
Z 1
0
2
3
(x − 2x + x ) dx = k
1 2 1
k
=k
= .
− +
2 3 4
12
=k
0
1 2 2 3 1 4 1
x − x + x 2
3
4
0
So k = 12.
Part (b):
Pr
Z 1
Z 1
1
≤X<1 =
f (x) dx = 12
x(1 − x)2 dx
2
1/2
1/2
Z 1
1 2 2 3
2
3
= 12
(x − 2x + x ) dx = 12
x − x +
2
3
1/2
"
2
1 1
1 2 2 3 1 4
(1) − (1) + (1) −
−
= 12
2
3
4
2 2
1
1
1
5
1
−
−
+
= .
= 12
12
8 12 64
16
1 4 1
x 4
1/2
3
!#
2 1
1 1 4
+
3 2
4 2
Part (c).
For x < 0,
Z
x
F (x) = Pr(X ≤ x) =
Z
x
f (t) dt =
−∞
0 dt = 0.
−∞
For 0 ≤ x ≤ 1,
Z x
Z x
F (x) = Pr(X ≤ x) =
f (t) dt = 12
t(1 − t)2 dt
−∞
0
Z x
1
2 3 1 4 x
2
3
2
= 12
(t − 2t + t ) dt = 12
t − t + t 2
3
4
0
0
1 2 2 3 1 4
x − x + x .
= 12
2
3
4
5
For x > 1,
Z
x
F (x) = Pr(X ≤ x) =
Z
f (t) dt
−∞
Z 1
0
−∞
Z 0
0
Z 1
=
0 dt + 12
−∞
= 12
f (t) dt
1
t(1 − t)2 dt +
2
∞
Z
0 dt
1
1
3
(t − 2t + t ) dt =
1 2 1
= 1.
− +
2 3 4
0
∞
0
Z
= 12
Z
f (t) dt +
f (t) dt +
=
1 2 2 3 1 4 1
t − t + t 2
3
4
0
Part (b) again:
Remember that
Z
Pr(a ≤ X ≤ b) = F (b) − F (a) =
b
f (x) dx,
a
where F is the CDF of X and f is the PDF of X. So if we had done part (c) before part (b), we
could have used that

 0
if x < 0
1 2
2 3
1 4
F (x) =
12 2 x − 3 x + 4 x
if 0 ≤ x ≤ 1

1
if x > 0
to compute
1
1
Pr
≤ X < 1 = F (1) − F
2
2
!
1 2 2 3 1 4
1 1 2 2 1 3 1 1 4
= 12
(1) − (1) + (1) − 12
−
+
2
3
4
2 2
3 2
4 2
=
5
.
16
Example. Find the probability distribution function f (x) for a random variable with cumulative
distribution function
− 12 x
if x ≥ 0
F (x) = 1 − e
0
if x < 0
For x > 0,
f (x) =
1 1
1
d
d F (x) =
1 − e− 2 x = e− 2 x .
dx
dx
2
For x < 0,
d
d
F (x) =
0 = 0.
dx
dx
F (x) is not differentiable at 0, but it’s okay to have f (x) undefined at finitely many points because
this won’t affect the intergal of f . In conclusion,
1 − 12 x
if x > 0
2e
f (x) =
0
if x < 0.
f (x) =
6
Let X be a continuous random variable with probability density function f .
The expected value (or expectation or mean or average) of X is
Z ∞
xf (x) dx.
E(X) =
−∞
The variance of X is
Z
∞
Var(X) =
(x − E(X))2 f (x) dx = E(X 2 ) − (E(X))2
−∞
The standard deviation of X is
σ(X) =
p
Var(X).
Here is a useful fact (see below for a proof):
EX 2 =
Z
∞
x2 f (x) dx.
−∞
The expected value of a random variable is a measure of the center of its distribution. The variance
and standard deviation of a random variable are measures of the dispersion (or horizontal spread) of
its distribution.
Example. Find the standard deviation of the random variable X with probability density function

3

if x ≥ 0
f (x) =
(x + 1)4
 0
if x < 0
First we find the expected value:
Z ∞
E(X) =
xf (x) dx
∞
Z ∞
=3
x(x + 1)−4 dx
0
Z t
= 3 lim
x(x + 1)−4 dx
t→∞ 0
Z t+1
= 3 lim
(u − 1)u−4 du
t→∞ 1
Z t+1
= 3 lim
(u−3 − u−4 ) du
u=x+1
t→∞ 1
1 −2 1 −3 t+1
= 3 lim − u + u
du
t→∞
2
3
1
1
1
1 1
−2
−3
= 3 lim
− (t + 1) + (t + 1)
− − +
t→∞
2
3
2 3
1 1
= 3 −0 + 0 + −
2 3
1
= .
2
7
Then we compute
2
Z
∞
x2 f (x) dx
E(X ) =
∞
Z
∞
x2 (x + 1)−4 dx
0
Z t
= 3 lim
x2 (x + 1)−4 dx
t→∞ 0
Z t+1
= 3 lim
(u − 1)2 u−4 du
u=x+1
t→∞ 1
Z t+1
= 3 lim
(u−2 − 2u−3 + u−4 ) du
=3
t→∞ 1
1 −3 t+1
= 3 lim −u + u − u
du
t→∞
3
1
1
1
−3
−1
−2
− −1 + 1 −
= 3 lim
−(t + 1) + (t + 1) − (t + 1)
t→∞
3
3
1
= 3 −0 + 0 − 0 +
3
= 1.
Therefore
−1
−2
2
1
3
Var(X) = E(X ) − (E(X)) = 1 −
=
2
4
√
p
3
σ(X) = Var(X) =
.
2
2
and
2
8
Proofs of Two Facts
Fact 1. If X is a continuous random variable, then Pr(X = c) = 0 for all real numbers c.
If X has a probability density function, the proof is easy:
Z
Pr(X = c) = Pr(c ≤ X ≤ c) =
c
f (x) dx = 0.
c
If X is a continuous random variable without a probability density function, we have to work harder.
We will need to recall the definition of a continuous random variable and a property of limits.
Recall: X is a continuous random variable if its CDF, F (x) = Pr(X ≤ x), is continuous.
Recall: If m ≤ g(x) for all x < c, then m ≤ lim f (x).
x→c−
Proof of Fact 1.
Step 1. For any x < c,
Pr(X ≤ c) = Pr(X ≤ x or x < X ≤ c) = Pr(X ≤ x) + Pr(x < X ≤ c)
and therefore
Pr(x < X ≤ c) = Pr(X ≤ c) − Pr(X ≤ x) = F (c) − F (x).
Step 2. For any x < c, if X = c, then x < X ≤ c. So X = c cannot be more likely than
x < X ≤ c. Therefore
Pr(X = c) ≤ Pr(x <≤ X ≤ c).
Step 3.
Pr(X = c) = lim Pr(X = c)
x→c−
≤ lim Pr(x < X ≤ c)
x→c−
= lim (F (c) − F (x))
x→c−
= F (c) − lim F (x)
x→c−
= F (c) − F (c)
= 0.
9
Fact 2. If X is a continuous random variable with probability density function f , then
Z ∞
x2 f (x) dx.
E(X 2 ) =
−∞
Remark. Fact 2 is a special case of a more general formula called the law of the unconscious
statistician,
Z ∞
g(x)f (x) dx,
E(g(X)) =
−∞
which is valid whenever the integral on the right converges.
Proof of Fact 2. Let F and f be, respectively, the CDF and PDF of X. Let G and g be, respectively,
the CDF and PDF of the random variable X 2 .
If x < 0,
G(x) = Pr(X 2 ≤ x) = 0.
If x ≥ 0,
G(x) = Pr(X 2 ≤ x) = Pr(|X| ≤
√
√
√
√
√
x) = Pr(− x ≤ X ≤ x) = F ( x) − F (− x).
Now we compute g(x).
If x < 0,
g(x) =
d
d
G(x) =
0 = 0.
dx
dx
If x > 0,
√
√ d
d
G(x) =
F ( x) − F (− x)
dx
dx
√ d
√
d √
0 √
= F ( x) ( x) − F 0 (− x) (− x)
dx
dx
√
√
F 0 ( x) F 0 (− x)
√
√
=
+
2 x
2 x
√
√ 1
= √ f ( x) + f (− x) .
2 x
g(x) =
We don’t need to worry about g(x) at x = 0, since the value of g at one point won’t affect the
integral of g. We can now compute
Z ∞
2
E(X ) =
xg(x) dx
−∞
Z ∞
√
√ x
√ f ( x) + f (− x) dx
=
2 x
0
Z
√
√ 1 ∞√
=
x f ( x) + f (− x) dx
2 0
Z t
√
√
√ 1
= lim
x f ( x) + f (− x) dx.
t→∞
2
0
Making the substitution
√
u = x, u2 = x,
2u du = dx
x = 0 ⇒ u = 0,
x=t⇒u=
√
t
10
gives
√
E(X 2 ) = lim
Z
t
u2 (f (u) + f (−u)) du
t→∞ 0
√
Z
u2 f (u) du +
= lim
t→∞
√
t
Z
0
!
t
u2 f (−u) du .
0
In the second integral we make the substitution v = −u to get
!
Z − √t
Z √t
2
2
2
E(X ) = lim
u f (u) du −
v f (v) dv
t→∞
0
0
√
Z
= lim
Z
=
2
u f (u) du +
t→∞
= lim
t
Z
t→∞ 0
Z ∞
2
0
√
t
0
∞
x2 f (x) dx +
=
Z
∞
−∞
f (v) dv
0
t→∞ −√t
Z
0
−∞
Z 0
−∞
0
=
2
u f (u) du + lim
u f (u) du +
Z
√ v
− t
Z
2
!
0
x2 f (x) dx.
v 2 f (v) dv
x2 f (x) dx
v 2 f (v) dv