Probability

Probability
19th September 2016
lecture based on
Hogg – Tanis – Zimmerman: Probability and Statistical Inference (9th ed.)
Properties of Probability
Methods of Enumeration
Conditional Probability
Independent Events
Bayes’ Theorem
Properties of Probability
Methods of Enumeration
Conditional Probability
Independent Events
Bayes’ Theorem
what statisticians do:
problem/situation that needs to be considered
(measurement problem)
collect data through observations
summarize the results
(descriptive statistics, graphical methods)
analyze the situation
(statistical inferences)
report with recommendations
statistics deals with the data collection
and analysis of data
variability in results
statisticians try to find a pattern in data
and deal with errors
decision makers can decide
upon confidence in the analyzed data
variability is a fact of life
(variability is inherent)
decisions have to involve uncertainties
examples
• meteorology
• seismology
• medical research
• state legislature
(weather forecast)
(seismic hazard analysis)
(drug/vaccine testing)
(speed limit vs. no. of accidents)
• economy
(estimation of unemployment rate)
how to make sense to the observations?
sometimes the answers are obvious …
however, in many investigations
appropriate probability and statistical models
are needed
random experiments
experiments for which the outcome
cannot be predicted with certainty
although the specific outcome of a r.e. cannot be predicted
before the experiment is performed,
the collection of all possible outcomes is known (can be described)
outcome space S
the collection of all possible outcomes
event A
A is a part of the collection of outcomes in S, A  S
words set and event
are interchangeable
algebra of sets

AB
AB
AB
A’
null or empty set
A is subset of B
union of A and B
intersection of A and B
complement of A
(all elements in S
that are not in A)
Venn diagrams
A1, A2, … Ak are mutually exclusive events
Ai  Aj = , i  j
that is, A1, A2, … Ak are disjoint sets
A1, A2, … Ak are exhaustive events
A1  A2  …  Ak = S
if A1, A2, … Ak are mutually exclusive and exhaustive events
Ai  Aj = , i  j and A1  A2  …  Ak = S
Commutative Laws
AB=BA
AB=BA
Associative Laws
(A  B)  C = A  (B  C )
(A  B)  C = A  (B  C )
Distributive Laws
A  (B  C) = (A  B)  (A  C)
A  (B  C) = (A  B)  (A  C)
De Morgan’s Laws
(A  B)´ = A’  B’
(A  B)’ = A’  B’
P(A)
probability of event A
(the chance of A occuring)
consider repeating the experiment n times
N(A)
number of times that event A occurred throughout n performances
ratio
N(A)
𝑛
is the relative frequency of event A in n repetitions
relative frequency is usually very unstable for small values of n
number p
number about which the relative frequency tends to stabilize
(number that the relative frequency will be in future performances)
although we cannot predict the outcome of a random experiment with certainty,
if we know p (for a large number of n),
we can predict fairly accurately the relative frequency of event A

p is called probability of event A
P(A)
proportion of outcomes of a random experiment
that terminate in the event A
set function
a function such as P(A) that is evaluated for a set A
PROPERTIES
N(A)/n is always nonnegative
if A = S, the outcome of the experiment will always belong to S
N(A)/n = 1
if A and B are mutually exclusive events
N(A  B)/n = N(A)/n + N(B)/n
DEFINITION
Probability
is a real-valued set function P that assign, to each event A in the sample space S,
a number P(A) called the probability of the event A,
such that the following properties are satisfied
(a) P(A) ≥ 0
(b) P(S) = 1
(c) A1, A2, … Ak are events and Ai  Aj = , i  j, then
P(A1  A2  …  Ak) = P(A1) + P(A2) + … + P(Ak)
for each positive integer k
P(A1  A2  …) = P(A1) + P(A2) + …
for an infinite, but countable, number of events
THEOREMS
1. For each event A,
P(A) = 1 – P(A’).
2. P() = 0.
3. If events A and B are such that A  B, then P(A) ≤ P(B).
4. For each event A, P(A) ≤ 1.
5. If events A and B are two events, then
P(A  B) = P(A) + P(B) - P(A  B) .
6. If A, B and C are any three events, then
P(A  B  C) = P(A) + P(B) + P(C) - P(A  B) - P(A  C) - P(B  C) + P(A  B  C) .
PROOF
1. For each event A,
P(A) = 1 – P(A’).
from Venn diagrams, we have
S = A  A’
and
A  A’ = 
from properties (b) and (c) for probability, it follows than
1 = P(A) + P(A’)
then
P(A) = 1 – P(A’)
PROOF
2. P() = 0.
in the previous theorem P(A) = 1 – P(A’), take A =  so that A’ = S
then
P() = 1 – P(S) = 1 - 1 = 0
PROOF
3. If events A and B are such that A  B, then P(A) ≤ P(B).
we have
B = A  (B  A’) and A  (B  A’) = 
from properties (b) and (c) for probability, it follows than
P(B) = P(A) + P(B  A’) ≥ P(A)
because, from property (a) for probability, we have
P(B  A’) ≥ 0
PROOF
4. For each event A, P(A) ≤ 1.
since A  S, we have by theorem (1) and property (b) for probability
P(A) ≤ P(S) = 1
which gives the desired result
PROOF
5. If events A and B are two events, then
P(A  B) = P(A) + P(B) - P(A  B) .
from Venn diagrams, the event A and B can be represented as a union of mutually exclusive events,
A  A’ = A  (A’  B)
from property (c) for probability
P(A  B) = P(A) + P(A’  B)
however
B = (A  B)  (A’  B),
which is a union of mutually exclusive events
thus
P(B) = P(A  B) + P(A’  B)
if we substitute P(A’  B) to the right-hand side of the second equation, we obtain the desired result
let S = e1, e2, …, em,
ei is a possible outcome of the experiment
integer m is called the total number of ways
in which the random experiment can terminate
if each of these outcomes has the same probability of occurring,
we say that the m outcomes are equally likely
P (ei) = 1/𝑚 , i = 1, 2, … m
if the number of outcomes in an event A is h, then the integer h
is called the number of ways that are favorable to the event A
ℎ 𝑁(𝐴)
𝑃 𝐴 = =
𝑚 𝑁(𝑆)
Properties of Probability
Methods of Enumeration
Conditional Probability
Independent Events
Bayes’ Theorem
Multiplication principle
Suppose that the experiment Ei has ni , i = 1,2, … m possible outcomes
after previous experiments have been performed.
Then the composite experiment E1 E2 … Em
that consists of performing E1, then E2 … and finally Em
has n1 n2 … nm possible outcomes.
Permutation
Suppose that n positions are to be filled with n different objects.
There are n choices for filling the first position, n - 1 for the second, …
and 1 choice for the last position.
By multiplication principle, the are
n (n - 1) … (2) (1) = n!
possible arrangements
Each of the n! arrangements (in a row) of n different objects
is called a permutation of n objects.
Permutation
If only r positions are to be filled with objects
selected from n different objects, r ≤ n,
then the number of possible ordered arrangements is
nPr
= n (n - 1) (n - 2) … (n - r + 1)
Each of the nPr arrangements
is called a permutation of n objects taken r at a time.
nPr
=
𝑛 𝑛−1 … 𝑛−𝑟+1 𝑛−𝑟 …(3)(2)(1)
𝑛−𝑟 … 3 (2)(1)
=
𝑛!
𝑛−𝑟 !
a set contains n objects
consider a problem drawing r objects from the set
the order in which the objects are drawn may/may not be important
a drawn object is/is not replaced before the next object is drawn
If r objects are selected from a set of n objects,
and if the order of selection is noted,
then the selected set of r objects is called an order sample of size r.
Sampling with replacement occurs when an object is selected
and then replaced before the next object is selected.
The number of possible ordered samples of size r
taken from a set of n objects is nr .
(multiplication principle)
sampling without replacement
occurs when an object is not replaced after it has been selected
By the multiplication principle, the number of possible ordered samples of size r
taken from a set of n objects without replacement is
𝑛!
𝑛 𝑛 − 1 … (𝑛 − 𝑟 + 1) =
= nPr
𝑛−𝑟 !
often the order of selection is not important
and interest centers only on the selected set of r objects
we are interested in a number of subsets of size r
that can be selected from a set of n different objects
each of the nCr unordered subsets is called
a combination of n objects taken r at a time, where
𝑛
𝑛!
C
=
=
n r
𝑟! 𝑛−𝑟 !
𝑟
𝑛
numbers
are called binomial coefficients
𝑟
𝑛
(𝑎
+ 𝑏)𝑛 =
𝑟=0
𝑛 𝑟 𝑛−𝑟
𝑏 𝑎
𝑟
suppose that in a set of n objects,
n1 are similar, n2 are similar, … ns are similar,
where n1 + n2 + … + ns = n
the number of distinguishable permutations of the n objects is
𝑛!
𝑛
𝑛1 , 𝑛2 , … 𝑛𝑠 = 𝑛1 ! 𝑛2 ! … 𝑛𝑠 !
the coefficients are sometimes called
multinomial coefficients
expansion (𝑎1 + 𝑎2 + … + 𝑎𝑠 )𝑛
Properties of Probability
Methods of Enumeration
Conditional Probability
Independent Events
Bayes’ Theorem
we are interested only in those outcomes
which are elements of a subset B of the sample space S
we are confronted with the problem of defining
a probability set function with B as the “new” sample space
for a given event A we want to define P(A|B)
the probability of A, considering only those outcomes
of the random experiment that are elements of sample space B
𝑁(𝐴 ∩ 𝐵)
𝑁(𝐴 ∩ 𝐵)
𝑃(𝐴 ∩ 𝐵)
𝑁(𝑆)
P 𝐴𝐵 =
=
=
𝑁(𝐵)
𝑁(𝐵)
𝑃(𝐵)
𝑁(𝑆)
DEFINITION
The conditional probability of an event A,
given that event B has occurred, is defined by
𝑃 𝐴𝐵 =
𝑃(𝐴 ∩ 𝐵)
𝑃(𝐵)
,
provided that P(B) ≥ 0.
Conditional probability satisfies the axioms for a probability function, with P(B) ≥ 0 :
1. P(A|B) ≥ 0
2. P(B|B) = 1
3. if A1, A2, … Ak are mutually exclusive events, then
P(A1  A2  …  Ak|B) = P(A1|B) + P(A2|B) + … + P(Ak|B)
for each positive integer k
P(A1  A2  …|B) = P(A1|B) + P(A2|B) + …
for an infinite, but countable, number of events
once we know the conditional probability P(B|A),
we can determine the probability of the intersection of two events
DEFINITION
The probability that two events, A and B, both occur
is given by the multiplication rule,
P(A  B) = P(A) P(B|A),
provided P(A) > 0
or by
P(A  B) = P(B) P(A|B),
provided P(B) > 0
the multiplication rule can be extended to three or more events
P(A  B  C) = P[(A  B)  C] = P(A  B) P(C|A  B)
P(A  B) = P(A) P(B|A)
P(A  B  C) = P(A) P(B|A) P(C|A  B)
Properties of Probability
Methods of Enumeration
Conditional Probability
Independent Events
Bayes’ Theorem
for certain pair of events, the occurrence of one of them
may/may not change the probability of the occurrence of the other
DEFINITION
Events A and B are independent if and only if P(A  B) = P(A) P(B).
Otherwise, A and B are called dependent events.
statistically independent
stochastically independent
independent in a probabilistic sense
THEOREM
If A and B are independent events,
then the following pairs of events are also independent:
A and B’
A’ and B
A’ and B’
DEFINITION
Events A, B and C are mutually independent
if and only if the following two conditions hold:
1. A, B, and C are pairwise independent, that is
P(A  B) = P(A) P(B) , P(A  C) = P(A) P(C) , P(B  C) = P(B) P(C)
2. P(A  B  C) = P(A) P(B) P(C)
Properties of Probability
Methods of Enumeration
Conditional Probability
Independent Events
Bayes’ Theorem
let B1, B2 … Bm constitute a partition of the sample space S
S = B1  B2  …  Bm and Bi  Bj = , i  j
events B1 , B2 , … Bm are mutually exclusive and exhaustive
suppose that the prior probability of the event Bi is positive, P(Bi) > 0, i = 1, … m
if A is an event, then A is the union of m mutually exclusive events
A = (B1  A)  (B2  A)  …  (Bm  A)
𝑚
𝑃 𝐴 =
𝑚
𝑃 𝐵𝑖 ∩ 𝐴 =
𝑖=1
𝑃 𝐵𝑖 𝑃 𝐴 𝐵𝑖
𝑖=1
law of total probability
If P(A) > 0, then
𝑃 𝐵𝑘 𝐴 =
𝑃(𝐵𝑘 ∩ 𝐴)
,
𝑃(𝐴)
𝑘 = 1,2, … 𝑚
Bayes’ theorem
𝑃 𝐵𝑘 𝑃 𝐴 𝐵𝑘
𝑃 𝐵𝑘 𝐴 = 𝑚
,
𝑖=1 𝑃 𝐵𝑖 𝑃 𝐴 𝐵𝑖
k = 1,2,... 𝑚
the conditional probability 𝑃 𝐵𝑘 𝐴
is often called the 𝐩𝐨𝐬𝐭𝐞𝐫𝐢𝐨𝐫 𝐩𝐫𝐨𝐛𝐚𝐛𝐢𝐥𝐢𝐭𝐲 of Bk
EXAMPLE
Compute the probability that building X collapses during the next earthquake in the region.
We do not know exactly if the next earthquake will be strong, medium or weak, but we estimated
the following probabilities:
P(strong) = 0,01, P(medium) = 0,1, P(weak) = 0,89
Additionally, structural engineers have performed analyses and estimated following:
P(collapse|strong) = 0,9, P(collapse|medium) = 0,2, P(collapse|weak) = 0,01
law of total probability

P(collapse) = P(collapse|strong) P(strong) + P(collapse|medium) P(medium) + P(collapse|weak) P(weak)
P(collapse) = 0,9 . 0,01 + 0,2 . 0.1 + 0,01 . 0,89 = 0,0379
l.o.t.p. allows us to break the problem into 2 parts (size of an earthquake, capacity of the building)
EXAMPLE
Suppose that an earthquake occurred and building X collapsed.
Bayes’ theorem

P(collapse|strong)P(strong) 0,9 .0,01
P(strong|collapse) =
=
= 0.24
P(collapse)
0,0379
B.t. provides a valuable calculation approach for combining pieces of information
to compute a probability that may be difficult to determine directly