Probability
Lotto
I am offered two lotto cards:
– Card 1: has numbers
– Card 2: has numbers
Which card should I take so that I have the
greatest chance of winning lotto?
Roulette
In the casino I wait at the roulette wheel until
I see a run of at least five reds in a row.
I then bet heavily on a black.
I am now more likely to win.
Coin Tossing
I am about to toss a coin 20 times.
What do you expect to happen?
Suppose that the first four tosses have been
heads and there are no tails so far. What do
you expect will have happened by the end of
the 20 tosses ?
Coin Tossing
• Option A
– Still expect to get 10 heads and 10 tails. Since
there are already 4 heads, now expect to get 6
heads from the remaining 16 tosses. In the next
few tosses, expect to get more tails than heads.
• Option B
– There are 16 tosses to go. For these 16 tosses I
expect 8 heads and 8 tails. Now expect to get 12
heads and 8 tails for the 20 throws.
TV Game Show
• In a TV game show, a car will be given away.
– 3 keys are put on the table, with only one of them
being the right key. The 3 finalists are given a
chance to choose one key and the one who
chooses the right key will take the car.
– If you were one of the finalists, would you prefer
to be the 1st, 2nd or last to choose a key?
Let’s Make a Deal Game Show
• You pick one of three doors
– two have booby prizes behind them
– one has lots of money behind it
• The game show host then shows you a
booby prize behind one of the other doors
• Then he asks you “Do you want to change
doors?”
– Should you??! (Does it matter??!)
• See the following website:
• http://www.stat.sc.edu/~west/javahtml/LetsMakeaDeal.html
Game Show Dilemma
Suppose you choose door A. In which case
Monty Hall will show you either door B or C
depending upon what is behind each.
No Switch Strategy ~ here is what happens
Result
A
B
C
Win
Car
Goat
Goat
Lose
Goat
Car
Goat
Lose
Goat
Goat
Car
P(WIN) = 1/3
Game Show Dilemma
Suppose you choose door A, but ultimately
switch. Again Monty Hall will show you either
door B or C depending upon what is behind each.
Monty will show
Switch Strategy ~ here is what happensMonty
eitherwill
B orshow
C.
door
C, youto the
You switch
switch
to shown
B and
one
not
Result
A
B
C win.
and lose.
Lose
Car
Goat
Goat Monty will show
door B, you
Win
Goat
Car
Goat switch to C and
win.
Win
Goat
Goat
Car
P(WIN) = 2/3 !!!!
Matching Birthdays
• In a room with 23 people what is the
probability that at least two of them will have
the same birthday?
• Answer: .5073 or 50.73% chance!!!!!
• How about 30?
• .7063 or 71% chance!
• How about 40?
• .8912 or 89% chance!
• How about 50?
• .9704 or 97% chance!
Probability
What is Chapter 6 trying to do?
– Introduce us to basic ideas about probabilities:
• what they are and where they come from
• simple probability models (genetics)
• conditional probabilities
• independent events
• Baye’s Rule
Teach us how to calculate probabilities:
• tables of counts and using properties of
probabilities such as independence.
Probability
I toss a fair coin (where fair means ‘equally likely outcomes’)
What are the possible outcomes?
Head and tail ~ This is called a “dichotomous experiment” because
it has only two possible outcomes. S = {H,T}.
What is the probability it will turn up heads?
1/2
I choose a patient at random and observe whether they are
successfully treated.
What are the possible outcomes?
“Success” and “Failure”
What is the probability of successful treatment?
?????
What factors influence this probability? ?????
What are Probabilities?
• A probability is a number between 0 & 1
that quantifies uncertainty.
• A probability of 0 identifies impossibility
• A probability of 1 identifies certainty
Where do probabilities come from?
• Probabilities from models:
The probability of getting a four when a fair dice
is rolled is
1/6 (0.1667 or 16.7% chance)
Where do probabilities come from?
• Probabilities from data
or Empirical probabilities
What is the probability that a randomly selected
patient is successfully treated?
– In a clinical trial n = 67 patients are “randomly”
selected.
– 40 of these patients are successfully treated.
– The estimated probability that a randomly chosen
patient will have a successful outcome is
40/67 (0.597 or 59.7% chance)
Where do probabilities come from?
• Subjective Probabilities
– The probability that there will be another
outbreak of ebola in Africa within the next year is
0.1.
– The probability of rain in the next 24 hours is very
high. Perhaps the weather forecaster might say a
there is a 70% chance of rain.
– A doctor may state your chance of successful
treatment.
Simple Probability Models
“The probability that an event A occurs”
is written in shorthand as P(A).
For equally likely outcomes, and a given
event A:
Number of outcomes in A
P(A) =
Total number of outcomes
1. Heart Disease
In 1996, 6631 Minnesotans died from
coronary heart disease. The numbers of
deaths classified by age and gender are:
Sex
Age
Male
Female
Total
< 45
79
13
92
45 - 64
772
216
988
65 - 74
1081
499
1580
> 74
1795
2176
3971
Total
3727
2904
6631
1. Heart Disease
Let
A be the event of being under 45
B be the event of being male
C be the event of being over 64
Sex
Age
Male
Female
Total
< 45
79
13
92
45 - 64
772
216
988
65 - 74
1081
499
1580
> 74
1795
2176
3971
Total
3727
2904
6631
1. Heart Disease
Find the probability that a randomly chosen
member of this population at the time of death
was:
P(A) = 92/6631 = 0.0139
a) under 45
Sex
Age
Male
Female
Total
< 45
79
13
92
45 - 64
772
216
988
65 - 74
1081
499
1580
> 74
1795
2176
3971
Total
3727
2904
6631
Conditional Probability
• We wish to find the probability of an event
occuring given information about occurrence
of another event. For example, what is
probability of developing lung cancer given
that we know the person smoked a pack of
cigarettes a day for the past 30 years.
• Key words that indicate conditional
probability are:
“given that”, “of those”, “if …”, “assuming
that”
Conditional Probability
“The probability of event A occurring given
that event B has already occurred”
is written in shorthand as P(A|B)
Conditional Probability and Independence
P(A
and
B)
P(A|B) =__________ , P(B) > 0
P(B)
Two events A and B are said to be independent if
P(A|B) = P(A)
and
P(B|A) = P(B)
i.e. knowing the occurrence of one of the events tells
you nothing about the occurrence of the other.
1. Heart Disease
Find the probability that a randomly chosen
member of this population at the time of death
was:
b) male assuming that the person was younger
than 45.
Sex
Age
Male
Female
Total
< 45
79
13
92
45 - 64
772
216
988
65 - 74
1081
499
1580
> 74
1795
2176
3971
Total
3727
2904
6631
2. Heart Disease
Find the probability that a randomly chosen
member of this population at the time of death
was:
b) male given that the person was younger than
45.
P(B|A) = P(A and B)/P(A)
P(B|A) == (79/6631)/(92/6631)
79/92 = 0.8587 = 79/92
Sex
Age
Male
Female
Total
< 45
79
13
92
45 - 64
772
216
988
65 - 74
1081
499
1580
> 74
1795
2176
3971
Total
3727
2904
6631
1. Heart Disease
Find the probability that a randomly chosen
member of this population at the time of death
was:
c) male and was over 64.
P(B and C) = (1081 + 1795)/6631= 2876/6631=.434
Sex
Age
Male
Female
Total
< 45
79
13
92
45 - 64
772
216
988
65 - 74
1081
499
1580
> 74
1795
2176
3971
Total
3727
2904
6631
1. Heart Disease
Find the probability that a randomly chosen
member of this population at the time of death
was:
d) over 64 given they were female (not B).
Sex
Age
Male
Female
Total
< 45
79
13
92
45 - 64
772
216
988
65 - 74
1081
499
1580
> 74
1795
2176
3971
Total
3727
2904
6631
1. Heart Disease
Find the probability that a randomly chosen
member of this population at the time of death
was:
d) over 64 given they were female (not B).
P(C|not B) = (499+2176)/2904 = .9211
Sex
Age
Male
Female
Total
< 45
79
13
92
45 - 64
772
216
988
65 - 74
1081
499
1580
> 74
1795
2176
3971
Total
3727
2904
6631
2. Hodgkin’s Disease
Response to Treatment
Type
LD
None
44
Row
Partial Positive Totals
10
18
72
LP
12
18
74
104
MC
58
54
154
266
NS
12
16
68
96
98
314
n = 538
Column 126
Totals
2. Hodgkin’s Disease
2. Hodgkin’s Disease
Response to Treatment
Type
None
Partial
Positive
Row
Totals
LD
44
10
18
72
LP
12
18
74
104
MC
58
54
154
266
NS
12
16
68
96
98
314
n = 538
Column 126
Totals
a) Had positive response to treatment
P(pos) = 314/538 = .584 or 58.4% chance
2. Hodgkin’s Disease
Response to Treatment
Type
None
Partial
Positive
Row
Totals
LD
44
10
18
72
LP
12
18
74
104
MC
58
54
154
266
NS
12
16
68
96
Column
Totals
126
98
314
n = 538
b) Had at least some response to treatment
P(par or pos) = (98 + 314)/538 = 412/538
= .766 or 76.6% chance
2. Hodgkin’s Disease
Type
Response to Treatment
None
Partial
Positive
Row
Totals
LD
44
10
18
72
LP
12
18
74
104
MC
58
54
154
266
NS
12
16
68
96
Column
Totals
126
98
314
n = 538
c) Had LP and positive response to treatment
P(LP and pos) = 74/538 = .138 or 13.8%
2. Hodgkin’s Disease
Response to Treatment
Type
None
Partial
Positive
Row
Totals
LD
44
10
18
72
LP
12
18
74
104
MC
58
54
154
266
NS
12
16
68
96
98
314
n = 538
Column 126
Totals
d) Had LP or NS as there histological type.
P(LP or NS) = (104 + 96)/538 = .372 or 37.2% chance
2. Hodgkin’s Disease
Response to Treatment
Type
None
Partial
Positive
Row
Totals
LD
44
10
18
72
LP
12
18
74
104
MC
58
54
154
266
NS
12
16
68
96
98
314
n = 538
Column 126
Totals
What conditional probabilities would be of interest?
EXAMPLES IN NOTES
3. Right Heart Catheterization and 30-day
Mortality (Conners, et al. 1996)
Died within 30 days?
YES
NO
Row
Totals RHC = patient had
1354
No RHC = patient did
2184 not have catheter
catheter put in
Catheter?
RHC
830
No RHC
1088
2463
Column
Totals
YES = Died within 30
3551
days
1918
3817
5735
NO = Survived 30
days
What is the probability that a heart patient in this study
died?
P(YES) = 1918 / 5735 = .3344 or 33.44%
3. Right Heart Catheterization and 30-day
Mortality (Conners, et al. 1996)
Died within 30 days?
YES
NO
Row
Totals RHC = patient had
1354
No RHC = patient did
2184 not have catheter
catheter put in
Catheter?
RHC
830
No RHC
1088
2463
Column
Totals
YES = Died within 30
3551
days
1918
3817
5735
NO = Survived 30
days
What is the probability that a heart patient had a right
heart catheter put in during treatment?
P(RHC) = 2184 / 5735 = .3808 or 38.08%
3. Right Heart Catheterization and 30-day
Mortality (Conners, et al. 1996)
Died within 30 days?
YES
NO
Row
Totals RHC = patient had
1354
No RHC = patient did
2184 not have catheter
catheter put in
Catheter?
RHC
830
No RHC
1088
2463
Column
Totals
YES = Died within 30
3551
days
1918
3817
5735
NO = Survived 30
days
What is the probability that a patient would die within 30
days given that they had a right heart catheter put in?
P(YES | RHC) = 830 / 2184 = .3800 or 38.00%
3. Right Heart Catheterization and 30-day
Mortality (Conners, et al. 1996)
Died within 30 days?
YES
NO
Row
Totals RHC = patient had
1354
No RHC = patient did
2184 not have catheter
catheter put in
Catheter?
RHC
830
No RHC
1088
2463
Column
Totals
YES = Died within 30
3551
days
1918
3817
5735
NO = Survived 30
days
What is the probability that a patient would die within 30 days
given that they did not have a right heart catheter put in?
P(YES | No RHC) = 1088 / 3551 = .3064 or 30.64%
3. Right Heart Catheterization and 30-day
Mortality (Conners, et al. 1996)
How many times more likely is a patient
who had a right heart catheter put in to die
within 30 days than patient who did not
have a Swan-Ganz line put in?
P(YES | RHC) = .3800
P(YES | No RHC) = .3064
.3800/ .3064 = 1.24 times more likely. This is called
the relative risk or risk ratio (denoted RR).
Risk of death is 24% greater for those that had a
Swan-Ganz line put in.
3. Right Heart Catheterization and 30-day
Mortality (Conners, et al. 1996)
The shading for 30-day
mortality is 1.24 times
higher for the RHC
group than for the No
RHC group (recall RR
= 1.24).
Patients having a
Swan-Ganz line put in
have 1.24 times higher
risk of death within 30days of initial
treatment.
Building a Contingency Table from a Story
4. HIV Example
A European study on the transmission of the HIV
virus involved 470 heterosexual couples.
Originally only one of the partners in each couple
was infected with the virus. There were 293
couples that always used condoms. From this
group, 3 of the non-infected partners became
infected with the virus. Of the 177 couples who
did not always use a condom, 20 of the noninfected partners became infected with the virus.
4. HIV Example
Let C be the event that the couple always
used condoms. (NC be the complement)
Let I be the event that the non-infected
partner became infected. (NI be the
complement)
Infection
Status
I
NI
Total
Condom Usage
C
NC
Total
4. HIV Example
A European study on the transmission of the HIV
virus involved 470 heterosexual couples.
Originally only one of the partners in each couple
was infected with the virus. There were 293
couples that always used condoms. From this
group, 3 of the non-infected partners became
infected with the virus.
Infection
Status
I
Condom Usage
C
NC
Total
3
NI
Total
293
470
4. HIV Example
Of the 177 couples who did not always use a
condom, 20 of the non-infected partners became
infected with the virus.
Infection
Status
Condom Usage
C
NC
NI
3
290
20
157
23
447
Total
293
177
470
I
Total
4. HIV Example
a) What proportion of the couples in this study
always used condoms?
P(C )
Condom Usage
Infection
Status
C
NC
NI
3
290
20
157
23
447
Total
293
177
470
I
Total
4. HIV Example
a) What proportion of the couples in this study
always used condoms?
P(C ) = 293/470 (= 0.623)
Condom Usage
Infection
Status
C
NC
NI
3
290
20
157
23
447
Total
293
177
470
I
Total
4. HIV Example
b) If a non-infected partner became infected,
what is the probability that he/she was one
of a couple that always used condoms?
P(C|I ) = 3/23 = 0.130
Condom Usage
Infection
Status
C
NC
NI
3
290
20
157
23
447
Total
293
177
470
I
Total
4. HIV Example
c) In what percentage of couples did the nonHIV partner become infected amongst those
that did not use condoms?
P(I|NC) = 20/177 = .113 or 11.3%
• Amongst those that did where condoms?
P(I|C) = 3/293 = .0102 or 1.02%
• What is relative risk of infection associated
with not wearing a condom?
RR = P(I|NC) / P(I|C) = 11.08 times more
likely to become infected.
4. HIV Example
The percentage of
couples where the
non-HIV partner
became infected in
the non-condom
user group is 11
times higher than
that for condom
group.
Relative Risk (RR) and Odds Ratio (OR)
Example: Age at First Pregnancy and Cervical
Cancer
A case-control study was conducted to determine
whether there was increased risk of cervical cancer
amongst women who had their first child before age
25. A sample of 49 women with cervical cancer
was taken of which 42 had their first child before the
age of 25. From a sample of 317 “similar” women
without cervical cancer it was found that 203 of
them had their first child before age 25.
Q: Do these data suggest that having a child at or
before age 25 increases risk of cervical cancer?
Relative Risk (RR) and Odds Ratio (OR)
The ODDS for an event A are defined as
P(A)
Odds for A = _______
1 – P(A)
For example suppose we roll a single die
the odds for a 3 are:
Odds for 3 = P(3)/(1 – P(3)) =
= (1/6)/(1 – (1/6)) = 1/5
1 three for every 5 rolls that don’t result in a six.
(Odds for a 3 are 1:5 and odds against are 5:1)
Relative Risk (RR) and Odds Ratio (OR)
Odds for disease
The Odds Ratio (OR) for a disease associated with
a riskthose
amongst
with risk
factor is ratio of the odds for disease for those with
riskfactor
present
factor and the odds for disease for those without the risk
factor
P(Disease|Risk Factor)
_____________________
1 – P(Disease|Risk Factor)
OR = _________________________
P(Disease|No
Risk
Factor)
_______________________
1 – P(Disease|No Risk Factor)
Odds for disease
The Odds Ratio gives us the multiplicative increase
in
amongst those
without the risk
odds associated with having the “risk factor”.
factor.
Relative Risk (RR) and Odds Ratio (OR)
Cervical Cancer
Age at 1st
Pregnancy
Case
Age < 25
42
203
245
Age > 25
7
114
121
Column
Totals
49
317
n = 366
Row
Control Totals
a) Why can’t we calculate P(Cervical Cancer | Age < 25)?
Because the number of women with disease was fixed
in advance and therefore NOT RANDOM !
Relative Risk (RR) and Odds Ratio (OR)
Cervical Cancer
Age at 1st
Pregnancy
Case
Age < 25
42
203
245
Age > 25
7
114
121
Column
Totals
49
317
n = 366
Row
Control Totals
b) What is P(risk factor|disease status) for each group?
P(Age < 25|Case) = 42/49 = .857 or 85.7%
P(Age < 25|Control) = 203/317 = .640 or 64.0%
Relative Risk (RR) and Odds Ratio (OR)
Cervical Cancer Row
Age at 1st
Pregnancy Case Control Totals
Age < 25
42
203
245
Age > 25
7
114
121
Column
Totals
49
317
n = 366
c) What are the odds for the risk factor amongst the cases?
Amongst the controls?
Odds for risk factor cases = .857/(1-.857) = 5.99
Odds for risk factor controls = .64/(1- .64) = 1.78
Relative Risk (RR) and Odds Ratio (OR)
Cervical Cancer
Age at 1st
Pregnancy
Case
Age < 25
42
203
245
Age > 25
7
114
121
Column
Totals
49
317
n = 366
Row
Control Totals
d) What is the odds ratio for the risk factor associated
with being a case?
Odds Ratio (OR) = 5.99/1.78 = 3.37, the odds for having
1st child on or before age 25 are 3.37 times higher for
women who currently have cervical cancer versus those that
do not have cervical cancer.
Relative Risk (RR) and Odds Ratio (OR)
Odds Ratio
The ratio of
dark to light
shading is 3.37
times larger for
the cervical
cancer group
than it is for
the control
group.
Relative Risk (RR) and Odds Ratio (OR)
e) Even though it is inappropriate to do so
calculate P(disease|risk status).
P(case|Age<25) = 42/245 = .171 or 17.1%
P(case|Age>25) = 7/121 = .058 or 5.8%
Now calculate the odds for disease given
the risk factor status
Odds for Disease for 1st Preg. Age < 25
= .171/(1 - .171) = .207
Odds for Disease for 1st Preg. Age > 25
= .058/(1 - .058) = .061
Relative Risk (RR) and Odds Ratio (OR)
f) Finally calculate the odds ratio for
disease associated with 1st pregnancy age
< 25 years of age.
Odds Ratio = .207/.061 = 3.37
Final Conclusion: Women who have their
This
exactly
thebefore
same age
as the
first is
child
at or
25 odds
haveratio
3.37for
times
having
the
risk
factor
(Age
<
25)
associated
the odds of developing cervical cancer when
with being in the cervical cancer group!!!!
compared to women who had their first child
after the age of 25.
Relative Risk (RR) and Odds Ratio (OR)
Risk
Factor
Status
Risk
Factor
Present
Risk
Factor
Absent
Disease Status
Case
a
c
Control
b
a X d
OR = _____
b X c
d
Much easier
computational
formula!!!
Relative Risk (RR) and Odd’s Ratio (OR)
When the disease is fairly rare, i.e. P(disease)
< .10 or 10%, then one can show that the
odds ratio and relative risk are similar.
OR is approximately equal to RR when
P(disease) < .10 or 10% chance.
In these cases we can use the phrase:
“… times more likely” when interpreting the
OR.
Relative Risk (RR) and Odds Ratio (OR)
Age at 1st
Pregnancy
Age < 25
Age > 25
Column
Totals
Case
a
42
c
7
49
Row
Control Totals
b
203
245
d
114
121
317
n = 366
OR = (42 X 114)/(7 X 203) = 3.37
Because less than 10% of the population of women develop
cervical cancer we can say women who have their first child at or
before age 25 are 3.37 times more likely to develop cervical
cancer than women who have their first child after age 25.
More About RR and OR
• The most commonly cited advantage of the RR over the OR
is that the former is the more natural interpretation. The
relative risk comes closer to what most people think of when
they compare the relative likelihood of events.
e.g. suppose there are two groups, one with a 25% chance of
mortality and the other with a 50% chance of mortality. Most
people would say that the latter group has it twice as bad.
But the odds ratio is 3, which seems too big.
RR = .50/.25 = 2.00
OR = P(death|high mortality)/P(survive|high mortality)
P(death|low mortality)/P(survive|low mortality)
= .50/(1 - .50) = 3.00
.25/(1 - .25)
More About RR and OR
Even more extreme examples are possible. A
change from 25% to 75% mortality
represents a relative risk of 3, but an odds
ratio of 9. A change from 10% to 90%
mortality represents a relative risk of 9 but an
odds ratio of 81.
More About RR and OR
• OR’s arise as part of logistic regression
which we will study later in the course.
• Despite their pitfalls OR’s are really the only
option when case-control studies are used.
• Any study of risk needs to adjust for potential
confounding factors which is typically done
using logistic regression.
© Copyright 2026 Paperzz