Chapter Three: Probability

Chapter Three: Probability
1/38
3.1 Introduction
Probability is the basis for inferential statistics.
In this chapter you will
be given a definition of probability.
study probability as it relates to contingency tables and the normal
curve.
be introduced to risk ratios, odds ratios, sensitivity, specificity, and
positive and negative predictive values.
3.1 Introduction
2/38
3.2 A Definition of Probability
We define the probability of some occurrence,1 A as
P (A) =
NA
N
P () is read “The probability of ...” and A represents any event of interest.
A is the compliment of A or “not observing A.” NA is the number of events
that meet the specified criterion and N is the total number of events.
1
This definition assumes equally likely events.
3.2 A Definition of Probability
3/38
Example
Five marbles are placed in a cup. Three are red, two are white. If a marble
is randomly selected, what is the probability it is red?
NA
N
3
=
5
= .60
P (A) =
3.2 A Definition of Probability
4/38
Some Properties of Probability
Several important properties of probability can be deduced from its
definition.
1
P (A) ≥ 0. This follows because NA is a count and cannot, therefore,
be less than zero.
2
P (A) ≤ 1. This follows because NA can never exceed N.
P (A) + P A = 1 or P A = 1 − P (A). This follows because
N −N
fail to meet the stated criterion A. Thus
A outcomes
A)
P A = (N−N
=
1 − NNA = 1 − P (A).
N
3
3.2 A Definition of Probability
5/38
3.3 Contingency Tables
A contingency table is a convenient means of summarizing data.
A frequency contingency table summarizes the numbers of
observations in a data set that manifest some specified set of
characteristics.
A probability contingency table summarizes the proportions of
observations in a data set that manifest some specified set of
characteristics.
3.3 Contingency Tables
6/38
Frequency Tables
Frequency tables, such as the one represented here, show the numbers of
observations (persons, things etc.) that manifest some set of
characteristics.
D
S 9
S 2
11
3.3 Contingency Tables
D
3 12
6 8
9
7/38
Frequency Tables (continued)
S
S
D
9
2
11
D
3 12
6 8
9
Thus,
The number of persons who smoke (S) and have the disease (D) is 9.
The number who don’t smoke (S) and have the disease (D) is 2.
The number who smoke (S) and don’t have the disease (D) is 3.
The number who don’t smoke (S) and don’t have the disease (D) is
6.
3.3 Contingency Tables
8/38
Frequency Tables (continued)
S
S
D
9
2
11
D
3 12
6 8
9
Logically, the values at the table margins give the total count for the
indicated characteristics. Thus,
12 persons smoked.
eight did not smoke.
11 had disease.
nine were disease free.
3.3 Contingency Tables
9/38
Some Notation
The following notation is useful in studying probability as it relates to
contingency tables.
The probability of observing an event A:
P (A)
The probability of observing an event A and an event B:
P (AB)
The probability of observing an event A or event B:
P (A ∪ B)
The probability of observing event A given that you have observed
event (B):
P (A | B)
3.3 Contingency Tables
10/38
Calculating Probabilities
S
S
D
9
2
11
D
3 12
6 8
9
Given an observation is randomly drawn from the above table, we calculate
The probability
of selecting
8
P S = 20
= .40
The probability of selecting
P (D) = 11
20 = .55
The probability of selecting
the disease:
3
P SD = 20
= .15
The probability of selecting
disease:
6
P S D = 20
= .30
3.3 Contingency Tables
a person who does not smoke:
someone who has the disease:
someone who smokes and does not have
a non-smoker who does not have the
11/38
Calculating Probabilities (continued)
D
S 9
S 2
11
D
3 12
6 8
9
The probability of selecting someone who smokes or is without
disease: P S ∪ D = 9+3+6
= 18
20
20 = .90
The probability of selecting someone who has disease or is a
non-smoker:
P D ∪ D = 9+2+6
= 17
20
20 = .85
3.3 Contingency Tables
12/38
Calculating Probabilities (continued)
D
S 9
S 2
11
D
3 12
6 8
9
The probability of selecting someone with disease given the person
selected is a smoker:
9
P (D | S) = 12
= .75
The probability of selecting someone who doesn’t smoke given they
are disease
free:
P S | D ≈ 69 = .67
3.3 Contingency Tables
13/38
Probability Tables
S
S
D
.45
.10
.55
D
.15 .60
.30 .40
.45
A more common form of contingency table is obtained by dividing each
count in a frequency table by N in order to obtain probabilities. The
probability table shown here was constructed in this manner from the
frequency table shown previously.
3.3 Contingency Tables
14/38
Probability Tables (continued)
B
A
A
P (AB)
P AB
P (B)
B
P AB
P AB
P B
P (A)
P A
Given arbitrary variables A nd B, the cell and marginal entries depicted
here represent the values in a probability contingency table.
3.3 Contingency Tables
15/38
Probability Tables (continued)
B
A
A
P (AB)
P AB
P (B)
B
P AB
P AB
P B
P (A)
P A
Probabilities of the form P (A ∪ B) or P A ∪ B for example, would be
obtained by summing the appropriate cell entries.
3.3 Contingency Tables
16/38
Probability Tables (continued)
B
A
A
P (AB)
P AB
P (B)
B
P AB
P AB
P B
P (A)
P A
Thus P (A ∪ B) = P (AB) + P AB + P AB
and
P A ∪ B = P AB + P AB + P AB
3.3 Contingency Tables
17/38
Probability Tables (continued)
B
A
A
P (AB)
P AB
P (B)
B
P AB
P AB
P B
P (A)
P A
Conditional Probabilities are calculated in the same manner as was used
with frequency tables. Thus for example,
P (A | B) =
P (AB)
P (B)
and
P AB
P B|A =
P (A)
3.3 Contingency Tables
18/38
Independence
Two events A and B are said to be independent if
P (A | B) = P (A)
or equivalently if
P (AB) = P (A) P (B)
3.3 Contingency Tables
19/38
Independence (continued)
A
A
B
.18
.12
.30
B
.42 .60
.28 .40
.70
Q: Are A and B independent?
A: Yes.
Q: How do you know?
A: Because P (A | B) = P (A) = .60
or equivalently
P (A) P (B) = P (AB) = .18
3.3 Contingency Tables
20/38
Sensitivity
Sensitivity is the probability that a person with the disease will test
positive for that disease or
Sensitivity = P (+ | D)
3.3 Contingency Tables
21/38
Sensitivity (continued)
+
−
D
.008
.001
.009
D
.011 .019
.980 .981
.991
Sensitivity = P (+ | D)
.008
=
.009
= .89
3.3 Contingency Tables
22/38
Specificity
Specificity is the probability that a person who does not have the disease
will test negative for the disease or
Specificity = P − | D
3.3 Contingency Tables
23/38
Specificity (continued)
+
−
D
.008
.001
.009
D
.011 .019
.980 .981
.991
Specificity = P − | D
.980
=
.991
= .99
3.3 Contingency Tables
24/38
Positive Predictive Value
Positive predictive value is the probability that a person who tests
positive for a disease has that disease or
PPV = P (D | +)
3.3 Contingency Tables
25/38
PPV (continued)
+
−
D
.008
.001
.009
D
.011 .019
.980 .981
.991
PPV = P (D | +)
.008
=
.019
= .42
3.3 Contingency Tables
26/38
Negative Predictive Value
Negative predictive value is the probability that a person who tests
negative for a disease does not have the disease or
NPV = P D | −
3.3 Contingency Tables
27/38
NPV (continued)
+
−
D
.008
.001
.009
D
.011 .019
.980 .981
.991
NPV = P D | −
.980
=
.981
= .999
3.3 Contingency Tables
28/38
Prevalence
Prevalence is the probability of disease or
Prevalence = P (D)
3.3 Contingency Tables
29/38
Prevalence (continued)
+
−
D
.008
.001
.009
D
.011 .019
.980 .981
.991
Prevalence = P (D)
= .009
3.3 Contingency Tables
30/38
The Risk Ratio
The risk ratio (RR) is formed by dividing the probability of disease in
some group exposed to a potential risk factor by the probability of disease
in some group not so exposed or
RR =
3.3 Contingency Tables
P (D | E )
P D|E
31/38
Risk Ratio (continued)
D
D
E
.15
.05
.20
E
.10 .25
.70 .75
.80
P (D | E )
P D|E
.750
=
.125
= 6
RR =
3.3 Contingency Tables
32/38
The Odds Ratio
The odds that an event will occur is the probability that the event will
occur divided by the probability that the event will not occur. Thus, the
odds of disease for some group exposed to a potential risk factor would be
odds =
3.3 Contingency Tables
P (D | E )
P D|E
33/38
The Odds Ratio (continued)
Likewise, the odds of disease for some group not exposed to some
potential risk factor would be
odds =
3.3 Contingency Tables
P D|E
P D|E
34/38
The Odds Ratio (continued)
The odds ratio (OR) is defined as the odds of disease for an exposed
group divided by the odds of disease for an unexposed group or
OR =
P(D|E )
P (D|E )
P (D|E )
P (D|E )
3.3 Contingency Tables
35/38
The Odds Ratio (continued)
which simplifies to
P (D | E ) P D | E
OR =
P D|E P D|E
3.3 Contingency Tables
36/38
Odds Ratio (continued)
D
D
E
.15
.05
.20
E
.10 .25
.70 .75
.80
P (D | E ) P D | E
OR =
P D|E P D|E
(.750) (.875)
=
(.250) (.125)
= 21
This means the odds of disease in the exposed group is 21 times that of
the unexposed group.
3.3 Contingency Tables
37/38
Bayes Rule
In its simplist form Bayes rule allows you to use P (A | B) to find
P (B | A). The rule is expressed as
P (B | A) =
3.3 Contingency Tables
P (A | B) P (B)
P (A | B) P (B) + P A | B P B
38/38