Chapter Three: Probability 1/38 3.1 Introduction Probability is the basis for inferential statistics. In this chapter you will be given a definition of probability. study probability as it relates to contingency tables and the normal curve. be introduced to risk ratios, odds ratios, sensitivity, specificity, and positive and negative predictive values. 3.1 Introduction 2/38 3.2 A Definition of Probability We define the probability of some occurrence,1 A as P (A) = NA N P () is read “The probability of ...” and A represents any event of interest. A is the compliment of A or “not observing A.” NA is the number of events that meet the specified criterion and N is the total number of events. 1 This definition assumes equally likely events. 3.2 A Definition of Probability 3/38 Example Five marbles are placed in a cup. Three are red, two are white. If a marble is randomly selected, what is the probability it is red? NA N 3 = 5 = .60 P (A) = 3.2 A Definition of Probability 4/38 Some Properties of Probability Several important properties of probability can be deduced from its definition. 1 P (A) ≥ 0. This follows because NA is a count and cannot, therefore, be less than zero. 2 P (A) ≤ 1. This follows because NA can never exceed N. P (A) + P A = 1 or P A = 1 − P (A). This follows because N −N fail to meet the stated criterion A. Thus A outcomes A) P A = (N−N = 1 − NNA = 1 − P (A). N 3 3.2 A Definition of Probability 5/38 3.3 Contingency Tables A contingency table is a convenient means of summarizing data. A frequency contingency table summarizes the numbers of observations in a data set that manifest some specified set of characteristics. A probability contingency table summarizes the proportions of observations in a data set that manifest some specified set of characteristics. 3.3 Contingency Tables 6/38 Frequency Tables Frequency tables, such as the one represented here, show the numbers of observations (persons, things etc.) that manifest some set of characteristics. D S 9 S 2 11 3.3 Contingency Tables D 3 12 6 8 9 7/38 Frequency Tables (continued) S S D 9 2 11 D 3 12 6 8 9 Thus, The number of persons who smoke (S) and have the disease (D) is 9. The number who don’t smoke (S) and have the disease (D) is 2. The number who smoke (S) and don’t have the disease (D) is 3. The number who don’t smoke (S) and don’t have the disease (D) is 6. 3.3 Contingency Tables 8/38 Frequency Tables (continued) S S D 9 2 11 D 3 12 6 8 9 Logically, the values at the table margins give the total count for the indicated characteristics. Thus, 12 persons smoked. eight did not smoke. 11 had disease. nine were disease free. 3.3 Contingency Tables 9/38 Some Notation The following notation is useful in studying probability as it relates to contingency tables. The probability of observing an event A: P (A) The probability of observing an event A and an event B: P (AB) The probability of observing an event A or event B: P (A ∪ B) The probability of observing event A given that you have observed event (B): P (A | B) 3.3 Contingency Tables 10/38 Calculating Probabilities S S D 9 2 11 D 3 12 6 8 9 Given an observation is randomly drawn from the above table, we calculate The probability of selecting 8 P S = 20 = .40 The probability of selecting P (D) = 11 20 = .55 The probability of selecting the disease: 3 P SD = 20 = .15 The probability of selecting disease: 6 P S D = 20 = .30 3.3 Contingency Tables a person who does not smoke: someone who has the disease: someone who smokes and does not have a non-smoker who does not have the 11/38 Calculating Probabilities (continued) D S 9 S 2 11 D 3 12 6 8 9 The probability of selecting someone who smokes or is without disease: P S ∪ D = 9+3+6 = 18 20 20 = .90 The probability of selecting someone who has disease or is a non-smoker: P D ∪ D = 9+2+6 = 17 20 20 = .85 3.3 Contingency Tables 12/38 Calculating Probabilities (continued) D S 9 S 2 11 D 3 12 6 8 9 The probability of selecting someone with disease given the person selected is a smoker: 9 P (D | S) = 12 = .75 The probability of selecting someone who doesn’t smoke given they are disease free: P S | D ≈ 69 = .67 3.3 Contingency Tables 13/38 Probability Tables S S D .45 .10 .55 D .15 .60 .30 .40 .45 A more common form of contingency table is obtained by dividing each count in a frequency table by N in order to obtain probabilities. The probability table shown here was constructed in this manner from the frequency table shown previously. 3.3 Contingency Tables 14/38 Probability Tables (continued) B A A P (AB) P AB P (B) B P AB P AB P B P (A) P A Given arbitrary variables A nd B, the cell and marginal entries depicted here represent the values in a probability contingency table. 3.3 Contingency Tables 15/38 Probability Tables (continued) B A A P (AB) P AB P (B) B P AB P AB P B P (A) P A Probabilities of the form P (A ∪ B) or P A ∪ B for example, would be obtained by summing the appropriate cell entries. 3.3 Contingency Tables 16/38 Probability Tables (continued) B A A P (AB) P AB P (B) B P AB P AB P B P (A) P A Thus P (A ∪ B) = P (AB) + P AB + P AB and P A ∪ B = P AB + P AB + P AB 3.3 Contingency Tables 17/38 Probability Tables (continued) B A A P (AB) P AB P (B) B P AB P AB P B P (A) P A Conditional Probabilities are calculated in the same manner as was used with frequency tables. Thus for example, P (A | B) = P (AB) P (B) and P AB P B|A = P (A) 3.3 Contingency Tables 18/38 Independence Two events A and B are said to be independent if P (A | B) = P (A) or equivalently if P (AB) = P (A) P (B) 3.3 Contingency Tables 19/38 Independence (continued) A A B .18 .12 .30 B .42 .60 .28 .40 .70 Q: Are A and B independent? A: Yes. Q: How do you know? A: Because P (A | B) = P (A) = .60 or equivalently P (A) P (B) = P (AB) = .18 3.3 Contingency Tables 20/38 Sensitivity Sensitivity is the probability that a person with the disease will test positive for that disease or Sensitivity = P (+ | D) 3.3 Contingency Tables 21/38 Sensitivity (continued) + − D .008 .001 .009 D .011 .019 .980 .981 .991 Sensitivity = P (+ | D) .008 = .009 = .89 3.3 Contingency Tables 22/38 Specificity Specificity is the probability that a person who does not have the disease will test negative for the disease or Specificity = P − | D 3.3 Contingency Tables 23/38 Specificity (continued) + − D .008 .001 .009 D .011 .019 .980 .981 .991 Specificity = P − | D .980 = .991 = .99 3.3 Contingency Tables 24/38 Positive Predictive Value Positive predictive value is the probability that a person who tests positive for a disease has that disease or PPV = P (D | +) 3.3 Contingency Tables 25/38 PPV (continued) + − D .008 .001 .009 D .011 .019 .980 .981 .991 PPV = P (D | +) .008 = .019 = .42 3.3 Contingency Tables 26/38 Negative Predictive Value Negative predictive value is the probability that a person who tests negative for a disease does not have the disease or NPV = P D | − 3.3 Contingency Tables 27/38 NPV (continued) + − D .008 .001 .009 D .011 .019 .980 .981 .991 NPV = P D | − .980 = .981 = .999 3.3 Contingency Tables 28/38 Prevalence Prevalence is the probability of disease or Prevalence = P (D) 3.3 Contingency Tables 29/38 Prevalence (continued) + − D .008 .001 .009 D .011 .019 .980 .981 .991 Prevalence = P (D) = .009 3.3 Contingency Tables 30/38 The Risk Ratio The risk ratio (RR) is formed by dividing the probability of disease in some group exposed to a potential risk factor by the probability of disease in some group not so exposed or RR = 3.3 Contingency Tables P (D | E ) P D|E 31/38 Risk Ratio (continued) D D E .15 .05 .20 E .10 .25 .70 .75 .80 P (D | E ) P D|E .750 = .125 = 6 RR = 3.3 Contingency Tables 32/38 The Odds Ratio The odds that an event will occur is the probability that the event will occur divided by the probability that the event will not occur. Thus, the odds of disease for some group exposed to a potential risk factor would be odds = 3.3 Contingency Tables P (D | E ) P D|E 33/38 The Odds Ratio (continued) Likewise, the odds of disease for some group not exposed to some potential risk factor would be odds = 3.3 Contingency Tables P D|E P D|E 34/38 The Odds Ratio (continued) The odds ratio (OR) is defined as the odds of disease for an exposed group divided by the odds of disease for an unexposed group or OR = P(D|E ) P (D|E ) P (D|E ) P (D|E ) 3.3 Contingency Tables 35/38 The Odds Ratio (continued) which simplifies to P (D | E ) P D | E OR = P D|E P D|E 3.3 Contingency Tables 36/38 Odds Ratio (continued) D D E .15 .05 .20 E .10 .25 .70 .75 .80 P (D | E ) P D | E OR = P D|E P D|E (.750) (.875) = (.250) (.125) = 21 This means the odds of disease in the exposed group is 21 times that of the unexposed group. 3.3 Contingency Tables 37/38 Bayes Rule In its simplist form Bayes rule allows you to use P (A | B) to find P (B | A). The rule is expressed as P (B | A) = 3.3 Contingency Tables P (A | B) P (B) P (A | B) P (B) + P A | B P B 38/38
© Copyright 2026 Paperzz