A Probability Lesson FOR ANY HIGH SCHOOL MATH COURSE T EACHI NG CON T E M PORARY M AT HEMATI CS JU L I E G R AV ES JA N UA RY 2 0 1 6 A group of 300 persons was asked to sort themselves just as you did. Here are their results. Liberal Moderate Conservative Under 35 50 10 10 70 35 to 50 20 70 40 130 Over 50 10 50 40 100 80 130 90 300 Now think about selecting a person at random from this group of 300 individuals. βAt randomβ implies that each individual has a 1 in 300 chance of being selected. Liberal Under 50 35 35 to 20 50 Over 10 50 80 Moderate Conser vative 10 10 70 70 40 130 50 40 80 100 P(select liberal)= 300 130 90 300 = 4 15 90 P(select conservative)= 300 = 3 10 Liberal Under 50 35 35 to 20 50 Over 10 50 80 Moderate Conser vative 10 10 70 70 40 130 50 40 100 130 90 300 P(under P(over 70 35)= 300 100 50)= 300 = = 1 3 7 30 Liberal Under 50 35 35 to 20 50 Over 10 50 80 Moderate Conser vative 10 10 70 70 40 130 50 40 100 130 90 300 P(select under 35 and liberal)= P(select over 50 and 50 300 = 1 6 40 conservative)= 300 = 2 15 Liberal Conditional probability Under 50 35 35 to 20 50 Over 10 50 80 P(select under 35 given selected 50 liberal)= 80 P(select under 35 given selected 10 conservative)= 90 = Moderate Conser vative 10 10 70 70 40 130 50 40 100 130 90 300 5 8 = 1 9 Liberal Under 50 35 35 to 20 50 Over 10 50 80 Moderate Conser vative 10 10 70 70 40 130 50 40 100 130 90 300 10 liberal)= 80 1 P(select over 50 given selected = 8 70 P(select 35 to 50 given selected moderate)= 130 50 5 P(select liberal given selected under 35)= = 70 7 = 7 13 To recap, we have determined the following: π ππ π ππ P(select under 35)= P(select liberal)= P(select under 35 given selected liberal)= π π P(select liberal given selected under 35)= π π π π P(select under 35 AND liberal)= Notice that P(select under 35 AND liberal) β P(select under 35) X P(select liberal) 1 7 4 β β 6 30 15 However, P(select under 35 AND liberal) = P(select under 35) x P(select liberal given selected under 35) 1 7 5 = β 6 30 7 P(select under 35 AND liberal) = P(select liberal) x P(select under 35 given selected liberal) 1 4 5 = β 6 15 8 Sensitivity and Specificity These two terms both describe the effectiveness of a test used to detect a disease, a trait, or the presence of a marker in the blood. Sensitivity is the rate at which a test identifies a disease when the disease is present. Sensitivity is a conditional probability. sensitivity = P(positive test result given that the disease is present). Sensitivity is sometimes called the βtrue positive rateβ. Specificity is the rate at which a test gives a correct negative result when the disease is not present. specificity =P(negative test result given that the disease is not present). Specificity is sometimes called the βtrue negative rateβ. We would like a test to have as high a specificity as possible, and as high a sensitivity as possible. Unfortunately, in most cases there is a trade off between specificity and sensitivity. Increasing one rate leads to a decrease in the other rate. Concept of sensitivity and specificity Concept of sensitivity and specificity Concept of sensitivity and specificity Concept of sensitivity and specificity Lessons Learned A test with sensitivity 0.92 will give a positive test result for 92% of patients who have the disease. This implies that 8% of people who have the disease get a negative (and thus incorrect) test result. A test with specificity of 0.96 has a 96% chance of giving a negative test result to a patient who does not have the disease. The probability of a person who does not have the disease getting a positive (and thus incorrect) test result is 4%. Suppose also that the disease we are testing for is fairly common. For example, it may be the case that 30% of the population we can test has the disease. Note that to an individual, the prevalence of the disease may not be particularly important. Individuals are not usually interested in how prevalent any particular disease is. Instead, they typically want to know if they have the disease. We will fill in the cells to show what we expect to happen when the test is administered to 5000 individuals . Test positive Disease actually present Disease not present 1500 = 0.30 x 5000 1500 individuals have the disease and 3500 do not have the disease. Test Negative 1500 3500 5000 Test positive Disease actually present Disease not present 1380 Test Negative 1500 3500 5000 sensitivity = 0.92 = P(positive test result given that the disease is present) 1380 = 0.92 1500 Test positive . Disease actually present Disease not present Test Negative 1380 1500 3360 3500 5000 specificity = 0.96 = P(negative test result given that the disease is not present) 3360 = 0.96 3500 Test positive Disease actually present Disease not present Test Negative 1380 120 1500 140 3360 3500 1520 3580 5000 Imagine selecting an individual at random from among the 5000 tested. P(select positive test AND disease present)= P(select positive test AND disease not present) = P(select positive test given select disease present)= P(select disease present given select positive test)= P(select disease not present given select positive test)= Test positive Disease actually present Disease not present P(select positive Test Negative 1380 120 1500 140 3360 3500 1520 3580 5000 1520 test)= 5000 β0.30 140 P(select positive test given select disease not present) = 3500 β0.04 1380 P(select positive test given select disease present)=1500 β0.92 1380 P(select disease present given select positive test)=1520 β0.91 P(select disease not present given select positive 140 test)= 1520 β0.09 Test positive Disease actually present Disease not present Test Negative 1380 120 1500 140 3360 3500 1520 3580 5000 Among the 1520 individuals that test positive, 140 do not have the disease. This means the test gave these individuals a false positive result. The false positive rate for this test is 140 β 0.09. 1520 Among the 3580 individuals that test negative, 220 do actually have the disease. These people received a false negative test result. The false negative rate for this test is 120 3580 β 0.03. We can carry out a comparable analysis for a test that has the same sensitivity (0.92) and the same specificity (0.96) , but now we will test for a disease that is fairly rare. Suppose only 4% of the population has the disease we are testing for. Test positive 200 = 0.04 x 5000 184 = 0.92 x 200 4608 = 0.96 x 4800 Disease actually present Disease not present Test Negative 184 200 4608 4800 5000 Test positive Test Negative Disease actually present 184 16 200 Disease not present 192 4608 4800 376 4624 5000 4624 patients tested negative and among these 16 actually had the disease so the false negative rate is 16 4624 β 0.0035 376 patients tested positive and of these 192 did not have the disease. This shows a false positive rate of 192 376 β 0.51. What happened to cause such dramatic changes in the false positive and false negative rates? The only difference between the tables was the prevalence of the disease, i.e. the probability that an individual in the population actually has the disease. We want to understand how the prevalence of the disease influences the false positive and false negative rates for this test. Let p represent the probability that a randomly selected individual has the disease or trait we are testing for. This number is our measure of the prevalence of the disease. p = P(select an individual who has the disease) If we use N to represent the population size, we can complete a two way table. For now, we will continue to use the values 0.96 for specificity and 0.92 for sensitivity. We will fill in the table cells to show what we expect to happen when the test is administered to N individuals . Test positive Disease actually present Disease not present Test Negative 0.92pN pN 0.96(1-p)N (1-p)N N We can fill in the other cells. Test positive Test Negative Disease actually present 0.92pN 0.08pN Disease not present 0.04(1-p)N 0.96(1-p)N pN (1-p)N N Algebra will help us find the column totals. Test positive Disease actually present Disease not present Test Negative 0.92pN 0.08pN pN 0.04(1-p)N 0.96(1-p)N (1-p)N (0.88p+0.04)N (-0.88p+0.96)N N Test positive Disease actually present Disease not present Test Negative 0.92pN 0.08pN pN 0.04(1-p)N 0.96(1-p)N (1-p)N (0.88p+0.04)N (-0.88p+0.96)N To find the false positive rate, we need the probability that a patient who tests positive does not have the disease. That is, P(select disease not present given selected test positive) The false positive rate is 0.04 1βπ π 0.88π+0.04 π = β0.04π+0.04 0.88π+0.04 N To find the false negative rate, we determine this probability: P(select disease present given selected test negative) The false negative rate is 0.08ππ β0.88π+0.96 π Test positive Disease actually present Disease not present = 0.08π β0.88π+0.96 Test Negative 0.92pN 0.08pN pN 0.04(1-p)N 0.96(1-p)N (1-p)N (0.88p+0.04)N (-0.88p+0.96)N N So the false positive rate and the false negative rate are each a function of the prevalence (p)of the disease in the population. The false The false β0.04π+0.04 positive rate is 0.88π+0.04 0.08π negative rate is β0.88π+0.96 Questions A test for malaria has a 95% true positive rate and a 98 % true negative rate. If 0.08% of residents of US have malaria, what is the probability that an individual who tests negative actually has malaria? If 45% of population in Ghana has malaria, what is the false positive rate? How high must the sensitivity of the test be to ensure that the false negative rate is below 25%? What prevalence of disease results in false positive rate of under 10%? For what disease prevalence is the false negative rate higher than 50%? β0.04π₯+0.04 0.88π₯+0.04 We can study the function π¦ = to see how the false positive rate (y) varies as the prevalence of the disease (x) changes. 0.08π₯ β0.88π₯+0.96 The function π¦ = represents the relationship between the false negative rate and the disease prevalence. False positive and false negative rates as functions of disease prevalence We can generalize even furtherβ¦ p = prevalence of disease in population F = specificity of test E = sensitivity of test Test positive Disease actually present Disease not present Test Negative EpN (1-E)pN pN (1-F)(1-p)N F(1-p)N (1-p)N ((1-F)(1-p)+Ep)N ((1-E)p+F(1-p))N N The false positive rate is The false negative rate is (1βπΉ)(1βπ) 1βπΉ 1βπ +πΈπ 1βπΈ π πΉ 1βπ + 1βπΈ π We can think of either of p, F , or E as the independent variable, with the other two as parameters. Where might a problem like this fit in the math courses you teach? Questions? [email protected]
© Copyright 2026 Paperzz