Count Data 1. Estimating & testing proportions: Ten customers, 2 purchase a product. We estimate the probability p of purchase as p=0.20 for all customers. Could p really be 0.50 in the population? Binomial n independent trials Each results in event (Y=1) or nonevent (Y=0) p = probability of event: Constant on all trials. Mean of Y is p Variance is E{(Y-p)2}=p(1-p)2+(1-p)(0-p)2=p(1-p) S = sum of Y’s =observed number of events in n trials Pr{S=r} = n!/[r!(n-r)!]pr(1-p)(n-r) n! = n(n-1)(n-2)…(1), 0! = 1! = 1. p known Pr{S} is probability function. p to be estimated and r known Pr{S} is now a function of p and is known as the likelihood function L(p). Logarithm is ln(L(p)). Ex: (r=2 n=10) : 45 p2(1-p)8 maximum at p=0.20. Maximum Likelihood: Visually: Plot L(p) versus p Find the value of p that makes what we saw (2 events, 8 non-events) most likely i.e. Find p to maximize L(p) = 45p2(1-p)8 i.e. Find p to maximize simpler L(p) = p2(1-p)8 i.e. Find p to maximize ln(L(p)) = 2ln(p)+8ln(1-p) i.e. Find p to make 2/p-8/(1-p) =0 (chain rule) or… Find p to minimize -2ln(L(p)) = -4ln(p)-16ln(1-p) Can’t maximize analytically? Use Gauss-Newton search Gauss-Newton to make f(x)=0: (1) Make a guess for x (2) Iterate this: x x−f(x)/f’(x) Example: f(x) is our derivative, start at x=0.9 Top curve: derivative (right scale) Bottom curve: -2 ln(L(x)) (left scale) 10 Gauss-Newton steps Obs p change N2LL step 1 2 3 4 5 6 7 8 9 10 0.90000 0.80308 0.62096 0.32714 0.16828 0.19585 0.19993 0.20000 0.20000 0.20000 -0.09692 -0.18211 -0.29383 -0.15886 0.02758 0.00408 0.00007 0.00000 0.00000 0.00000 29.6495 19.2630 9.8146 3.1956 2.4633 2.3958 2.3947 2.3947 2.3947 2.3947 1 2 3 4 5 6 7 8 9 10 Run Logistic_A.sas demo. Deriv Deriv2 155.556 76.269 35.771 11.552 -4.533 -0.527 -0.008 -0.000 -0.000 0.000 1604.94 418.80 121.74 72.72 164.38 129.02 125.06 125.00 125.00 125.00 2. Contingency Tables Observed Coupon No Coupon Purchase 86 24 No Purchase 14 76 Expected (under H0: no coupon effect) Purchase No Purchase Coupon 55 45 No Coupon 55 45 (O − E ) 2 =77.65 Pearson Chi-square (k=1 df) χ = all∑ E cells 2 k Compare to Chi-square 1 df =1.962 (significant) Likelihood Coupon No Coupon Purchase p186 p224 No Purchase (1-p1)14 (1-p2)76 Likelihood is C p186(1-p1)14p224(1-p2)76 Max at p1=0.86, p2=0.24 Max ln(L) is 86ln(.86)+…+76ln(.76)=-95.6043 under H0:p1=p2: Max at p1= p2=0.55, (1-p1) = (1-p2) =0.45 Max ln(L) is 86ln(.55)+…+76ln(.45)=-137.628 Likelihood ratio χ2 test (change in -2 ln(L) values) = 2(137.628-95.6043) = 84.0468 Close, but not the same as Pearson Chi-square (77.65) See Logistic_A.sas demo, last part. 3. Logistic Regression: X = food storage temperature (degrees C) Y = 1 if spoilage after 2 months, 0 otherwise X: -14 -8 -9 -6 2 3 8 9 10 16 Y: 0 0 0 1 0 0 1 1 1 1 Regress Y on X: Problem: Predicted probabilities >1 or < 0. Idea: Convert p to logit Logit = ln(p/(1-p)) = ln(odds) Model Logit = β0 + β1X p= exp(Logit)/(1+exp(Logit))= exp( β0 + β1X )/(1+ exp( β0 + β1X )) So… use exp( β0 + β1X )/(1+ exp( β0 + β1X )) for p in the likelihood function (you know X) then find betas that maximize this function. Equivalently, minimize -2 ln(likelihood). Any betas whose -2 ln(likelihood) differs from that of the maximum likelihood betas by an amount exceeding the Chi-square 95% point would be rejected in a 5% hypothesis test. Therefore if we truncate our plot at the right point we will cut off the rejected set of betas and have an approximate 95% confidence region for the pair of betas. Run demo: Logistic_B.sas intercept= -0.2878 slope= 0.2083 Pairs: one 0 and one 1 Concordant: actual 1 has higher predicted probability than actual 0 (1 is to the right of 0 when slope > 0) Discordant pairs: Actual 0 has higher probability of being 1 than does actual 1. We have 5 0’s, 5 1’s, so 5x5=25 pairs. Two of those 25 (circled) are discordant and there are no ties so 23/25 =92% are concordant. proc logistic data=logistic; model spoiled(event="1")=temperature/ itprint ctable pprob=0.5; Percent Concordant Percent Discordant Percent Tied Pairs 92.0 8.0 0.0 25 Somers' D Gamma Tau-a c 0.840 0.840 0.467 0.920 Prior probability 0.5: Classify any point with higher probability than 0.5 as 1, others as 0. You will have some misclassifications. Classification Table Prob Level 0.500 Correct NonEvent Event 4 3 Incorrect NonEvent Event 2 1 Correct 70.0 Percentages Sensi- Speci- False tivity ficity POS 80.0 60.0 33.3 Split point at X with -0.2878 + .2083X = 0 (why?) 4 correct events (at X = 8, 9, 10, 16). 3 correct non-events (at X= -14, -9, -8). 2 incorrect events (at X = 2, 3) 1 incorrect non-event (at X = -6). False NEG 25.0 Sensitivity: probability of calling an event an event: 1+4=5 actual events, we predicted 4 of them so 4/5=80% Specificity: Probability of calling an actual non-event a non-event: 3+2=5 non-events of which we predicted 3 so 3/5=60% (denominators = numbers of actuals) False positives: We predicted there would be 2+4 events but were wrong twice so 2/6 = 33.3%. False negatives: We predicted there would be 4 nonevents but were wrong once so 1/4=25% (denominators = numbers of predictions) Odds Ratio: Old Logit = β0 + β1 X = ln(odds at X) New Logit = β0 + β1 (X+1) = ln(odds at X+1) New Logit – Old Logit = β1= ln(new odds)-ln(old odds) = ln( (new odds)/(old odds) )= ln(odds ratio) so… odds ratio = exp(β1) eβ = 1 + β + β2/2! + β3/3! + …. (Taylor) eβ is approximately 1 + β when β is small Other Stats (source, SAS online help): The following statistics are all rank based correlation statistics for assessing the predictive ability of a model: nc= # concordant(23), nd= # discordant(2), N = # points(10), t = # pairs with different responses(25) C (area under the ROC curve) (nc + ½ (# ties))/ t Somers’ D (nc-nd)/ t Goodman-Kruskal Gamma (nc-nd)/ (nc+nd) Kendall’s Tau-a (nc-nd)/ (½N(N-1)) Percent Concordant Percent Discordant Percent Tied Pairs 92.0 8.0 0.0 25 Somers' D Gamma Tau-a c 0.840 0.840 0.467 0.920 Appendix: Details Exactly what is the food example likelihood function? X: -14 -8 -9 -6 2 3 8 9 10 16 Y: 0 0 0 1 0 0 1 1 1 1 L = (1-p1)(1-p2)(1-p3)(p4)(1-p5)(1-p6)(p7)(p8)(p9)(p10) = exp( β 0 − 14 β1 ) exp( β 0 − 8β1 ) exp( β 0 − 9 β1 ) exp( β 0 − 6 β1 ) exp( β 0 + 16 β1 ) 1 − 1 − 1 − 1 + exp( β 0 − 14 β1 ) 1 + exp( β 0 − 8β1 ) 1 + exp( β 0 − 9 β1 ) 1 + exp( β 0 − 6 β1 ) 1 + exp( β 0 + 16 β1 ) This is a function, L(β0,β1), of β0 and β1 X Recall: exp(X) is just another way of writing e . Algebra: If Logit=L=ln(p/(1-p)) then eL=p/(1-p), eL-peL=p, eL=p(1+eL) and so p=eL/(1+eL) =1/(e-L+1) =1/(1+e-L)
© Copyright 2025 Paperzz