Linear Probability Model (LPM) Analysis of case/control studies • Case/control studies are designed to consider observed genotype as the random variable, and compare its distribution between cases and controls • The analysis and interpretation is easier if we consider disease status (case vs. control) as a random outcome, predicted by genotype 1 LPM Regression Line • These models lend themselves to analysis via logistic regression Points on the regression line are predicted probabilities of Y for each value of X 0 LPM problems Logistic Function Predicted probabilities can be >1 or <0 The error terms vary based on the size of X The errors are not normally distributed since Y takes on only two values 1 Points on the regression line are predicted probabilities of Y for each value of X p ( x) 0 e 0 1 x 1 e 0 1 x Logistic Regression – Statistical Details Logistic Regression • Response: Presence/Absence of a characteristic or disease p( xi ) • Predictor: Numeric variable observed for each case p(x) = Prob. of presence at predictor level x [sometimes p(x)] • Model: p ( xi ) e 0 1xi 1 e 0 1xi e 0 1xi 1 e 0 1xi Data: • = 0 P(Presence) is the same at each level of x Model for yi: • > 0 P(Presence) increases as x increases p( xi ) ln 0 1 xi 1 p( xi ) Yi (0,1) i 1, ,n f ( yi ) p yi (1 p)1 yi i 1, p P(Yi 1) ,n • < 0 P(Presence) decreases as x increases Allelic Odds Ratio - revisited Logistic Regression & Allelic Odds Ratios Disease Allele A a Yes πA No 1-πA πa 1-πa ORA p A / (1 p A ) p a / (1 p a ) πA represents the probability that an allele/chromosome drawn at random from the A alleles/chroms is from an individual with disease (a case subject) • We can fit a logistic regression model predicting the origin (case or control) of an allele using x - an indicator variable for allele: (0,1) = (A, a) • Testing Ho: β=0 is equivalent to testing if the OR=1 Genotype Odds Ratios - revisited Logistic Regression & Genotype Odds Ratios Disease A/A Yes πA/A No 1-πA/A A/a πA/a 1-πA/a a/a Πa/a 1-πa/a ORAA p AA / (1 p AA ) p aa / (1 p aa ) ORAa p Aa / (1 p Aa ) p aa / (1 p aa ) Multiple Logistic Regression • • Extension to more than one predictor variable (numeric or dummy variables). With p predictors, the model is: p ( x1 , • For the Genotype OR, we need 2 indicator variables, x1 & x2, to represent the genotype categories (i.e., 2 d.f.). • The hypothesis of no association is tested with Ho: β1=β2=0 95% Confidence Interval for Odds Ratio • Construct a 95% CI for : 1.96 SE 1.96 SE , 1.96 SE ^ ^ p xp constant: ORi e i • Inferences on i and ORi are conducted as in the case of a single predictor ^ • Exponentiate both endpoints of the CI for : e 1.96SE ^ , e 1.96SE ^ ^ • Adjusted Odds ratio for raising xi by 1 unit, holding other predictors ^ x p xp e 0 11 , xp ) x 1 e 0 1 1 ^ ^ ^ n L( p ) p yi (1 p )1 yi 1-parameter (no-intercept) Model i 1 exp ln p ln(1 p ) yi (1 yi ) Likelihood: n p exp yi ln ln(1 p ) 1 p i 1 n Log Likelihood: (1 p) l ( ) yi xi ln(1 exp( xi )) i 1 dl 0 Æ Solve for the MLE of d 1-parameter (no-intercept) Model Reparameterize the Likelihood function using: p ln xi 1 p n 1 L( ) exp yi xi ln i 1 1 exp( xi ) 1 1 exp( xi ) n 1 L( ) exp yi xi ln 1 exp( xi ) i 1 2-parameter Model (with Intercept) Log Likelihood: n l ( 0 , 1 ) [ yi ( 0 1 xi ) ln(1 exp( 0 1 xi ))] i 1 Solve for: l 0 0 l f ( , ) 0 1 0 1 f ( 0 , 1 ) l f 2 ( 0 , 1 ) 1 l 0 1 f1 0 Hessian Matrix = f '( 0 , 1 ) f 2 0 n exp( 0 1 xi ) 2 0 0 i 1 (1 exp( 0 1 xi )) n 1 n n xi exp( 0 1 xi ) 1 1 2 i 1 (1 exp( 0 1 xi )) n 1 n 1 f1 1 f 2 1 n n f '( n ) 1 xi exp( 0 1 xi ) 2 i 1 0 1 xi )) n xi 2 exp( 0 1 xi ) 2 i 1 (1 exp( 0 1 xi )) n (1 exp( 1 exp( 0 1 xi ) n yi 1 exp( x ) 0 1 i i 1 n xi exp( 0 1 xi ) yi xi 1 exp( x ) 0 1 i i 1 f ( n ) In general, Newton-Raphson’s method can be expanded to k possible parameters. As a result, the Hessian matrix is always of dimension kxk and all other vectors are of dimension kx1.
© Copyright 2026 Paperzz