Linear Probability Model (LPM) LPM problems Logistic Function

Linear Probability Model (LPM)
Analysis of case/control studies
• Case/control studies are designed to consider observed genotype as the
random variable, and compare its distribution between cases and controls
•
The analysis and interpretation is easier if we consider disease status (case
vs. control) as a random outcome, predicted by genotype
1
LPM Regression Line
•
These models lend themselves to analysis via logistic regression
Points on the regression line are predicted
probabilities of Y for each value of X
0
LPM problems
Logistic Function
 Predicted probabilities can be >1 or <0
 The error terms vary based on the size of X
 The errors are not normally distributed since Y takes on only two values
1
Points on the regression line are predicted
probabilities of Y for each value of X
p ( x) 
0
e 0  1 x
1  e 0  1 x
Logistic Regression – Statistical Details
Logistic Regression
• Response: Presence/Absence of a characteristic or disease
p( xi ) 
• Predictor: Numeric variable observed for each case
p(x) = Prob. of presence at predictor level x [sometimes p(x)]
• Model:
p ( xi ) 
e 0  1xi
1  e 0  1xi
e 0  1xi
1  e 0  1xi
Data:
•  = 0  P(Presence) is the same at each level of x
Model for yi:
•  > 0  P(Presence) increases as x increases

 p( xi ) 
ln 
  0  1 xi
 1  p( xi ) 
Yi  (0,1) i  1,
,n
f ( yi )  p yi (1  p)1 yi i  1,
p  P(Yi  1)
,n
•  < 0  P(Presence) decreases as x increases
Allelic Odds Ratio - revisited
Logistic Regression & Allelic Odds Ratios
Disease
Allele
A
a
Yes
πA
No
1-πA
πa
1-πa
ORA 
p A / (1  p A )
p a / (1  p a )
πA represents the probability that an allele/chromosome drawn at random from the
A alleles/chroms is from an individual with disease (a case subject)
•
We can fit a logistic regression model predicting the origin (case or control)
of an allele using x - an indicator variable for allele: (0,1) = (A, a)
•
Testing Ho: β=0 is equivalent to testing if the OR=1
Genotype Odds Ratios - revisited
Logistic Regression & Genotype Odds Ratios
Disease
A/A
Yes
πA/A
No
1-πA/A
A/a
πA/a
1-πA/a
a/a
Πa/a
1-πa/a
ORAA 
p AA / (1  p AA )
p aa / (1  p aa )
ORAa 
p Aa / (1  p Aa )
p aa / (1  p aa )
Multiple Logistic Regression
•
•
Extension to more than one predictor variable (numeric or dummy
variables).
With p predictors, the model is:
p ( x1 ,
•
For the Genotype OR, we need 2 indicator variables, x1 & x2, to represent
the genotype categories (i.e., 2 d.f.).
•
The hypothesis of no association is tested with Ho: β1=β2=0
95% Confidence Interval for Odds Ratio
• Construct a 95% CI for  :
  1.96 SE      1.96 SE  ,   1.96 SE  
^
^
 p xp
constant:
ORi  e i
• Inferences on i and ORi are conducted as in the case of a single
predictor
^

• Exponentiate both endpoints of the CI for  :
 e  1.96SE ^ , e  1.96SE ^ 




^
• Adjusted Odds ratio for raising xi by 1 unit, holding other predictors
^

  x   p xp
e 0 11
, xp ) 
  x 
1 e 0 1 1
^
^
^
n
L( p ) p yi (1 p )1 yi
1-parameter (no-intercept) Model
i 1
exp ln p ln(1 p ) yi
(1 yi )
Likelihood:
n p exp yi ln ln(1 p ) 1 p i 1 n
Log Likelihood:
(1 p) l ( ) yi xi ln(1 exp( xi )) i 1
dl
0 Æ Solve for the MLE of d
1-parameter (no-intercept) Model
Reparameterize the Likelihood function using:
p ln xi
1 p n 1
L( ) exp yi xi ln i 1 1 exp( xi ) 1
1 exp( xi )
n 1
L( ) exp yi xi ln 1 exp( xi ) i 1 2-parameter Model (with Intercept)
Log Likelihood:
n
l ( 0 , 1 ) [ yi ( 0 1 xi ) ln(1 exp( 0 1 xi ))]
i 1
Solve for:
l
0
0
l f ( , ) 0
1 0 1 f ( 0 , 1 ) l f 2 ( 0 , 1 ) 1
l
0
1
f1
0
Hessian Matrix = f '( 0 , 1 ) f 2
0
n exp( 0 1 xi )
2
0 0 i 1 (1 exp( 0 1 xi ))
n 1 n n
xi exp( 0 1 xi )
1 1 2
i 1 (1 exp( 0 1 xi ))
n 1
n 1
f1 1 f 2 1 n
n f '( n ) 1
xi exp( 0 1 xi ) 2
i 1
0 1 xi ))
n
xi 2 exp( 0 1 xi ) 2
i 1 (1 exp( 0 1 xi )) n
(1 exp( 1
exp( 0 1 xi ) n
yi 1 exp( x ) 0
1 i
i 1
n
xi exp( 0 1 xi ) yi xi 1 exp( x ) 0
1 i i 1
f ( n )
In general, Newton-Raphson’s method can be expanded to k possible
parameters. As a result, the Hessian matrix is always of dimension
kxk and all other vectors are of dimension kx1.