Now we will begin our discussion of logistic regression. Just as for linear regression, we need to look at each of the three basic cases for predictors: • One binary predictor • One multi-level predictor • One continuous predictor Before that, we need to introduce this new model. 1 We will use the WCGS data so here again we have given the SAS code to set up the formats for the Yes/No variables ARCUS and CHD69 and our age groups, AGEC. 2 As in our discussion of contingency table methods, our outcome will be the presence or absence of coronary heart disease by the end of the 10-year study. This variable is coded as 1 for YES and 0 for NO. Recall that in this study, the outcome probabilities or risks represent the incidence proportion of CHD over the entire duration of the study (approximately ten years). 3 There are clear limitations to contingency table methods. They are less intuitive when more than two categorical variables are involved and we cannot study the association between a binary outcome and a continuous predictor directly. Logistic regression generalizes contingency table methods for binary outcomes and will handle these two issues easily. In simple logistic regression, only one predictor is involved. The predictor can be either categorical or continuous. Logistic regression will model one of the binary outcome values as the EVENT, it is VERY important to be certain the software you are using is predicting the probability you desire. If you are getting the opposite answer from what you expect this is one possible reason. Before we define the model we will discuss a few mathematical details and reasons behind our model. 4 • We will see that in the logistic regression model the left side of our model will be the LOG-ODDS of the outcome EVENT of interest for a given set of X’s. • This means we will need to recall or learn these algebra rules regarding LOGs and EXPONENTIALS. Remember that in statistics we use LOG to represent LOG base e. • Logarithms return powers, exponents. For example in the first rule we see that if we take the LOG of e to the A power, the result is just the power of e which was A. So the LOG of a number will return the power of e that results in that number. • That isn’t exactly important other than you know that LOG and e cancel in the way seen in the first rule. However, if that makes sense to you then the next rule will be easier to believe. • It says that the LOG of the quantity A DIVDED by B is the LOG of A MINUS the log of B. Again you need to use this rule whether you believe it or not but • Do you recall the algebra rule about dividing two numbers with like bases? We have given it here at the bottom for the base e. We have e to the p divided by e to the q. When we divide two numbers with like bases we simplify by subtracting the exponents to get e to the p-q. • This is why the 2nd LOG rule is correct. To find the LOG of (A divided by B) we can rewrite A and B in terms of base e to a power and then simplify by subtracting their powers, the log of this result would simply return the powers of e that results in A minus the power of e that results in B which is LOG(A) minus LOG(B). • We will see these rules in use as we work with logistic regression models. We will minimize the algebra required but some will be necessary. 5 When we introduce the logistic model, you will probably feel it is more complex than you would like but there is a good reason. This image shows some possible models for how probabilities change with the values of a predictor X. If we were to try to model the probability with a straight line (top left graph) there are a number of possible problems • The outcome in our linear model was continuous – here the outcome is 0 or 1. • Predicted probabilities that result from a linear model can be negative or greater than 1 – both are not possible for probabilities so it would not make much sense to use a model that can produce impossible results. • Another issue is that the way the probability changes is usually not linearly related to the covariate. The top right graph is an exponential or log-linear model. The log of outcome risk is linear in this model. • This model can handle a variety of exponential shapes • Predicted Probabilities are now constrained to be larger than zero which is great but • Predicted Probabilities can still be larger than 1 which is not good 6 On the bottom left we have a step function. • This is constrained to be between 0 and 1 and will be what will result from categorizing the predictor and estimating the risk directly in each group. • This is a good an flexible method and will be equivalent to using a multi-level categorical predictor in a logistic regression model but we will not model the probabilities directly, instead we will model the log-odds. • This method does depend on the categories chosen but does have advantages to treating the predictor as continuous as we will see – however this type of model would not be a good basis as it would not handle continuous predictors directly. On the bottom right we have the logistic function. Which will be our choice. The basic function is e to the x divided by the quantity 1 + e to the x. • We can see that this function is constrained to be between 0 and 1. • It allows for a smooth curve to represent the probability which is non-linear in the way that many risks work. • For low values there is low risk then at a certain point there is an increase in risk that eventually levels out. • The shape of this is flexible and can go in the reverse direction 7 Here are a few graphs showing some of the possible variations in logistic functions. The change can be very slow as in the first graph or more rapid occurring over a small range of x values. The location of the change may also be shifted along the x-axis as needed. This produces a very flexible model which legitimately constrains the resulting probabilities to be between zero and one. It will also have the important benefit that a specific transformation is linear in X which will allow us to partly use our knowledge from linear regression when conducting logistic regression. 8 Using the logistic model does imply the strong assumption that the logistic function represents the probability function accurately. • To avoid having to carry a lot of exponents, we will use the notation exp(A) to represent e to the A. Using this notation, we can write our logistic model for our probabilities as • P(X) = exp(beta_0 + beta_1 times X) divided by the quantity 1 + exp(beta_0 + beta_1 times X) • Notice the denominator is simply 1 + the numerator which can be helpful when performing by-hand calculations. • We won’t go through the algebra but it can be shown that this is equivalent to the equation below called the logit function. I rarely use the far left notation but instead use the center expression LOG of the quantity P(X) divided by (1 – P(X)) • P(X) divided by 1-P(X) is the odds of the event and thus the left side is the LOGODDS of the event. • The useful part is the right side which says that the LOG-ODDS is LINEAR in X. This right side is similar to our linear regression model except here we do not have an ERROR TERM, even in our theoretical model. • We will see that the exponentiated regression coefficients have the nice interpretation as odds ratios. The regression coefficients themselves are LOG ODDS RATIOS. • Notice that this model also says that the odds of the event occurring is EXP(beta_0 + beta_1 times X). We will see that this model is a multiplicative risks model. 9 The logistic model makes the following assumptions about the outcome Y. • The outcome Y follows a Bernoulli distribution • A Bernoulli distribution is the same as a Binomial distribution with 1 trial (where n = 1). • The mean of Y is given by the logistic function. • The observations are independent Notice that we can still use the notation E[Y|X] since the mean of a binary 0/1 variable is the probability that 1 occurs. 10 Compared to linear regression model: • There is no assumption of constant variance of Y over the range of X values, in fact the variance of Y given X is known to have a specific relationship based upon the assumption of a Bernoulli outcome – it is P(X) times (1- P(X)) and so is clearly not constant over the range of X values. • Also the random aspect is not included as an additive term in the regression equation, however, it is still an integral part of estimation and inference. Now we will go through an example for each of the three possible types of predictors and discuss how to interpret the results. 11
© Copyright 2026 Paperzz