1 Now we will begin our discussion of logistic regression. Just as for

Now we will begin our discussion of logistic regression. Just as for linear regression,
we need to look at each of the three basic cases for predictors:
• One binary predictor
• One multi-level predictor
• One continuous predictor
Before that, we need to introduce this new model.
1
We will use the WCGS data so here again we have given the SAS code to set up the
formats for the Yes/No variables ARCUS and CHD69 and our age groups, AGEC.
2
As in our discussion of contingency table methods, our outcome will be the presence
or absence of coronary heart disease by the end of the 10-year study.
This variable is coded as 1 for YES and 0 for NO.
Recall that in this study, the outcome probabilities or risks represent the incidence
proportion of CHD over the entire duration of the study (approximately ten years).
3
There are clear limitations to contingency table methods. They are less intuitive when
more than two categorical variables are involved and we cannot study the association
between a binary outcome and a continuous predictor directly.
Logistic regression generalizes contingency table methods for binary outcomes and
will handle these two issues easily.
In simple logistic regression, only one predictor is involved. The predictor can be either
categorical or continuous.
Logistic regression will model one of the binary outcome values as the EVENT, it is
VERY important to be certain the software you are using is predicting the probability
you desire. If you are getting the opposite answer from what you expect this is one
possible reason.
Before we define the model we will discuss a few mathematical details and reasons
behind our model.
4
• We will see that in the logistic regression model the left side of our model will be the
LOG-ODDS of the outcome EVENT of interest for a given set of X’s.
• This means we will need to recall or learn these algebra rules regarding LOGs and
EXPONENTIALS. Remember that in statistics we use LOG to represent LOG base
e.
• Logarithms return powers, exponents. For example in the first rule we see that if we
take the LOG of e to the A power, the result is just the power of e which was A. So
the LOG of a number will return the power of e that results in that number.
• That isn’t exactly important other than you know that LOG and e cancel in the way
seen in the first rule. However, if that makes sense to you then the next rule will be
easier to believe.
• It says that the LOG of the quantity A DIVDED by B is the LOG of A MINUS the log
of B. Again you need to use this rule whether you believe it or not but
• Do you recall the algebra rule about dividing two numbers with like bases? We have
given it here at the bottom for the base e. We have e to the p divided by e to the q.
When we divide two numbers with like bases we simplify by subtracting the
exponents to get e to the p-q.
• This is why the 2nd LOG rule is correct. To find the LOG of (A divided by B) we can
rewrite A and B in terms of base e to a power and then simplify by subtracting their
powers, the log of this result would simply return the powers of e that results in A
minus the power of e that results in B which is LOG(A) minus LOG(B).
• We will see these rules in use as we work with logistic regression models. We will
minimize the algebra required but some will be necessary.
5
When we introduce the logistic model, you will probably feel it is more complex than
you would like but there is a good reason. This image shows some possible models for
how probabilities change with the values of a predictor X.
If we were to try to model the probability with a straight line (top left graph) there are a
number of possible problems
• The outcome in our linear model was continuous – here the outcome is 0 or 1.
• Predicted probabilities that result from a linear model can be negative or greater
than 1 – both are not possible for probabilities so it would not make much sense to
use a model that can produce impossible results.
• Another issue is that the way the probability changes is usually not linearly related
to the covariate.
The top right graph is an exponential or log-linear model. The log of outcome risk is
linear in this model.
• This model can handle a variety of exponential shapes
• Predicted Probabilities are now constrained to be larger than zero which is great but
• Predicted Probabilities can still be larger than 1 which is not good
6
On the bottom left we have a step function.
• This is constrained to be between 0 and 1 and will be what will result from
categorizing the predictor and estimating the risk directly in each group.
• This is a good an flexible method and will be equivalent to using a multi-level
categorical predictor in a logistic regression model but we will not model the
probabilities directly, instead we will model the log-odds.
• This method does depend on the categories chosen but does have advantages to
treating the predictor as continuous as we will see – however this type of model
would not be a good basis as it would not handle continuous predictors directly.
On the bottom right we have the logistic function. Which will be our choice. The basic
function is e to the x divided by the quantity 1 + e to the x.
• We can see that this function is constrained to be between 0 and 1.
• It allows for a smooth curve to represent the probability which is non-linear in the
way that many risks work.
• For low values there is low risk then at a certain point there is an increase in risk
that eventually levels out.
• The shape of this is flexible and can go in the reverse direction
7
Here are a few graphs showing some of the possible variations in logistic functions.
The change can be very slow as in the first graph or more rapid occurring over a small
range of x values. The location of the change may also be shifted along the x-axis as
needed.
This produces a very flexible model which legitimately constrains the resulting
probabilities to be between zero and one.
It will also have the important benefit that a specific transformation is linear in X which
will allow us to partly use our knowledge from linear regression when conducting
logistic regression.
8
Using the logistic model does imply the strong assumption that the logistic function
represents the probability function accurately.
• To avoid having to carry a lot of exponents, we will use the notation exp(A) to
represent e to the A. Using this notation, we can write our logistic model for our
probabilities as
• P(X) = exp(beta_0 + beta_1 times X) divided by the quantity 1 + exp(beta_0 +
beta_1 times X)
• Notice the denominator is simply 1 + the numerator which can be helpful when
performing by-hand calculations.
• We won’t go through the algebra but it can be shown that this is equivalent to the
equation below called the logit function. I rarely use the far left notation but instead
use the center expression LOG of the quantity P(X) divided by (1 – P(X))
• P(X) divided by 1-P(X) is the odds of the event and thus the left side is the LOGODDS of the event.
• The useful part is the right side which says that the LOG-ODDS is LINEAR in X.
This right side is similar to our linear regression model except here we do not have
an ERROR TERM, even in our theoretical model.
• We will see that the exponentiated regression coefficients have the nice
interpretation as odds ratios. The regression coefficients themselves are LOG
ODDS RATIOS.
• Notice that this model also says that the odds of the event occurring is EXP(beta_0
+ beta_1 times X).
We will see that this model is a multiplicative risks model.
9
The logistic model makes the following assumptions about the outcome Y.
•
The outcome Y follows a Bernoulli distribution
•
A Bernoulli distribution is the same as a Binomial distribution with 1 trial
(where n = 1).
•
The mean of Y is given by the logistic function.
•
The observations are independent
Notice that we can still use the notation E[Y|X] since the mean of a binary 0/1 variable
is the probability that 1 occurs.
10
Compared to linear regression model:
• There is no assumption of constant variance of Y over the range of X values, in fact
the variance of Y given X is known to have a specific relationship based upon the
assumption of a Bernoulli outcome – it is P(X) times (1- P(X)) and so is clearly not
constant over the range of X values.
• Also the random aspect is not included as an additive term in the regression
equation, however, it is still an integral part of estimation and inference.
Now we will go through an example for each of the three possible types of predictors
and discuss how to interpret the results.
11