Nonlinear Regression

Nonlinear Regression
Probability and Statistics
Boris Gervits
1
Topics of Discussion
• Definition of NLR
• Picking a Regression Model
- Linear versus Nonlinear models
• Techniques
- Loss Functions
- Function Minimization Algorithm
- Regression methods
• Example
2
• NLR is a popular statistical tool used to
fitting data into a model and computing the
relationship among independent and
dependent variables
• General form of NLR model:
Yi = F(xi, Ө) + ei,
where i = 1, …, n is the number of measurement, Yi are
responses, xi is the vector (xi1, …, xik) of measurements
of k independent variables, Ө is the parameter vector
(Ө1, …, Өp), and ei are random errors, usually assumed
to have mean 0 and constant variance.
3
Linear versus Nonlinear Models
Linear model
y = a + b1*x1 + b2 *x2 + ... + bn*xn
Polynomial Regression
y = a + b1*x + b2*x2
The non-linearity of this model is expressed in the term x2.
However, the nature of the model is linear.
We are interested in finding the best fits for parameters, and while
doing estimation, we’d square the measure of x.
Making nonlinear models linear
Since linear regression is simpler and more straight-forward, it may
be preferable to nonlinear.
When a nonlinear model can be converted to a linear one?
4
Example
Consider the relationship between
a human's age from birth (the x
variable) and his or her growth
rate (the y variable).
Growth = Exp(-b*Age)
We can easily transfer this into a
linear model:
log(Growth) = -b*Age
Can we really do this without affecting the quality of the fit?
5
•
When we transformed from a nonlinear to linear model, We forgot about the
random error in the dependent variable. The growth rate is, of course,
affected by many variables other than age, so we should expect
considerable fluctuation:
Growth = Exp(-b*Age) + error
Additive Error Here, we assume that the error is independent of age. That
is, the error distribution is the same at any age. Because the error in the
original nonlinear equation is now additive, we cannot linearize this model
by taking the log on both sides.
Multiplicative Error However, the error variability is not likely to be constant
at all ages. There are greater fluctuations of the growth rate during the
earlier ages than the later ages, when growth eventually stops anyway.
Here is a more realistic model including error:
Growth = exp(-b*Age) * error
Now, if we take a log, the residual error becomes an additive factor in linear
equation:
Log (Growth) = -b*Age + error
6
NLR Techniques
Loss Functions
• Residual is the deviation of a particular point (observed
response) from the regression line (predicted response).
Residuals signify some loss in the accuracy of the
prediction. The goal of NLR is to minimize a loss
function.
• Least Squares is the most common loss function
– minimizing the sum of squared deviations of the observed values
for the dependent variable from those predicted by the model
7
• Other loss functions:
– Absolute deviations. This could be useful in order to deemphasize outliers. When least squares function is used, a
squared large residual will affect regression coefficients more.
– Weighted Least Squares
Ordinary least squares technique assumes that residual variance
is the same across all values of independent variables. That is,
the error variance is not dependent on the measurements. This
often fails to be the case.
Example: relationship between the projected cost of construction
project, and the actual cost. In this case, the absolute magnitude
of the error is proportional to the size of the project.
Here, it is appropriate to use weighted least squares technique.
The loss function would be:
Loss = (Obs-Pred)2 * (1/y2)
8
Function Minimization Algorithms
1. Start with an initial estimated value for each
parameter in the equation. Set a step size and
convergence criteria.
2. Generate the regression curve (get a list of
predicted response variables)
3. Calculate the loss function
4. Adjust the parameters to minimize loss
function (more on this)
5. Repeat steps 2-4 until adjustments satisfy
convergence criteria
6. Report the best-fit result
9
Nonlinear Regression Methods
(step 4 from above)
• When fitting just one parameter, no special
algorithm is needed. You can always use the
brute force method:
– calculate loss function for the initial value
– Move a step (assuming you picked the direction
correctly), calculate loss function again, compare with
the previous result
– If it’s better, get rid of the previous result and keep
moving in the same direction
– If it’s worse, we’ve gone too far. Check the
convergence criteria, if it’s not satisfied, reduce the
step and move in the opposite direction
10
•
•
•
•
If there is more than one parameter to fit, the number of possible
combinations of values is infinite.
We need an efficient way to find the best fit.
Some methods are based on calculating first and/or second order
derivatives to identify the slope and the slope of the slope of the function
and determine how fast and in which direction the slope is changing at any
point.
Others recalculate how much the sum of the squares changes if the values
of parameters are changed slightly.
•
Mathematica uses the following methods:
- Gradient (steepest decent) method
- Newton method
- QuaziNewton
- Levenberg / Marquardt
•
Simplex method takes n + 1 initial values, if n is the number of parameters
to fit.
Less likely to be confused by local minimum, but does not compute
standard error or confidence interval.
11
Example: Fitting Dose-Response
Curves
• Dose-response curves can
describe the results of
many kinds of experiments
• X axis plots concentration
of a drug or hormone
• Y axes plots response.
• Examples of meanings of Y
- change in heart rate
- contraction of a muscle
- secretion of a hormone
- membrane potential and
ion channels
12
Extract from lecture 2:
13
• Equation for a dose-response curve:
•Bottom
- base line response
•Top- maximum response
•EC50
- drug concentration that provokes a halfway response
•HillSlope
- the slope of the curve
•[Drug]
- drug dosage or concentration (independent variable)
Note: EC50 may not be the same as concentration that provokes a 50%
• A more commonly used equation:
14
Fitting Data with Nonlinear Regression
•
•
•
•
•
Choose your model
Decide which parameters to fit and which to constrain
Choose weighting scheme, if appropriate
Choose initial values
Perform the curve fit and interpret the best-fit parameter
values
- Does the curve go near your data?
- Are the best-fit parameter values acceptable? What is
the standard error and 95% confidence interval?
• Check the parameters’ correlation matrix
• Could the fit be a local minimum?
15
Troubleshooting Bad Fits
Poorly-defined parameters
• Problem: The response was measured at 5 concentrations between the bottom and
the top; the data was normalized to return response from 0% to 100%. NLR shows
very wide confidence intervals.
• Diagnosis: not enough data. Top and Bottom cannot be defined precisely.
• Solution: Since the data was normalized to run from 0 to 100, NLR does not have to
calculate these parameters.
Bad Initial Values
• Problem: We feed a bunch of data to an NLR model and get back an error message
“Does not converge”. The data and the equation look valid. What could be wrong?
• Diagnosis: try to plot the curve generated by the initial values, together with the data.
If it’s clearly off, no wonder it can’t converge.
• Solution: Change the initial values
Redundant Parameters
• Problem: Suppose you came up with a model Y = b1*(E^(-(b2 + b3)*b4). Possible
error message: “Bad model”. However, the model seems to be correct.
• Diagnosis: The model is ambiguous. B2 and b3 parameters cannot be strictly
defined by our data.
• Solution: Determine the value of either b2 or b3 from another experiment.
16