Nonlinear relationships

Mathematics 241
Nonlinear models
November 15
Goals for the day:
1. Words:
2. R:
transformation, non-linear least squares
lm, nls
3. Big idea:
There are two ways to fit a non-linear relationship
Transformations
If x is a quantitative variable, it is often useful to re-express or transform x by applying a function x0 = f (x). In
statistics, we do this for one of two reasons:
1. to transform a variable with a skew distribution to one with a distribution that is more symmetric.
Example: use the counties dataset and construct a histogram of the logs of the populations of the counties.
2. to transform a nonlinear relationship between two variables x and y into a relationship that is closer to linear.
Tukey’s ladder of powers and Tukey’s bulge
The ladder of powers and the “bulge” rule for using them is due to Tukey. The following diagram is from Devore and
Farnum, Applied Statistics for Engineers and Scientists.
Page 2
Erosion
In very arid parts of the world, soil erosion is a significant problem and it is greatly affected by wind. The datafrace
soil at /home/stob/ has measurements taken in a certain sandy plain in India. The wind velocity in miles per
hour (wind) and the amount of soil erosion in kilograms per day (erosion) were recorded. We would like a model
that predicts erosion from wind velocity. Find a transformation of one or both of the variables that linearizes the
relationship. To do this, you can start by plotting one variable against the other. When a linear relationship is
found, use lm and then examine the residuals.
Hints on using xyplot.
The following will plot various transformed variables:
>
>
>
>
xyplot(y~x,data=d)
xyplot(log(y)~x,data=d)
xyplot(y^2~x,data=d)
xyplot(y~log(x),data=d)
However the following does not work:
> xyplot(y~x^2,data=d)
This is because in formula notation, arithmetic operators on the right hand side have special meanings. To repair
that, it is necessary to use the I operator:
> xyplot(y~I(x^2),data=d)
After plotting a transformation and seeing an approximately linear relationship, we can use lm with the very same
notation to find the slope and intercept of the transformed relationship.
Write an equation that linearizes the relationship of erosion to wind velocity. That is, write an equation of the form
y 0 = β̂0 + β̂1 x0 where x0 and y 0 are transformations of x and y. What is R2 for your relationship? (Hint: you can
find R2 from summary(l) where l is the result of lm.)
Page 3
Michaelis-Menten Equation
Michaelis and Menten developed an equation that relates the rate of a chemical reaction to the concentration of a
substrate. The equation is
rate = Vmax
conc
K + conc
Here Vmax and K are constants that characterize the particular reaction.
The R dataframe Puromycin has the data on the reaction rate (rate) of a particular enzyme (in counts/min/min)
as a function of the concentration conc of the substrate (in ppm). There are two different cases (state) as the cells
were either treated with Puromycin or not. Of course the goal of the experiment was to see how the constants Vmax
and K change when the cells are treated.
In this case, we know the (nonlinear) relationship between y and x so we need to do algebra to find the linearization.
Simplifying the names of the variables and the constants, we have
y = b0
x
b1 + x
Find the appropriate transformations of x and y so that this relationship is of the form
y 0 = β0 + β1 x 0
Since we have two different states, we need to know how to use lm with only a subset of the data. Notice the subset
we need in each case is defined by the categorical variable state. The following does a linear model (inappropriate
here of course) for the subset of observations in which the state is treated.
> lm(rate~conc,data=Puromycin,subset=(state=='treated'))
Write the Michaelson-Menten relationship predicted by these data for both the treated and untreated case.
Treated: b0
b1
Untreated: b0
b1
Non-linear least squares
Instead of linearizing the relationship above, we could attempt to find b0 and b1 by minimizing the sums of squares
of residuals. This involves a non-linear optimization which is implemented in R function nls.
For example:
> nls(y~b0*sin(x)+b1*cos(x), start=list(b0=1,b1=2),data=d)
will fit the function y = b0 sin x + b1 cos x to the data in dataframe d using a starting guess of b0 = 1 and b1 = 2. A
starting guess is necessary for a nonlinear equation solver.
Write the Michaelson-Menten relationship predicted by these data for both the treated and untreated case using nls.
Treated: b0
b1
Untreated: b0
b1
Mathematics 241
Nonlinear models
November 15
Pollution
It is reasonable to think that the closer to the road a plot of land is, the greater the concentration of lead in that
soil. The dataframe pollution at /home/stob has data on measurements taken next to a certain interstate highway
in Iowa. The variables are the distance from the highway in meters (dist) and the lead content of the soil in ppm
(lead). We would like to develop a model that predicts lead content from the distance. Write a model that does
this.

Download Report

Nonlinear relationships

Paperzz.com

Your Paperzz