Mathematics 241 Nonlinear models November 15 Goals for the day: 1. Words: 2. R: transformation, non-linear least squares lm, nls 3. Big idea: There are two ways to fit a non-linear relationship Transformations If x is a quantitative variable, it is often useful to re-express or transform x by applying a function x0 = f (x). In statistics, we do this for one of two reasons: 1. to transform a variable with a skew distribution to one with a distribution that is more symmetric. Example: use the counties dataset and construct a histogram of the logs of the populations of the counties. 2. to transform a nonlinear relationship between two variables x and y into a relationship that is closer to linear. Tukey’s ladder of powers and Tukey’s bulge The ladder of powers and the “bulge” rule for using them is due to Tukey. The following diagram is from Devore and Farnum, Applied Statistics for Engineers and Scientists. Page 2 Erosion In very arid parts of the world, soil erosion is a significant problem and it is greatly affected by wind. The datafrace soil at /home/stob/ has measurements taken in a certain sandy plain in India. The wind velocity in miles per hour (wind) and the amount of soil erosion in kilograms per day (erosion) were recorded. We would like a model that predicts erosion from wind velocity. Find a transformation of one or both of the variables that linearizes the relationship. To do this, you can start by plotting one variable against the other. When a linear relationship is found, use lm and then examine the residuals. Hints on using xyplot. The following will plot various transformed variables: > > > > xyplot(y~x,data=d) xyplot(log(y)~x,data=d) xyplot(y^2~x,data=d) xyplot(y~log(x),data=d) However the following does not work: > xyplot(y~x^2,data=d) This is because in formula notation, arithmetic operators on the right hand side have special meanings. To repair that, it is necessary to use the I operator: > xyplot(y~I(x^2),data=d) After plotting a transformation and seeing an approximately linear relationship, we can use lm with the very same notation to find the slope and intercept of the transformed relationship. Write an equation that linearizes the relationship of erosion to wind velocity. That is, write an equation of the form y 0 = β̂0 + β̂1 x0 where x0 and y 0 are transformations of x and y. What is R2 for your relationship? (Hint: you can find R2 from summary(l) where l is the result of lm.) Page 3 Michaelis-Menten Equation Michaelis and Menten developed an equation that relates the rate of a chemical reaction to the concentration of a substrate. The equation is rate = Vmax conc K + conc Here Vmax and K are constants that characterize the particular reaction. The R dataframe Puromycin has the data on the reaction rate (rate) of a particular enzyme (in counts/min/min) as a function of the concentration conc of the substrate (in ppm). There are two different cases (state) as the cells were either treated with Puromycin or not. Of course the goal of the experiment was to see how the constants Vmax and K change when the cells are treated. In this case, we know the (nonlinear) relationship between y and x so we need to do algebra to find the linearization. Simplifying the names of the variables and the constants, we have y = b0 x b1 + x Find the appropriate transformations of x and y so that this relationship is of the form y 0 = β0 + β1 x 0 Since we have two different states, we need to know how to use lm with only a subset of the data. Notice the subset we need in each case is defined by the categorical variable state. The following does a linear model (inappropriate here of course) for the subset of observations in which the state is treated. > lm(rate~conc,data=Puromycin,subset=(state=='treated')) Write the Michaelson-Menten relationship predicted by these data for both the treated and untreated case. Treated: b0 b1 Untreated: b0 b1 Non-linear least squares Instead of linearizing the relationship above, we could attempt to find b0 and b1 by minimizing the sums of squares of residuals. This involves a non-linear optimization which is implemented in R function nls. For example: > nls(y~b0*sin(x)+b1*cos(x), start=list(b0=1,b1=2),data=d) will fit the function y = b0 sin x + b1 cos x to the data in dataframe d using a starting guess of b0 = 1 and b1 = 2. A starting guess is necessary for a nonlinear equation solver. Write the Michaelson-Menten relationship predicted by these data for both the treated and untreated case using nls. Treated: b0 b1 Untreated: b0 b1 Mathematics 241 Nonlinear models November 15 Pollution It is reasonable to think that the closer to the road a plot of land is, the greater the concentration of lead in that soil. The dataframe pollution at /home/stob has data on measurements taken next to a certain interstate highway in Iowa. The variables are the distance from the highway in meters (dist) and the lead content of the soil in ppm (lead). We would like to develop a model that predicts lead content from the distance. Write a model that does this.
© Copyright 2026 Paperzz