ST430 Introduction to Regression Analysis ST430: Introduction to Regression Analysis, Chapter 5, Sections 5.1-5.6 Luo Xiao October 14, 2015 1 / 22 ST430 Introduction to Regression Analysis Model Building 2 / 22 Model Building ST430 Introduction to Regression Analysis Model building means finding a model that will: provide a good fit to a set of data; give good estimates of E (Y |X1 , X2 , . . . , Xk ); give good predictions of Y . In some situations, such a model may not exist! Finding the best (or least bad) model may still provide insights into the problem at hand. 3 / 22 Model Building ST430 Introduction to Regression Analysis Using background information In most situations, you can use knowledge of the context to guide model-building. Example: summertime daily peak electricity demand and temperature: Example 5.2 in textbook; Y : Peak load (unit: megawatts); X : Temperature (unit: Fahrenheit degree); Scatter plot on next slide. Temperature-dependent component of demand is largely for air-conditioning. 4 / 22 Model Building ST430 Introduction to Regression Analysis 140 120 100 LOAD 160 180 Scatter plot of electricity data 70 80 90 100 TEMP 5 / 22 Model Building ST430 Introduction to Regression Analysis “Cooling degree days”: ( degree days = T − T0 0 if T > T0 , the “base temperature” if T ≤ T0 . Simple model: E (Y ) = temperature-independent load + degree day load = β0 + β1 max(T − T0 , 0). Try T0 = 80. R code (Output in “output1.txt”): T0 = 82 fit = lm(LOAD~pmax(TEMP-T0,0),data=POWERLOADS) summary(fit) 6 / 22 Model Building ST430 Introduction to Regression Analysis 140 120 100 LOAD 160 180 Model fit with T0 = 80 70 80 90 100 TEMP 7 / 22 Model Building ST430 Introduction to Regression Analysis What is the best T0 ? 0.9 0.7 0.5 Adjusted R−squared We search over all possible T0 and find the one that gives the highest Ra2 . 70 75 80 85 90 95 100 T0 8 / 22 Model Building ST430 Introduction to Regression Analysis 140 120 100 LOAD 160 180 Model fit with T0 = 82 70 80 90 100 TEMP 9 / 22 Model Building ST430 Introduction to Regression Analysis A different, smoother, model (but linear in the parameters): E (Y ) = β0 + β1 T + β2 T 2 R code for fitting the model: fit = lm(LOAD~TEMP + I(TEMP^2),data=POWERLOADS) Output in “output2.txt”. Note improvement in Ra2 . 10 / 22 Model Building ST430 Introduction to Regression Analysis 140 120 100 LOAD 160 180 Model fit with the quadratic model 70 80 90 100 TEMP 11 / 22 Model Building ST430 Introduction to Regression Analysis Models with two (or more) quantitative variables First order model: E (Y ) = β0 + β1 X1 + β2 X2 . Second order models: Interaction model: E (Y ) = β0 + β1 X1 + β2 X2 + β3 X1 X2 ; Complete second order model: E (Y ) = β0 + β1 X1 + β2 X2 + β3 X1 X2 + β4 X12 + β5 X22 . 12 / 22 Model Building ST430 Introduction to Regression Analysis Which to use? Consider the response surface: a graph of E (Y ) against X1 and X2 : For the first order model, the response surface is a plane. For second order models, the response surface is quadratic; it may be bowl-shaped, or an inverted bowl, or have a saddle point. Note that (X1 + X2 )2 − (X1 − X2 )2 4 so the response surface for the interaction model is always saddle-shaped. X1 X2 = 13 / 22 Model Building ST430 Introduction to Regression Analysis Product quality data Many products are produced using chemicals. The quality of products depend on temperature and pressure at which the chemical reactions happen. Example 5.3 in textbook Y : quality (percentage) X1 : temperature X2 : pressure Visualization Scatter plots (next slide) 3-dimensional plots (see R code in file "3d_rgl.R") 14 / 22 Model Building ST430 Introduction to Regression Analysis 90 80 70 40 50 60 QUALITY 80 70 60 50 40 QUALITY 90 Scatter plots 80 85 90 TEMP 15 / 22 95 100 50 52 54 56 PRESSURE Model Building 58 60 ST430 Introduction to Regression Analysis Model fit with the complete 2nd order model Model: E (Y ) = β0 + β1 X1 + β2 X2 + β3 X1 X2 + β4 X12 + β5 X22 R code: setwd("/Users/xiaoyuesixi/Dropbox/teaching/2015Fall/R_datasets load("PRODQUAL.Rdata")# load in data fit = lm(QUALITY~TEMP*PRESSURE + I(TEMP^2) + I(PRESSURE^2), data = PRODQUAL) summary(fit) 16 / 22 Model Building ST430 Introduction to Regression Analysis Fitted response surface R code: X1 <- seq(80, 100, length = 40) X2 <- seq(50, 60, length = 40) X <- expand.grid(TEMP = X1, PRESSURE = X2) yhat <- predict(fit,newdata = X) library(rgl) # only work in R but not in RStudio plot3d(X$TEMP,X$PRESSURE,yhat, pch=20,col="blue") 17 / 22 Model Building ST430 Introduction to Regression Analysis Coded variables Some variables can be represented on different scales. E.g., temperature in degrees Celsius or Fahrenheit. Suppose some response Y is modeled as a linear function of temperature: E (Y ) = β0 + β1 X , with X = temperature in degrees Fahrenheit. 18 / 22 Model Building ST430 Introduction to Regression Analysis If X ∗ = temperature in degrees Celsius, then X = 32 + 1.8X ∗ . So E (Y ) = β0 + β1 (32 + 1.8X ∗ ) = (β0 + 32β1 ) + (1.8β1 ) X ∗ = β0∗ + β1∗ X ∗ , where β0∗ = β0 + 32β1 and β1∗ = 1.8β1 . 19 / 22 Model Building ST430 Introduction to Regression Analysis So if Y is linearly related to X , then it is also linearly related to X ∗ , with different coefficients β0∗ and β1∗ . Furthermore, the significance of effect of X remains the same. We sometimes code variables to make an equation more easily interpreted. When a variable takes only two distinct values, we often code them as −1 and +1. E.g., if X is temperature with levels 80◦ F and 100◦ F, and X ∗ = (x − 90)/10, then X ∗ = −1 when X = 80, and X ∗ = 1 when X = 100. 20 / 22 Model Building ST430 Introduction to Regression Analysis A variable with three levels can similarly be coded as −1, 0, and +1, provided the three levels are equally spaced. The interpretation of the corresponding coefficient β ∗ is, as always, the change in E (Y ) when X ∗ changes by 1, with all other variables fixed. But with a variable coded like this, a change of 1 in X ∗ means moving, say, from the midpoint value to the high value. The corresponding change in E (Y ) is often called the effect of the variable. 21 / 22 Model Building ST430 Introduction to Regression Analysis When a variable takes more than two or three values, it is sometimes standardized: Xi − X̄ . Xi∗ = ui = sx All coefficients are then in the units of Y , so they can be compared numerically. If Y is also standardized, the coefficients are dimensionless. These are called standardized regression coefficients, and are widely used in some fields. Despite what the text says, standardization has no effect on computational errors, with modern algorithms. 22 / 22 Model Building
© Copyright 2025 Paperzz