ST430 Introduction to Regression Analysis ST430: Introduction to Regression Analysis, Ch3, Sec 3.6-3.8 Luo Xiao September 2, 2015 1 / 25 ST430 Introduction to Regression Analysis Simple Linear Regression 2 / 25 Simple Linear Regression ST430 Introduction to Regression Analysis Acknowledgement 1 This template for Rmarkdown is adapted from the latex template designed by Professor Peter Bloomfield. 2 Some notes are also adapted from Professor Peter Bloomfield’s lecture notes. 3 / 25 Simple Linear Regression ST430 Introduction to Regression Analysis Linear regression in R: Advertising and Sales example x = c(1,2,3,4,5) # x is a vector of 5 scalars y = c(1,1,2,2,4) # y is a vector of 5 scalars fit = lm(y~x) # lm() is the linear regression function summary(fit) # summary of the linear regression Try the code yourself to get the R output! 4 / 25 Simple Linear Regression ST430 Introduction to Regression Analysis Confidence interval in R: Use R function: confint() R output ## 2.5 % 97.5 % ## (Intercept) -2.12112485 1.921125 ## x 0.09060793 1.309392 5 / 25 Simple Linear Regression ST430 Introduction to Regression Analysis Compare the confidence interval and the hypothesis test Note that we reject H0 if and only if the corresponding confidence interval does not include 0. 6 / 25 Simple Linear Regression ST430 Introduction to Regression Analysis ANOVA table Use R function: anova() R output: ## ## ## ## ## ## ## ## Analysis of Variance Table Response: y Df Sum Sq Mean Sq F value Pr(>F) x 1 4.9 4.9000 13.364 0.03535 * Residuals 3 1.1 0.3667 --Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' 7 / 25 Simple Linear Regression ST430 Introduction to Regression Analysis How to read data in R? Standard data format for R: "xx.Rdata" or "xx.RData" Two simple ways: 1 Open the file directly in R or RStudio 2 Load the data by R function "load("filename")" (Notes: needs to put data in the working directory) Other file formats such as ".txt" and ".xlsx" can be also be read into R using special R functions. 8 / 25 Simple Linear Regression ST430 Introduction to Regression Analysis Read data in R: example setwd("~/Dropbox/teaching/2015Fall/R_datasets/Exercises&Exampl load("ADSALES.Rdata") ADSALES #name of the loaded data ## ## ## ## ## ## 1 2 3 4 5 ADVEXP_X SALES_Y 1 1 2 1 3 2 4 2 5 4 The loaded data is a data frame 9 / 25 Simple Linear Regression ST430 Introduction to Regression Analysis What is a data frame in R? a matrix-like structure each column may have a name each column can be of different types (numeric/factor/logical/character) 10 / 25 Simple Linear Regression ST430 Introduction to Regression Analysis Example: a data frame in R # vectors of different types name = c("John","Ann","Tom") #character sex = c("Male","Female","Male") #character height = c(5.9,5.3,5.7) #numeric # create a data frame here data = data.frame(names=name, sex = sex, height=height) names(data) # display column names of the data frame data #display the data frame 11 / 25 Simple Linear Regression ST430 Introduction to Regression Analysis Example: a data frame in R ## [1] "name" "sex" "height" ## name sex height ## 1 John Male 5.9 ## 2 Ann Female 5.3 ## 3 Tom Male 5.7 12 / 25 Simple Linear Regression ST430 Introduction to Regression Analysis Working with data frames in R setwd("~/Dropbox/teaching/2015Fall/R_datasets/Exercises&Exampl load("ADSALES.Rdata") x = ADSALES$ADVEXP_X # x is the advertising expenditure y = ADSALES$SALES_Y # y is the sales revenue 13 / 25 Simple Linear Regression ST430 Introduction to Regression Analysis Useful measures of linear regression Coefficient of correlation Coefficient of determination 14 / 25 Simple Linear Regression ST430 Introduction to Regression Analysis Coefficient of correlation The regression equation Y = β0 + β1 X + shows the linear relationship between X and Y . The correlation coefficient r shows the strength of that relationship. r always lies between -1 and +1; 15 / 25 Simple Linear Regression ST430 Introduction to Regression Analysis 16 / 25 Simple Linear Regression ST430 Introduction to Regression Analysis 17 / 25 Simple Linear Regression ST430 Introduction to Regression Analysis 18 / 25 Simple Linear Regression ST430 Introduction to Regression Analysis Calculate r directly as (xi − x̄ )(yi − ȳ ) SSxy . r = qP =p P SSxx SSyy (xi − x̄ )2 (yi − ȳ )2 P Note that β̂1 = SSxy SSxx Hence, calculate r from β̂1 as s r = β̂1 × SSxx . SSyy Note that r always has the same sign as β̂1 . 19 / 25 Simple Linear Regression ST430 Introduction to Regression Analysis Calculation of correlation in R Use R function: cor() x = c(1,2,3,4,5) # x is a vector of 5 scalars y = c(1,1,2,2,4) # y is a vector of 5 scalars cor(x,y) # cor() calculates correlation ## [1] 0.9036961 20 / 25 Simple Linear Regression ST430 Introduction to Regression Analysis Correlation and causation Not the same thing! A 1999 article in the journal Nature found “a strong association between myopia and night-time ambient light exposure during sleep in children before they reach two years of age”. The article noted that no causal link was established, but continued “it seems prudent that infants and young children sleep at night without artificial lighting in the bedroom”. Much anguish for parents of myopic children! 21 / 25 Simple Linear Regression ST430 Introduction to Regression Analysis Later studies found that myopic parents tend to leave the light on, and also tend to have myopic children. One study, in particular, found that “the proportion of myopic children in those subjected to a range of nursery-lighting conditions is remarkably uniform”. This suggests that the association observed in the first study resulted from parental behavior and inheritance, not from a causal effect of night-time lighting. The moral: “Correlation does not imply causation”. 22 / 25 Simple Linear Regression ST430 Introduction to Regression Analysis Coefficient of determination The coefficient of determination R 2 also measures the strength of the relationship between x and y . With only one independent variable, R 2 = r 2 . When we have more than one independent variable, R 2 measures the strength of the relationship of y to all of them. The correlation coefficient r is always between pairs of individual variables. 23 / 25 Simple Linear Regression ST430 Introduction to Regression Analysis We interpret R 2 as the fraction of the variance of y that is “explained” by the regression. The definition is SSyy − SSE (yi − ŷi )2 R = =1− P . (yi − ȳ )2 SSyy 2 P If the regression is strong, we expect ŷi to be a good predictor of yi , so SSE < SSyy , whence the ratio is small and R 2 is close to 1. Conversely, if the regression is weak, ŷi is not much better than ȳ as a predictor of yi , so the ratio is close to 1 and R 2 is close to 0. 24 / 25 Simple Linear Regression ST430 Introduction to Regression Analysis Find R 2 in R output x = c(1,2,3,4,5) # x is a vector of 5 scalars y = c(1,1,2,2,4) # y is a vector of 5 scalars fit = lm(y~x) # lm() is the linear regression function summary(fit) # summary of the linear regression Try the above code in R! 25 / 25 Simple Linear Regression
© Copyright 2025 Paperzz