Regression Models - Introduction • In regression models, two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent variable, X, also called predictor variable or explanatory variable. It is sometimes modeled as random and sometimes it has fixed value for each observation. • In regression models we are fitting a statistical model to data. • We generally use regression to be able to predict the value of one variable given the value of others. STA302/1001 week 1 1 Simple Linear Regression - Introduction • Simple linear regression studies the relationship between a quantitative response variable Y, and a single explanatory variable X. • Idea of statistical model: Actual observed value of Y = … • Box (a well know statistician) claim: “All models are wrong, some are useful”. ‘Useful’ means that they describe the data well and can be used for predictions and inferences. • Recall: parameters are constants in a statistical model which we usually don’t know but will use data to estimate. STA302/1001 week 1 2 Simple Linear Regression Models • The statistical model for simple linear regression is a straight line model of the form Y 0 1 X where… • For particular points, Yi 0 1 X i i , i 1, ..., n • We expect that different values of X will produce different mean response. In particular we have that for each value of X, the possible values of Y follow a distribution whose mean is 0 1 X • Formally it means that …. STA302/1001 week 1 3 Estimation – Least Square Method • Estimates of the unknown parameters β0 and β1 based on our observed data are usually denoted by b0 and b1. • For each observed value xi of X the fitted value of Y is yˆ i b0 b1 xi . This is an equation of a straight line. • The deviations from the line in vertical direction are the errors in prediction of Y and are called “residuals”. They are defined as ei yi yˆ i . • The estimates b0 and b1 are found by the Method of Lease Squares which is based on minimizing sum of squares of residuals. • Note, the least-squares estimates are found without making any statistical assumptions about the data. STA302/1001 week 1 4 Derivation of Least-Squares Estimates • Let n RSS yi b0 b1 xi 2 i 1 • We want to find b0 and b1 that minimize RSS. • Use calculus…. STA302/1001 week 1 5 Properties of Fitted Line ei 0 ei2 is a minimum ei xi 0 ŷi yi ei yˆ i 0 Note: you need to know how to prove the above properties. STA302/1001 week 1 6 Statistical Assumptions for SLR Recall, the simple linear regression model is Yi = β0 + β1Xi + εi where i = 1, …, n. The assumptions for the simple linear regression model are: 1) E(εi)=0 2) Var(εi) = σ2 3) εi’s are uncorrelated. • These assumptions are also called Gauss-Markov conditions. • The above assumptions can be stated in terms of Y’s… STA302/1001 week 1 7 Gauss-Markov Theorem • The least-squares estimates are BLUE (Best Linear, Unbiased Estimators). • The least-squares estimates are linear in y’s… • Of all the possible linear, unbiased estimators of β0 and β1 the least squares estimates have the smallest variance. STA302/1001 week 1 8 Properties of Least Squares Estimates • Estimate of β0 and β1 – functions of data that can be calculated numerically for a given data set. • Estimator of β0 and β1 – functions of the underlying random variables. • Recall: the least-square estimators are… • Claim: The least squares estimators are unbiased estimators for β0 and β1. • Proof:… STA302/1001 week 1 9 Estimation of Error Term Variance σ2 • The variance σ2 of the error terms εi’s needs to be estimated to obtain indication of the variability of the probability distribution of Y. • Further, a variety of inferences concerning the regression function and the prediction of Y require an estimate of σ2. • Recall, for random variable Z the estimates of the mean and variance of Z based on n realization of Z are…. • Similarly, the estimate of σ2 is 1 n 2 s ei n 2 i 1 2 • S2 is called the MSE – Mean Square Error it is an unbiased estimator of σ2 (proof later on). STA302/1001 week 1 10 Normal Error Regression Model • In order to make inference we need one more assumption about εi’s. • We assume that εi’s have a Normal distribution, that is εi ~ N(0, σ2). • The Normality assumption implies that the errors εi’s are independent (since they are uncorrelated). • Under the Normality assumption of the errors, the least squares estimates of β0 and β1 are equivalent to their maximum likelihood estimators. • This results in additional nice properties of MLE’s: they are consistent, sufficient and MVUE. STA302/1001 week 1 11
© Copyright 2026 Paperzz