: Pertemua 19 Regresi Linier 1 Outline Materi : Koefisien korelasi dan determinasi Persamaan regresi Regresi dan peramalan 2 Simple Correllation and Linear Regression • Types of Regression Models • Determining the Simple Linear Regression Equation • Measures of Variation • Assumptions of Regression and Correlation • Residual Analysis • Measuring Autocorrelation • Inferences about the Slope 3 Simple Correlation and… • Correlation - Measuring the Strength (continued) of the Association • Estimation of Mean Values and Prediction of Individual Values • Pitfalls in Regression and Ethical Issues 4 Purpose of Regression Analysis • Regression Analysis is Used Primarily to Model Causality and Provide Prediction – Predict the values of a dependent (response) variable based on values of at least one independent (explanatory) variable – Explain the effect of the independent variables on the dependent variable 5 Types of Regression Models Positive Linear Relationship Negative Linear Relationship Relationship NOT Linear No Relationship 6 Simple Linear Regression Model • Relationship between Variables is Described by a Linear Function • The Change of One Variable Causes the Other Variable to Change • A Dependency of One Variable on the Other 7 Simple Linear Regression Model (continued) Population regression line is a straight line that describes the dependence of the average value (conditional mean) of one variable on the other Population Slope Coefficient Population Y Intercept Dependent (Response) Variable Random Error Yi X i i Population Regression Y |X Line (Conditional Mean) Independent (Explanatory) Variable 8 Simple Linear Regression Model (continued) Y (Observed Value of Y) = Yi X i i i = Random Error Y | X X i (Conditional Mean) X Observed Value of Y 9 Linear Regression Equation Sample regression line provides an estimate of the population regression line as well as a predicted value of Y Sample Y Intercept Yi b0 b1 X i ei Ŷ b0 b1 X Sample Slope Coefficient Residual Simple Regression Equation (Fitted Regression Line, Predicted Value) 10 Linear Regression Equation • (continued) b0 and b1 are obtained by finding the values of b0 and b1 that minimize the sum of the squared residuals n i 1 • • b0 Yi Yˆi e 2 n i 1 2 i provides an estimate of b1 provides an estimate of 11 Linear Regression Equation (continued) Yi b0 b1 X i ei Y ei Yi X i i b1 i Y | X X i b0 Observed Value Yˆi b0 b1 X i X 12 Interpretation of the Slope and Intercept • E Y | X 0 is the average value of Y when the value of X is zero change in E Y | X 1 change in X • measures the change in the average value of Y as a result of a one-unit change in X 13 Interpretation of the Slope and Intercept (continued) • b Eˆ Y | X 0 is the estimated average value of Y when the value of X is zero change in Eˆ Y | X • b1 is the estimated change in X change in the average value of Y as a result of a one-unit change in X 14 Simple Linear Regression: Example You wish to examine the linear dependency of the annual sales of produce stores on their sizes in square footage. Sample data for 7 stores were obtained. Find the equation of the straight line that fits the data best. Store Square Feet Annual Sales ($1000) 1 2 3 4 5 6 7 1,726 1,542 2,816 5,555 1,292 2,208 1,313 3,681 3,395 6,653 9,543 3,318 5,563 3,760 15 Scatter Diagram: Example Annua l Sa le s ($000) 12000 10000 8000 6000 4000 2000 0 0 1000 2000 3000 4000 5000 6000 S q u a re F e e t Excel Output 16 Simple Linear Regression Equation: Example Yˆi b0 b1 X i 1636.415 1.487 X i From Excel Printout: C o e ffi c i e n ts I n te r c e p t 1 6 3 6 .4 1 4 7 2 6 X V a ria b le 1 1 .4 8 6 6 3 3 6 5 7 17 Annua l Sa le s ($000) Graph of the Simple Linear Regression Equation: Example 12000 10000 8000 6000 4000 2000 0 0 1000 2000 3000 4000 5000 6000 S q u a re F e e t 18 Interpretation of Results: Example Yˆi 1636.415 1.487 X i The slope of 1.487 means that for each increase of one unit in X, we predict the average of Y to increase by an estimated 1.487 units. The equation estimates that for each increase of 1 square foot in the size of the store, the expected annual sales are predicted to increase by $1487. 19 Simple Linear Regression in PHStat • In Excel, use PHStat | Regression | Simple Linear Regression … • Excel Spreadsheet of Regression Sales on Footage 20 Measures of Variation: The Sum of Squares (continued) • SST = Total Sum of Squares – Measures the variation of the Yi values around their mean, Y • SSR = Regression Sum of Squares – Explained variation attributable to the relationship between X and Y • SSE = Error Sum of Squares – Variation attributable to factors other than the relationship between X and Y 21 The Coefficient of Determination • SSR Regression Sum of Squares r SST Total Sum of Squares 2 • Measures the proportion of variation in Y that is explained by the independent variable X in the regression model 22 Venn Diagrams and Explanatory Power of Regression r 2 Sales Sizes SSR SSR SSE 23 Coefficients of Determination (r 2) and Correlation (r) Y r2 = 1, r = +1 ^ Y =b i 0 Y r2 = 1, r = -1 ^=b +b X Y i 0 1 i + b1 X i X Y r2 = .81,r = +0.9 X Y ^=b +b X Y i 0 1 i X r2 = 0, r = 0 ^=b +b X Y i 0 1 i X 24 Standard Error of Estimate n • SYX SSE n2 i 1 Y Yˆi 2 n2 • Measures the standard deviation (variation) of the Y values around the regression equation 25 Measures of Variation: Produce Store Example Excel Output for Produce Stores R e g r e ssi o n S ta ti sti c s M u lt ip le R R S q u a re 0 .9 4 1 9 8 1 2 9 A d ju s t e d R S q u a re 0 .9 3 0 3 7 7 5 4 S t a n d a rd E rro r 6 1 1 .7 5 1 5 1 7 O b s e r va t i o n s r2 = .94 0 .9 7 0 5 5 7 2 n 7 94% of the variation in annual sales can be explained by the variability in the size of the store as measured by square footage. Syx 26 Linear Regression Assumptions • Normality – Y values are normally distributed for each X – Probability distribution of error is normal • Homoscedasticity (Constant Variance) • Independence of Errors 27 Consequences of Violation of the Assumptions • Violation of the Assumptions – Non-normality (error not normally distributed) – Heteroscedasticity (variance not constant) • Usually happens in cross-sectional data – Autocorrelation (errors are not independent) • Usually happens in time-series data • Consequences of Any Violation of the Assumptions – Predictions and estimations obtained from the sample regression line will not be accurate – Hypothesis testing results will not be reliable • It is Important to Verify the Assumptions 28 Variation of Errors Around the Regression Line f(e) • Y values are normally distributed around the regression line. • For each X value, the “spread” or variance around the regression line is the same. Y X2 X1 X Sample Regression Line 29 Purpose of Correlation Analysis (continued) • Sample Correlation Coefficient r is an Estimate of and is Used to Measure the Strength of the Linear Relationship in the Sample Observations n r X i 1 n X i 1 i i X Yi Y X 2 n Y Y 2 i i 1 30 Features of and r • Unit Free • Range between -1 and 1 • The Closer to -1, the Stronger the Negative Linear Relationship • The Closer to 1, the Stronger the Positive Linear Relationship • The Closer to 0, the Weaker the Linear Relationship 31 Pitfalls of Regression Analysis • Lacking an Awareness of the Assumptions Underlining Least-Squares Regression • Not Knowing How to Evaluate the Assumptions • Not Knowing What the Alternatives to Least-Squares Regression are if a Particular Assumption is Violated • Using a Regression Model Without Knowledge of the Subject Matter 32 Strategy for Avoiding the Pitfalls of Regression • Start with a scatter plot of X on Y to observe possible relationship • Perform residual analysis to check the assumptions • Use a histogram, stem-and-leaf display, box-and-whisker plot, or normal probability plot of the residuals to uncover possible non-normality 33 Strategy for Avoiding the Pitfalls of Regression (continued) • If there is violation of any assumption, use alternative methods (e.g., least absolute deviation regression or least median of squares regression) to least-squares regression or alternative least-squares models (e.g., curvilinear or multiple regression) • If there is no evidence of assumption violation, then test for the significance of the regression coefficients and construct confidence intervals and prediction intervals 34 Chapter Summary • Introduced Types of Regression Models • Discussed Determining the Simple Linear Regression Equation • Described Measures of Variation • Addressed Assumptions of Regression and Correlation • Discussed Residual Analysis • Addressed Measuring Autocorrelation 35 Chapter Summary (continued) • Described Inference about the Slope • Discussed Correlation - Measuring the Strength of the Association • Addressed Estimation of Mean Values and Prediction of Individual Values • Discussed Pitfalls in Regression and Ethical Issues 36
© Copyright 2026 Paperzz