Bölüm 3: Doğrusal Gerileme (Regresyon) [email protected] Jump to first page 1. Gerilemenin (Regresyonun) Anlamı [email protected] Jump to first page Gerilemenin Anlamı Bağımlı ve bağımsız değişkenler arasındaki ilişkileri incelemektedir. Bağımlı değişkenin popülasyon ortalamasını, veri olan bağımsız değişkenlere dayanarak tahmin etmektedir. [email protected] Örn:Bir malın miktarı ile fiyatı arasında nasıl bir ilişki vardır? Örn: Kesin gelir düzeyi veri iken tüketim düzeyi ne olur? Jump to first page Gerilemenin Anlamı Ayrıca hipotezleri test eder: Örn:Tüketim ve gelir arasındaki kesin ilişki hakkında Gelir arttığında tüketimin ne kadar artacağını test eder. [email protected] Jump to first page 2. Gerileme (Regresyon)Örneği [email protected] Jump to first page Gerileme Örneği Toplam popülasyonunu 60 ailenin oluşturduğu bir ülkeyi varsayalım. Gelir ve tüketim arasındaki ilişkiyi inceleyelim. Bazı aileler aynı miktarda gelire sahip olsun. Haftalık geliri ($100, $120, $140, vb.) olacak şekilde gruplara ayıralım. [email protected] Jump to first page Gerileme Örneği Her grup içerisindeki,aile tüketim alışkanlık sınırları veri olsun. 6 aile 100$’lık gelire sahip olsun.Bunların harcamaları ise 65$, 70$,74$, 80$, 85$, 88$ olsun. Geliri (X) ve harcamaları (Y) bulun. Sonra her kategorinin içerisinde,kesin X koşulu üzerinde,Y dağılımına sahip olalım. [email protected] Jump to first page Gerileme Örneği Her dağılım için koşullu ortalamayı hesaplayalım: E(Y|(X=X i). E(Y|(X=X i)’i nasıl elde ettik ? Koşullu olasılık olan (1/6)’yı Y değeriyle çarptık ve toplamlarını aldık. • Bu değer bizim örneğimiz için 77’dir. Bu koşullu dağılımları her gelir seviyesi için çizebiliriz. [email protected] Jump to first page Gerileme Örneği Popülasyon regresyonu; bağımlı değişkenin koşullu ortalamalarını, açıklayıcı değişkenlerin sabit değerleri için birleştiren doğrudur. Formal olarak: E(Y|Xi) Bu popülasyon regresyon fonksiyonu Y’ye göre ortalamanın X ile nasıl değiştiğini açıklamaktadır. [email protected] Jump to first page Gerileme Örneği Bu fonksiyon hangi formu alır? Birçok olasılığa rağmen,onun doğrusal bir fonksiyon olduğunu kabul edelim: E(Y|Xi) = 1 + 2Xi 1 ve 2 regresyon katsayılarıdır. (kesişme ve eğim). Eğim ,X’teki veri bir değişme ile Y’nin ne kadar değişeceğini göstermektedir. Biz 1 ve 2’yi ,X ve Y’nin gerçek gözlemlerine dayanarak hesaplıyoruz. [email protected] Jump to first page 3. Doğrusallık [email protected] Jump to first page Doğrusallık Doğrusallık değişkenlerde veya parametrelerde olabilir. Değişkenlerdeki doğrusallık Y’nin koşullu beklentisi X –’in doğrusal bir fonksiyonudur. Regresyon düz doğrudur. Eğim sabittir. Değişken eğimli kare,karekök veya interaktif terimlilerle fonksiyon oluşturamaz. [email protected] Jump to first page Doğrusallık Biz parametrelerdeki doğrusallıkla ilgilenmekteyiz. Parametreler sadece birinci kuvvete kadar çıkabilirler. Değişkenler doğrusal olabilir yada olmayabilir. [email protected] Jump to first page Doğrusallık Parametrelerdeki doğrusallık Y’deki koşullu beklentiler parametrelerin doğrusal fonksiyonudur. Xs doğrusal olabilir yada olmayabilir. E(Y|Xi) = 1 + 2Xi doğrusaldır. E(Y|Xi) = 1 + 2Xi doğrusal değildir. ’lar birin kuvveti olarak görünüyorsa ve diğer parametreler de çarpılmıyor yada bölünmüyorsa doğrusaldır. [email protected] Jump to first page 4. Stokastik Hata [email protected] Jump to first page Stokastik Hata Tek tek değerler ,koşullu ortalamadan yukarıda yada aşağıda bir değer alabilir. Belirtirsek; ui = Yi - E(Y|Xi),burada ui tek değerin koşullu ortalamadan sapmasını göstermektedir. Yer değiştirince: Yi = E(Y|Xi) + ui ui stokastik hata terimidir. [email protected] Bu bir tesadüfi bozulmadır. Bu olmadan model deterministik olacaktır. Jump to first page Stokastik Hata Örneği Aile tüketiminin gelirle doğrusal ilişkisi olduğunu ,artı bozulma terimi olduğunu kabul edelim.Bazı örnekler: 65$’lık harcamaya sahip olan aile bu şekilde gösterilebilir: Yi = 65 = 1 + 2(100) + ui 75$’lık harcamaya sahip olan aile : Yi = 75 = 1 + 2(100) + ui [email protected] Jump to first page Stokastik Hata Örneği Modelin deterministik ve stokastik bölümleri vardır. Bir ekonometrik model tüketim ve gelir arasındaki ilişkiyi ifade etmektedir. [email protected] Sistematik bölümü fiyat,eğitim vb. ile gösterilir. İlişki kesin değildir,tek tek varyasyona konu olmakta ve bu varyasyon u’ya yakalanmaktadır. Jump to first page U’nun Beklenen Değeri Yi = E(Y|Xi) + ui Koşullu beklentileri ele alalım. E(Yi|Xi) = E[(EY|Xi)] + E(ui|Xi) E(Yi|Xi) = E(Y |Xi) + E(ui|Xi ) Bir sabitin beklenen değeri sabittir ve bir kere Xi sabitlenirse, E(Y|Xi) de sabit olur. So E(ui|Xi) = 0 Koşullu ortalama değerleri ui =0 [email protected] Jump to first page Hata Terimi Ne Yakalar? İhmal edilen değişkenler Tüketimi etkileyebilecek diğer değişkenler modele dahil edilmemiştir. Eğer doğru olarak belirlenecekse modelimiz bunları da içermelidir. Ekonomik ilişkisini bilmiyorsak değişkeni ihmal edebiliriz. Datamız olmayabilir. Kötü hava,grev gibi tesadüfi olaylar düzensizce meydana gelir. [email protected] Jump to first page Hata Terimi Ne Yakalar? Bağımlı değişkendeki ölçüm hatasını Tüketimde Friedman modeli Devamlı tüketim devamlı gelirin bir fonksiyonudur. Bu datalar gözlemlenebilir değildir, cari tüketim ve gelir gibi vekiller kullanmak zorundadır. Sonra hata terimi bu ölçüm hatasını temsil eder ve onu yakalar. [email protected] Jump to first page Hata Terimi Ne Yakalar? İnsan davranışlarının tesadüfiliği İnsanlar aynı durumlarda bile tamamen aynı şekilde hareket etmezler. Yani hata terimi bu tesadüfiliği yakalamaktadır. [email protected] Jump to first page 5. Örnek Regresyon Fonksiyonu [email protected] Jump to first page Örnek Regresyon Fonksiyonu If have whole population, we can determine a regression line by taking conditional means In practice, usually have a sample. Suppose took a sample of population Can’t accurately estimate the population regression line since we have sampling fluctuations. [email protected] Jump to first page Sample Regression Function Our sample regression line can be denoted: ^ Y b1 b2 X i ^ Y is the estimator of E(Y | Xi), the conditiona l mean b1 is estimator of B1 b2 is estimator of B 2 [email protected] Jump to first page Sample Regression Function In stochastic form: Yi b1 b2 X i ei where ei is sample residual or residual - an estimate of ui We can have several independent variables this is multivariate regression e.g. consumption may depend on interest rate as well as income. [email protected] Jump to first page 6. Ordinary Least Squares [email protected] Jump to first page OLS Regression Estimate the PR by the method of ordinary least squares. The PRF is not directly observable, so we estimate it from the SRF: We have a PRF: Yi = 1 + 2Xi +ui Yi = b1 + b2Xi +ei We can rewrite as ei = actual Yi - predicted Yi ei = Yi - b1 - b2Xi [email protected] Jump to first page OLS Regression We determine the SRF is such a manner that it is a good fit. We make the sum of squared residuals as small as possible. ^ e (Yi - Y i ) 2 2 i By squaring, we give more weight to larger residuals. [email protected] Jump to first page OLS Regression Residuals are a function of the betas Choosing different values for beta gives different values for squared residuals. We choose the beta values that minimize this sum. These are the least-squares estimators. [email protected] Jump to first page Normal Equations The least squares estimates are derived in the following manner: Least squares minimizes ESS ^2 ^ ^ ESS u i (Yi 1 2 X i ) 2 ^ Partially differenti ate ESS with respect to 1 : ESS ^ 1 ^2 u ^ ^ ^ i 1 2 ui ui ^ 1 ^ ^ 2 (Yi 1 2 X i )( 1) [email protected] Jump to first page Normal Equations ^ Partially differenti ate ESS with respect to 2 : ESS ^ ^2 u i ^ 2 ^ ^ 2 2 ui ui ^ 2 ^ ^ 2 (Yi 1 2 X i )( X i ) Set resulting equations to zero and solve them : ^ (Y (Y i ^ 1 ^ i 2 Xi) 0 ^ 1 2 X i )( X i ) 0 [email protected] Jump to first page Normal Equations Simplifyin g yields : ^ ^ Y n X Y X X X i 1 i 2 ^ i i ^ i 1 2 2 i Solving both simultaneo usly yields : ^ ^ 1 Y 2 X ^ 2 Y X n X Y ( X X )(Y Y ) (X X ) X nX i i 2 i i i 2 2 i [email protected] Jump to first page 8. Assumptions of Classical Linear Regression Model [email protected] Jump to first page Assumptions Using model Y = B1 + B2X + u Y depends on X and u X values are fixed and u values are random. Thus Y values are random too. Assumptions about u are very important. Assumptions are made that ensure that OLS estimates are BLUE. [email protected] Jump to first page Linearity Assumption The regression model is linear in the parameters and the error term. Y = B1 + B2X + e. Not necessarily linear in the variables We can still apply OLS to models that are nonlinear in the variables. [email protected] Jump to first page Specification Assumption Assume the regression model is correctly specified All variables included (no specification bias). Otherwise, specification error results. [email protected] Jump to first page Expected Value of Error Expected value of the error term=0 E(ui) = 0 Its mean value is 0, conditional on the Xs. Add a stochastic error term to equations to explain individual variation. Assume the error term is from a distribution whose mean is zero [email protected] Jump to first page Expected Value of Error In practice the mean is forced to be zero by intercept term, which incorporates any difference from zero Intercept represents the fixed portion of Y that cannot be explained by the independent variables. The error term is the random portion [email protected] Jump to first page No Correlation with Error Explanatory variables are uncorrelated with the error term There is zero covariance between the disturbance ui and the explanatory variable Xi. Cov(Xi*ui) = 0 Alternatively, X and u have separate influences on Y [email protected] Jump to first page No Correlation with Error Suppose the error term and X are positively correlated. Estimated coefficient would be higher than it should because the variation in Y caused by e is attributed to X [email protected] Jump to first page No Correlation with Error Consumption function violates this assumption Increase in C leads to increase in income which leads to increase in C. So error term in consumption and income move together If we do not have this assumption then simultaneous equation estimation [email protected] Jump to first page Constant Variance of Error The variance of each ui is the same given a value of Xi. var(ui) = 2 a constant (Homoscedasticity) Ex: variance of consumption is the same at all levels of income Alternative: variance of the error term changes (Heteroscedasticity) [email protected] Ex: variance of consumption increases as income increases Jump to first page No Correlation Across Error Terms No correlation between two error terms The covariance between the u's zero Cov (ui, uj) = 0 for i not equal to j [email protected] Jump to first page No Correlation Across Error Terms Often shows up in time series serial correlation Random shock in one period which affects the error term may persist and affect subsequent error terms. Ex: positive error in one period associated with positive error in another: [email protected] Jump to first page No Perfect Linear Function Among Variables No explanatory variable is a perfect linear function of other explanatory variables Multicollinearity occurs when variables move together Ex: explain home purchases and include both real and nominal interest rates for a time period in which inflation was constant. [email protected] Jump to first page 9. Properties of OLS Estimators [email protected] Jump to first page OLS Properties 1)linear (linear functions of Y): Y = b1 + b2X 2)Unbiased: E(b1) = B1and E(b2) = B2 In repeated sampling, the expected values of b1 and b2 will coincide with their true values B1 and B2. [email protected] Jump to first page OLS Properties 3) They have minimum variance var b1 is less than the variance of any other unbiased linear estimator of B1 var b2 is less than the variance of any other unbiased linear estimator of B2 [email protected] Jump to first page BLUE Estimator Given the assumptions of the CLRM, OLS estimators, in the class of unbiased linear estimators, have minimum variance They are BLUE. [email protected] Jump to first page 10. Variances and Standard Errors of OLS Estimators [email protected] Jump to first page Variances and Standard Errors Remember t he OLS estimators are : ^ ^ 1 Y 2 X ^ 2 x y x y S x i i i 2 i i XX The estimators will vary across samples : ^ var ( 1 ) 2 X i nS XX ^ 2 ^ standard error ( 1 ) var ( 1 ) [email protected] Jump to first page Variances and Standard Errors ^ The variance and standard error of 2 : ^ var ( 2 ) 2 S XX ^ ^ standard error ( 2 ) var ( 2 ) [email protected] Jump to first page Variances and Standard Errors 2 is the variance of the error term, assumed constant for each u (homoscedasticity.) If know 2 one can compute all these terms. If don't know it use its estimator. The estimator of 2 is (ei)2/n-2 ^ e is the RSS or (Yi Yi ) 2 or 2 i difference between actual and predicted Y. [email protected] Jump to first page Degrees of Freedom n-2 is degrees of freedom for error Sum of independent observation To get e, we have to compute predicted Y To compute predicted Y, we must first obtain b1 and b2, so we lose 2 df. [email protected] Jump to first page Standard Error of Estimate ^ 2 ^ The square root of is . This is called the standard error of the estimate (the standard deviation of the Y values about the regression line) It is used as a measure of goodness of fit of the estimated regression line. [email protected] Jump to first page Example Estimated regression line Y = 24.47 + 0.509 X se (6.41) (.036) t 3.813 14.243 How do we get these figures? ^ 2 2 e i n2 337.3 / 8 42.16 [email protected] Jump to first page Example ^ var ( 1 ) 2 X i nS XX ^ 2 322,000 42.16 41.4 10(33,000) ^ standard error ( 1 ) var ( 1 ) 6.41 ^ var ( 2 ) 2 S XX ^ 42.16 .0013 33,000 ^ standard error ( 2 ) var ( 2 ) .036 [email protected] Jump to first page Example The the estimated slope coefficient is 0.509 and its standard error (standard deviation) is 0.036. This is a measure of how much 2 varies from sample to sample. We can say our computed 2 lies within a certain number of standard deviations from the true 2. [email protected] Jump to first page 11. Hypothesis Testing [email protected] Jump to first page Hypothesis Testing Set up the null hypothesis that our parameter values are not significantly different from zero H0:2 = 0 What does this mean?: Income has no effect on spending. So set up this null hypothesis and see if it can be rejected. [email protected] Jump to first page Hypothesis Testing In problem 5.3, 2= 1.25 This is different from zero, but this is just derived from one sample If we took another sample we might get +0.509 and a third sample we might get 0 In other words, how do we know that this is significantly different from zero? [email protected] Jump to first page Hypothesis Testing 2 ~ N(2, (2)2) Can test either by confidence interval approach, or by test of significance approach. 2 follows the normal distribution with mean and variance as above: Z 2 B2 / 2 x i ~ N(0,1) [email protected] Jump to first page Hypothesis Testing However, we do not know the true variance 2 We can estimate 2 Then we have: 2 B2 ^ / x 2 i ^ where [email protected] ~ t n -2 , 2 e i n2 Jump to first page Hypothesis Testing However, we do not know the true variance 2 We can estimate 2 Then we have: 2 B2 ^ / 2 x i ^ ~ t n -2 , where 2 e i n2 More generally (2 - B2)/ se 2 [email protected] Jump to first page Problem 5.3 Example /se()=1.25/0.039=31.793~t(n-2) At 95% with 7 df, t=2.365 so reject the null. Also could do a one-tail test Set up the alternative hypothesis that 2>0 Also reject the null since t = 1.895 for one-tailed test. [email protected] Jump to first page Problem 5.3 Example Most of the time, we assume a null that the parameter value = 0. There are occasions where we may want to set up a different null hypothesis. In Fisher example, we set up hypothesis that b2 = 1. So now 1.25-1 /se = 0.25/.039 = 6.4 So it is significant. [email protected] Jump to first page Confidence Interval Approach P(2.365 t 2.365) .95 2 B2 P 2.365 ^ 2.365 .95 2 / xi ^ ^ 2.365 2.365 P 2 B2 2 .95 2 2 xi xi P 2 2.365se( 2 ) B2 2 2.365se( 2 ) .95 P1.25 2.365(.039) B2 1.25 2.365(.039) .95 P1.158 B2 1.342) .95 B2 = 0 and B2 = 1 do not lie in this interval [email protected] Jump to first page 12. Coefficient of 2 Determination--R [email protected] Jump to first page Coefficient of Determination The coefficient of determination, R2, measures the goodness of fit of the regression line overall ^ ei Yi Yi ^ Yi Yi ei Alternatively , ^ ^ (Yi Y ) (Yi Y ) (Yi Yi ) variation in variation in Y Y from mean= explained by X+ unexplained value around its mean variation [email protected] Jump to first page Coefficient of Determination Using lower cases to represent deviations from means ^ yi y i ei Now square and sum ^2 2 y y e i i 2 i TSS ESS RSS Total variation in observed Y values about their mean is partitioned into 2 parts, one attributable to the regression line and the other to random forces. [email protected] Jump to first page Coefficient of Determination If the sample fits the data well, ESS should be much larger than RSS. The coefficient of determination (R2)= ESS/TSS Measures the proportion or percentage of the total variation in Y explained by the regression model. [email protected] Jump to first page Correlation Coefficient The correlation coefficient is the square root of R2 Correlation coefficient measures the strength of the relationship between two variables. However, in a multivariate context, R has little meaning. [email protected] Jump to first page 13. Forecasting [email protected] Jump to first page Forecasting Suppose we want to predict out of sample and know relation between CPI and S&P (Problem 5.2) Have data to 1989 and want to predict 1990 stock prices. Expect inflation in 1990 to be 10% so CPI is 124 + 12.4 = 136.4 Y = -195.08 + 3.82CPI Estimated Y for 1990 is 325.97=195.08 + 3.82(136.4) [email protected] Jump to first page Forecasting There will be some error to this forecast - prediction error. This has quite a complicated formula. This error increases as we get further away from the sample mean. Hence, we cannot forecast very far out of sample with a great deal of certainty. [email protected] Jump to first page
© Copyright 2026 Paperzz