FIN 30210: Managerial Economics Statistical Analysis Part I: Probability The Cubs have a 12% chance of winning the world series this year Here are the odds for Blackjack….remember, what happens in Vegas stays in Vegas Probability is about having the truth “Patriots have no need for probability, win coin flip at impossible clip” (Nov. 4th, 2015) “Belichick has also been extremely lucky. The Pats have won the coin toss 19 of the last 25 times, according to the Boston Globe's Jim McBride.” So, what are the odds that the Patriots can win at least 19 out of 25 flips? To do this, we need a probability distribution…for a coin toss, we have the following. Probability Side Note: 50 Super Bowl Coin Tosses 1/2 Heads: 24 (48%) Tails: 26 (52%) Head Tail Outcome So, suppose that we wanted the odds that the Patriots got 19 wins in a row…. Probability ( A and B) = Probability(A) * Probability (B) Probability ( A and B) = Probability(A) * Probability (B) Probability 1/2 Head Tail Outcome So, we want the probability of 19 Wins P W and W and ... and W .5 .5 ... .5 .5 19 0.00000191 The odds of dying from an asteroid collision with earth in the next 100 years is 1 in 500,000 (.000191% - 1 in 523,560) This isn’t really what we want though…getting 19 wins in a row is one of many ways to get 19 out of 25 What are the odds that the Patriots get 24 out of 25 wins Probability ( A and B) = Probability(A) * Probability (B) Probability ( A or B) = Probability(A) + Probability (B) There are LOTS of ways to get exactly 24 out of 25 wins One way would be P .50 0.0000000298 25 LWWWWWWWWWWWWWWWWWWWWWWWW One way would be (.00000298%) 24 Wins WLWWWWWWWWWWWWWWWWWWWWWWW 23 Wins P .50 0.0000000298 25 (.00000298%) What are the odds that the Patriots get 24 out of 25 wins Probability ( A and B) = Probability(A) * Probability (B) Probability ( A or B) = Probability(A) + Probability (B) In Fact, there are 25 ways to get 24 out of 25 wins, so the answer would be P 25 .50 0.00000075 25 (.000075% - 1 in 1.3 million) The odds of becoming a movie star are 1 in 1.5 million The probability for a number of wins out of a certain number of tries is given by a binomial distribution: k successes in n tries. Probability of success is p Note: 24 out of 25 wins n! nk k P k , n, p p 1 p k ! n k ! P 24, 25,.5 25 .5 So, the probability that the patriots get EXACTLY 19 out of 25 wins would be 25! 25 19 19 P 19, 25;.5 .5 .5 .0052 19! 25 19 ! (.52% - 1 in 192) So, the probability that the patriots get AT LEAST 19 out of 25 wins would be The odds of Notre Dame winning the national title in football this year are 1 in 40 25 25! P .5i *.525i .0073 i 19 i ! 25 i ! (.73% - 1 in 137) 25 Here’s the binomial distribution for 25 tosses 18 Odds of 12 or Less = 50% Odds of 12 or Less = 50% 16 14 Probability (%) 12 10 8 6 4 2 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 31% .73% On the other side of the proverbial coin is losing the toss a lot. In 2011, the Cleveland Browns lost 11 in a row. P .50 0.00049 11 (.049% 1 in 2040) The odds of fatally slipping in the shower are 1 in 2500 In 2012, the Carolina Panthers lost 12 in a row. P .50 0.00024 12 (.024% - 1 in 4,166) The odds of getting a hole in 1 in golf are 1 in 5,000 What are your odds of winning at craps? Easiest Bet – Playing the Pass Line The Game of Craps – Playing the Pass Line • • • • • If you roll a 2,3,or 12, you lose “crap out” If you roll a 7 or 11, you “win” If you roll a 4,5,6,8,9,10 the rolled number becomes the “point” If you roll the point, you win, if you roll a seven before rolling the “point” you lose. The Pass Line Pays Even Odds Probability 1/6 Number 1 2 3 4 5 Total on 2 dice Combinations Probability Percentage 2 1+1 1/36 3% 3 1+2, 2+1 2/36 6% 4 1+3, 2+2, 3+1 3/36 8% 5 1+4, 2+3, 3+2, 4+1 4/36 11% 6 1+5, 2+4, 3+3, 4+2, 5+1 5/36 14% 7 1+6, 2+5, 3+4, 4+3, 5+2, 6+1 6/36 17% 8 2+6, 3+5, 4+4, 5+3, 6+2 5/36 14% 9 3+6, 4+5, 5+4, 6+3 4/36 11% 10 4+6, 5+5, 6+4 3/36 8% 11 5+6, 6+5 2/36 6% 12 6+6 1/36 3% 6 Come Out • Win = 23% • Lose = 12% • Roll Again = 65% • 4 or 10 = 16% • 5 or 9 = 22% • 6 or 8 = 27% 18 16 14 Probability (%) 12 10 8 6 4 2 0 2 3 “Craps” (9%) 4 5 “Point” (33%) 6 7 8 “Win” (17%) 9 10 11 12 “Craps” (3%) “Point” (33%) “Win” (6%) 65 Percent of the time, you have a “point” to make Come Out = 4,10 Next Roll • Win = 8% • Lose = 17% • Roll Again = 75% Come Out = 6,8 Next Roll • Win =14% • Lose = 17% • Roll Again = 69% 17% 16 Probability (%) Come Out = 5,9 Next Roll • Win =11% • Lose = 17% • Roll Again = 72% 18 14% 14 12 14% 11% 11% 10 8% 8% 8 6% 16% 6 4 3% 3% 2 0 2 3 4 5 6 7 8 9 10 11 12 So, what's the probability that you win with a pass bet? Total on 2 dice Probability Roll a Seven 6/36 Roll a 11 2/36 Roll a 4 and then roll another 4 before rolling a 7 (3/36)*PR(4 before a 7) Roll a 5 and then roll another 5 before rolling a 7 (4/36)*PR(5 before a 7) Roll a 6 and then roll another 6 before rolling a 7 (5/36)*PR(6 before a 7) Roll a 8 and then roll another 8 before rolling a 7 (5/36)*PR(8 before a 7) Roll a 9 and then roll another 9 before rolling a 7 (4/36)*PR(9 before a 7) Roll a 10 and then roll another 10 before rolling a 7 (3/36)*PR(10 before a 7) These are a bit tricky….. What’s the probability the you roll a 4 before you roll a7? Total on 2 dice Probability 4 3/36 7 6/36 All Other #s 27/36 2 Roll a 4 Roll something other than a 4 or 7, then roll a 4 A useful bit of math 1 x x 2 x3 ... 1 1 x 3 3 27 3 27 3 27 3 P 4 before 7 ...... 36 36 36 36 36 36 36 Roll something other than a 4 or 7 twice, then roll a 4 Roll something other than a 4 or 7 three times, then roll a4 2 3 3 27 27 27 1 ...... 36 36 36 36 3 36 3 36 9 9 1 36 27 1 36 9 So, what's the probability that you win with a pass bet? Event Probability Total on 2 dice Probability Percentage (approx.) 4 before a 7 3/9 Roll a Seven 6/36 16.67% 5 before a 7 4/10 Roll a 11 2/36 5.56% 6 before a 7 5/11 Roll a 4 and then roll another 4 before rolling a 7 (3/36)*(3/9) = 9/324 2.78% 8 before a 7 5/11 Roll a 5 and then roll another 5 before rolling a 7 (4/36)*(4/10) = 16/360 4.44% 9 before a 7 4/10 Roll a 6 and then roll another 6 before rolling a 7 (5/36)*(5/11) = 25/396 6.31% 10 before a 7 3/9 Roll a 8 and then roll another 4 before rolling a 8 (5/36)*(5/11) = 25/396 6.31% Roll a 9 and then roll another 9 before rolling a 7 (4/36)*(4/10) = 16/360 4.44% Roll a 10 and then roll another 10 before rolling a 7 (3/36)*(3/9) = 9/324 2.78% Total 244/495 49.3% The Game of Craps – Playing the Pass Line • • • • • If you roll a 2,3,or 12, you lose “crap out” If you roll a 7 or eleven, you win “win” If you roll a 4,5,6,8,9,10 the rolled number becomes the “point” If you roll the point, you win, if you roll a seven before rolling the “point” you lose. The Pass Line Pays even odds Playing the Pass Line This is known as the “House Edge” Win = 49.3% - Loss = 50.7% -1.4% “The Gambler’s Ruin” A gambler playing a negative expected value game will eventually go broke with probability one!! Event Probability (approx.) Pass Line Win 22.2% Pass Line Loss 11.1% 4,or 10 Win 5.6% 4 or 10 Loss 11.1% 5 or 9 Win 8.9% 5 or 9 Loss 13.3% 6 or 8 Win 12.6% 6 of 8 Loss 15.2% The Game of Craps – Playing the Pass Line • • • • If you roll a 2,3,or 12, you lose “crap out” If you roll a 7 or eleven, you win “win” If you roll a 4,5,6,8,9,10 the rolled number becomes the “point” If you roll the point, you win, if you roll a seven before rolling the “point” you lose. • The Pass Line Pays even odds Playing the Pass Line Expected Value measures the average outcome over a large number of attempts, given the probabilities of each outcome. EV Pr xi xi Event Probability (approx.) Pass Line Win 22.2% Pass Line Loss 11.1% 4,or 10 Win 5.6% 4 or 10 Loss 11.1% 5 or 9 Win 8.9% 5 or 9 Loss 13.3% 6 or 8 Win 12.6% 6 of 8 Loss 15.2% Playing the Pass Line Expected Percentage loss (House Edge) $.0141 *100 1.41% $1 For a $1 Pass Bet Event Probability Total Bet Payout Expected Payout Expected Total Bet Pass Line Win 22.22% $1 $1 .222 .222 Pass Line Loss 11.11% $1 -$1 -.111 .111 4,or 10 Win 5.56% $1 $1 .392 .0556 4 or 10 Loss 11.11% $1 -$1 -.444 .1111 5 or 9 Win 8.89% $1 $1 .623 .0889 5 or 9 Loss 13.33% $1 - $1 -.665 .1333 6 or 8 Win 12.63% $1 $1 .882 .1263 6 of 8 Loss 15.15% $1 -$1 -.912 .1515 -.0141 1.00 Total 100% Suppose that the first roll is a 4. I can now make an additional bet. I can make a bet that a 4 is rolled before a 7. This is called “Playing the odds” Event Probability 4 before a 7 3/9 5 before a 7 4/10 6 before a 7 5/11 8 before a 7 5/11 9 before a 7 4/10 10 before a 7 3/9 Bet Payout 4 or 10 2 to 1 5 or 9 3 to 2 6 or 8 6 to 5 The house pays odds equal to the to the true odds, so the house edge on this additional bet are ZERO!!!!!!! This is the only fair bet in Vegas!!! Suppose that you can bet twice your initial bet on the odds Whatever your initial Pass/Don’t Pass Wager, you can up your bet on a point as follows • You can bet 2X your initial bet if your point is 4 or 10 (Pays 2 to 1) • You can bet 2X your initial bet if your point is 5 or 9 (Pays 3 to 2) • You can bet 2X your initial bet if your point is 6 or 8 (Pays 6 to 5) For a $1 Initial Bet – Playing Pass/w 2x odds Event Probability Total Bet Payout Expected Payout Expected Bet Pass Line Win 22.22% $1 $1 .222 .222 Pass Line Loss 11.11% $1 -$1 -.111 .111 4,or 10 Win (Pays 2-1) 5.56% $3 $5 .392 .167 4 or 10 Loss 11.11% $3 -$3 -.444 .333 5 or 9 Win (Pays 3-2) 8.89% $3 $4 .623 .267 5 or 9 Loss 13.33% $3 - $3 -.665 .400 6 or 8 Win (Pays 6-5) 12.63% $3 $3.40 .882 .379 6 of 8 Loss 15.15% $3 -$3 -.912 .455 -.0141 2.33 Total 100% Expected Percentage loss (House Edge) $.0141 *100 .605% $2.33 The expected loss is the same, but your overall bet is bigger, so the percentage loss is smaller!! A Common Casino Betting System for Casino Craps is the “3-4-5” System Whatever your initial Pass/Don’t Pass Wager, you can up your bet on a point as follows • You can bet 3X your initial bet if your point is 4 or 10 (Pays 2 to 1) • You can bet 4X your initial bet if your point is 5 or 9 (Pays 3 to 2) • You can bet 5X your initial bet if your point is 6 or 8 (Pays 6 to 5) For a $1 Initial Bet – Playing Pass/w 3-4-5 odds Event Probability Total Bet Payout Expected Payout Expected Bet Pass Line Win 22.22% $1 $1 .222 .222 Pass Line Loss 11.11% $1 -$1 -.111 .111 4,or 10 Win (Pays 2-1) 5.56% $4 $7 .392 .224 4 or 10 Loss 11.11% $4 -$4 -.444 .444 5 or 9 Win (Pays 3-2) 8.89% $5 $7 .623 .444 5 or 9 Loss 13.33% $5 - $5 -.665 .666 6 or 8 Win (Pays 6-5) 12.63% $6 $7 .882 .7578 6 of 8 Loss 15.15% $6 -$6 -.912 .909 -.0141 3.77 Total 100% Expected Percentage loss (House Edge) $.0141 *100 .374% $3.77 The bigger the multiple allowed, the smaller the house edge!! House Edge for other craps bets Bet House Edge Pass/Come 1.41% Don’t Pass/Don’t Come 1.36% Pass/Come(2X odds) .606% Don’t Pass/Don’t Come (2X odds) .466% Place 6 and 8 1.52% Place 5 and 9 4.00% Place 4 and 10 6.67% Buy 6 and 8 4.76% Buy 5 and 9 4.76% Buy 4 and 10 4.76% Lay 6 or 8 4.00% Lay 5 or 9 3.23% Lay 4 or 10 2.44% Field Bet 5.56% Any Craps 11.11% 6 or 8 Hard way 9.09% 4 or 10 Hard way 11.10% 11 or 3 11.10% 2 or 12 13.90% Any 7 16.70% Here’s a comparison of casino edges on other games… Craps Other Games House Edge when you Take the Odds Game House Edge (w/ proper play) Table Odds Taken Pass Line Don’t Pass Blackjack 0.5% 0x 1.41% 1.36% Video Poker 0.5% - 5% 1x 0.848% 0.682% Baccarat 1.06% 2x 0.606% 0.455% Roulette 5.5% 3x 0.471% 0.341% Slot Machines 0% – 17% 3-4-5x 0.374% 0.273% Progressive Slots 5% -17% 5x 0.326% 0.227% Keno 25%+ 10x 0.184% 0.124% Typical State Lottery 50%+ 20x 0.099% 0.065% 100x 0.021% 0.014% What are the odds that it will be 80 degrees tomorrow in South Bend? As with the first two examples, this involves a probability distribution Just as with the coin flip or the dice roll, we can imagine a “truth” out there governing South Bend temperatures. This “truth”, again, is in the form of a probability distribution. Probability x = temperature = mean = standard deviation 3 2 Temperature 2 3 Probability We can use the normal distribution to get the probability that the temperature lies within various ranges = mean = standard deviation 0.2% 3 34% 2.3% 2 34% 2.3% 0.2% 13.5% 13.5% 68% 95% 99.6% 2 Temperature 3 So, for example…… = 60 = 15 Probability 34% 2.3% 0.2% 34% Temperature Range Probability <15 0.2% 15 – 30 2.3% 30 – 45 13.5% 45 – 60 34% 60 – 75 34% 75 – 90 13.5% 90 – 105 2.3% >105 .2% 2.3% 13.5% 0.2% 13.5% Temperature 15 30 45 60 68% 95% 99.6% 75 90 105 Conditional distributions give us probabilities conditional on some observable information What is the probability that the Temperature in south bend is greater than 15 degrees Probability Probability Unconditional = 10 =5 = 60 = 15 Temp 15 99.8% 30 45 Conditional on February 60 75 90 105 5 0 5 10 15 20 25 16% Temp Part II: Statistics Statistics is about finding the truth Law of large numbers: In statistics, as the number of identically distributed, randomly generated variables increases, their sample mean (average) approaches their theoretical mean. The law of large numbers was first proved by the Swiss mathematician Jakob Bernoulli. Number of data points increases Jakob Bernoulli 1655 - 1705 # of Heads Total Flips 1 N x xi N i 1 1 N x xi N i 1 2 1 N s x x i N 1 i 1 2 1 Pr Heads 2 1 2 3 4 5 6 3.5 6 (Population Mean) 2 (Population Variance) Average Monthly Temperatures in Indiana from 1894 - 2016 Sample Statistics • Average = 50.7 • Std. Dev. = 16.3 • High = 78.1 • Low = 22.1 We have Average Monthly temperatures (1894 – 2016) for 36 locations across Indiana. This is what we would call a “cross sectional” dataset (multiple observations at a single point in time) 4.5 Sample Statistics • Average = 50.7 • Std. Dev. = 16.3 • High = 78.1 • Low = 22.1 4 3.5 Frequency (%) 3 2.5 2 1.5 1 0.5 0 20 25 30 35 40 45 50 55 60 65 70 75 80 85 Suppose that we condition on “Northern Indiana” or “Southern Indiana” Sample Statistics • Average = 47.9 • Std. Dev. = 16.8 • High = 72.6 • Low = 22.1 We have Average Monthly temperatures (1894 – 2016) for 36 locations across Indiana. This is what we would call a “cross sectional” dataset (multiple observations at a single point in time) “Northern Indiana” 7 Frequency (%) 6 Sample Statistics • Average = 47.9 • Std. Dev. = 16.8 • High = 72.6 • Low = 22.1 5 4 3 2 1 0 20 25 30 35 40 45 50 55 60 65 70 75 80 85 “Southern Indiana” 7 Frequency (%) 6 Sample Statistics • Average = 53.1 • Std. Dev. = 15.7 • High = 78.1 • Low = 24.2 5 4 3 2 1 0 20 25 30 35 40 45 50 55 60 65 70 75 80 85 January - March 14 Frequency (%) 12 Sample Statistics • Average = 32.7 • Std. Dev. = 6.5 • High = 47.5 • Low = 22.1 10 8 6 4 2 0 20 25 30 35 40 18 45 50 55 60 65 70 75 80 I could also condition on Month(s) of the year 85 Temperature June - August Frequency (%) 16 14 Sample Statistics • Average = 70.8 • Std. Dev. = 2.6 • High = 78.8 • Low = 65.1 12 10 8 6 4 2 0 20 25 30 35 40 45 50 55 60 65 70 75 80 85 Temperature February Frequency (%) 25 20 Sample Statistics • Average = 32.7 • Std. Dev. = 3.5 • High = 37.7 • Low = 23.6 15 10 5 0 20 25 30 35 40 20 45 50 55 60 65 70 75 80 Or individual months of the year 85 Temperature July Frequency (%) 18 16 Sample Statistics • Average = 72.6 • Std. Dev. = 1.9 • High = 78.1 • Low = 69.0 14 12 10 8 6 4 2 0 20 25 30 35 40 45 50 55 60 65 70 75 80 85 Temperature Northern Indiana in February 35 30 Sample Statistics • Average = 25.9 • Std. Dev. = 1.4 • High = 27.4 • Low = 23.6 25 20 15 10 5 Or individual months of the year and locations 0 20 25 30 35 40 45 50 55 60 65 70 75 80 85 35 Northern Indiana in July 30 Sample Statistics • Average = 70.8 • Std. Dev. = 1.1 • High = 72.6 • Low = 69.0 25 20 15 10 5 0 20 25 30 35 40 45 50 55 60 65 70 75 80 85 For Indiana x = 50.7 Probability s = 16.3 “I’m 95% sure that the temperature for September will be between 18 and 83 degrees” 34% 2.3% 0.2% 34% 2.3% 13.5% 0.2% 13.5% Temperature 1.8 18.1 34.4 50.7 68% 95% 99.6% 67 83.3 99.6 So, for example……for Northern Indiana in September Sample Statistics x = 61.9 Probability s = 1.1 0.2% “I’m 95% sure that the temperature for September will be between 60 and 64 degrees” 34% 2.3% 34% 2.3% 13.5% 0.2% 13.5% Temperature 58.6 59.7 60.8 61.9 68% 95% 99.6% 63 64.1 65.2 Regressions are about estimating conditional distributions Linear Regressions make several key assumptions • • • • • Linear Relationship Multivariate Normality No or Little Multicollinearity No Auto-correlation Homoscedasticity Independent Variable x is N x , x2 yi xi i Explained Variable Parameters to be estimated Error Term is N 0, 2 Cov , X 0 yi xi i Frequency Conditional Distribution of Y / 2 xi xi x yi xi i y E y / xi xi Var y / xi 2 The OLS (Ordinary Least Squares) method estimates the parameters alpha and beta by minimizing the sum of squared errors. y min i 1 yˆ y N 2 ˆ , ˆ yˆi ˆ ˆ xi Estimated Coefficients N ˆ ŷ y x x y y i 1 i N xi x i 1 ̂ ˆ y ˆ x x i 2 We also have a set of error terms i yi ˆ ˆ xi Frequency E 0 Var 2 0 x ŷ y These errors are a sampling of the population of errors Each regression gives us a sample of the distribution of errors (not the entire population of errors). Therefore, the estimated coefficients are not the true coefficients, but rather, they are samples drawn from a distribution of possible true parameter values E ˆ x2 2 1 Var ˆ 2 N N x 2 Var ˆ N x2 E ˆ Frequency Frequency A few important things regarding these parameter estimates… The estimated parameters are drawn from a distribution with a mean equal to the true parameter value – we are not making biased prediction! E ˆ E ˆ These parameters are unknown, so we need to estimate them from the data 2 Var ˆ N x2 1 x2 Var ˆ 2 N N x 2 1) The variance of the parameters is smaller (the estimates are more precise)when the variance of x is large 2) As the number of observations gets large, the variance approaches zero – we learn the truth! Law of large numbers: In statistics, as the number of identically distributed, randomly generated variables increases, their sample mean (average) approaches their theoretical mean. The law of large numbers was first proved by the Swiss mathematician Jakob Bernoulli. Sample Estimates Population Parameters N ˆ 2 i 1 2 i Number of observations gets big N 2 1 x2 ˆ ˆ Var N N 1 Var x 2 Var ˆ ˆ Var x x2 Number of observations gets big x x Var ˆ Var 0 Var x x2 2 N 1Var x 2 Number of observations gets big x x Var ˆ Var 0 We also have some additional “diagnostics” to check the performance of the regression Total Sum of Squares N TSS yi y 2 i 1 = Total Variation in the Data we are trying to explain Regression Sum of Squares N RSS i 1 yˆi yˆ 2 + Total Variation in the data we have actually Residual Sum of Squared Residuals N SSE i 2 i 1 Total Variation in the Data left unexplained Standard Error of the Regression N R Squared of the Regression R2 RSS TSS The percentage of the variation of Y explained in the regression SE yi yˆ 2 i 1 N 2 The is the average error of our estimates If we would like to make a forecast using our regression data, we need to calculate the conditional distribution E y0 / x0 ˆ ˆ x0 Frequency 2 x x 1 0 2 Var y0 / x0 ˆ 1 N N 1Var x 1) Note that since our estimates are unbiased, our forecasts will also be unbiased! 2) As our sample size gets bigger, the variance of our forecasts goes down (our forecasts get more precise) 3) If the variance of X is big, we get better forecasts x0 ŷ A forecast is only as good as the error attached to it!!! ŷ yˆi 2 yˆi ˆ ˆ xi 95% Confidence Interval yˆi 2 ̂ E y0 / x0 ˆ ˆ x0 x We always get the best forecast at the sample average x 2 x x 1 0 2 Var y0 / x0 ˆ 1 N N 1 Var x Example: Does the striped ground cricket chirp differently at different temperatures? Temperature (F) 20.0 88.6 16.0 71.6 19.8 93.3 18.4 84.3 17.1 80.6 15.5 75.2 14.7 69.7 17.1 82.0 15.4 69.4 16.2 83.3 15.0 79.6 17.2 82.6 16.0 80.6 17.0 83.5 14.4 76.3 22.0 20.0 Chirps/Sec. Chirps Per Sec 18.0 16.0 14.0 12.0 10.0 65.0 70.0 75.0 80.0 85.0 Temperature (F) 90.0 95.0 Example: Does the striped ground cricket chirp differently at different temperatures? Chirps Temp. Temperature (F) 20.0 88.6 16.0 71.6 19.8 93.3 18.4 84.3 17.1 80.6 15.5 75.2 14.7 69.7 17.1 82.0 15.4 69.4 16.2 83.3 15.0 79.6 17.2 82.6 16.0 80.6 17.0 83.5 14.4 76.3 22.0 20.0 Chirps/Sec. Chirps Per Sec 18.0 16.0 14.0 12.0 10.0 65.0 70.0 75.0 80.0 85.0 Temperature (F) 90.0 95.0 y x Chirps Per Sec Temperature (F) y y x x y y x x x x 2 20.0 88.6 3.3 8.6 28.6 73.3 16.0 71.6 -0.7 -8.4 5.5 71.2 19.8 93.3 3.1 13.3 41.7 175.8 18.4 84.3 1.7 4.3 7.4 18.1 17.1 80.6 0.4 0.6 0.3 0.3 15.5 75.2 -1.2 -4.8 5.6 23.4 14.7 69.7 -2.0 -10.3 20.2 106.9 17.1 82.0 0.4 2.0 0.9 3.8 15.4 69.4 -1.3 -10.6 13.3 113.2 16.2 83.3 -0.5 3.3 -1.5 10.6 15.0 79.6 -1.7 -0.4 0.7 0.2 17.2 82.6 0.5 2.6 1.4 6.6 16.0 80.6 -0.7 0.6 -0.4 0.3 17.0 83.5 0.3 3.5 1.2 12.0 14.4 76.3 -2.3 -3.7 8.4 14.0 Average = 16.7 Average = 80.0 Sum=133.5 Sum =629.8 133.5 ˆ .21 629.8 N ˆ x x y y i 1 i N i xi x 2 i 1 ˆ y ˆ x ˆ 16.7 .21 80 .309 Chirps Per Sec Temperature (F) Predicted Error Squared Error 20.0 88.6 18.5 -1.5 2.3 16.0 71.6 14.9 -1.1 1.3 19.8 93.3 19.5 -0.3 0.1 18.4 84.3 17.6 -0.8 0.7 17.1 80.6 16.8 -0.3 0.1 15.5 75.2 15.6 0.1 0.0 14.7 69.7 14.5 -0.2 0.1 17.1 82.0 17.1 0.0 0.0 15.4 69.4 14.4 -1.0 1.0 16.2 83.3 17.3 1.1 1.3 15.0 79.6 16.6 1.6 2.4 17.2 82.6 17.2 0.0 0.0 16.0 80.6 16.8 0.8 0.6 17.0 14.4 83.5 76.3 17.4 15.9 0.4 1.5 0.1 2.1 Average = 16.7 Average = 80.0 Variance = 2.89 Variance = 44.98 1 x2 Var ˆ ˆ N N 1 Var x 2 N ˆ 2 2 i i 1 N 2 Var ˆ ˆ 2 ˆ 2 N 1Var x 12.3 .946 13 Var ˆ .946 .00149 14 44.98 SE ˆ .00149 .0387 Sum = 12.3 1 802 Var ˆ .946 9.66 15 14 44.98 SE ˆ 9.66 3.109 Actual y ŷ Predicted y y 2 yˆ yˆ 2 y yˆ 20.0 18.5 11.2 3.3 2.3 16.0 14.9 0.4 3.2 1.3 19.8 19.5 9.9 7.9 0.1 18.4 17.6 3.1 0.8 0.7 17.1 16.8 0.2 0.0 0.1 15.5 15.6 1.3 1.1 0.0 14.7 14.5 3.8 4.8 0.1 17.1 17.1 0.2 0.2 0.0 15.4 14.4 1.6 5.1 1.0 16.2 17.3 0.2 0.5 1.3 15.0 16.6 2.7 0.0 2.4 17.2 17.2 0.3 0.3 0.0 16.0 16.8 0.4 0.0 0.6 17.0 14.4 17.4 15.9 0.1 5.1 0.5 0.6 0.1 2.1 Sum = 28.3 Sum = 12.3 Average = 16.7 Average = 16.7 Sum = 40.6 R2 28.3 .70 40.6 Regression Sum of Squares Total Sum of Squares 2 N TSS yi y N RSS 2 i 1 i 1 yˆi yˆ R Squared of the Regression RSS R TSS 2 Standard Error of the Regression N SE SE 12.3 .97 13 yi yˆ i 1 N 2 2 2 Chirps .309 .211 85 17.7 E y0 / x0 ˆ ˆ x0 1 85 80 2 Var y / 85 .946 1 3.43 15 14 44.98 2 x x 1 0 2 Var y0 / x0 ˆ 1 N N 1 Var x SD y / 85 3.43 1.85 Chirps .309 .211Temp. 22.0 So, lets calculate a prediction for the number of chirps at a temperature of 85 degrees 20.0 17.7 18.0 95% Confidence Interval 17.7+/-2(1.85) = [21.4, 14] Chirps/Sec. 16.0 14.0 12.0 10.0 65.0 70.0 75.0 80.0 85.0 Temperature (F) 90.0 95.0 Mult R R 2 .70 Here are the results of the regression done in Excel 1 R N 1 1 .70 15 1 1 1 2 SUMMARY OUTPUT Adj R 2 N p 1 15 1 1 Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 0.84 0.70 0.67 0.97 15 FStat RSS 28.29 29.97 MSE .94 ANOVA df Regression Residual Total SS 1 13 14 28.29 12.27 40.56 -0.31 0.21 Standard Error 3.11 0.04 Coefficients Intercept Temperature (F) MS F 28.29 0.94 t Stat 29.97 P-value -0.10 5.47 0.92 0.00 Significance F 0.00 Lower 95% -7.02 0.13 Upper 95% 6.41 0.30 How does taking LSD affect your performance on a math test? Score Conc. 90 80 70 Test Score 60 50 40 30 20 “Correlation of performance test scores with “tissue concentration” of Lysergic Acid Diethylamide in human subjects” John Wagner, George Aghajanian, and Oscar Bing March 22, 1968 10 0 1 2 3 4 Tissue Concentration 5 6 7 How does taking LSD affect your performance on a math test? SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error 0.963 0.927 0.912 5.710 Observations 7.000 Score 90.3 9.6 Conc. ANOVA df Regression Residual 1 5 SS 2056.44358 163.0445059 Total 6 2219.488086 Coefficients 90.29465244 Standard Error 5.647752669 -9.565531392 1.204533226 Intercept Tissue Conc (x) MS F 2056.44358 63.06387 32.60890117 Significanc eF 0.00051 Lower Upper t Stat P-value Lower 95% Upper 95% 95.0% 95.0% 15.98771365 1.74E-05 75.77664 104.8127 75.77664 104.8127 -7.941276492 0.00051 -12.6619 -6.46918 -12.6619 -6.46918 What would your predicted score be with a concentration of 4? Score 90.3 9.6Conc. 90 Mean x = 4.3 Var(x) = 3.7 80 E y0 / x0 90.3 9.6 4 51.9 70 1 4 4.32 Var y / 4 32.6 1 37.4 7 6 3.7 50 SD y / 4 37.4 6.1 Test Score 60 40 30 20 10 0 95% Confidence Interval 51.9 +/- 12.2 64.1, 39.7 1 2 3 4 Tissue Concentration 5 6 7 What about the possibility of a non-linear relationship between LSD usage and math performance? Score Conc. Score Conc Beta measures the unit change in test score per unit change in LSD concentration VS Score e Conc %Score Conc Beta measures the percentage change in test score per unit change in LSD concentration Both functional forms indicate a negative relationship, but one is linear while the other is non-linear 120 100 80 60 40 Score e Conc. Score Conc. 20 0 0 1 2 3 4 5 6 7 I can estimate this nonlinear relationship through a transformation of variable? Score e Conc Take the natural log of both sides… ln Score ln e Conc A little math here… ln Score ln Conc Define a new constant… ln Score Constant Conc How does taking LSD affect your performance on a math test? ln Score 4.64 .19 Conc. SUMMARY OUTPUT ln 4.64 Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 0.961663013 0.924795751 0.909754901 0.114937569 7 e4.64 103.5 Score 103.5.e.19Conc ANOVA df Regression Residual Total Intercept Tissue Conc (x) 1 5 6 Coefficients 4.648346245 -0.19010758 SS 0.812264483 0.066053224 0.878317707 Standard Error 0.113676212 0.024244471 MS 0.812264483 0.013210645 Significan F ce F 61.4856 0.000541 Lower Upper Lower Upper t Stat P-value 95% 95% 95.0% 95.0% 40.89110775 1.65E-07 4.356132 4.94056 4.356132 4.94056 -7.841275714 0.000541 -0.25243 -0.12779 -0.25243 -0.12779 What would your predicted score be with a concentration of 4? ln Score 4.64 .19 Conc. 90 Mean x = 4.3 Var(x) = 3.7 80 E ln y / 4 4.64 .194 3.88 e 3.88 48.4 1 4 4.32 Var ln y / 4 .0131 .015 7 6 3 . 7 SDln y / 4 .015 .122 Test Score 70 60 50 40 30 20 12.2% 10 0 95% Confidence Interval 48.4 +/- 24.4% 1 2 3 4 Tissue Concentration 60.2, 36.6 5 6 7 A linear regression can capture several different non-linear relationships by transforming the variables! Functional Form y x y e x y ln x e y x e Regression Equation y x ln y x y ln ln x ln y ln ln x Interpretation A one unit change in X causes a Beta units change in Y A one unit change in X causes a Beta percent change in Y A one percent change in X causes a Beta units change in Y A one percent change in X causes a Beta percent change in Y “Northern Indiana” 7 Sample Statistics • Average = 47.9 • Std. Dev. = 16.8 6 Frequency (%) 5 4 3 2 1 0 20 25 30 35 40 45 50 55 60 65 70 75 80 85 Indiana 4.5 Sample Statistics • Average = 50.7 • Std. Dev. = 16.3 4 Frequency (%) 3.5 3 2.5 2 1.5 1 0.5 0 20 25 30 35 40 45 50 55 60 65 70 75 80 85 I could accomplish the same thing with a “temperature dummy” Distribution for Northern Indiana Temperature DN E Temp / 1 51.8 3.9 47.9 1, if North DN 0, if Not 2 1 1 .28 Var Temp / 1 263.11 265.3 432 431 . 20 SDTemp / 1 265.3 16.3 SUMMARY OUTPUT Regression Statistics Multiple R 0.107015815 R Square 0.011452385 Adjusted R Square 0.009153437 Standard Error 16.22029062 Observations Average Temperature for “Not Northern Indiana” Mean x = .28 Var(x) = .20 Temperature 51.8 3.9 DN 432 ANOVA df Regression Residual 1 430 SS 1310.641907 113132.0659 Total 431 114442.7079 Coefficients 51.79651334 Standard Error 0.918293128 t Stat 56.40520632 P-value 8.0168E-201 Lower 95% 49.99161169 Upper 95% 53.60141498 -3.888803317 1.742338706 -2.231944514 0.026133096 -7.313363408 -0.464243226 Intercept North MS 1310.641907 263.0978278 F 4.981576314 Significance F 0.026133096 “Southern Indiana” 7 Frequency (%) 6 5 Sample Statistics • Average = 53.1 • Std. Dev. = 15.7 4 3 2 1 0 20 25 30 35 40 45 50 55 60 65 70 75 80 Indiana 4.5 4 Sample Statistics • Average = 50.7 • Std. Dev. = 16.3 3.5 Frequency (%) 85 3 2.5 2 1.5 1 0.5 0 20 25 30 35 40 45 50 55 60 65 70 75 80 85 Suppose that I repeat the process for Southern Indiana Distribution for Northern Indiana Temperature DS E Temp /1 49.3 3.8 53.1 1, if South DS 0, if Not 2 1 .39 1 Var Temp /1 262.7 1 264.2 432 431 .24 SD Temp /1 264.2 16.3 Mean x = .39 Var(x) = .24 SUMMARY OUTPUT Regression Statistics Multiple R 0.114475818 R Square 0.013104713 Adjusted R Square 0.010809608 Standard Error 16.20672908 Observations 432 Average Temperature for “Not Southern Indiana” Temperature 49.3 3.8 DS ANOVA df Regression Residual Total Intercept South 1 430 431 SS 1499.738841 112942.969 114442.7079 MS 1499.738841 262.6580675 F 5.709852566 Significance F 0.017299906 Coefficients 49.22994863 3.822021173 Standard Error 0.997455223 1.59948673 t Stat 49.35554748 2.389529779 P-value 3.1797E-179 0.017299906 Lower 95% 47.26945418 0.678236097 Upper 95% 51.19044307 6.96580625 Temperature N DN S DS 1, if North DN 0, if Not 1, if South DS 0, if Not Northern Indiana E Temp / DN 1 50.3 2.7 47.6 Southern Indiana E Temp / DS 1 50.3 2.9 53.2 SUMMARY OUTPUT Average Temperature for Central Indiana Regression Statistics Multiple R 0.134093809 R Square 0.017981149 Adjusted R Square 0.013402973 Standard Error 16.18547051 Observations 432 Temperature 50.3 2.7 DN 2.9 DS ANOVA df Regression Residual Total Intercept North South 2 429 431 SS 2057.811439 112384.8964 114442.7079 MS 1028.905719 261.9694555 F 3.927578951 Significance F 0.020403391 Coefficients 50.34616241 -2.728522571 2.900701864 Standard Error 1.255855211 1.869422778 1.717587401 t Stat 40.0891456 -1.459553507 1.68882344 P-value 3.6261E-147 0.145144669 0.09197992 Lower 95% 47.87776753 -6.402890097 -0.475231843 Upper 95% 52.81455728 0.945844954 6.27663557 Be mindful of what hypothesis you are testing! Northern Dummy Only (T-stats in parentheses) Temperature 51.8 3.9 DN (-2.2) Northern Indiana has a different average temperature than the rest of the state Significant Southern Dummy Only (T-stats in parentheses) Temperature 49.3 3.8 DS (2.4) Northern Indiana has a different average temperature than the rest of the state Significant Northern and Southern Dummy (T-Stats in parentheses) Temperature 50.3 2.7 DN 2.9 DS (-1.5) (-1.7) Not Significant!!! Northern and Southern Indiana have a different average temperature than central Indiana Suppose I put in Dummies for all three regions… Temperature N DN C DC S DS 1, if Central DC 0, if Not There is no other region, so we know that DN DC DS 1 One of our assumptions is violated! • • • • • Linear Relationship Multivariate Normality No or Little Multicollinearity No Auto-correlation Homoscedasticity Example: The Famous 2000 Election Al Gore Democrat Pat Buchanan Reform Party George W. Bush Republican Ralph Nader Green Party The Case of Palm Beach County Overall State Results Candidate Vote Total Percentage George W. Bush 2,909,815 49.039 Al Gore 2,909,578 49.035 Ralph Nader 96,844 1.633 Pat Buchanan 17,358 .293 Total 5,933,595 100.000 Palm Beach County Results Candidate Vote Total Percentage George W. Bush 152,954 35.44 Al Gore 269,696 62.48 Ralph Nader 5,564 1.29 Pat Buchanan 3,407 .79 Total 431,621 100.00 Did Pat Buchanan REALLY get 3,407 votes in Palm Beach County The Strategy: Use available data on demographics from the counties in Florida (omitting Palm Beach County) to estimate a relationship between demographics and Pat Buchanan's vote total “Are a function of” B F D Pat Buchanan’s Votes Demographic Statewide Average Palm Beach % Black 15.9% 14.4% % Hispanic 6.3% 9.8% % Over 65 yrs. 16.9% 23.7% % College Degree 13.9% 22.1% Income (in thousand) 26.188 33.518 Observable Demographics Using Palm Beach demographics, forecast Pat Buchanan’s vote total for Palm Beach BPB F DPB Turns out, the best fitting regression was as follows LN P 2 B 2 A65 3 H 4C 5 I Buchanan Votes Total Votes *100 Variable Coefficient Standard Error t - statistic Intercept 2.146 .396 5.48 Black (%) -.0132 .0057 -2.88 Age 65 (%) -.0415 .0057 -5.93 Hispanic (%) -.0349 .0050 -6.08 College (%) -.0193 .0068 -1.99 Income (000s) -.0658 .00113 -4.58 R Squared = .73 Demographic Palm Beach % Black 14.4% % Hispanic 9.8% % Over 65 yrs. 23.7% % College Degree 22.1% Income (in thousand) 33.518 LN P 2.146 .0132B .0415 A65 .0349H .0193C .0658I E LnP / D 2.004 P e 2.004 .134% .00134431,621 578 This would be our prediction for Pat Buchanan’s vote total! +/- 2 Standard Deviation Confidence Interval Var LnP / D .065 SD LnP / D .065 .2556 (25.56%) .134(1.5112) .2025% .002025 431, 621 874 .134 0.4888 .065% .00065 431, 621 280 Demographic Palm Beach % Black 14.4% % Hispanic 9.8% % Over 65 yrs. 23.7% % College Degree 22.1% Income (in thousand) 33.518 LN P 2.146 .0132B .0415 A65 .0349H .0193C .0658I Frequency Event Odds Win the Powerball 1 in 292,000,000 Struck by Lightning 1 in 960,000 Crushed by a Vending Machine 1 in 112,000,000 Becoming a Movie Star 1 in 1,505,000 Having Identical Quadruplets 1 in 15,000,000 7 Standard Deviations from the mean • 1 in 390,882,215,445 280 578 +/1 2 Standard Deviations 874 3, 407 Votes Speaking of Election, who will win this year’s election? VS Let’s ask Ray Fair….he should have a pretty good idea! V 1 G * I 2 P * I 3 Z * I 4 DPER Democratic Share of Two Party Presidential Vote Variable DUR 6 I The Fair Presidential Election Model Description Coefficient Value (T-Statistic) Constant 47.75 (79.15) Average Annual Growth in Real Per Capita GDP (First three quarters of election year) .667 (5.79) P Average Annual Growth in GDP Deflator (for first 15 quarters of the current administration) -.690 (-2.34) Z # of Quarters of the current administration with annual real GDP per capita growth exceeds 3.2% .968 (4.03) DPER 1 if Democratic incumbent is running again, -1 if Republican incumbent is running again, otherwise, 0 3.01 (2.14) DUR 1 (-1) if Democrat (Republican) has been in office for 2 terms. 0 if ether party in for 1 term -3.80 (-3.10) 1 If a democrat is the incumbent, -1 if a Republican is the incumbent -1.56 (-0.71) Const G Ray Fair Yale University 5 I R Squared = .912 Democratic Share of Two Party Presidential Vote V 1 G * I 2 P * I 3 Z * I 4 DPER 5 DUR 6 I Predictors for the 2016 Presidential Election Description Value Average Annual Growth in Real Per Capita GDP (First three quarters of election year) .87% (Estimated) Average Annual Growth in GDP Deflator (for first 15 quarters of the current administration) 1.28% # of Quarters of the current administration with annual real GDP per capita growth exceeds 3.2% 3 Quarters out of 15 1 if Democratic incumbent is running again, -1 if Republican incumbent is running again, otherwise 0 0 (No) 1 (-1) if Democrat (Republican) has been in office for 2 terms. 0 if ether party in for 1 term 1 (2 Terms) Democrat Incumbent 1 (Yes) Since 1908, the Fair Model has correctly predicted 23 out of 27 elections (85% Success Rate) He predicted every election between 1908 and 1960 correctly!! Ray Fair Election Year Candidates Predicted Democrat Predicted Republican Actual Democrat Actual Republican 1960 Kennedy (D) vs. Nixon (R) 51.3 48.7 50.1 49.9 1964 Johnson (D) vs. Goldwater (R) 55.3 44.7 61.3 38.7 1968 Humphrey (D) vs. Nixon (R) 49.0 51 49.6 50.4 1972 McGovern (D) vs. Nixon (R) 39.9 60.1 38.2 61.8 1976 Carter (D) vs. Ford (R) 49.2 50.8 51.1 48.9 1980 Carter (D) vs. Reagan (R) 46.6 53.4 44.7 55.3 1984 Mondale (D) vs. Reagan (R) 42.8 57.2 40.1 59.9 1988 Dukakis (D) vs. Bush (R) 45.0 55 46.0 54 1992 Clinton (D) vs. Bush (R) 48.8 51.2 53.6 46.4 1996 Clinton (D) vs. Dole (R) 53.2 46.8 54.7 45.3 2000 Gore (D) vs. Bush (R) 49.3 50.7 50.3 49.7 2004 Kerry (D) vs. Bush (R) 45.5 54.5 48.8 51.2 2008 Obama (D) vs. McCain (R) 55.2 44.8 53.7 46.3 2012 Obama (D) vs. Romney (R) 49.0 51.0 51.3 48.7 Democratic Share of Two Party Presidential Vote V 1 G * I 2 P * I 3 Z * I 4 DPER 5 DUR 6 I And the winner is….. VS 45.0% 55.0% prediction error 3% Congratulations to President Elect Trump! Cross Sectional Regressions vs. Time Series Regressions y y 0 1 2 3 A cross sectional regression focusses on variations across locations (or other factors) at a single point in time t 0 1 2 3 t A time series regression focusses entirely on variation across time (ignoring variation across location (or other factors) Luckily, all the tools from the cross-sectional analysis carries over to time series analysis yt t t ŷ ŷ ˆ ˆ t Time indicator (t = 0,1,2,3,4) All the properties of the estimates are the same! E ˆ E ˆ ̂ 1 t 2 Var ˆ 2 N N t 2 2 Var ˆ N t2 0 t We can forecast just as we did before as well ŷ In Sample yˆi 2 ŷ ˆ ˆ t The further out into the future you try and predict, the bigger your errors get! yˆi 2 2 t t 1 0 2 Var y0 / x0 ˆ 1 N N 1 Var t ̂ t t The longer your sample period is, the better you do!! Example: South Bend Daily High Temperature (2013 – 2014) 16 14 Frequency (%) 12 10 8 Sample Statistics • Average = 57.8 • Std. Dev. = 21.9 • Median = 60.1 • Mode = 84 • High = 97 • Low = 1.2 6 4 2 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 Temperature Here’s the time series representation for Temperature in South Bend 120 100 80 60 40 20 0 1/2013 4/2013 7/2013 10/2013 1/2014 4/2014 7/2014 10/2014 Temp 1t Daily Observations t = 0 is 1/1/2013 SUMMARY OUTPUT Regression Statistics Multiple R 0.09 R Square 0.01 Adjusted R Square 0.01 Standard Error 21.82 Observations 730.00 Temperature rises by .01 degrees per day (3.65 degrees per year)!!! Temp 54.3 .01t ANOVA df Regression Residual Total Intercept Time 1.00 728.00 729.00 SS 2934.51 346550.24 349484.76 Coefficients Standard Error 54.36 1.61 0.01 0.00 MS 2934.51 476.03 F Significance F 6.16 0.01 t Stat P-value 33.69 0.00 2.48 0.01 Lower 95% 51.20 0.00 Global Warming! Somebody call Al Gore!!! Upper 95% 57.53 0.02 120 Obviously, we have work to do! 100 80 Temp 54.4 .01t 60 40 20 0 1/2013 4/2013 7/2013 10/2013 1/2014 4/2014 7/2014 10/2014 My Sample is from 1/1/20013 to 12/31/2014. Suppose that I want to predict the temperature for my birthday this year Temp. Date Time 1/1/2013 0 12/31/2014 729 9/28/2016 1366 In Sample E Temp / t 1,366 54.36 .011366 68.0 2 1 1,366 365 Var Temp / t 1,366 476 1 491.3 730 729 44, 469 SD Temp / t 1,366 22.1 Out of Sample 120 112.2 100 80 68 60 40 23.8 20 0 1/1/2013 t=0 12/31/2013 t = 729 09/28/2016 t = 1,366 Time 120 Q1 Q4 Q3 Q2 There is obviously a regular pattern here!!! 100 80 60 40 20 0 1/2013 4/2013 7/2013 10/2013 1/2014 4/2014 7/2014 10/2014 Lets Use some quarterly dummies Dummies for quarters 1,2,3 Temp 1t 2 D1 3 D2 4 D3 SUMMARY OUTPUT Regression Statistics Multiple R 0.83 R Square 0.69 Adjusted R Square 0.69 Standard Error 12.28 Observations 730.00 Temp. in 4th Quarter Daily Observations T = 0 is 1/1/2013 Temp 49.2 .004t 14.8D1 22.8D2 31.4 D3 ANOVA df Regression Residual Total Intercept Time Q1 Q2 Q3 4.00 725.00 729.00 SS 240117.82 109366.94 349484.76 Coefficients Standard Error 49.21 1.53 -0.00368 0.00 -14.75 1.45 22.76 1.36 31.43 1.30 MS 60029.45 150.85 F Significance F 397.94 0.00 t Stat P-value 32.13 0.00 -1.49 0.14 -10.14 0.00 16.72 0.00 24.17 0.00 Lower 95% 46.20 -0.01 -17.60 20.09 28.88 Upper 95% 52.22 0.00 -11.89 25.44 33.99 Looks like global warming is just a myth after all! 120 This look a lot better! Temp 49.2 .004t 14.8D1 22.8D2 31.4 D3 100 80 60 40 20 0 1/2013 4/2013 7/2013 10/2013 1/2014 4/2014 7/2014 10/2014 My Sample is from 1/1/20013 to 12/31/2014. Suppose that I want to predict the temperature for my birthday this year Date Time 1/1/2013 0 12/31/2014 729 9/28/2016 1366 E Temp / t 1,366 49.21 .003 1366 31.43 76.5 2 1 1,366 365 Var Temp / t 1,366 150.8 1 155.7 730 729 44, 469 SD Temp / t 1,366 12.4 Temp. Out of Sample In Sample 101.3 Average (Q3,Q4) Fourth Quarter 85.6 69.9 76.5 60.8 45 51.7 36 20.27 1/1/2013 t=0 12/31/2013 t = 729 12/31/2015 t = 729 09/28/2016 t = 1,366 Time Just as with cross sectional analysis, I can capture non-linear relationships by a transformation of the data Linear Growth (unit change per unit time) y 1t Linear Growth (percentage change per unit time) ln y ln e t y e t Take logs on both sides ln y t Take logs on both sides Ln Temp 1t Daily Observations t = 0 is 1/1/2013 SUMMARY OUTPUT Temperature rises by .021 percent per day (7.6% per year)!!! Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations ln Temp 3.89 .00021t 0.09 0.01 0.01 0.49 730.00 Global Warming! Somebody call Al Gore!!! ANOVA df Regression Residual Total Intercept Time 1.00 728.00 729.00 SS 1.37 172.58 173.95 Standard Coefficients Error 3.89 0.04 0.00021 0.00 MS F 1.37 0.24 5.76 t Stat 107.95 2.40 P-value 0.00 0.02 Significance F 0.02 Lower 95% Upper 95% 3.82 0.00 3.96 0.00 Lets Use some quarterly dummies Dummies for quarters 1,2,3 ln Temp 1t 2 D1 3 D2 4 D3 Temp. in 4th Quarter SUMMARY OUTPUT Daily Observations T = 0 is 1/1/2013 Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 0.76 0.58 0.57 0.32 730.00 ln Temp 3.87 .00014t .41D1 .42 D2 .55D3 ANOVA df Regression Residual Total Intercept Time Q1 Q2 Q3 4.00 725.00 729.00 SS 100.34 73.61 173.95 Standard Coefficients Error 3.87 0.04 -0.0001359 0.00 -0.41 0.04 0.42 0.04 0.55 0.03 MS F 25.09 0.10 247.08 t Stat 97.37 -2.13 -10.79 11.75 16.42 P-value 0.00 0.03 0.00 0.00 0.00 Significance F 0.00 Lower 95% 3.79 0.00 -0.48 0.35 0.49 The second quarter (Apr-June) is 42% warmer than the 4th quarter (Oct-Dec) Let’s Compare the forecasts For September 28, 2016 (t=1,366) No Dummies Quarterly Dummies ln Temp 3.89 .00021t ln Temp 3.87 .00014t .41D1 .42 D2 .55D3 E ln Temp / t 1,366 3.89 .000211366 4.18 e4.18 65.4 Best Guess 2 1 1,366 365 Var Temp / t 1,366 .24 1 .25 730 729 44, 469 SD Temp / t 1,366 .50 (50%) Prediction: I’m 95% sure the temperature will be between 32.7 and 130.8 degrees E ln Temp / t 1,366 3.87 .00014 1366 .55 4.61 e4.61 100.5 Best Guess 2 1 1,366 365 Var Temp / t 1,366 .10 1 .103 730 729 44, 469 SD Temp / t 1,366 .32 (32%) Prediction: I’m 95% sure the temperature will be between 36.1 and 166.2 degrees The Moral of the Story… The exponential growth model is much more sensitive to parameter changes than the liner model!! 12 .30 .25 .20 10 8 600 .30 500 400 6 300 4 2 0 0 3 6 9 12 15 18 200 .25 100 .20 0 0 Linear 3 6 9 Exponential 12 15 18 Gas Price: US Regular all Formulations 4.500 4.000 Exponential Trend Dollars per Gallon 3.500 3.000 2.500 $2.37 2.000 1.500 1.000 0.500 0.000 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 Source: US. Energy Information Administration Let’s assume exponential growth Price e 1t 2 D1 3 D2 4 D3 ln Price 1t 2 D1 3 D2 4 D3 SUMMARY OUTPUT Gas prices increase (on average) .45% per month (5.4% per year) Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 0.8929 0.7973 0.7946 0.2073 310 ln Price .14 .0045t .0314 D1 .0568D2 .0692 D3 ANOVA df Regression Residual Total Intercept Time Q1 Q2 Q3 SS MS 4 305 309 51.5481 13.1081 64.6562 12.8870 0.0430 Coefficients -0.1378 0.0045 -0.0314 0.0568 0.0692 Standard Error 0.0308 0.0001 0.0332 0.0332 0.0334 t Stat -4.4674 34.4261 -0.9452 1.7103 2.0711 Gas prices are (on average) 9.6% higher in the 3rd quarter (July-Sept) than they are in the 4th Quarter (Oct-Dec) Using the regression to seasonally adjust the data ln Price 2 D1 3 D2 4 D3 1t ln P e ln P P Seasonally Adjusted Price ln P 1t 2 D1 3 D2 4 D3 Just to check, suppose that I run a regression with my seasonally adjusted price… Intercept Time Q1 Q2 Q3 Coefficients -0.1378 0.0045 0.0000 0.0000 0.0000 Standard Error 0.0308 0.0001 0.0332 0.0332 0.0334 t Stat -4.4674 34.4261 0.0000 0.0000 0.0000 Let’s look at the residuals for a moment… ln Price 1t 2 D1 3 D2 4 D3 Percentage difference between predicted price and actual price ln Pˆ 0.60 0.40 0.20 0.00 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 -0.20 -0.40 -0.60 What could cause a deviation of gas prices from trend? -0.80 2014 Let’s look at the residuals for a moment… Recession Recession 0.60 Recession 0.40 0.20 0.00 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 -0.20 -0.40 -0.60 -0.80 It could be because of changes in demand (i.e. the business cycle) Let’s look at the residuals for a moment… 0.60 Recession Recession Recession 140.00 Oil Residual 120.00 0.20 100.00 0.00 80.00 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 -0.20 60.00 -0.40 40.00 -0.60 20.00 -0.80 0.00 Or, It could be because of changes in supply (i.e. oil) Price of Oil 0.40
© Copyright 2026 Paperzz