Forecasting Techniques: Single Equation Regressions

Forecasting Techniques:
Single Equation Regressions
Su, Chapter 10, section III
Regression Models
• Represent functional relationships between
economic variables
• Usually estimated by OLS techniques
• General Form
Yt = b0 + b1X1t + b2X2t + … + bkX1k + ut
Yt : Dependent Variable Xit‘s : Explanitory Variables
bi‘s: Parameters
ut : Stochastic Term
Regression: Forecasting Ability
• Depends on the structure of the regression
equation, including
– Degrees of Freedom: Should be > 30
– Statistical Significance and sign of parameters
– High Goodness of Fit
• Low Standard Error of Estimate
• High R2
Forecasting with Regression Models
• Depends on choice of X’s, which is
generally guided by economic theory
– Example: According to the IS/LM model, what
variables would be useful for forecasting GDP?
• Generally speaking, more data should be
preferred
Some Useful Concepts I
• Ex Post Forecast: Extrapolation goes
beyond sample period but not into future
– Example: Sample period for regression is 19701997, forecast through 2000
• Ex Ante Forecast: Extrapolation extends
into future
– Example: Sample period is 1990:1-2001:1,
forecast through 2002:1
Some Useful Concepts II
• Predictive power of a regression model depends
on its lag structure
• Conditional Forecasts: Some contemporaneous
explanatory variables appear on RHS
– Must also predict values for these contemporaneous
explanatory variables
• Unconditional Forecasts: Only lagged explanatory
variables appear on RHS
Some Useful Concepts III
• Point Forecast: Predicts a single number
– Example: The Dow will be 1100 on July 1
• Interval Forecast: Shows a numerical
interval in which the actual value can be
expected to fall
– Example: The Dow will be between 1000 and
2000 on July 1 with 99% probability
Example: Automobile Sales
• Want to replicate the regression results in
section 4
• Use the regression data analysis tool to
replicate the results on page 348
• Model: Yt = a + bXt + ut
• Y: Automobile Sales X: New Car Price
• Linear Demand Curve
Demand for New Cars
130.0
120.0
Price
110.0
100.0
90.0
80.0
70.0
60.0
50.0
4000
5000
6000
7000
Sales
8000
9000
10000
Procedure
• Step 1: Copy the sales and price data to a
new worksheet
• Step 2: Start the regression data analysis
tool
• Specify correct ranges
Regression Output
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.59
R Square
0.35
Adjusted R Square
0.31
Standard Error
1013.7142
Observations
20
ANOVA
df
Regression
Residual
Total
Intercept
X Variable 1
SS
9794932.261
18497098.29
28292030.55
MS
9794932
1027617
Coefficients Standard Error
10200.23
887.95
-30.2750
9.8062
t Stat
11.49
-3.09
1
18
19
Interpreting Regression Results
• Yt = 10,200.23 - 30.275Xt
– Parameter on X: -30.27
– t-statistic: 3.08
(10.20)
Ex Post Point Forecasts
• To make an ex post forecast for 1991,
simply plug the actual value of the price
index for 1991 into (10.20) - Put in D22
• Yt = 10,200.23 - 30.275(125.3) = 6,406.77
• Note that ex post forecasts can be done for
any year in the period for which data are
available
Evaluation of Ex Post Forecasts
• Can also evaluate forecasts within sample
• Copy the formula from D22 into D21
• Where in the regression output can you find
this number?
• Fill in the rest of column D with the Ex Post
Forecasts and plot the actual sales and the
Ex Post forecasts
Actual Sales and Ex Post Forecast
12000
10000
8000
6000
4000
2000
0
1971
1973
1975
1977
1979
1981
1983
1985
1987
1989
1991
Summary Statistics
• Already know how to calculate, but in this
case the regression function has already
done some of the heavy lifting
• We saw where the Ex Post forecasts could
be found, what about the forecast errors?
Residuals and Forecast Errors
• In the terminology of econometrics, ex post
forecast errors are called residuals
• The OLS estimator is designed to minimize
the sum of the residuals squared - OLS
estimates minimize MSE and RMSE
• To find value of MSE, look on the ANOVA
table, for the row labeled Residual and
under the column labeled SS
Ex Ante Point Forecasts
• To generate these, must forecast X, as these
forecasts are conditional on unknown future
values (must pretend that the present is
1991 in this case)
• How should X be forecast?
Ex Ante Point Forecasts: Example
• Step 1: Extend the time column to 1994
• Step 2: Calculate the forecasted X’s using
the same change naïve forecasting model in
column C
• Step 3: Using the formula from above,
calculate the Ex Ante forecasts for 1992 1994 and chart them
Ex Post and Ex Ante Forecasts
12000
10000
8000
6000
4000
2000
0
1971 1973 1975 1977 1979 1981 1983 1985 1987 1989 1991 1993
Interval Forecasts
• Instead of a line, can also display the range
in which the forecast values will probably
fall
• These are called interval forecasts and are
based on the variance of the regression
• Based on (10.18)
Interval Forecasts: Example
• Must calculate average of X and
sum of X - average(X) = x
• First term of (10.18) is just ex ante forecast
• t0.025 is just a value from a table in a
statistics book
• se has already been calculated by the
regression program
• Text has wrong numbers
Forecast Interval
Ex Post and Ex Ante Forecasts
12000
10000
8000
6000
4000
2000
0
1971 1973 1975 1977 1979 1981 1983 1985 1987 1989 1991 1993
Autoregressive Models
• Even though they use sophisticated
statistical techniques, these models are
extrapolations
• The explanatory variables (X’s) are lagged
values of the dependent variable
• Assumes that the time path of a variable is
self-generating
• Also called the “Chain Principle”
AR Models: Functional Forms
• General:
Xt = f(Xt-1,Xt-2,Xt-3,...,b1, b2,, b3...,ut)
– ut : residual term, captures random components
– Must specify form and lag length
• Linear form, lag length k
Xt = b0 + b1 Xt-1,+ b2Xt-2,+ …+ bkXt-k + ut
Note that both No Change and Same Change
naïve forecasts are special cases of this
AR Models: Determining Lag Length
• The general form has an infinite number of
parameters, but we never have this much
data - model must be restricted to be used
• Assume that the impact of some distant Xt-j
are trivial and insignificant
• Rule of thumb: don’t use a k >4 because of
econometric problems
Dummy Variables
• Requires no additional economic data
• Was discussed in chapter 2
• Two Types:
– Trend
– Seasonal / annual
Dummy Variables: Trends
• Uses a time variable T (=1,2,3,…) and
extrapolates X along its time path
Linear:
Xt = a + bTt
Exponential:
Reciprocal:
Parabolic:
X = ea + bTt
X = 1/[a + bTt]
X = b0 + b1 Tt,+ b2T2t
Dummy Variables: Seasonal
• These are “Intercept shifters” - they allow
the intercept term b0 to vary systematically
• Single Equation Model with Quarterly
Dummies:
Yt = g1Q1+g2Q2+g3Q3+g4Q4+b1X1t+…+bkX1k+ut
• Can also use monthly dummies if Y is
monthly
• Get a different forecast for each quarter
Other Dummy Variables
• Dummy variables can be useful tools in
forecasting
• Recall from the earlier section that the
single equation forecast for new car sales
was high for 1991 because it was a
recessionary year
• Can use a dummy variable for recessions to
improve this forecast
Example: Recession Dummy
• Model: Yt = a + bXt + gDR + ut
• Y: Automobile Sales X: New Car Price
• DR: Recession Dummy, = 1 in years with
troughs
• Add new sheet to spreadsheet, copy Year,
New Car Sales, New Car Price
• Look at Table 7.1, p. 236 to create dummy
Empirical Results
Yt = 10,699.87 - 31.66Xt - 1893.29DR
(571.918) (6.233) (360.237)
Forecast with Recession Dummy
12000
10000
8000
Actual
6000
Forecast
4000
2000
0
1971
1973
1975
1977
1979
1981
1983
1985
1987
1989
1991
Forecast Comparison
No Dummy
6406.77
1991 F
1013.71
SEE
18,497,098.29
SSR
0.35
R^2
Dummy
4839.97
643.83
7,046,970
0.75
Exercise: AR Models
• Data: U.S. Population 1948-1990
• Available in a text file on Web page
(tab2-1.txt)
• Step 1: Read file into Excel
Exercise: Creating Lag Variables
• Best way is with formulas, although could
copy as well
• Population data are in column 2
• Step 2: Label columns 3-6 “Lag1”, “Lag2”,
“Lag3” and “Lag4”
• What value goes in C3? D4? E5? F6?
Year Pop
1948 147.20
1949 149.77
1950 152.27
1951 154.88
1952 157.55
Lag1
Lag2
Lag3
Lag4
147.20
147.20
147.20
147.20
• C3 is the Lag1 value for 1949, which is the
actual population in 1948 - population
lagged one year
• D4 is the Lag2 value for 1950, which is the
actual population in 1948 - population
lagged two years
• Step 3:Fill in rest of lags using formulas
Exercise: AR Regressions
• Step 4: Replicate the regression results on
page 352. Note: Watch sample period
• Step 5: Calculate Ex Post forecasts for the
sample period and RMSE for each method
– Which has the lowest RMSE?
• Step 6: Calculate Ex Ante population
forecasts through 2025 and compare to
Table 10.4
Exercise: Trend Forecasting
• Step 1: Create trend and trend squared
variables in the spreadsheet
• Step 2: Replicate the three regression
results shown on page 354
• Step 3: Calculate a 100 year ahead Ex Ante
forecast of U.S. population using each, and
chart the time paths
• How accurate are these forecasts