overhead - 02 Regression Forecasting

Simetar Download
• Download three files at:
– http://ge.tt/3w2lhlr
• Before installing Simetar do the following:
– Download and install the latest Service Pack for
Office 2003 or 2007 if using these systems
– We do not run on Excel 2010 yet, install 2007
– If you have a Mac you must have Office and
use Parallels to allow you to run Excel
• Double click the Simetar.MSI file and use
the license code provided in class to install
Simetar on your computer
Multiple Regression Forecasts
• Materials for this lecture
• Demo
Lecture 2 Multiple Regression.XLS
• Read Chapter 15 Pages 8-9
• Read all of Chapter 16’s Section 13
Define Data Patterns
• A time series is a chronological sequence of observations
for a particular variable over fixed intervals of time
–
–
–
–
–
Daily
Weekly
Monthly
Quarterly
Annual
• Six patterns for time series data (data we work with is
time series data because use data generated over time).
–
–
–
–
–
–
Trend
Cycle
Seasonal variability
Structural variability
Irregular variability
Black Swans
Patterns in Data Series
Trend
Seasonal
periods
Cycle
0
10
20
J
J
J
J months
Mixed
30
40
years
years
Trend Variation
• Trend a general up or down movement in the
values of a time series over the historical period
of observation
• Most economic data contains at least one trend
– Increasing, decreasing or flat trends
• Trend represents long-term growth or decay
• Trends caused by strong underlying forces, such
as:
–
–
–
–
–
–
Technological changes
Change in tastes and preferences
Change in income and population
Market competition
Inflation and deflation
Policy changes
Cyclical Variation
• Cycle is a recurring up and down movement
around a trend
• Cycles persist for 2 to 20 years from peak to peak
– Business cycle is most notorious
– Agriculture examples are cattle and hog cycles
• Two components to a cycle
– Expansion
– Contraction
• Cycles caused by
–
–
–
–
Changes in tastes and preferences
Economic activity (inflation and deflation)
Production cycles (animals)
Moon and sunspots?
• Cycles vary in length and magnitude
Seasonal Variation
• Periodic (cyclical) patterns in a time series that complete
the cycle within a year
• Caused by
– Weather, seasons of the year
– Production/marketing patterns
– Customs and holidays
• Agricultural production causes seasonal variation of prices
• Holidays cause retail demand and sales to vary on a
seasonal pattern
–
–
–
–
–
–
Thanksgiving – turkey
St. Patrick’s Day – corned beef and green beer
Easter – ham
Winter – cheese
Summer – ice cream and gasoline sales
Holidays and high temperatures – crushed ice
Structural Variation
• Variables you want to forecast are often
dependent on other variables
Qt. Demand = f( Own Price, Competing Price,
Income, Population, Season, Tastes &
Preferences, Trend, etc.)
Y = a + b (Time)
• Structural models will explain most
structural variation in a data series
– Even when we build structural models, the
forecast is not perfect
– A residual remains as the unexplained portion
Irregular Variation
• Erratic movements in time series that
follow no recognizable regular pattern
– Random, white noise, or stochastic movements
• Risk is this non-systematic variability in the
residuals
• This risk leads to Monte Carlo simulation of
the risk for our probabilistic forecasts
– We recognize risks cannot be forecasted
– Incorporate risks into probabilistic forecasts
– Provide forecasts with confidence intervals
Black Swans (BSs)
• BSs low probability events
– An outlier “outside realm of reasonable expectations”
– Carries an extreme impact
– Human nature causes us to concoct explanations
• Black swans are an example of uncertainty
– Uncertainty is generated by unknown probability
distributions
– Risk is generated by known distributions
• Recent recession was a BSs
– A depression is a BSs
– Dramatic increases of grain prices in 2006 and 2007
– Dramtaic increase in cotton price in 2010
Multiple Regression Forecasts
• Structural model of the forecast variable is used
when suggested by:
–
–
–
–
Economic theory
Knowledge of the industry
Relationship to other variables
Economic model is being developed
• Examples of forecasting:
–
–
–
–
–
Planted acres – inputs sales businesses need this
Demand for a product – sales and production
Price of corn or cattle – feedlots, grain mills, etc.
Govt. payments – Congressional Budget Office
Exports or trade flows – international ag. business
Multiple Regression Forecasts
• Structural model
Ŷ = a + b1 X1 + b2 X2 + b3 X3 + b4 X4 + e
Where Xi’s are exogenous variables that explain the
variation of Y over the historical period
• Estimate parameters (a, bi’s, and SEPe)
using multiple regression (or OLS)
– OLS is preferred because it minimizes the sum
of squared residuals
– This is the same as reducing the risk on Ŷ as
much as possible, i.e., minimizing the risk for
your forecast
Multiple Regression Model
PltAc t = f(Price t-1, Plt t-1, IdleAcre t , X t )
HarvAc t = f(PltAc t )
Yield t = f(Price t , Yield t-1)
Prod t = Yield t * HarvAc t
Supply t = Prodt + EndStock t-1
Price t = a + b Supply t
Domestic D t = f(Price t , Income / pop t , Z t )
Export D t = f(Price t , Yt )
End Stock t = Supply t - Domestic D t - Export D t
Steps to Build Multiple Regression Models
• Plot the Y variable in search of: trend, seasonal, cyclical
and irregular variation
• Plot Y vs. each X to see the structural relationship and how
X may explain Y; calculate correlation coefficients to Y
• Hypothesize the model equation(s) with all likely Xs to
explain the Y, based on knowledge of model & theory
• Forecasting wheat production, model is
Plt Act = f(E(Pricet), Plt Act-1, E(PthCropt), Trend, Yieldt-1)
Harvested Act = a + b Plt Act
Yieldt = a + b Tt
Prodt = Harvested Act * Yieldt
• Estimate and re-estimate the model
• Make the deterministic forecast
• Make the forecast stochastic for a probabilistic forecast
US Planted Wheat Acreage Model
Plt Act = f(E(Pricet), Yieldt-1, CRPt, Yearst)
• Statistically significant betas for Trend
(years variable) and Price
• Leave CRP in model because of policy
analysis and it has the correct sign
• Use Trend (years) over Yieldt-1, Trend
masks the effects of Yield
Multiple Regression Forecasts
• Specify alternative values for X and forecast
the Deterministic Component
• Multiply Betas by their respective X’s
– Forecast Acres for alternative Prices and CRP
– Lagged Yield and Year are constant in scenarios
Multiple Regression Forecasts
• Probabilistic forecast uses ŶT+I and SEP or Std
Dev and assume a normal distrib. for residuals
ỸT+i = ŶT+i + NORM(0, SEPT)
or
ỸT+i = NORM(ŶT+i , SEPT)
Multiple Regression Forecasts
• Present probabilistic forecast as a PDF with 95%
Confidence Interval shown here as the bars about
the mean in a probability density function (PDF)
Growth Forecasts
• Some data display a growth pattern
• Easy to forecast with multiple regression
• Add T2 variable to capture the growth or
decay of Y variable
• Growth function
Ŷ = a + b1T+ b2T2
Log(Ŷ) = a + b1 Log(T)
Double Log
Log(Ŷ) = a + b1 T
Single Log
See Decay Function worksheet for several
examples for handling this problem
Multiple Regression Forecasts
Single Log Form
Log (Yt) = b0 + b1 T
Double Log Form
Log (Yt) = b0 + b1 Log (T)
Decay Function Forecasts
• Some data display a decay pattern
• Forecast them with multiple regression
• Add an X variable to capture the growth or
decay of forecast variable
• Decay function
Ŷ = a + b1(1/T) + b2(1/T2)
Forecasting Growth or Decay Patterns
• Here is the regression result for estimating a decay
function
Ŷt = a + b1 (1/Tt)
or
Ŷt = a + b1 (1/Tt) + b2 (1/Tt2)
Observed and Predicted Values for KOV
150
100
50
0
-50
Predicted
Lower 95% Predict. Interval
Lower 95% Conf. Interval
Observed
Upper 95% Predict. Interval
Upper 95% Conf. Interval
Multiple Regression Forecasts
• Examine a structural regression model that
contains Trend and an X variable
Ŷ = a + b1T + b2Xt does not explain all of the
variability, a seasonal or cyclical variability may be
present, if so need to remove its effect
Goodness of Fit Measures
• Models with high R2 may not forecast well
– If add enough Xs can get high R2
T

R2 = 1 -
e 2t
t=1
T

(Yt - Y)2
t=1
– R-Bar2 is preferred as it is not affected by no. Xs
• Selecting based on highest R2 same as using
minimum Mean Squared Error
MSE =(∑ et2)/T
Goodness of Fit Measures
• R-Bar2 takes into account the effect of adding Xs
R
2
= 1 - (s
2
T
/ [( (Yt - Y) ) / (T - 1)])
2
t=1
where s2 is the unbiased estimator of the
regression residuals
s2
T
T
= (
) * [( e 2t ) / T]
T-k
t=1
and k represents the number of Xs in the model
Goodness of Fit Measures
• Akaike Information Criterion (AIC)
T
2k
AIC = exp ( )  ( e 2t / T)
T
t=1
3.5
SIC
3.0
T
k
SIC = T ( )  ( e 2t / T)
T
t=1
• For T = 100 and k goes from 1 to 25
Penalty Factor
• Schwarz Information Criterion (SIC)2.5
2.0
AIC
1.5
s2
1.0
0.5
.05
.10
.15
k/T
• The SIC affords the greatest penalty for just adding Xs.
• The AIC is second best and the R2 would be the poorest.
.20
.25
Goodness of Fit Measures
• Summary of goodness of fit measures
– SIC, AIC, and S2 are sensitive to both k and T
– The S2 is small and rises slowly as k/T increases
– AIC and SIC rise faster as k/T increases
– SIC is most sensitive to k/T increases
Goodness of Fit Measures
• MSE works best to determine best model for “in
sample” forecasting
• R2 does not penalize for adding k’s
• R-Bar2 is based on S2 so it provides some penalty
as k increases
• AIC is better then R2 but SIC results in the most
parsimonious models (fewest k’s) R 2