Multivariate component calculated from residuals

Materials for Lecture 17
•
•
•
•
Chapter 7 – Study this closely
Chapter 16 Sections 3.9.1-3.9.7 and 4.3
Lecture 17 Multivariate Empirical Dist.xlsx
Lecture 17 Multivariate Normal Dist.xlsx
Multivariate Probability Distributions
• Multivariate (MV) Distribution --Two or
more random variables that are correlated
– Can be MV Normal
– Or MV EMP
– Or MV Beta
– Or MV (Normal for X1 and EMP for X2)
• Univariate distribution we have many
distributions (one for each random
variable)
Parameter Estimation for MV Dist.
• Data were generated contemporaneously
– Output observed each year or month,
– Prices observed each year for related commodities
• Corn and sorghum used interchangeably for animal
feed so prices are related
• Steer and heifer prices are related
• Fed steer price and Feeder steer prices are related
– Supply and demand forces affect prices similarly, bear
market or bull market; prices move together
• Prices for tech stocks move together
• Prices for an industry or sector’s stocks move together
Different MV Distributions
• Multivariate Normal distribution – MVN
• Multivariate Empirical – MVE
• Multivariate Mixed where each variable is
distributed differently, such as
– X ~ Uniform
– Y ~ Normal
– Z ~ Empirical
– R ~ Beta
– S ~ Gamma
Sim MV Distribution as Independent
• If correlation is ignored when random
variables are correlated, results are biased:
~
~
• If Z = Ỹ1 + Ỹ2 OR Z = Ỹ1 * Ỹ2 and the
model is simulated without correlation, so
ρ1,2 =0
– But the true ρ1,2 > 0 then
the model will
~
understate the risk for Z
– But the true ρ1,2 < 0 then
the model will
~
overstate the risk for Z
• If
~
Z~
= Ỹ 1 * Ỹ2
– The Mean of Z is biased, as well
Parameters for a MVN Distribution
• Deterministic component
– Ŷij -- a vector of means or predicted values for the period i to
simulate all of the j variables, for example:
Ŷij = ĉ0 + ĉ1 X1 + ĉ2 X2
• Stochastic component
– êji
-- a matrix of residuals from the predicted or mean values for
each (j) of the M random variables
êji = Yij – Ŷij and the Std Dev of the residuals σêj
• Multivariate component calculated from residuals
– Covariance matrix (Σ) for all M random variables in the distribution
MxM covariance matrix (in the general case use correlation matrix)
– Estimate the covariance (or correlation) matrix using residuals
about the forecast (or the deterministic component)
σ211 σ12 σ13 σ14
Σ =
σ222 σ23 σ24
σ233 σ34
σ244
1
OR
Ρ=
ρ12 ρ13 ρ14
1 ρ23 ρ24
1
ρ34
1
13
3 Variable MVN Distribution
• Deterministic component for three random variables
– Ĉi = a + b1Ci-1
– Ŵi = a + b1Ti + b2 Wi-1
– Ŝi = a + b1Ti
• Stochastic component
– êCi = Ci – Ĉi
– êWi = Wi – Ŵi
– êSi = Si – Ŝi
• Multivariate component calculated from the residuals
σ2cc
σcw
σcs
Σ =
σ2ww
σws
σ2ss
Simulating MVN in Simetar
• One Step procedure for a 4 variable
Highlight 4 cells if the distribution is for 4 variables, type
=MVNORM( 4x1Means Vector, 4x4 Covariance Matrix)
=MVNORM( A1:A4 , B1:E4)
Control Shift Enter
where:
the 4 means or forecasted values are in column A
rows 1-4,
covariance matrix is in columns B-E and rows 1-4
• If you use the historical means, the MVN will
validate perfectly, but only forecasts (simulates)
the future if the data are stationary.
• If you use forecasts rather than means, the
validation test fails for the mean vector.
– The CV will differ inversely from the historical CV as the
means increase or decrease relative to history
Example of Mean vs. Y-Hat Problem for
Validation
180
160
140
120
100
80
60
40
20
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
X
X-Bar
Y-Hat
Simulating MVN in Simetar
• Two Step procedure for a 4 variable MVN
Highlight 4 cells if the distribution is for 4
variables, and type
=CUSD (Location of Correlation Matrix)
Control Shift Enter
=CUSD (B1:E4) for a 4x4 correlation matrix in cells
B1:E4
Next use the individual CUSDs to calculate the
random values, using Simetar NORM function:
For Ỹ1 = NORM( Mean1 , σ1 , CUSD1 )
For Ỹ2 = NORM( Mean2 , σ2 , CUSD2 )
For Ỹ3 = NORM( Mean3 , σ3 , CUSD3 )
For Ỹ4 = NORM( Mean4 , σ4 , CUSD4 )
• Use Two Step if you want more control of the
process
MVN Distribution Validation
• Demonstrate MVN for a distribution with 3 variables
• One step procedure in line 63
• Means in row 55 and covariance matrix in B58:D60
• Validation test shows the random variables maintained historical
covariance
Two Step MVN Distribution
Review Steps for MVN
• Develop parameters
– Calculate averages (and standard deviations used for two
step procedure)
– Calculate Covariance matrix
– Calculate Correlation matrix (Used for Two Step procedure
and for validation of One Step procedure)
• One Step MVN procedure is easier
• Use Two Step MVN procedure for more
control of the process
• Validate simulated MVN values vs. historical
series
– If you use different means than in history, the validation test
for means vector WILL fail
Parameters for MV Empirical
•
•
•
•
Step I Deterministic component for three random variables
– Ĉi = a + b1Ci-1
– Ŵi = a + b1Ti + b2 Wi-1
– Ŝ i = a + b1 T i
Step II Stochastic component calculated from residuals
– êCi = Ci – Ĉi
– êWi = Wi – Ŵi
– êSi = Si – Ŝi
Step III Calculate the stochastic empirical distribution’s parameters
– SCi = Sorted (êCi / Ĉi)
– SWi = Sorted (êWi / Ŵi)
– SSi = Sorted (êSi / Ŝi)
Step IV Multivariate component is a correlation matrix calculated using
unsorted residuals in Step II
1  e c , e w  e c , e s
1
 e w , e s
1
Simulating MVE in Simetar
• One Step procedure for a 4 variable MVE
Highlight 4 cells if the distribution is for 4 variables, then type
=MVEMP( Location Actual Data ,,,, Location Y-Hats, Option)
Option = 0 use actual data
Option = 1 use Percent deviations from Mean
Option = 2 use Percent deviations from Trend
Option = 3 use Differences from Mean
End this function with Control Shift Enter
=MVEMP(B5:D14 ,,,, G7:I6, 2)
Where the 10 observations for the 3 random variables are in rows 5-14 of
columns B-D and simulate as percent deviations from trend
Two Step MVE
• Two Step procedure for a 4 variable MVE
Highlight 4 cells if the distribution has 4 random variables, type
=CUSD( Location of Correlation Matrix) Control Shift Enter
=CUSD( A12:A15)
Next use the CUSDs to calculate the random values
(Mean here could also be Ŷ)
For Ỹ1 = Mean1 *(1+ Empirical(S1, F(Si) , CUSD1) )
For Ỹ2 = Mean2 * (1 + Empirical(S2, F(Si) , CUSD2) )
For Ỹ3 = Mean3 * (1 + Empirical(S3, F(Si) , CUSD3) )
For Ỹ4 = Mean4 * (1 + Empirical(S4, F(Si) , CUSD4) )
• Use Two Step if you want more control of the
process
Parameter Estimation for MVE
Simulate a MVE Distribution
If Cannot Factor Correl Matrix
• When the Matrix is not Positive Semi Definite
use “Always Calculate” Option
Yield 1
Yield 1
Yield 2
Yield 3
Yield 1
Yield 2
Yield 3
Yield 2
Yield 3
1
0.96
0.9
1 0.263039
1
Yield 1
#VALUE!
#VALUE!
#VALUE!
Yield 2
#VALUE!
#VALUE!
#VALUE!
Yield 3
#VALUE! =MSQRT($C$4:$E$6)
#VALUE!
#VALUE!
Factored Matrix with the Always Calculate Option Turned On
Yield 1
Yield 2
Yield 3
Yield 1
-0.372 0.749664
0.9 =MSQRT($C$4:$E$6,,,,TRUE)
Yield 2
0 0.964785 0.263039
Yield 3
0
0
1
If Cannot Get CUSD or CSDs
• When the Matrix is not positive semi definite
then you can not calculate CUSDs, CSNDs
and One Step functions fail
• In that case use “Always Calculate” Option
Yield 1
Yield 2
Yield 3
CUSDs with a bad matrix
#VALUE! #VALUE! #VALUE! =CUSD($C$4:$E$6)
CSNDs with a bad matrix
#VALUE! #VALUE! #VALUE! =CSND(C4:E6)
Use the Always Calculate Option
CUSD
0.963801 0.968456 0.69574 =CUSD(C4:E6,,,,TRUE)
CSND
-0.67859 0.055371
-0.99369 =CSND(C4:E6,,,TRUE)
MV Mixed Distributions
• What if you need to simulate a MV
distribution made up of variables that are not
all Normal or all Empirical? For example:
–
–
–
–
X is ~ Normal
Y is ~ Beta
T is ~ Gamma
Z is ~ Empirical
• Develop parameters for each variable
• Estimate the correlation matrix for the
random variables in the distribution
MV Mixed Distributions
• Simulate a vector of Correlated Uniform
Standard Deviates using =CUSD() function
=CUSD( correlation matrix ) is an array
function so highlight the number of cells
that matches the number of variables in
the distribution
• Use the CUSDi values in the appropriate
Simetar functions for each random variable
=NORM(Mean, Std Dev, CUSD1)
=BETAINV(CUSD2, Alpha, Beta)
=GAMMAINV(CUSD3, P1, P2)
=Mean*(1+EMP(Si, F(Si), CUSD4))
Validation of MV Distributions
• Simulate the model and specify the random variables as
the KOVs then test the simulated random values
• Perform the following tests
– Use the Compare Two Series Tab in HoHi to:
• Test means for the historical series or the forecasted means vs. the
simulated means
• Test means and covariance for historical series vs. simulated
– Use the Check Correlation Tab to test the correlation matrix used
as input for the MV model vs. the implied correlation in the
simulated random variables
• Null hypothesis (Ho) is:
Simulated correlationij = Historical correlation coefficientij
• Critical t statistic is 1.98 for 100 iterations; if Null hypothesis is true
the calculated t statistics will exceed 1.98
• Use caution on means tests if your forecasted Ŷ is
different from the historical Ῡ
Validation of MV Distributions
Validation of MV Distributions