Using Return Level in a Statistical Model for the Joint Distribution of

USING RETURN LEVEL IN A STATISTICAL
MODEL FOR THE JOINT DISTRIBUTION
OF THE EXTREME VALUES
OF EQUITIES
by
MARK LARRY LABOVITZ
AB, The George Washington University, 1971
MS, The Pennsylvania State University, 1976
MA, The Pennsylvania State University, 1977
PhD, The Pennsylvania State University, 1978
ApSc, The George Washington University, 1981
MBA, The University Of Pennsylvania, 1989
MS, Regis University, 2003
MS, The University Of Colorado, 2008
A thesis submitted to the
University of Colorado at Denver/Health Sciences Center
in partial fulfillment
of the requirements for the degree of
Doctor of Philosophy
Department of Mathematical and Statistical Sciences
2009
© 2009 by Mark Larry Labovitz
All rights reserved.
-ii-
This thesis for the Doctor of Philosophy
degree by
Mark Larry Labovitz
has been approved
by
_______________________________________
Stephan R. Sain
_______________________________________
Peter G. Bryant
_______________________________________
Daniel S. Cooley
______________________________________
Michael S. Jacobson
______________________________________
Craig J. Johns
______________________________________
Weldon A. Lodwick
_________________________
Date
-iii-
Labovitz, Mark L. (Ph.D., Mathematical and Statistical Sciences)
Using Return Level In A Statistical Model For The Joint Distribution
Of The Extreme Values Of Equities
Thesis directed by Thesis Advisor Stephan R. Sain
ABSTRACT
Taking risk for the sake of financial reward is a long accepted reality of investing;
however it has been shown that investing using statistically more precise models of
risk is associated with higher expected and realized returns. Two additional
observations guide this research. Firstly, if financial returns are viewed as a
frequency distribution, it is the downside or left hand tail of returns which most
concerns investors. Secondly, for most investors the joint risk inherent in a
collection of securities held (a portfolio) is of greater interest than the risk
associated with any given single security. Thus the concern is the joint behavior of
downside returns from a portfolio.
In this research, the following hypotheses were examined:
1) Generalized Extreme Value (GEV) distributions are suitable for describing
the probabilistic nature of the “downside” return behavior of financial
securities.
2) The three parameters of the GEV distribution, rather than being constant,
are functions of financial indicators, and as such are time-varying.
-iv-
3) Multivariate or joint extreme behavior of securities can be described and
modeled as the joint behavior of time-varying return values, instead of a
more commonly used dependence function, such as a copula.
4) Functions of the return values can be used to improve the characterization
of risk and thereby the returns from portfolio construction.
For 3,000 equities randomly selected from those publicly traded on exchanges,
daily performance measures were collected from January 2000 to August 2007.
Weekly block minima of the returns were computed for each equity series. Time
varying GEVs were fitted using 44 financial covariates (down selected from 139).
95% of the time-varying models showed significant improvement over the static
model. Return values from these distributions were modeled satisfactorily as a
Gaussian Process, using both fixed and random effects representing important
ancillary factors such as market capitalization, business sector and stock exchange.
Finally, when the error or nugget variance is used in the description of risk for
portfolio formation, the financial portfolios so formed outperformed conventionally
used models by greater than 300% over the time frame of the study.
-v-
This abstract accurately represents the content of the candidate’s thesis. I
recommend its publication.
Signed ________________________________
Stephan R. Sain
-vi-
Acknowledgements
When a fifty-something attempts a doctoral program, the threads of support are
widely cast and numerous. It is no exception for this author. Firstly, the author
would like to acknowledge the support of the faculty members in the Department of
Mathematical and Statistical Sciences at the University of Colorado at Denver. I
would like to call out three of the faculty (and former faculty) in particular, Dr.
Craig Johns with whom, I spent long hours talking about statistics, Dr. Richard
Lundgren, who was always there with a kind word of encouragement and Dr.
Stephan Sain, my advisor, without whose enthusiasm and insights I could not have
succeeded in completing this research. The author wishes to thank his colleagues
at Lipper and Thomson Reuters, in particular the Lipper COO Eric Almquist (who
actively made certain I had everything I needed), the late Jed McKnight, my first
boss at Lipper, an incredibly knowledgeable and supportive soul, Barb Durland,
editor par-excellence, Hank Turowski and Jeff Kenyon who offered wise criticisms
and last but not least my colleague, my boss and my mentor in finance Andrew
Clark. To my family, thanks to my wife Susan for the love, time and support you
gave me, this is my last degree, you now have it in writing; thanks to my children
Leah and Edward who listened to me with semi-patience whenever I would
babbled on about a new discovery and would respond with an encouraging
-vii-
“whatever.” I remember my late mother Florence Labovitz, while a poor woman
she gave me the most important gift of faith in people and causes, to champion the
underdog and never forget where you came from. My mother was of the greatest
generation and I know that I am unlikely to meet another of her courage and grit in
my lifetime. Finally to my uncle Erwin Newman, an excellent mathematician, a
tutor, and a very gentle man, who died much too young and whom I think about
often. For these two and others un-named who have departed and left their mark
on me:
“…even when they are gone, the departed are with us, moving us to live
as, in their higher moments, they themselves wished to live. We
remember them now; they live in our hearts; they are an abiding
blessing.” (From a Jewish meditation on the departed)
-viii-
TABLE OF CONTENTS
Figures
Tables
..................................................................................................... xii
...................................................................................................... xv
Chapters
1. Background And Literature Review .................................................................... 1
1.1 Overview Of The Chapter ................................................................................ 1
1.2 Introduction....................................................................................................... 4
1.3 Financial Risk And Normality .......................................................................... 4
1.4 Models Of Risk-Reward ................................................................................... 6
1.4.1 Mean Variance Optimization ......................................................................... 7
1.4.2 VaR And CVaR ........................................................................................... 10
1.4.3 Generalized Capital Asset
Pricing Model (G-CAPM) ........................................................................... 12
1.5 Modeling Risk As It Effects Reward .............................................................. 15
1.6 Extremes And The Central Limit Theorem .................................................... 17
1.8 Statistics Of Extremes..................................................................................... 22
1.8.1 Existence Of Extreme Value Distributions.................................................. 23
1.8.2 Extreme Value Distributions ....................................................................... 27
1.8.3 Unification Through GEV Distributions
And Extreme Maximas ................................................................................ 29
1.8.4. Extreme Minima .......................................................................................... 31
1.9 Extreme Value Generation Models ................................................................ 33
1.9.1 Block Maxima Model .................................................................................. 33
1.9.2 Peaks Over Thresholds Model ..................................................................... 33
1.9.3 Poisson Generalized Pareto Model .............................................................. 35
1.9.4 Relation Between Extrema Models ............................................................. 36
1.10 Parameter Estimation ...................................................................................... 38
1.11 Departures From Independence ...................................................................... 41
1.11.1 Threshold Clustering ................................................................................... 42
1.11.2 Serial Correlation Effects ............................................................................ 42
1.12 Return Values ................................................................................................. 47
1.13 Multivariate Extreme Distributions ................................................................ 48
1.13.1 Dependence Functions And Copulas ........................................................... 51
1.14 Organization Of Remainder Of Dissertation .................................................. 53
2. Statement Of The Problem And Outline
Of The Proposed Research ................................................................................ 54
-ix-
2.1
2.2
2.3
2.4
2.5
Overview Of The Chapter .............................................................................. 54
Research Threads ............................................................................................ 56
Ingest, Clean, And Reformat Data .................................................................. 62
Fitting Time-Varying GEVs ........................................................................... 64
Computing Return Values And Developing A
Dependence Function Model .......................................................................... 66
2.6 Portfolio And Innovation Processing.............................................................. 68
3. Data, Data Analysis Plan, And Pre-Analysis
Data Manipulations ............................................................................................ 69
3.1 Overview Of The Chapter .............................................................................. 69
3.2 Classification Of Data Types .......................................................................... 72
3.3 Performance Data ........................................................................................... 73
3.4 Ancillary Data ................................................................................................. 73
3.5 Market Capitalization ..................................................................................... 78
3.6 Equity Liquidity .............................................................................................. 83
3.7 Covariates ....................................................................................................... 89
3.7.1 Sub-Selecting Covariates ............................................................................. 91
3.8 A Couple Of Final Words On Data Organization .......................................... 98
4. Fitting Time-Varying GEVs ............................................................................ 100
4.1 Overview Of The Chapter ............................................................................ 100
4.2 GEV Reprise ................................................................................................. 106
4.3 Non-Time-Varying Model ............................................................................ 108
4.4 Block Size And Distribution ......................................................................... 112
4.5 The Time-Varying Model ............................................................................. 125
4.5.1 Matrix Form Of Relationships Between TimeVarying Covariates And GEV Parameters ................................................ 127
4.6 Examining Covariates ................................................................................... 128
4.6.1 Time’s Arrow And Periodic Covariates .................................................... 128
4.6.2 Financial Markets And Economic Covariates ........................................... 133
4.7 The Full Fitting Of Time-Varying GEVs ..................................................... 137
4.7.1 Stepwise Model ......................................................................................... 137
4.8 Analyzing The Covariate Models ................................................................. 142
5. Estimating Time-Varying Return Levels
And Modeling Return Levels Jointly .............................................................. 144
5.1 Overview Of The Chapter ............................................................................ 144
5.2 A Brief Recap To This Point ........................................................................ 149
5.3 Computing Return Value Levels .................................................................. 149
5.4 The Variance Of Return Values ................................................................... 151
5.5 Multivariate Model ....................................................................................... 154
5.6 Model Building ............................................................................................. 158
5.6.1 Fixed-Effect Models .................................................................................. 158
-x-
5.7 Examining Factors As Sources Of Variability ............................................. 179
5.8 Further Modeling .......................................................................................... 183
5.8.1 Step 1–Further Fixed-Effect Modeling ...................................................... 183
5.8.2 Step 2–Fitting Gaussian Processes Using
Maximum-Likelihood Estimation ............................................................. 186
5.9 The Selected Model, In Detail ...................................................................... 196
6. Consequences Of The Research For Portfolio
Formation And Innovation .............................................................................. 201
6.1 Overview Of The Chapter ............................................................................ 201
6.2 Tasks To Be Performed In The Chapter ....................................................... 205
6.3 Test Data Sets ............................................................................................... 205
6.4 Model Validation .......................................................................................... 206
6.4.1 Expectations Of The Test Set .................................................................... 208
6.4.2 Variability And Distribution
In The Test Data Set .................................................................................. 212
6.5 Application Of Model Results
To Portfolio Formation ................................................................................. 214
6.5.1 Construction Of Efficient Frontiers ........................................................... 221
6.5.2 Applying The MVO Weights .................................................................... 227
6.6 Predicting The Best Covariance Structure .................................................... 233
7. Summary And Conclusions, Along With
Thoughts On The Current Research ................................................................ 239
7.1 Overview Of The Chapter ............................................................................ 239
7.2 Summary ....................................................................................................... 240
7.3 Conclusions................................................................................................... 249
7.4 Unique Aspects Of The Research ................................................................. 250
7.5 Future Research ............................................................................................ 252
Appendix
A. List of Countries, Stock Exchanges, Industries
And Sectors Used In This Research ................................................................ 255
B. Detailed Analyses Of The Covariate Models .................................................. 259
Bibliography
.................................................................................................... 278
Sources Of Data .................................................................................................... 287
-xi-
LIST OF FIGURES
Figure
1.1
Example Of Efficient Frontier Built Upon
Mean Variance Optimization ……………………………………………. 8
1.2
Probability Density Function Of Exponential
Distribution With Rate λ =1…………………………………………….. 17
1.3
Normal Density Function Superimposed On Histogram
Of 10,000 Means Of Sample Size 50 Drawn From
An Exp(1) Distribution………………………………………………….. 19
1.4
Normal Density Function Superimposed On Histogram
Of 10,000 Maxima Of Sample Size 50 Drawn From
An Exp(1) Distribution………………………………………………….. . 19
1.5
Extreme-Value Distributions Depicted On The Same Plot……………... 28
1.6
Relationships Between Return Values………………………………….. 49
2.1
High-Level Data Flow Diagram Depicting The
Organization Of Research Elements
In This Thesis…………………………………………………………… 61
3.1
Log-Log Plot Of Market Caps Versus Rank For
51,590 Equity Series By Continent Of
Domicile, Overlaid By Market Caps
As Defined In Table 3.4…………………………………………………. 80
3.2
Notched Box And Whisker Plot Of Logged Market
Capitalization Of 51,590 Equity Series…………………………………. 81
3.3
Log–Log Plot Of Market Caps Versus Rank For
15,528 Equity Series By Continent Of
Domicile, Overlaid By Market Caps
As Defined In Table 3.4…………………………………………………. 82
3.4
Notched Box And Whisker Plot Of Logged Market
Capitalization Of 15,528 Equity Series………………………………….. 83
4.1
Plot Of Estimated Medians of GEV Parameters Over
Different Time Units Expressed In Weeks……………………………… 118
4.2
Estimates Of GEV Median Means And Standard
Deviations Over Different Time Units
Expressed In Weeks…………………………………………………….. 119
4.3
Estimates Of GEV Median Skewness And Kurtosis
Statistics Over Different Time Units
Expressed In Weeks…………………………………………………….. 120
-xii-
4.4
5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
5.9
5.10
5.11
5.12
5.13
Histograms Of Extremes On Different Time Scales……………………. 121
Histograms Of Log Return Values For Selected Years For
P [ X ≥ x] ≤ 0.019 , Censored At A Value
Of 2,000 Percent………………………………………………………… 161
Histograms Of Log Return Values For Selected Years For
P [ X ≥ x ] ≤ 0.019 , Censored At A Value
Of 1,000 Percent………………………………………………………… 162
Histograms Of Log Return Values For Selected Years For
P [ X ≥ x ] ≤ 0.019 , Censored At A Value
Of 500 Percent…………………………………………………………... 163
Notched Box-Plots Of 52-Week Logged Return Value
Aggregated By Continent For (A) Year 2001
And (B) Year 2007……………………………………………………….166
Notched Box-Plots Of 52 Week Logged Return Value
Aggregated By Sector For (A) Year 2001
And (B) Year 2007……………………………………………………… 167
Notched Box-Plots Of 52-Week Logged Return Value
Aggregated By Market Cap For (A) Year 2001
And (B) Year 2007……………………………………………………… 168
Notched Box-Plots Of 52 Week Logged Return Value
Aggregated By Exchange For (A) Year 2001
And (B) Year 2007…………………………………………………….. 169
QQNorm Plots Of Standardized Residuals From
Three-Factor Fixed-Effects Model For
Years 2001 To 2007……………………………………………………. 175
Scatter Plot Of Standardized Residuals Versus Fitted
Values From Three-Factor Fixed-Effects
Model For 2001…………………………………………………………. 178
Plot Of Mean And +/- Two Standard Deviations Of
Logarithm Of Market Cap Versus Logarithm Of
Mean Return Value……………………………………………………… 180
Plot Of Year Versus Logarithm Of Mean Return Value………………... 181
Plot Of Market Capitalization Versus Standardized
Residuals From Model Composed Of Earlier Three
Factors Augmented By Market Capitalization As
A Continuous Predictor………………………………………………… 182
Selected Qqnorm Plots Of Standardized Residuals
From Various Market Capitalization For 2001
Coming From The Selected Model…………………………………....... 195
-xiii-
6.1
6.2
6.3
6.4
6.5
6.6
6.7
6.8
6.9
Plot Of Estimated Mean Response For The Test
Dataset, Using Coefficients Based On The
Training Set Fixed-Effects Model Versus
The Logged 52-Week Return Response………………………………… 209
Mean Response And Observation Values Within
95% Prediction Interval Envelopes, Using
Individually Derived And SchefféAdjusted Prediction Intervals…………………………………………… 211
Graphs Depicting The Results In The Formation Of
Efficient Frontiers For The Sigma Covariance
Structure, Rebalance Year 2002……………………………………….. 224
Graphs Depicting The Results In The Formation Of
Efficient Frontiers For The Sigma, 26.071 Weeks
Covariance Structure, Rebalance Year 2004……………………………. 225
Graphs Depicting The Results In The Formation Of
Efficient Frontiers For The Sigma, 256.42 Weeks
Covariance Structure, Rebalance Year 2006…………………………… 226
Cumulative Return Plot (VAMI) Over The Years
Indicated For An Invest-And-Hold Strategy
Under The Set Of Covariance Structures
Described In The Text…………………………………………………... 230
Returns Normalized By Standard Deviation For An
Invest-And-Hold Strategy Under The Set Of
Covariance Structures Described In The Text………………………….. 231
Cumulative Return Plot (VAMI) Over The Years
Indicated For An Annual-Rebalance Strategy For
Maximum, Minimum, And Base Covariance
Structures For Risk……………………………………………………... 232
Plot Of Daily Returns From S&P 500 Versus Similar
Measure Of The Vix For The Period From
2000-2007………………………………………………………………. 234
-xiv-
LIST OF TABLES
Table
1.1
Salient statistics for the two samples…………………………………….. 20
3.1
Equity-based time series used in present research………………………. 73
3.2
Description of ancillary dataset used in the equity
sample design……………………………………………………………. 74
3.3
Reducing the number of securities by processing stage…………………. 76
3.4
Market capitalization classes and their cut points……………………….. 79
3.5
Analysis of deviance table from stepwise logistic
regression for weekly extreme-value series…………………………….... 86
3.6
Analysis of deviance table from stepwise logistic
regression for monthly extreme-value series………………………………86
3.7
Cross-tabs from the assignment of weekly
extreme-value series to classes…………………………………………... 87
3.8
Cross-tabs from the assignment of monthly
extreme-value series to classes………………………………………….. 87
3.9
Globally recognized benchmarks and
indices……………………………………………………………………. 90
3.10 Number of covariates satisfying the high-grading
criteria as a function of percentage variation
explained and threshold value……………………………………………. 96
3.11 Distribution of variation in loadings between
covariates and factors for the factor structure
arising from the selected high-grading results
described in the accompanying text……………………………………… 97
3.12 Covariates selected for use in estimating GEV parameters……………… 98
4.1
Correlations among GEV parameters for weekly and
monthly data series……………………………………………………… 114
Tabulation of results from the empirical (bootstrap)
4.2
analysis of maxima of different time units and
estimation of related GEV parameters and
statistics…………………………………................................................ 117
4.3
Description of candidate periodic behaviors……………………………. 128
4.4
Number of weekly extremal series failing to reject
the null hypothesis of level stationarity under the
KPSS test……………………………………………………………….. 130
-xv-
4.5
4.6
4.7
4.8
4.9
5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
6.1
6.2
6.3
6.4
A.1
Results from the permutation test from power spectra
examining hypothesized salient periods………………………………… 131
Results from white-noise test from power spectrum
examining hypothesized salient periods………………………………… 132
Correlation matrix from the covariate series created by
crossing aggregation and time frame factors……………………………. 135
Percentage representation of the number of models in
which each of the covariate classes appears…………………………….. 136
Tally of stopping steps for the equities in training set
(stopping rules used are discussed in the text above)…………………… 141
Results from stepwise model construction for fixedeffects model by year……………………………………………………. 172
Coefficients of determination (R 2 ) for the annually
computed three fixed-effects model described in
the text above……………………………………………………………. 173
ANOVA arising from Type III SoS for fixed-effect
models containing additional discrete and
continuous factors/predictors……………………………………………. 185
Selected results from maximum-likelihood computations
under assumptions and structure described in the
previous text……………………………………………………………... 189
Results from repeated performance of the Jarque-Bera
test of normality by variance grouping and by model
for first pair of treatments of nuggets…………………………………… 193
Results from repeated performance of the Jarque-Bera
test of normality by variance grouping and by model
for second pair of treatments of nuggets………………………………… 194
Example of cell mean contrasts for discrete market
capitalization predictor…………………………………………………...197
Structure of the columns of the design matrix X ………………………. 197
Identifiers/indices and names of 98 securities in
test data set………………………………………………………………. 207
P-values from KS test of the null hypothesis that the
residuals from the fixed-effects model run on both
the training and test sets have the same distribution…………………….. 213
Rebalance strategy yielding highest returns under the
set of covariance structures described in the text for
an annual rebalance over the years indicated……………………………. 229
Correlation between returns on VIX and returns on
Sharpe portfolios for the years indicated…………………………………236
Countries used in this research ………………………………………….. 255
-xvi-
A.2
A.3
A.4
B.1
B.2
B.3
B.4
B.5
B.6
B.7
B.8
B.9
B.10
Stock exchanges used in this research……………………………………256
Industries used in this research…………………………………………...257
Sectors used in this research……………………………………………...258
Covariates grouped as per factors developed using
a principal-components extraction and a varimax
rotation, groups (factors) contain more than
one covariate…………………………………………………………….. 261
Covariates grouped as per factors developed using
a principal-components extraction and a varimax
rotation, groups (factors) contain only
one covariate…………………………………………………………….. 262
Overall tally of covariates entering models, broken
down by parameters and broad themes………………………………….. 263
Tally of covariates entering models, broken down
by parameters and time lag, not including
contemporaneous observations………………………………………….. 264
Tally of covariates entering models broken down
by parameters versus aggregate function and
covariate groupings……………………………………………………… 265
Tally of covariates entering models, broken down
by parameters versus covariate group/factor……………………………. 267
Results from chi-square test for the presence of
association between the values of the stated
factor and the dependent variable……………………………………….. 270
Directional representation of residuals from
contingency table analyses between the
Market Value factor and significant
dependent variables as listed in
Table B.7………………………………………………………………… 272
Directional representation of residuals from
contingency table analyses between the
Trade Region factor and significant
dependent variables as listed in
Table B.7………………………………………………………………… 273
Directional representation of residuals from
contingency table analyses between the
Exchange Region factor and significant
dependent variables as listed
in Table B.7……………………………………………………………… 274
-xvii-
B.11
B.12
B.13
Directional representation of residuals from
contingency table analyses between the
Sector factor and significant dependent
variables as listed in Table B.7………………………………………….. 275
Q-Mode factor analysis—cases represented by
dependent variable minimum aggregation
(Min.ex) and properties represented by
the data factor sector (Sect)…………………………………………….. 276
Results from Q-Mode factor analysis…………………………………... 277
-xviii-
1.
1.1
Background And Literature Review
Overview Of The Chapter
In Chapter 1, the material first motivates the domain for the research by introducing
and discussing risk in a financial context. It is succinctly put that what most
rational investors want from their investments is to maximize reward and minimize
risk. In most “fair” financial markets risk and expected reward are directly related,
as expected reward increases so does risk. The concept of reward as defined by
return on investments is generally well understood, however it is the concept of risk
where the issues lie. Firstly the risk appetite of individuals is difficult to directly
connect to the most commonly used measure of risk, the standard deviation and the
exactness of the relationship between standard deviation and the likelihood of loss
is really only understood in the context of the probability distribution of return. In
this regard it is shown that the assumptions of the Gaussian distribution of returns
by early investigators are not borne out by later research. In fact, departures from
the Gaussian form are such that investors who are driven by concerns regarding the
likelihood associated with the loss of various amounts of wealth (and many
investors are motivated by this downside behavior, as it is called) are poorly served
by assuming the probability in the tails of the Gaussian. So the focus becomes the
-1-
behavior of extreme returns along with a discussion examining various means of
describing the distribution of extremes, showing the deficiencies in a number of
models and commonly used statistics “tricks” or machinery.
The bulk of the Chapter is spent on a review of the development of extreme value
theory and the pertinent literature. The material includes Extreme Value Theory
(EVT) and distributions, some of the key theorems and conditions under which the
distributions exists and the unification of the form of the three EVT distributions
under the rubric of Generalized Extreme Value (GEV) distributions. Related topics
are discussed which cover the models for identifying and defining extreme values,
including the block maximum, or the point process approaches the peaks over
threshold (POTs) and the Poisson-generalized Pareto (P-GPD) model. The
relationship between the three models and the associated distributions are also
reviewed. These sections are followed by a discussion of the methods and issues
which are involved with parameter estimation for GEV parameters.
Other topics commonly associated with the modeling of extremes are reviewed or
at least acknowledged. The impact of relaxing the initial assumption of identically
and independently distributed behavior of the extreme random variables are briefly
visited. Under this hypothesis topics such as clustering of extreme values, and
-2-
forms of dependence, both serially and spatial are highlighted along with the
relevant literature.
The final two topics of the Chapter (Chapter 1) are of special relevance to the
dissertation. These are the concept of return values and the construction of
multivariate extreme distributions. Return values are the quantile values defined
for a given distribution and return level of N time units (in this case weeks) such
that the probably that the value of an extreme random variable from the distribution
will exceed the quantile (return) value is equal to 1/N.
The discussion of multivariate extreme value distributions is focused on
dependence functions or functions for “tying” together a set of univariate extreme
values distributions into a multivariate distribution. Of the dependence functions
presently favored, the most popular is the copula. The copula is a multivariate
distribution of unit (0,1) support and of dimensionality equal to the dimensionality
of the desired multivariate distribution, defined such that each of its marginal
distributions is uniform.. The problem with copulas is not in defining them but in
showing/knowing that the target joint distribution is the correct one for the
circumstance. The thrust of this research as will be spelled out in other Chapters is
to use return values as a central element in the formation of a model to create
multivariate extreme value distributions.
-3-
1.2
Introduction
No one likes to lose money. One way to avoid that is for investors to place their
money in risk-free investment vehicles (for example, U.S. Treasury bills).
However, the average return from such investments is often much less than desired
by investors. To raise the expected return investors must be prepared to take some
risk, that is, they must be willing to chance losing some money. The questions for
investors become: What is the measurement of risk they should be using, that is,
how to quantify risk? What do statements involving this risk quantity mean in a
financial context? How do these measures and statements of risk help guide the
investment selection process? As the dissertation advances these questions will be
explored more fully and lead to the development of a model to provide some
additional guidance.
1.3
Financial Risk And Normality
Taking a statistical perspective, let us assume the observation of a random variable
(RV) X , which is measurable on a standard Borel σ -algebra, and that this random
variable represents return performance of a financial security over some given
period. For this random variable risk can be equated with uncertainty and the
distribution of non-zero probability density. The extent to which we can
-4-
characterize the distribution of non-zero probability density is the extent to which
we can characterize risk. Let F be an arbitrary unknown probability distribution
function of X . Assuming F has a finite second moment, we know from the central
limit theorem (CLT) (Casella and Berger [2001]) that appropriately transformed
sums and means of X convergence in distributions to the normal distribution with
mean equal to 0 and variance equal to 1 [N(0,1)]. Noting that categorizing risk
under a normality assumption is largely defining functions that are proportional to
the standard deviation, Markowitz (1952) suggests variance (and functions of
variance) were “sufficient” descriptions of risk. This assumption by Markowitz is a
key to his portfolio1 theory, commonly called the mean-variance portfolio theory or
mean-variance optimization (MVO). Using the CLT to assume the distribution of
expected values is normal, we know the portfolio of linear combinations of
asymptotically normal random variables formed is itself normal. Any level of
risk/uncertainty can be estimated for any value or quantile of the portfolio random
variable because of the tractable nature of such computations under a normal
assumption. The use of the normality assumption and mean-variance portfolio
theory was formalized further in the development of the capital asset-pricing model
1
For the purposes of this dissertation, a portfolio is defined as a set of financial securities owned by
one entity. They may include equity or ownership in a business, bonds or debt which pays interest
on money loaded, collective investment mutual funds, exchanged trade funds, unit investment trust
and derivative securities, securities built upon equities and first placed debt. In this dissertation, the
investigator will focus on or restrict the discussion to collections of equities as portfolios.
-5-
(CAPM) and the concept of systematic risk embodied in the financial coefficient
called β 2 (Sharpe [1964] and Lintner [1965]).
While Costa et al. (2005) applied the multivariate normal in the process of
evaluating risks in the joint distribution of portfolios, there is a substantial body of
financial journal literature examining the skewness and leptokurtosity (fattailedness) of the distribution of financial returns (Affleck-Graves and McDonald
[1989], Campbell et al. [1997], Dufour et al. [2003], Szego [2002], Jondeau and
Rockinger [2003], and Tokat et al. [2003]). The nearly unanimous conclusion is
that financial returns are not normally distributed. So, if distributions of returns are
not normal and the investor is interested in protecting against/computing the risk
associated with the extreme downside of portfolio returns, does the CLT help here?
1.4
Models Of Risk-Reward
In reviewing some of major models for capturing risk, it should be evident that not
all levels of risk are of equal concern. Downside extreme returns are of far greater
concern than either returns which are centrally located in the distribution or upside
extreme returns. With regard to the upside extreme returns, while there are at
present few complaints by investors because they made more money than expected,
2
Beta is parameter which represents how changes in the returns in the larger financial market as a
whole relate to changes returns in return of an individual security or group of securities, that is a
portfolio.
-6-
it is the author’s view that it is just a matter of time. As methods of portfolio
construction get more sophisticated, investors will demand that portfolio analysts
consider both tail extremes in portfolio construction.
1.4.1
Mean Variance Optimization
As stated earlier one of the earliest models of the modern portfolio theory era,
Mean Variance Optimization (MVO) treats portfolio construction or asset
allocation as a balance between return and risk, where expected return is defined as
the arithmetic mean and risk is limited to the second (central) moment or variance.
This is frequently represented as return penalized by risk in the following formula
w T µ − λ w T Σw
(1.1)
where: w is a weight vector describing the allocation of p asset to the portfolio,
with
∑
p
i =1
w i = 1 and (often) w i ≥ 0 ,
µ is the expected return vector of assets in the portfolio, frequently
estimated by the arithmetic mean,
Σ is the covariance matrix frequently estimated by the maximum
likelihood estimator, and
λ is the risk aversion parameter.
The formula above is the objective function which is to be maximized, subject to a
set of constraints. The use of the MVO is best suited to situations when the
underlying asset returns are distributed normally. One of the most common
-7-
visualization tools of MVO is the efficient frontier, a depiction of which is given in
Figure 1.1.
Line representing
CAPM
Randomly
generated feasible
solutions
Risk-free
rate
Efficient
Frontier
Figure 1.1 Example Of Efficient Frontier Built Upon
Mean Variance Optimization.
The Efficient Frontier (EF) is generated by repeated solutions of the MVO
optimization over the range of values of the standard deviation which the object
portfolio can take on. It represents the maximum expected return which the
portfolio can achieve for a given standard deviation, i.e. risk. Another important
element here is the Capital Asset Pricing Line, which is depicted as the dashed line
in Figure 1.1 We assume the existence of a risk-free investment such that the
-8-
investor may lend or borrow at the risk-free rate. All other investments under
consideration have returns which are not known a priori with certainty, and are
collectively called the risky portfolio. The components of the risky portfolio can
have only non-negative weightings (unlike the risk-free asset). We assume that all
other budgetary and existence constraints remain the same. Then the investor can
realize any combination risk portfolio existing on quadrant 1 or 4 of the plane
( µ,σ ) falling on the line µ = µ0 + ξ × σ with µ being return of the investment
portfolio, µ0 being the risk-free rate of return, σ being the risk measure--in this
case the standard deviation--and ξ ≥ 0 being the slope,which varies as function of
the efficient frontier and the universe of securities used. From Markowitz (1959),
we find
σ 2 = (1 − w 0 )2 σ π2
and
µ = µ0 +
µπ − µ0
σ
σπ
µ = µ0 +
µπ − µ0 1/α
λα
λα1/,απ
or more generally
where (in addition to previous definitions):
w 0 is the proportion of wealth in the risk-free investment
µπ is the return on the risky portfolio, and
-9-
(1.2)
σ π2 ,σ π are the variance and standard deviation, respectively, of the risky
portfolio.
λα1/α , λα1/,απ are risk measures for the portfolio and the risky portion of the
portfolio, respectively, of order α defined such that λα1/α = σ when α =2.
What makes these results particularly interesting is that under the same
assumptions, we can realize the Generalized Capital Asset Pricing Line or Model
(G-CAPM) in the same format by substituting the nth root of the cumulant of the nth
(see eqn. 1.2) order for standard deviation (Malevergne and Sornette [2001]).
1.4.2
VaR And CVaR
The focus of MVO upon downside returns is not explicit, but more indirectly
through dispersion of the deviates as graphically depicted in the dispersion
ellipsoid. The Value at Risk (VaR) and the Conditional Value at Risk (CVaR)
measures are directly associated with the downside distribution of returns. It is
however like the MVO univariate in character, in that is it is a measure made at the
portfolio level and not at the level of the individual components making up the
portfolio. It also performs best when the distribution of returns is unimodal. If we
select a (low) probability, typically in the 0.01 to 0.10 range, (call this
probability α ), then VaR is smallest quantile value x such that Pr[ X ≤ x ] = α .
We suppose X is a random variable describing the distribution of returns from a
portfolio and we choose α =0.05. Then the VaR is the return quantity x satisfying
the conditions described, and means that a return value of x or less will occur
-10-
randomly 5 times in 100 returns. Given the distribution of asset returns and a
designated level risk that an investor is willing to accept (that is the tail
probability), one could couple the VaR for any portfolio composed of these assets.
CVaR, or conditional value at risk, is a measure which answers the question: Given
the desired tail risk, what expected value of the measure (i.e. return) would an
investor see given that the return was less than the VaR value? Consequently,
CVaR is also called expected shortfall or expected tail loss. CVaR is computed
using established conditional probability identities.
x
x
−∞
−∞
E [Tail Loss ] = E [ X | X ≤ x ] = ∫ t • f X (t )dt / ∫ f X (t )dt
where:
(1.3)
E[i ] is the expected value operator,
X is a random variable defined on the distribution of return,
x is the VaR value, and
f X (t ) is the function describing the distribution of probability density for
portfolio returns.
A figure depicting the VaR and CVaR elements is given in Figure 1.6 in
conjunction with a discuss of return levels for extreme value distributions.
Rockefeller and Uryasev (2002) developed a linear optimization which identifies
the components and weights that probabilistically minimize CVaR and VaR for a
given α and probability density or probability mass function.
-11-
1.4.3
Generalized Capital Asset
Pricing Model (G-CAPM)
Let us assume that distribution of returns being observed is leptokurtic? Let us
leave out of consideration the distribution that is asymmetric or skewed. As stated
earlier, leptokurtic behaviour means that, in comparison to the normal distribution,
the leptokurtic distribution of returns has too much density in the central part of the
distribution, too little density in the shoulder and too much density in the tails
(similar to behaviour demonstrated by the t-distribution with few degrees of
freedom). This means:
1. If the investor believes the distribution of returns for a given asset or
portfolio is normal while it is indeed leptokurtic, and uses the sample of
returns to estimate the distribution’s parameters (namely the mean and
variance), and sets a quantile value for some tail probability, the quantile
will be wrong. Even worse the quantile value will be wrong in a nonconservative fashion, that is, more density (meaning more probability) will
be in the tail and tail events will be more likely.
2. While the first two cumulants for a normal are equal to the mean and
variance and the cumulants beyond the second are zero, the same is not true
for leptokurtic distributions.
-12-
Malevergne and Sornette (2002) describe the impact of moments higher than the
second moment in terms of the higher-degree cumulants. In the work cited,
cumulants up the 8th degree are examined. Malevergne and Sornette (M&S) pose
the risk in marginal distribution of assets as being made up of the non-zero
cumulants of higher order. For a symmetric distribution, the odd cumulants of a
symmetric leptokurtic distribution are zero. In examining higher order cumulants,
we learn that the cumulants are polynomials of order equal to the cumulant order,
so greater amounts of density in the tail of the distribution add considerably to the
risk, considerably more risk than limiting the estimate of risk to the second
cumulant or variance.
M&S show two important results for heavy tailed distributions and examining risk
as expressed by cumulants of up to order eight. These are:
1. As indicated in the discussion above on MVO, the results describing the
Capital Asset Pricing Model can be generalized to include higher order
cumulants.
2. While the efficient frontier constructed using lower order cumulants
dominates the efficient frontier constructed from higher order cumulants,
for the root 1/n, where n is the order of the cumulant, the return for the
given “true” level of risk as approximated by higher order cumulants
increases with the order of the cumulant. Hence, the results support the
-13-
findings of Clark and Labovitz (2006) wherein higher returns are available
with increasingly complex computation of risk.
Having established the results for a marginal or univariate distribution, the question
becomes one of estimating cumulants for portfolios or random variables which are
weighted sums of univariate random variables. Malevergne and Sornette (2001),
using the transformation theorem and a normal copula, define an approach for
going from an arbitrary marginal distribution to the characteristic or moment
generating function of the joint distribution. Here we show the form for the
moment generating function of the bivariate generalized Weibull distribution, given
by:
PˆS (k ) =
∞ ∞
1
2π 1 − ρ
ik ( χ1w1 sgn( y1 ) |
2
∫ ∫ dy dy
1
−∞ −∞
y1
2
2
1
exp[ − y TV −1y +
2
|q1 + χ 2w 2 sgn( y 2 ) |
y2
2
(1.4)
|q 2 ]
The moments (and hence cumulants) which are functions of the moments of this
distribution defined by
n
n
M n = ∑  w1pw 2n − pγ q1q2 (n, p )...
p =0  p 
-14-
This method turns out to be somewhat intractable for large universes of candidate
asset classes and as discussed below the use of copulas has some issues of its own.
1.5
Modeling Risk As It Effects Reward
Is there any advantage to pursuing such a line of inquiry, i.e., can various
performance/risk measures produce systematically different results and “better”
results? Or, as Clark and Labovitz (2006) put it, is money being left on the table?
Clark and Labovitz (2006) used both equity and bond fund universes to examine
the question “Do portfolios formed using different risk/performance metrics and
models yield systematically different level of performance?” The authors
examined portfolio formation under three risk reward models:
1.
Single-Index CAPM (Capital Asset Pricing Model): Assumes a onefactor return generating process, typically a function of overall market
return, plus a firm-specific innovation (from Sharpe, 1974).
2.
Multi-Factor Model: Fama and French added two factors, (i) small caps
and (ii) stocks with a high book-value-to-price ratio (customarily called
"value" stocks; their opposites are called "growth" stocks), to CAPM to
reflect a portfolio's exposure to these two classes. Carhart added a
momentum factor to the Fama/French model and this four-factor model
is what the authors used (from Day, et al., 2001).
-15-
3.
Generalized CAPM: G-CAPM evokes higher moments (mainly
kurtosis) to talk about risk because returns tend to be more peaked and
fat-tailed than assumed in either CAPM or multi-factor models.
Minimizing the variance of the return portfolio will overweight this
asset, which is wrongly perceived as having little risk due to its small
variance, or waist (from Malevergne and Sornette, 2001).
The authors answered with a rousing YES, the question as to whether, based upon
the risk measure used in portfolio formation, the investor was leaving money on the
table. The authors found:
• Single-Index CAPM increases gross annual returns by 70-80bp (basis
points).
• Multi-factor models increases gross annual returns by 220bp.
• Generalized CAPM, on average, increases gross annual returns by 300bp
or more.
Just as interesting was the ordering of the result: the greater the detail or
complexity, with which risk is modeled, the greater the increase in return which
was realized. The authors concluded that modeling risk more accurately is indeed a
valuable pursuit in improving return in portfolio formation.
-16-
1.6
Extremes And The Central Limit Theorem
Let us look at extremes and some common statistical mechanism. Firstly can the
Central Limit Theorem be of use in working with extremes? So let us assume the
returns, rather than forming a nearly symmetric unimodal distribution, appear to
behave far more like an exponential distribution with λ = 1. Figure 1.2 depicts a
probability density function of such a form. While forming the following two
figures, samples of size 50 were drew from this exponential distribution and then
computed the sample means and sample maxima.
Figure 1.2 Probability Density Function Of Exponential
Distribution With Rate λ =1.
-17-
Figure 1.3 is the resulting distribution of means, while Figure 1.4 is the resulting
empirical distribution of maxima. The line plot on each figure is the normal
distribution with parameters estimated from the sample. Despite the highly nonnormal shape of the original distribution, the means suggested by the CLT in
Figure 1.3 and Table 1.1 are nicely approximated by the normal distribution. This
is not so for the sample distribution of maxima. The Jarque-Bera (JB) test has a null
hypothesis that the data are from a normal distribution. The statistic is a function of
the excess estimated skewness and kurtosis in the data. Under the null hypothesis
of normality, the JB statistic is asymptotically distributed as χ d2.f .= 2 . While the
statistics seem to indicate that the number of observations forming a mean is not
yet large enough be distributed normally (recall that the results are asymptotic
results), the means are clearly trending towards a normal distribution much more
than the maximum values.
The conclusions appear to be:
•
The CLT and the normal distribution are not appropriate descriptions of the
maximum values of the function.
•
The maximums are asymmetric and leptokurtic (fat-tailed), i.e., they have
too much density in the center and, more importantly, in the tails.
-18-
-19-
Figure 1.3 Normal Density Function Superimposed On
Histogram Of 10,000 Means Of Sample Size
50 Drawn From An Exp(1) Distribution.
Figure 1.4 Normal Density Function Superimposed
On Histogram Of 10,000 Maxima Of
Sample Size 50 Drawn From An Exp(1)
Distribution.
-20-
Legend:
Nobs == number of observations
Means == arithmetic means
Median == median
Var == variance
Std == standard deviation
Min == minimum value
Max == maximum value
Iqr == interquartile difference
Skew == skewness
Kurtosis == kurtosis
Jarque-Bera == Jarque-Bera
test of normality
For this example, under assumptions of normality and the CLT, the following values should be observed:
Mean = 1.0
Skewness = 0.0
Kurtosis (this definition) = 3.0
Table 1.1 Salient statistics for the two samples.
1.7
Order Statistics
Another approach or piece of statistical machinery commonly taught to statisticians
is the field of order statistics: Let X1, X2,…,Xn be an identically and independently
distributed (i.i.d.) random sample of size n from the cumulative density function
(cdf) F. We can define Y1, Y2,…,Yn as the order statistics of the {Xi}, where Yn =
max({Xi}). If F is known, it can be shown that the distribution of FY =
i

 ( i −1)
n!
( n −i )
)f X

 F (1 − F
 (n − i )!(i − 1)! 
(1.5)
where: f X is the probability density function.
However, the selection of the probability model (choice of F) is often based upon
heuristics—the world view of the researcher—and the estimation of parameter
values are sensitive to model selection. But there is little theory, excepting some
special cases, suggesting the magnitude of potential bias/variance, which can be
extremely large nor are there asymptotic results associated with the situation (the
circumstances do not improve with increased sample size). Carrying on from a
classical statistical perspective, we have:
P(Yn ≤ x) = P( X1 ≤ x, X2 ≤ x,…, Xn ≤ x)
= P( X1 ≤ x) × P( X2 ≤ x) ×…× P( Xn ≤ x) [by independence]
= [FX (x)]n [by previous definitions]
-21-
(1.6)
However, FXn ( x ) → 0 as n → ∞ ∀x < xsup , where xsup = inf({ xi } : P ( X ≤ x i ) = 1) .
That is, the distribution of Yn or the max ({Xi}) converges to a point mass on Xsup.
So, the distribution function of the order statistic of the extreme value of X is
degenerative as n → ∞ —not a particularly useful result in evaluating the
distribution of extreme values.
1.8
Statistics Of Extremes
The development of the statistics of extreme values has a long history with respect
to the modern theory of statistics, with the original development being attributed to
Fisher and Tippet (1928). However, it has been placed on a much stronger
theoretical foundation by Gnedenko (1943), picking up speed in the latter half of
the twentieth century, starting with the work of Gumbel (1958). The theory has a
number of threads or approaches, depending on the nature of the uncertainty under
examination. Starting with univariate i.i.d. random variables, the theory progresses
through multivariate p-dimensional random vectors possessing dependence (the
covariance of X i and X j notated cov( X i , X j ) ≠ 0 for at least one pair i , j ∈ {1, 2,
…,d}) to stationary serially correlated processes (where, for example, in weak
stationary processes E [ X t ] = E [ X t + l ], E [ X 2 t ] = E [ X 2t + l ] and
cov( X t , X t + l ) = cov( X t +d , X t +d + l ) , (where E[ ] is the expected value operator)
-22-
regardless of the value of l for both univariate and multivariate environments), and
concludes with the extreme-value statistics of nonstationary processes.
1.8.1
Existence Of Extreme Value Distributions
We start the discussion with univariate i.i.d. random variables. Much of the early
work on the subject of the statistics of extreme values, including the early work of
Gumbel (1958), tried to develop the theory via the classical theory of sums of
random variables and CLTs. This alternative will be reviewed here, drawing
heavily on the text of Coles (2001) and Beirlant et al. (2004).
Let X be a random variable, and let the support of X be [a,b] ∈ + (where + are
the augmented reals, notated for simplicity as ). Let FX (x) be the cdf of X ,
otherwise notated as F (x) or just F , where F (x) = Pr( X ≤ x ) ∈ [0,1]. Similarly,
let us assume fx (x) exists and is the probability density function (pdf) for
x
X ∋ d F /dX = f X or FX (x) =
∫f
x
(x) dx.
−∞
Let X1, X 2 ,…, X n be a random sample of size n from the cdf F . We can then
define Yn = max({ X i }) .
To develop useful extreme-value limit laws we need to find linear renormalization
sequences of constants {an >0} and {bn }, an , bn ∈ ∀n ∋
-23-
P[
(Yn − bn )
≤ x ] → GY ( x ) as n → ∞
n
an
where GY G is a nondegenerative pdf (distribution function). Then, G belongs to
n
one of three families of distribution functions. In fact, these assumptions form the
hypotheses for an important theorem concerning limits for the probability
distribution of extreme values:
Theorem 1: If ∃ sequences of constants {an >0} and {bn }, an , bn ∈ ∀n ∋
P[
(Yn − bn )
≤ x ] → GY ( x ) as n → ∞ ,
n
an
(1.7)
where GY G is a nondegenerative probability density, then G( x ) belongs to one
n
of three distribution families:
Type I : exp{− exp( −[
x−b
])}, − ∞ < x < ∞
a
0,
Type II : 
−α
exp( −[ x − b a] ),
x≤b
exp( −[ −{ x − b a}α ]),
Type III : 
1,
x<b
x≥b
Where −∞ < b < ∞, a > 0 and for Types II and III α > 0.
-24-
x >b
(1.8)
The proof of Theorem 1 can be found in Leadbetter et al. (1983). An informal
proof is provided by Coles (2001). The sufficient condition (if the distribution is an
extreme-value distribution [EVD3], then it is max-stable) is a bit of algebra for each
distribution. The necessary condition (if the distribution is max-stable, then it is an
EVD) relies upon some results from functional analysis.
Definition 1: Let HYnn (r ) be the distribution function of Yn = max({ X i }) for
i = 1, 2,…, n , where the X i s are independent random variables, and let H X (r ) be
the distribution function of the independent X i s. Then H X is said to be max-stable
iff (defined as if and only if) for all n > 1, n ∈ , there exist constants An > 0 and
Bn such that
HYnn ( An r + Bn ) = H X (r ) [is identical in law ] .
(1.9)
Under max-stability, the distribution of the independent random variable is the
same (identical in distribution or law) as the distributions of the extreme values.
That is, if H n is the distribution function of Yn and the maximum order statistic of
{ X i } and these independent variables each have an identical distribution
function H , then max-stability is a property that is satisfied by these distributions.
3
Throughout the discussion below the investigator will, as appears to be common usage refer the
non-unified set of distributions as EVD and the unified form of the theory (discussed below) which
describes all three EVDs under one distributional form as the GEV or generalized extreme value
theory.
-25-
The property is that a sample of maxima yields an identical distribution—up to a
change of scale and location. The connection with the extreme-value limit laws is
made by the following theorem:
Theorem 2: A distribution is max-stable if and only if it is a generalized extremevalue distribution (EVD).
Proof (informal; for a formal proof see Leadbetter et al. [1983]): Let us define Ynk
to be the maximum of a sequence of n * k X i s or maximum of k maxima
Yn,i , i = 1, 2,…, k , with each
Yn,i = max(k th set n independent X i ' s ) max({ X1, X 2 ,…, X n } i ) . Assume from
the hypothesis that the limiting distribution of (Yn − Bn ) / An is H . By Theorem 1,
as n gets larger P {(Yn − Bn ) / An ≤ r } ≈ H (r ) . Therefore, for any k > 0
P {(Ynk − Bnk ) / Ank ≤ r } ≈ H (r ) . However, since Ynk by independence is the max of
k i.i.d. Yn , we have P {(Ynk − Bnk ) / Ank ≤ r } = [P {(Ynk − Bnk ) / Ank ≤ r }]k . By
simple manipulation:
P (Ynk ≤ r ) ≈ H (
r − Bnk
)
Ank
r − Bn
≈H (
)
An
k
-26-
(1.10)
Consequently, H and H k are identical distributions—up to a change in location and
scale. Therefore, H is max-stable and from the EVD family of distributions.
1.8.2
Extreme Value Distributions
Theorem 1 is called the extreme-value theorem (EVT); the three distributions are
commonly called the extreme-value distributions (EVDs), and the setup model is
called the block-maxima model (detailed for comparison purposes below). The
beauty of this model is that its result holds true regardless of F or the distribution of
the original observations. Types I, II, and III are commonly known as the Gumbel,
Frèchet, and Weibull distributions. All three have location and scale parameters ( b
and a ), and the Frèchet and Weibull distributions have a shape parameter (α). The
formulation of the Weibull for the EVT is the left tail (finite or bounded upper
value), where for reliability or failure studies the Weibull is reformulated to have a
positive support.
-27-
Gumbel is unlimited, Fréchet has
a lower limit, while Weibull has an
upper limit.
Legend:
W
Weibull
F
Frèchet
G
Gumbel
Figure 1.5 Extreme-Value Distributions Depicted On The Same Plot.
The tail behavior depicted in Figure 1.5 gives more precise distinction to the three
families. The Weibull distribution has a bounded upper tail and an infinite lower
tail. The Gumbel distribution is infinite in both tails, with an exponential rate of
decay. The Fréchet distribution has a bounded lower tail but an infinite upper tail
that decays at a rate governed by a polynomial function. As such, for equal location
-28-
and scale parameters there is more density in the upper tail of the Fréchet than in
the upper tail of the Gumbel (Beirlant et al. [2004]).
1.8.3
Unification Through GEV Distributions
And Extreme Maximas
Although Theorem 1 is a fairly astonishing result and has been cited by others as
the CLT for extreme values, there exists a commonly occurring problem of more
than one model being feasible. First, one must decide which of the three models is
most appropriate for the circumstances and the data prevailing at the time the
decision is made. As noted previously, our result suggests a distinction in tail
behavior. Secondly, after the model has been selected the researcher has the issue
of parameter estimation, which is an art in itself.
Alternative formulations of Theorem 1 are provided by von Mises (1954) and
Jenkinson (1955). These formulations combine the Gumbel, Fréchet, and Weibull
distribution families to form a single unified family of distribution. Commonly
called the GEV distribution, the distribution is of the form:
G( x; µ,σ ,ξ ) = exp{−[1 + ξ (
x−µ
σ
)]+ −1 / ξ }
(1.11)
defined on { x : 1 + ξ ( x − µ ) σ > 0}, where [ z ]+ = max(0, z ) ,
−∞ < µ < ∞, σ > 0 and − ∞ < ξ < ∞ . The GEV has three parameters: a location
-29-
parameter ( µ ), a scale parameter ( σ ), and a shape parameter ( ξ ). Note that the
equivalent Types II and III are the corresponding extremal distributions for ξ > 0
and ξ < 0 . In fact the shape parameter ξ governs the tail behavior of the
component distribution family. As we shall see below, ξ > 0, ξ < 0 , and ξ → 0
correspond to the Fréchet, Weibull, and Gumbel distributions, respectively. We can
use these inequalities to establish the relationship between the EVT and GEV
representations.
For the Weibull distribution let
ξ < 0, σ > 0 and defined on set { x : [1 + ξ ( x − µ ) / σ ] > 0} .
G( x ) = exp{−[1 + ξ ( x − µ ) / σ ]+ −1 / ξ } for −1 ξ = α > 0
= exp{−[(σ + ( − | ξ |)( x − µ ) / σ )]+α }
= exp{−[(σ − | ξ | ( x − µ ) / σ )]+α }
= exp{−[ − | ξ | ( x − µ − σ / | ξ |) / σ ]+α }
α
= exp{−[ − | ξ | ( x − µ − σ / | ξ |) / σ ]+ }
= exp{−[ −( x − ( µ + σ / | ξ |)) /(σ /(1 / | ξ |))]+α }
= exp{−[ −( x − b ) / a] α } for x < b, a > 0, α > 0
For the Fréchet distribution
let ξ > 0, σ > 0 and defined on set { x : [1 + ξ ( x − µ ) / σ ] > 0}
-30-
(1.12)
G(x ) = exp{−[1 + ξ ( x − µ )/ σ ]+−1/ ξ } for −1 ξ = −α < 0 ( α > 0 given)
= exp{−[(σ + ξ (x − µ))/ σ )]+−α }
= exp{−[(ξ (x − µ + σ / ξ )/ σ )]+−α }
(1.13)
−α
= exp{−[(x − (µ − σ / ξ ))/(σ /(1/ ξ ))]+ }
= exp{−[(x − b)/ a] −α } for x > b, a > 0, α > 0
The Type I or Gumbel family of distributions arises under the limit ξ → 0 and
can readily be seen as a consequence of the definition of e = lim(1 + 1 / n )n , where
n →∞
we have
G( x ) = exp{− exp[ −(
x−µ
σ
)]} .
(1.14)
1.8.4. Extreme Minima
The discussion up to this point reflects the extreme values of maxima. How does
the theory change if there is interest in the minimum extrema rather than the
maximum extrema? Let Wi = − X i , then the Wi s are i.i.d. And let
~
Yn = max({Wi }) , then by EVT and other intermediate results, we have:
~
P (Y n ≤ x ) = P ( −Y n ≤ x ) =
P (Yn ≥ − x ) = 1 − P (Yn ≤ x )
→ 1 − exp( −[1 + ξ (
−x − µ
−1
)]
ξ
), as n → ∞
σ
x − µ * −1ξ
= 1 − exp( −[1 − ξ (
)]+ )
σ
where µ * = − µ .
-31-
(1.15)
The minimum for the Gumbel is derived by once again taking the limit. Using the
definition of e defined as a limit, we have:
G(x;µ, σ) = 1- exp{-exp[(
x +µ
)]} .
σ
(1.16)
Now, we have described the mathematics of both maximum and minimum extrema.
Even though the research proposed here deals with minima, we provide the
following definition: Let a loss be a positive number (and a gain a negative
number), e.g., a loss of 2% is defined as 0.02 and a gain of 3.2% is defined as
minus 0.032. So, large losses are positive. In the following research we examine
maxima rather than minima. From a purely mathematical perspective the analysis
of the probability structure of the minimum of a set of random variables { X i } is
the same as the maximum of the transformed set of RVs {- X i }. This definition
was motivated by the researcher’s initial work, wherein he found that a number of
computations more easily decompose for maxima than for minima. Consequently,
in the remainder of this presentation readers should assume that all references to
maxima or minima in the context of data are after the above transformation.
-32-
1.9
1.9.1
Extreme Value Generation Models
Block Maxima Model
There are other representations of the uncertainty of extreme values. We may
motivate those results by describing mechanisms by which the extrema may be
generated. The block-maxima model is an extrema generation mechanism
associated with the family of GEV distributions. Let X1, X 2 ,… X l be a set of
independent random variables. This set is grouped or blocked into blocks of
length n, (n l ) . Without loss of generality we can assume there are m such
blocks, and we may generate a set of maximum values from the maximum of each
block, defined as Yn,1,Yn,2 ,…,Yn,m . It is this set {Yn,i } under the hypothesis of
Theorem 1 that follows a GEV distribution. In the present context blocks may be
trading weeks (5 days), trading months (~21 days), or trading years (~252 days).
1.9.2
Peaks Over Thresholds Model
There are two other models of extreme-value behavior commonly used. Instead of
defining a set of n random variables and the maximum of the set, define a high
threshold value u (which yields a similar selection challenge). Called “exceedance
over thresholds” (Smith [2003]) or “peaks over thresholds” (POTs), the process
works as follows:
-33-
Fix u , then define the conditional distribution of the random variable X , given
that x > u . From previous results for n (large enough) we have:
F n ( x ) ≈ G( x; µ,σ ,ξ ) = exp{−[1 + ξ ( x − µ ) / σ ]+ −1 / ξ }
n log F ( x ) ≈ −[1 + ξ ( x − µ ) / σ ]+ −1 / ξ
Using the well-known result that for small values of y , particularly y in the range
(0,1), the log( y ) ≈ y :
log(F ( x )) ≈ −(1 − F ( x ))
By substitution we have:
1 − F (u ) ≈ n −1 [1 + ξ (u − µ ) / σ ]+ −1 / ξ for large u
Define the RV as Y = X − u , then from the previous equation we have:
1 − F (u + y ) ≈ n −1 [1 + ξ (u + y − µ ) / σ ]+ −1 / ξ for y>0
From the definition of conditional probability,
P (Y ≤ y | Y > 0) =
-34-
FX (u + y ) − FX (u )
1 − FX (u )
(1.17)
Hence,
1 − P( X > u + y | X > u ) =
1 − FX (u + y )
1 − FX (u )
n −1 [1 + ξ (u + y − µ ) / σ ]+ −1 / ξ
=
n −1 [1 + ξ (u − µ ) / σ ]+ −1 / ξ
ξ (u + y − µ ) / σ −1 / ξ
]+
1 + ξ (u − µ ) / σ
ξ y −1 / ξ
= [1 +
] where σ = σ + ξ (u − µ )
σ +
= G( y ;σ ,ξ ) Generalized Pareto Distribution (GPD)
= [1 +
(1.18)
The results producing the GPD are after Pickands (1975).
1.9.3
Poisson Generalized Pareto Model
A third model is called the Poisson-Generalized Pareto Distribution or P-GPD. This
result is after Coles (2001) and Smith (1989): Let X1, X 2 ,…, X n be a random sample
of i.i.d. RVs, and suppose there exists an {an}, {bn} such that the GEV model
holds, then let Yi ,n = ( X i − an ) / bn , i = 1, 2,… , n . Let Nn denote point processes
defined on 2 as
Nn = {i /(n + 1),Yi ,n }
(1.19)
The first element rescales process times to be (0,1), and the second element gives a
form to the extremes. Form a region A in 2 , where A = [0,1] × [u, ∞] for some very
large value of u . Then, the probability that each point in Nn has of falling in A is
(from a previous result) given by:
-35-
p = P [Yi ,n > u ] ≈ n −1 [1 + ξ (u − µ ) / σ ]+ −1 / ξ
(1.20)
Because the X i are mutually independent, the points of Nn in A are distributed as
per a binomial probability law with parameters n, p . Using standard results from
mathematical statistics, an RV distributed as a binomial converges to a limiting
Poisson distribution as n →∞, p →0, that is, the limiting distribution of Nn
defined on A is Pois(Λ(A)), where Λ is the rate of the Poisson, which is a function
of the area A where the points are located. We have from the previous result:
Λ( A) = [1 + ξ (u − µ ) / σ ]+ −1 / ξ
(1.21)
Because of the defined homogeneity of the process in time, for any region A' =
[t1 , t2 ] × [u,∞] composed of any interval of time [ t1, t 2 ] ⊂ [0,1], the limiting
distribution N ( A' ) is Pois(Λ( A' )) with
Λ( A ') = (t 2 − t1 )[1 + ξ (u − µ ) / σ ]+ −1 / ξ
(1.22)
The resulting process is called by Smith (2003) a P-GPD model.
1.9.4
Relation Between Extrema Models
All three of the models—the block-maxima, the threshold model, and the P-GPD—
are related. Let us demonstrate these relationships, starting each proof with P-GPD.
For the block-maxima model we have:
-36-
Let M = max{X1, X2,…,Xn} and Nn = {i /(n + 1),( X i − bn ) / an } . Thus, the event
{(Mn − bn ) / an ≤ z } is equivalent to Nn(Az )=0, where
Az = (0,1) × ( z, ∞ )
P [(Mn − bn ) / an ≤ z ] = P [Nn ( Az ) = 0]
→ P [N ( Az ) = 0] = exp{−Λ( Az )}
= exp{−[1 + ξ ( z − µ ) / σ ]+
−1 / ξ
(1.23)
}
For an exceedance-over-threshold model, based upon the earlier discussion related
to homogeneity over the Cartesian space for the point process, we can factor by
coordinates the value of Λ over Az. So, let Λ( Az ) = Λ1 ([t1, t 2 ]) • Λ 2 ([ z, ∞]) , where
Λ1 ([t1, t 2 ]) = (t 2 − t1 ) and Λ 2 ([ z, ∞]) = [1 + ξ ( z − µ ) / σ ]+ −1 / ξ . Consequently,
P [( X i − bn ) / an > z | ( X i − bn ) / an > u ] = Λ 2 [ z, ∞] / Λ 2 [u, ∞]
=
n −1 [1 + ξ ( z − µ ) / σ ]+ −1 / ξ
n −1 [1 + ξ (u − µ ) / σ ]+ −1 / ξ
ξ ( z − µ ) / σ −1 / ξ
]+
1 + ξ (u − µ ) / σ
ξ ( z − u ) −1 / ξ
= [1 +
]+ where σ = σ + ξ (u − µ )
σ
= [1 +
(1.24)
One important question for each of these models is the size of each of the blocks,
i.e., the number of observations in a block, the length of a block, or the value of
u —the threshold for discerning the extreme-value region. In these contexts the
issue can be thought of as a bias-variance tradeoff. In the block-maxima model, if
-37-
the blocks are too short or too small a value of n, we will realize a poor fit to the
limiting behavior of the result and hence a biased estimate. On the other hand, too
large an n yields too large an estimate of the variance. One way of overcoming
this for the researcher is to turn to the subject matter domain under examination. In
the analysis of financial data there are natural trading periods such as a week, a
month, or a quarter, often examined over horizons of three, five, or ten years
(subject of course to limitations on the availability of data, which may dictate the
use of shorter time frames). Given the relationships among the models as described
above, we might intuit that low or high estimates of the value u will have a similar
result with respect to the impact on parameter estimation. Indeed, that turns out to
be the case. Once again, too low a value for u will result in biased estimators,
while setting too high a value of u will produce an overly large value for the
estimate of the variance. A series of statistical diagnostics for threshold values is
found in Davison (1984), Smith (1984), Davison and Smith (1990), and Coles
(2001).
1.10 Parameter Estimation
With respect to parameter estimation, although methods of moments techniques
(MME), (see Hosking et al. [1985], methods based on order statistics [Dijk and de
Haan, 1992], and Bayesian statistics) have been used in the literature, likelihood
techniques are most commonly used because of their greater adaptivity to and
-38-
utility in complex model-building circumstances. This does not mean there are not
examples in which other estimation methods perform a credible if not superior job.
For example, according to Madsen et al. (1997), the MME quantile estimators have
smaller root mean square error when the true value of the shape parameter is within
a narrow range around zero. Examples of the use of the Bayesian approach may be
found in Stephenson and Tawn (2004) and Cooley et al. (2006a, 2006b).
Maximum-likelihood estimators (MLEs) of the parameters GEV distributions were
considered by Prescott and Walden (1980, 1983) and Smith (1985). However,
because the end points of the GEV distributions are functions of the parameters
being estimated, the model conditions violate common regularity conditions
underlying theorems that prove MLEs asymptotically converge to the BLUE
estimator (Casella and Berger [2001]). Smith (1985) produced the following
results:
•
If ξ >-0.5, MLEGEV (defined as the maximum likelihood estimators of the
GEV parameters) estimators possess common asymptotic properties.
•
If -1< ξ <-0.5, MLEGEV estimators may be computed, but they do not have
regular or standard asymptotic properties.
•
If ξ <-1, MLEGEV estimators may not be obtained.
If {Zi } mi =1 is an i.i.d. random sample having a GEV distribution, MLEGEV are
defined through the use of the log-likelihood as follows:
-39-
z −µ
l ( µ,σ ,ξ ) = −m × log(σ ) − (1 + 1 )∑ log[1 + ξ ( i
)]
ξ i =1
σ
m
m
− ∑ [1 + ξ (
zi − µ
σ
i =1
)]
(1.25)
−1 ξ
given that
1+ ξ(
zi − µ
σ
) > 0, ∀i = 1,2,…,m.
If ξ = 0 , we compute the log-likelihood of the Gumbel distribution; the limit of
GEV under this assumption is:
m
l ( µ,σ ) = −m × log(σ ) − ∑ (
i =1
zi − µ
σ
m
) − ∑ exp[ −(
i =1
zi − µ
σ
)]
(1.26)
Under the assumptions listed earlier in the paragraph, the approximate distribution
of ( µˆ,σˆ,ξˆ ) is multivariate normal, with mean ( µ,σ ,ξ ) and covariance equal to the
inverse of the observed (computed) information matrix evaluated at the maximumlikelihood estimate (Coles [2001]). In the process of optimization under Newtontype methods this is equal to the inverse of the Hessian matrix evaluated at local
maxima of the log-likelihood function or the local minima of the negative loglikelihood function.
Care must be exercised when estimating GEV parameters with MLE for small
sample sizes, e.g., sample sizes less than 50. In such circumstances the MLE may
be unstable and can give unrealistic estimates for the shape parameter ( ξ ) (see
Hosking and Wallis [1997], Coles and Dixon [1999], and Martins and Stedinger
-40-
[2000, 2001]). On the other hand, MLEs allow covariate information to be more
readily incorporated into parameter estimates. Furthermore, obtaining estimates of
standard error for parameter estimates using MLEs is easier compared to most
alternative methods (Gilleland and Katz [2006]).
1.11 Departures From Independence
It should be clear from the previous discussion that the univariate EVT is
reasonably well developed. However, there are many other threads of active
research. For example, extreme-value research often starts with the assumption that
underlying RVs are i.i.d. that is, independently and identically distributed. Another
important active area of research relates to problems involving the joint extreme
behavior of uncertain phenomena.
With respect to the issue of independence, an important assumption underlying
many of the previous results is stochastic independence, that is, cov(Xt, Xt-j) 4 = 0.
Realistically, financial data are serially correlated and maxima occur in clusters.
Two related issues arise in the literature: (1) How are investigators to decided when
extremes, particularly those defined as peaks above a threshold, represent new
activity or are just a clustering phenomenon?, and (2) How do we go about making
4
t is an arbitrary value of the index sequence use as a reference for other values of the index in the
sequence. In this context it refers to discrete points in time.
-41-
adjustments to the theory to take into account the presence of autocorrelation or
dependency structures among the observations, including seasonal dependence?
1.11.1 Threshold Clustering
In the presence of substantial serial correlation and under a POTs model it is likely
exceedances over a threshold will cluster. A number of methods for dealing with
this behavior have been developed and presented in the literature. The element
many of the solutions have in common is the creation of a definition of a cluster
and then the application of the POTs method to the peak value or function of the
peak values within the cluster. Solutions for defining and handling clusters and
often simultaneously estimating u (the appropriate value of the extremes
threshold) are found in Leadbetter et al. (1989), Davison and Smith et al. (1990),
Smith and Weissman (1994), Walshaw (1994), and Smith (2003).
1.11.2 Serial Correlation Effects
Within series or processes, dependency structure raises issues in several forms.
These include the previously cited issue of clustering. Beyond clustering, we face a
variety of analysis concerns arising from the requirements of stationarity and
independence. As is the case for many temporally ordered sequences, strict
stationarity or the property of consistency in all of the moments without respect to
start position for the sample is a concern for extreme-value analysis. Strong
-42-
stationarity requires that the stochastic properties of series are homogeneous
through time. This means that joint distribution of X t , X t + c1 , X t + c2 ,…, X t +cn is the
same as X t + l , X t +c1 + l , X t +c2 + l ,…, X t +cn + l , where l is a constant and all the subscripts
resolve to an existing index value. In the event a probability distribution exists,
stationarity implies the moments of the two distributions will be identical,
suggesting the probability distribution will be the same. Many theorems assuming
strong stationarity also work (asymptotically) under the assumption of what is
known as “weak stationarity,” wherein it is assumed that only the first two
moments (the mean and the variance) are not changing or are homogeneous over
time. So, as is often the case, the researcher in an extreme-value analysis may be
satisfied with weak stationarity. A series may still be stationary or can easily be
made stationary. Lack of weak stationarity for extreme-value analysis has been
handled in the literature by common methods, including: use of inhomogeneous
models; detrending: differencing, prewhitening, curve fitting, and filtering; and
nonconstant variance: logging and weighting (Chatfield [1996]).
The presence in the data of general serial or temporal dependence clearly impacts
the results of the GEV model as presented. Of course, there are an infinite number
of forms of dependence a series can take on. However, there are fundamentally two
ways to adapt an analysis in which the researcher assumes independence when
-43-
there appears to be a lack thereof. One is to adjust the hypotheses of a given
theorem or analysis approach be used as well as the consequences/results of same
for the lack of independence. In the case of serial dependence a researcher may
need to change the structure of the variance-covariance matrix, for example
indexing the elements of variance-covariance matrix by time step and then adjust
the associated theorem or application accordingly.
A second approach is to take advantage of a consequence of stationary processes,
that is, under a stationary process the strength of the dependence decreases both
mathematically and empirically as the time between the random variables
increases. This leads, for example, to approaches that attempt to create nearindependence by selecting observations at some distance apart from each other.
Coles (2001) calls this satisfying the D(un ) condition, which is iff for all
i1 < … < i p < j1 … < jq with j1 − i p > l ,
| P ( X i1 ≤ un ,…, X i p ≤ un , X j1 ≤ un ,…, X jq ≤ un ) −
P ( X i1 ≤ un ,…, X i p ≤ un )P ( X j1 ≤ un ,…, X jq ≤ un ) |≤ α (n, l )
(1.27)
where α (n, l n ) → 0 for some sequence l n ∋ l n / n → 0 as n → ∞ . The idea is that
under the D(un ) condition the difference between the true joint distribution and the
approximate joint distribution approaches zero sufficiently fast ( the quantity within
the absolute value of the above inequality gets sufficiently close to zero), so that it
-44-
has no effect on the limit of the GEV. One means of effecting this result is to take
samples at a sufficiently large distance apart from one another. Of course, such an
approach cuts the number of observations available for an analysis and therefore
may be impractical in many cases. Finally, serial dependence often is found in the
form of seasonal dependence. Seasonal dependence of order d is such that
Cov( X t , X t +d ) ≠ 0 , for all t. Both for environmental and financial data, seasonal
dependencies are often present for market data that are a function of the time of
year or in the context of a larger-scale business cycle. Extending the extreme
formulations (block maximum, POTs and P-GPD) to incorporate seasonal
dependence can be accomplished by a number of methods, some of which are
commonly used in the context of any type of serial correlation, including:
1. Removal of the seasonal effects, either through fitting trend models to the
data, perhaps using periodic functions, or seasonal differencing the data.
2. Creation and fitting of inhomogeneous P-GPD process models, with the
inhomogeneity defined by allowing the value of λ [eqn. 1.12] (and
potentially other model parameters) to vary and be estimated separately by
season.
3. Expanding the model by allowing either expected values or the
deterministic portion of the model or the error structure to be adjusted by
the presence of covariates.
-45-
We also find some of these techniques useful in the environment of modeling
multivariate extrema, and as a consequence we revisit these topics later in the
thesis.
It should be clear from the previous discussion that stationarity and dependence are
constructs that will be important elements in later results. In particular, the
researcher calls attention now to a complication that arises later in the research in
the guise of analyzing covariation and which, involves stationarity and
independence: the decomposition of variation over a random field into temporal
and spatial components. So there are clearly a temporal component to financial
returns—provoking issues as per the previous discussion—of both stationarity and
independence. Amongst other purposes the use of a Gaussian Process in the model
for multivariate returns values (Section 5.5) was chosen as a method to deal with
these issues. Additionally there are spatial dimensions the researcher will define
later in the research to which financial returns will be indexed. The addition of
spatial dimensions is very common in the study of extremes, especially among
studies of extrema that focus on climatology or weather; for examples see Schlather
and Tawn (2002, 2003), Heffernan and Tawn (2004), and Gilleland and Nychka
(2005). Of course, for natural resources spatial dimensions are more intuitive,
commonly understood geographic ones, e.g., latitude, longitude, and stated plane
feet/coordinates, while the spatial dimensions for finance are considerably more
-46-
esoteric. Regardless of the definition of spatial coordinates, many of the techniques
used are the same.
1.12 Return Values
Extreme values of differing phenomena can be placed on a more comparable scale
through the introduction of estimates of extreme quantiles. If we invert the GEV
distribution function G( x ) , we obtain estimates of the extreme quantiles from the
following equations:
σ
[1 − ( − log(1 − p ))−ξ , ξ ≠ 0
ξ
x p = µ − σ log( − log(1 − p )), ξ = 0
xp = µ −
(1.28)
where p is the probability, such as G( x p ) = 1 − p . Therefore, we may define x p as
the return level associated with the 1 / p return period. Another way of thinking
about this result is as follows: it is expected that the quantile x p will be exceeded
on average once every (1 / p ) base periods (defined by the interval used to group
the data and compute extrema, e.g., weeks, months, quarters, years, etc.).
One of the useful aspects (at least for the research pursued here) of working with
the return value, return level, and return period is the importance of the analog of
this construct in the financial world. Recent definitions of financial risk have
moved beyond simple measures such as variance. Value at risk (VaR), popularized
-47-
in the 1990s, is a means of looking more closely at tail behavior of financial
performance and defining portfolios more directly in terms of confidence
(probability) of the magnitude of a loss. To define VaR (McNeil et al. [2005]) let
α be a confidence value with α ∈ (0,1) . Fix α and fix a period m , then VaR α is
the smallest x , such that the probability of X exceeding x is less than or equal to
(1 − α ), or more precisely:
VaRα = inf{ x ∈ : P ( X > x ) ≤ (1 − α )}, α ∈ (0,1) .
(1.29)
So, VaR is a quantile value of a loss-probability distribution, which is defined by a
period of time, a confidence value (typically 0.90, 0.95, or 0.99), and a loss
distribution (where gains may be considered to be negative losses). Figure 1.6
identifies some key relationships between the concept of the return value and VaRConditional VaR (CVaR), also know as expected shortfall, is the expected-value of
the random variable X | X ≤ x , where x is VaR α . These relationships between
VaR and the return value will be exploited later in this research.
1.13 Multivariate Extreme Distributions
A final (but very important) research thread to overview in this Chapter is the
transitions or changes in theory that occur when moving from a univariate model of
extremes to a bivariate or multivariate model of the stochastic nature of extremes.
Moving from univariate to multivariate models might be described as moving from
-48-
the probabilistic behavior of individual random variables or random vectors made
up of one random variable to random vectors made up of two (bivariate) or more
(multivariate) random variables. The probability laws describing the distribution of
density or probability of two or more random variables simultaneously (if the
random variables composing the random vector are continuous) are known as joint
probability density or probability distribution functions. There are discrete random
variable analogues, however some of the terms are different than the continuous
case. However in this research discrete random variables will not be used.
x
∝∫ t ∗ f (t )dt
cVaR =
T
−∞
Probability R eturn Value =
1-C onfidence level of VaR=
1/return period
Return Value =
Value at R isk (VaR )
x
Figure 1.6 Relationships Between Return Values.
-49-
While there are some well-known results in statistics of specific probability laws or
specific probability families that readily define the formation of joint probabilities
from marginal or univariate random variables, there are no such results that
provide generalizations on the formation of joint GEV distributions from
distributions of lower dimensions. Suggesting and applying a method to allow the
researcher to join the uncertainty in the extreme behavior, in this case of financial
equities, will be one of the key efforts in this research. While much of the previous
research has dealt with the extremes of a single random variable within univariate
circumstances, researchers such as Smith (1989) and Davison and Smith (1990)
have developed regression models within an extreme-value context, allowing
analysts to relate response variables to covariates. In point of fact there are some
important results for the formation of bivariate probability distributions of
extremes. Let’s look at an example (after Cebrián et al. [2003]) for random vectors
( X1, X 2 ) . The distributions of these random vectors are said to conform to an EVD
with unit exponential margins if:
P[X1 > y1 ] = exp(-y1 ) and P[X 2 > y 2 ] = exp(-y 2 ), for y1 , y 2 > 0,
(1.30)
and the joint cumulative density function defined as
F(x1 , x2 ) = P(X1 > x1 , X 2 > x2 )
-50-
(1.31)
possesses the scaling property, e.g., max-stability (while we initially defined maxstability as a univariate specification, it may be generalized to include multivariate
environments [Smith et. al., 1990]):
Fn (y1 , y 2 ) = F(n y1 , n y 2 ),
for any y1 , y 2 and ∀n ≥ 1, n ∈ (1.32)
By a bivariate extension of Theorem 2, Fn (y1 , y 2 ) in the limit is a bivariate EVD.
1.13.1 Dependence Functions And Copulas
An important element in forming a joint distribution is called a dependence
function. If the above cdf can be expressed as:
F(y1 ,y 2 ) = exp-(y1 +y 2 )A[y 2 /(y1 +y 2 )], for any y1 ,y 2 > 0 (1.33)
where:
1
A(w ) = ∫ max[(1 - w )q,w (1 - q )]dL(q )
0
for some positive finite measure L on [0, 1], then function A is called the
dependence function. A random bivariate vector follows an EVD if and only if
F(y1 ,y 2 ) can be expressed as the two equations above (Tawn [1988]).
But results like these are few and farther between for problems of higher
dimensionality. An important set of dependence functions is the extreme-value
copula. In the bivariate context we have:
-51-
C (u1, u2 ) = exp{[log(u1 ) + log(u2 )]
i Ai [log(u2 ) /(log(u1 ) + log(u2 ))]}
(1.34)
where A is a dependence function (defined earlier), 0 < u1, u2 < 1 , and C is the
extreme-value copula (Nelsen [1999]).
In the multivariate context only some of the bivariate results extend (Tawn [1990]).
For example, the application of such results to spatial-multivariate analyses is
problematic. “…. In the analysis of environmental extreme value data, there is a
need for models of dependence between extremes from different sources: for
example at various sea ports, or at various points of the river” (Tawn [1988]).
Further and in a more general context, according to Tawn (1990), there is not a
natural or consequential parametric model of the dependence behavior between two
(and greater) extreme-value marginal distributions. Despite growth in the research
over the past 20 years, “copulas are well studied in the bivariate case, the higherdimensional case still offers several open issues and it is far from clear how to
construct copulas which sufficiently capture the characteristics of financial returns”
(Fischer et al., [2009]).
The conclusion we may draw is that there are no unique or generally agreed
approaches for combining the uncertainty from two or more extreme RVs, and
there are not a lot of generally applicable results for developing joint distributions
-52-
for three or more RVs. In fact, Tawn’s quote provided above could just as well be
applied to the analysis of extrema for securities in general and equities in specific.
Therefore, an important focus for the present research, which will be detailed in the
next and subsequent Chapters, is the effort to recast the problem to characterize the
joint uncertainty in the tails of the distributions as the behavior of the joint model of
the RV as related to a specified set of return levels and return periods. We can
thereby move away from the problem of forming the joint pdf of the RVs analyzed
as extreme values or otherwise.
1.14 Organization Of Remainder Of Dissertation
The organization of the remainder of this thesis is as follows:
Chapter 2: Statement of the Problem and Outline of the Proposed Research
Chapter 3: Data, Data Analysis Plan, and Pre-Analysis Data Manipulations
Chapter 4: Fitting Time-Varying GEVs
Chapter 5: Estimating Time-Varying Return Levels and Modeling Return Levels
Jointly
Chapter 6: Consequences of the Research for Portfolio Formation and Innovation
Chapter 7: Summary and Conclusions, Along With Thoughts on the Current
Research
-53-
2.
2.1
Statement Of The Problem And Outline
Of The Proposed Research
Overview Of The Chapter
In Chapter 2, the problem being examined will be laid out as well as the
technology/statistical framework in which the research will be conducted. These
include the goals of the research, the process and steps by which the research will
be conducted and the domain universe and consequential data sets to which the
research will be applied.
The research will examine monetary returns, in particular extreme monetary
returns of financial equities. Because as described in Chapter 1, Section 1.4,
downside or negative returns are at the heart of portfolio design. Additionally,
because a portfolio is typically composed of multiple securities (multiple equities in
the present circumstance), it is the joint or multivariate behavior of such monetary
returns rather than the univariate behavior which is of most interest.
To this end the research exploits extreme value theory to describe the random
behavior of downside returns. Many disciplines which make use of GEV theory
-54-
find that static or non-time varying models (models in which the distributional
parameters do not change) make sense or are at least sufficient for the dimensions
of time and space over which the research is being conducted. However, financial
markets, especially at the end of the last decade and the beginning of this one, are
very dynamic. Therefore, a key hypothesis in this research is that 1) financial and
market conditions have a significant impact on the behavior of extreme monetary
returns and 2) a time varying GEV model, in which the parameters are functions of
the financial markets and the economy, is more appropriate.
As per issues raised in Chapter 1, Section 1.13.1, the researcher has chosen to use
return values as the basic element for a joint model or multivariate model of
downside behavior. Return values are a means for quantifying and selecting the
investor’s desired level of risk and uncertainty. A joint behavior model will be
developed through the use of a Gaussian Process model (see Chapter 5, Section
5.5).
This Chapter carefully enumerates and introduces both the processing steps as well
as the data types to be employed in the research. It concludes with a section on how
the research maybe used in the domain financial and financial research.
-55-
2.2
Research Threads
Myriad threads can be derived from the earlier research reviewed in Chapter 1.
Pertinent to the present research, these include:
1. Treating univariate processes as members of the generalized extremevalue/Poisson-Generalized Pareto Distribution (GEV/P-GPD) family of
distributions. There are a number of formalisms for modeling an extremal
environment that in turn lead to models describing the behavior of extremevalue random variables. These models and distributions have both clear
conceptual and mathematical relationships to one another.
2. The choice between looking at the estimation of the extreme as a block
maxima or as a value greater than a threshold and the bias-variance tradeoff inherent in the estimation process.
3. The multi-faceted impact of serial correlation and the lack of stochastic
independence—one of the very common consequences of which occurs in
the cluster definition mechanism, wherein under a “peaks-over-thresholds”
(POTs) model the exceedances are defined as belonging to clusters of
exceedances. These clusters are a function of presumed positive
autocorrelation—that is, if a value exceeds the threshold, it is more likely
that the value at the next time step will exceed the threshold than if the
previous value is lower than the threshold. An exemplary solution is
provided in Smith et al. (2006).
-56-
4. The handling of nonstationarity as inhomogeneity in the base distributions
through the use of covariates to describe the form of the inhomogeneity in
terms of changes in the parameters—in other words, moving as much of the
variation as possible into the model of the expected value of the parameters.
5. Issues in the definition of multivariate models as a combination of a
component-wise univariate extreme-value marginal distribution, combined
with a dependence function or copula being used to functionally connect the
independent univariate distributions to the joint probability density function.
6. The definition of return level and its interpretation as the means of relating
the distribution of extremes to the behavior of phenomena under study
within the domain of interest. This result is parlayed in the connection
between return level and the financial concept of value at risk (VaR).
7. The increasingly popular incorporation of spatial covariation/correlation
structures as a means of relating extreme-value point processes. For
example, in the field of environmental or climatological studies the
collection of observations at geographically distributed stations or spatially
diverse locations provides the spatial network, with latitude, longitude, and
altitude (or some related triple) forming the dimensions.
Using many of the research threads described above and in Chapter 1 as prologue,
the goal of the present research is to develop a model to examine the joint behavior
-57-
of a universe of financial securities. With joint behavior we wish to describe in a
stochastic sense extreme returns (to be defined later) over a large universe of
securities, using extreme-value theory as well as market variables (also to be
defined later) and covariation structures from each. The model and its elements will
then be used as an aid to gain insight into and perhaps perform a number of
important finance functions such as risk estimation, portfolio definition, and
designing and forecasting returns. Therefore, it is hoped this model will provide
both explanatory power to aid in understanding the variation in returns as well as
predictive power to aid in the formation of portfolios.
In developing the detailed discussion of the proposed research the researcher has
identified the following goals or threads to be examined in the research. This is not
meant to be a procedural list; procedures will be enumerated in this Chapter and
subsequent chapters. Also, the details underlying these threads will be exposed
elsewhere as appropriate. The goals of this research are:
1. Define joint extreme-value behavior of securities through the behavior of
return levels.
2. Use this joint behavior and the introduction of covariate-driven parameters
to define a model of return levels in space-selected market dimensions.
3. Adapt this model to derive portfolios of target return-level behavior, using
VaR:return-level relationships.
-58-
4. Investigate the propagation of extreme values by introducing extreme
innovations into portions of the market.
We frame answers to these questions as well as provide guidance on how to
generate an understanding of the phenomena of securities’ returns in the process of
laying out the model. The model will be derived as an analogue for the financial
domain of a model proposed by Smith et al. (2006) for the climatological domain,
specifically the distribution of extreme precipitation events over the contiguous
U.S. The current research is aimed at providing a solution to the problem of
modeling extrema in financial returns that moves away from the formalisms that
make use of dependence structures and copulas and the non-uniqueness
formulations that are part and parcel of such approaches.
Figure 2.1 depicts a high-level data-flow diagram of the overall processing steps to
be conducted in this thesis. The processing ovals take in data from outside sources
as well as data in the form of derived results from earlier processing steps, along
with control data (not shown), and generate results and data to be used as input for
later processing steps. Clearly, we could “drill down” into each processing oval to
discover additional processing steps, along with considerable detail. For each of the
processing ovals this drill-down is articulated in the summary in this Chapter but
effected in detail in the named Chapter associated with each processing oval. It is in
-59-
the performance of these processing steps that the results necessary to accomplish
the research threads and produce the accompanying results will be generated.
-60-
Figure 2.1 High-Level Data Flow Diagram Depicting The Organization Of
Research Elements In This Thesis.
-61-
2.3
Ingest, Clean, And Reformat Data
Three major groupings of data will be used in this thesis:
1. Equity5 performance data, including:
•
Equity identifier data
•
Share price data (U.S. dollars)
•
Total return data (returns adjusted for corporate actions, dividends,
capital distribution, stock splits, etc.)
•
Market capitalization net of Treasury shares
•
Volume or number of shares traded
2. Covariate data used to create time-varying estimates of GEV parameters:
•
Time’s arrow
•
Risk-free interest rate (vars. [various] countries)
•
Rates (vars. countries and maturities)
•
Commercial paper rates (vars. countries)
•
Salient interest-rate spreads
•
Consumer spending measures
•
Inflation
5
An equity is financial security which provides the owner a portion (often very small portion) of
the ownership of the business it is issued against. Shares of common stock are an example of an
equity. Price of the equity traded on the public market is the key character of equity used in this
research. The price is a (poorly understood) function of the traders’ expectations of success of the
firm over a range of time frames.
-62-
•
Unemployment rate
•
Trigonometric functions a la Smith et al. (2006)
•
Recognized indices in all major world markets
•
Volatility measures
3. Ancillary data used to segment equity data, return values, in particular:
•
Sector
•
Market capitalization
•
Continent and country
•
Exchange on which the equity is traded
•
Geographies representing issuing corporation headquarters and stock
exchange trading
The emphases of this processing step will be identification of data sources, cleaning
of the data (i.e., missing data processing, filtering time series based upon
completeness), initial transformation or manipulation of the data to create derived
series, and reformatting data for further processing. This effort will be described in
Chapter 3.
-63-
2.4
Fitting Time-Varying GEVs
Recall from Chapter 1, Section 1.8.3, that the extreme-value theorem (EVT) can be
reformulated as a generalized extreme-value (GEV) distribution (reproduced here
for convenience):
G( x; µ,σ ,ξ ) = exp{−[1 + ξ (
x−µ
σ
)]+ −1 / ξ }
(2.1)
defined on the set { x : 1 + ξ ( x − µ ) σ > 0}, where [ z ]+ = max(0, z )
where −∞ < µ < ∞, σ > 0 and − ∞ < ξ < ∞ . Also recall that there are equivalences
between the extrema-generation models described in Chapter 1, Section 1.9.4. In a
preliminary analysis it was concluded that a block-maxima model was a superior
model from both a domain vantage point (focus upon trading periods such as weeks
and months is a common framework used by traders in thinking about their trading
activities) and because of the success in the earlier analysis performing parameter
estimation using the block-maximum-related form of the GEV. To deal with the
likely occurrence of nonstationarity in securities’ returns we generalize Equation
2.1 to allow for time-dependent inhomogeneity in the parameters µ, σ and ξ ; we
reformulate Equation 2.1 as
-64-
ξ ≠0
G( xt ; µt ,σ t ,ξt ) = exp[ − {1 + ξt (
x t − µt
σt
)} + −1 / ξt ]
(2.2)
where: [ z ]+ = max[0,(1 + ξt ( xt − µt ) σ t )]
−∞ < µt < ∞
0 < σt
aS < ξt < bS , with aS , bS being suggest to the limits offered by Smith (1985),
as presented in Section 1.10.
ξ =0
G( xt ; µt ,σ t ) = exp{− exp[ −(
x t − µt
σt
)]}
(2.3)
where: −∞ < µt < ∞
0 < σt
Let X1, X 2 ,…, X n be a time-ordered nonstationary set of extreme returns from a
security, such that X1 is the oldest random variable and X n is the youngest
random variable. Then, we assume each X t ~ GEV( µt ,σ t ,ξ t ). As a consequence
we postulate a model for an extrema-probability distribution (Equation 2.2 or 2.3)
that is inhomogeneous in the model parameters. The inhomogeneity of the
parameters of the distribution is in turn described as a function of covariates. As a
result of identifying the functional relationship between the covariates and the
parameters, changes over time in the covariates are used to describe changes in the
parameters over time.
-65-
This parameterization allows each of the distribution parameters to be expressed in
terms of covariates and to find the subset of covariates that, along with the data,
maximizes the likelihood and yields good-quality measures for other model-fitting
diagnostics. While this research is detailed in Chapter 4, it is worth noting here that
the preliminary data examination provided some observations on model fitting that
directed the manner in which the larger piece of research was conducted. Within
the preliminary analysis the maximum likelihood search proved very sensitive and
catastrophically failed to converge when multiple (and what turned out to be)
nonsignificant parameters were entered into the model at the same time. To achieve
a parsimonious model and avoid the convergence problems observed at the outset
of the analysis, a step-wise strategy (forward and backward) was adopted.
2.5
Computing Return Values And Developing A
Dependence Function Model
As per the material presented in Chapter 1, it is the researcher’s contention that it
will be more advantageous to work ultimately in terms of shortfall probability,
commonly called just “shortfall.” Shortfall is the probability that for a given
threshold and given probability law, the return will be lower than the threshold.
Shortfall probability has, as noted in the first Chapter, Section 1.12, a return period
and return level or quantile. As also pointed out in same section, the shortfall
probability is often used within the financial concept of VaR. In the present
-66-
research this shortfall probability concept is being translated to an examination of
the return period or return level and the return value. Because the return level is
created by computing the inverse of the GEV, the use of a time-varying GEV will
have an impact on the methods used to compute the time-varying return level
(detailed in Chapter 5, Section 5.3).
Having used covariates that reflect behavior in the market and the economy to
estimate parameters of the extreme-value distributions (EVDs), these parameter
estimates are used in turn to estimate the return values and the errors in their
estimates. In Chapter 5, Section 5.3, results specific to each security from the GEV
fittings are used to model return levels. In Chapter 5, Section 5.4 a method to
compute the variance-covariance of return values is described.
Following and using these results, the research examines an interesting and useful
question: are these return values functions of important physical and market
dimensions? Let the return values represent a new set of random variables, xΘˆ ,N
i
i=1, 2,…, (number of securities) K, under the parameterization Θ̂ i for the N th
return level. The realizations of these random variables return values are notated
by x Θˆ ,N . In Chapter 5 (Sections 5.5 through 5.9) the modeling of these (return)
i
values as functions of physical and market dimensions is conducted. Under such
-67-
an outcome, an arbitrary but fixed vector of return values, perhaps a domainmeaningful vector can be described as a function of a host of ancillary
characteristics.
2.6
Portfolio And Innovation Processing
Chapter 6 returns the thesis more explicitly to the problem domain of finance and
portfolio design and evaluation. Armed with the results of Chapters 4 and 5,
Chapter 6 focuses on two areas of research:
1. What do these models have to say about the portfolio formation and
modification problem?
2.
What can we learn from these models about how extreme values
(innovations) propagate through the system of equities and the model
developed from them?
Combining the various datasets and models produces a pool of results we may use
to describe the joint uncertainty of equities over a set of salient market dimensions.
From the expected-return-value model and the joint distribution of return levels, an
appropriately guided investor can select a portfolio that has a particular return level,
while minimizing the tail probability.
-68-
3.
3.1
Data, Data Analysis Plan, And Pre-Analysis
Data Manipulations
Overview Of The Chapter
The focus of Chapter 3 is upon the data which define the basic experimental unit of
the research that is the publically-traded equity. The Chapter covers the type of
data which define the properties of the publically-traded equity and the
environment in which it “lives.” In addition, the Chapter describes the universe of
publically traded equities examined in this research and the manipulations
preformed on the data prior to and in preparation for the fitting of time-varying
General Extreme Value Distributions described in Chapter 4 Section 4.7..
The research involves data from publically trade equities listed on all recognized
stock exchanges world-wide for the year 2000-2001. Salient data concerning the
markets and economy which form much of the environment in which these equities
live were also of interest. Where appropriate, the data used were collected on a
daily basis. The number of equities in the universe was initially 76,424 and
through various filters which were required to facilitate the analysis the number
was reduced to 12,185 of which a sample of 3,000 was randomly selected for
-69-
model construction (sometimes referred to as the training set) and 100 were
randomly selected for the test set.
The three groups of data were used in the research. These were:
1. Performance Data,
2. Covariate Data, and
3. Ancillary Data.
The data are further defined according to their constituent members or variables
and the values these variables may take on.
A number of processing steps were performed and reported upon in the Chapter.
These include:
1.
An analysis of the market value/capitalization data to determine
reasonable cutoff values for a discrete market value global and continental
level classification.
2.
The development and application of a liquidity of trade evaluation.
Liquidity in the trading of securities is a latent or hidden property of the
security. A highly liquid security is one which a trader can buy or sell
readily at a fair price. The liquidity test developed here was applied to the
-70-
equity universe, eliminating those equities deemed illiquid. The test has
now found its way into data processing at the Thomson Reuters’ index
generation business unit.
3.
The reduction or sub-selection of the initially identified set of market and
economic measures into a set of potential covariates for use in the fitting
of time-varying GEV distributions. The covariates will be examined in
Chapter 4, Section 4, for utility in the modeling the time-varying GEV
parameters. The method used in performing this analysis is based upon
Factor Analytic approaches and has now found its way into data
processing at Thomson Reuters’ index generation business unit for the
generation of optimal indices6.
6
Optimal indices are portfolios of securities wherein the security weighting scheme is generated by
analytical research. This approach stands in opposition to the approach of passive index generation
in which the weight scheme is taken from a common publically known measure such as proportional
market capitalization.
-71-
3.2
Classification Of Data Types
Figure 2.1 portrays the data analysis plan for the proposed research. It consists of
four major analysis stages, each of which contains myriad processing steps. In
performing this research the data used falls into one of three groups. These groups,
briefly described in Chapter 2, Section 2.3, are:
1. Performance Data
2. Covariate Data
3. Ancillary Data
The basic experimental unit for the performance data is an equity issued by a
publicly traded company. While it may be quite complex in structure, each share of
equity at a first approximation represents a portion (usually a small portion) of
ownership of the issuing company. The equity gives the holder a claim against the
firm owner’s equity. Since the owner’s equity is the (potentially) volatile
accumulation of undistributed profit, unlike with bonds or fixed income assets there
is no a priori fixed upper limit on the value of equity assets. Nor is there a lower
limit, short of the physical value of zero (bankruptcy) for equities. Since bond
holders are in front of equity holders in the case of dissolution of a company, the
probability of equity assets being zero is always greater than or equal to the
probability of bond assets being zero. Prices of and, in turn, returns on equities tend
to move more rapidly and to a greater degree reflect the goings-on in the
marketplace and economy. This behavior led to a preference for using equities in
-72-
the current analysis. Secondly, by restricting the analysis to common share equities
and avoiding other asset classes and securities, the model was likely simplified at
this initial stage of research.
3.3
Performance Data
All data source references are found in the Section Sources of Data, the last section
of the dissertation. The sources of the performance data were the following
systems: FactSet Research System (FactSet [2007]), Security Analysis and
Validation Modeling (StockVal, Reuters [2007]), Lipper Analytical New
Application (LANA, Lipper [2007]) and the Lipper security master file (Lipper
[2008]). Daily data for four performance time series were downloaded for the
period from January 2000 until the end of August 2007. These series, collectively
known hereafter as the performance or market series are given in Table 3.1.
Table 3.1
Equity-based time series used in present research.
Abbreviation/Name
PR, Equity Price
TR, Total Return
MV, Market Value
VT, Trade Volume
3.4
Description
Daily closing price of a common share provided in U.S. dollars.
Daily return of an equity adjusted to add back dividends and capital
distributions.
Daily market value of the equity issung firm given in U.S. dollars.
Daily number of shares traded.
Ancillary Data
Other points-in-time attributes/elements previously identified as ancillary data were
also retrieved from the FactSet system and to a smaller degree from the Lipper
-73-
security master file (SMF). These included:
Table 3.2 Description of ancillary dataset used in the equity sample design.
Name
Continent
Country
Exchange
Exchange
Region
Trade Region
Industry
Sector
Description
Symbol
CN
The continent of security domicile.
These include:
F==Africa
A==Asia
E==Europe,
N==North Ameica
S==South America
CO The country of security domicile. (See Appendix A
for list.)
Exchange on which the security is traded. (See
EX
Appendix A for list.)
ER
Region in which the exchange operates:
Africa
Central Asia
Eastern Europe
Middle East
North America
Pacific Rim
South America
Western Europe
TR
Trade region of the organization issuing security:
Africa
Central Asia
Eastern Europe
Middle East
North America
Pacific Rim
South America
Western Europe
Industry in which issuing firm particpates. (See
IN
Appendix A for list.)
Sector in which issuing firm partcipates. (See
SE
Appendix A for list.)
-74-
The data elements displayed in Table 3.2 were used in the design and selection of
the securities sample as described below.
In this data preparation portion of the research, reducing the number of securities7
included in the performance data set was desired. The goal of this data-reduction
effort was to create a sample from which could be discerned the pattern of change
in joint extreme-value behavior, particularly as a function of financial and market
factors embodied in the ancillary and covariate dataset8, but which was within the
researcher’s time and computer resource budget. Put more simply, the initial
discussion in this Chapter focuses upon reducing the number of securities in the
performance series to a much smaller number so the number of securities to be
analyzed is tractable, while at the same time maintaining much of the structure of
joint extreme-value behavior found in the larger set of securities. Table 3.3
provides counts associated with each of the funneling steps, which in turn are
described below.
7
The word securities, unless otherwise noted, in this dissertation means common shares or common
equities.
8
The covariate dataset is discussed in much greater detail below.
-75-
Table 3.3 Reducing the number of securities by processing stage.
Processing Stage
Original number securities
Having no return values over last four
years
Not having at least 6 years return values
2002 to 2007
Gaps of length longer than two
Missing a market cap value
Not having disclosed sectors
Useable series with respect to length,
number of original observations,market
cap value and identifed industry sector
Count
Remaining
76,424
(35,821)
40,603
(16,481)
24,122
(1,270)
22,852
(5,561)
17,291
(1,763)
15,528
15,528
In a preliminary analysis the researcher decided that the block-maxima model using
a week or month as the block provided both a sufficient number of observations in
the context of six or seven years of data, as well as stable nondegenerative
estimates of the various extreme-value parameters. Consequently, the first three
processing thresholds were performed to filter the securities so that the securities
passing through these selection criteria possessed the required length and time
continuity of data. Additional reductions of 7,324 equity series were made because
in a sequential evaluation there were no nearly complete market-capitalization
values for 5,561 equity series, and there were no industry or sector attributes for
1,763 equity series.
-76-
Even though this processing reduced the number of securities by two-thirds, it was
still considered impractical by the researcher and the thesis committee to process
15,528 securities. Although it was selected in a somewhat arbitrary fashion, various
committee members suggested that if the methodology worked for a universe as
large as 15,528, it would work for a representative but smaller universe. The
number of 3,000 was adopted. The need for a representative sample suggested a
stratified sample. Four factors were selected for stratification: The first was the
near-term market capitalization, commonly defined as the total value of a
company’s outstanding shares, calculated by multiplying the price of a common
share of an equity by the number of shares outstanding in the marketplace
(Investopedia [2007]). The other three factors used to stratify the samples in this
analysis were the continent on which the security is domiciled, the sector
classification under which the firm offering the security has its primary product or
service offering, and the trade region in which the firm, against which the security
is positioned, operates. The list of continents used in this analysis is given in Table
3.2, and the sectors are listed in Appendix A. While drilling down to a more
granular level such as country and industry was a consideration, after examining
the distribution of securities over this extended classification, the resulting schema
appeared too sparsely covered. Trying to build models using this schema could
result in biased estimates of model parameters, so the schema was paired back to
the one described earlier.
-77-
3.5
Market Capitalization
The market capitalization of the security-issuing entity was a daily measure
provided by FactSet (2007). The firms were grouped into six classes, according to
numerical market-caps: mega-capitalization (hereafter abbreviated in many
instances as cap), large-cap, mid-cap, small-cap, micro-cap, and nano-cap.
According to Answers.com (2007), the market-capitalization cut points for 2006
were:
•
Mega-Cap: Market-cap of $200 billion and greater
•
Big-/Large-Cap: $10 billion to $200 billion
•
Mid-Cap: $2 billion to $10 billion
•
Small-Cap: $300 million to $2 billion
•
Micro-Cap: $50 million to $300 million
•
Nano-Cap: Under $50 million
Allowing for an estimated growth of approximately 4.9%, according to the
International Monetary Fund (2007), the cut points in Table 3.4 were used to divide
the firms into capitalization classes. Computing an average market capitalization
for the last quarter of a year of the investigatory period (roughly covering the
period June-August 2007), 51,590 members of the original universe were found to
have sufficient observations. Figure 3.1 is a log-log plot of the market-caps versus
ranks for these 51,590 equity series. The axes for this plot are log rank of market
capitalization versus log market capitalization in nominal dollars. Superimposed on
-78-
the plot are the capitalization cutoffs per Table 3.4. The mega-caps are clearly
dominated by securities traded on North American and European exchanges.
Table 3.4 Market capitalization classes and their cut points.
Capitalizations Upper Cut Point Lower Cut Point
Mega Cap
$∞
$ 209.8B
Big/Large Cap
$ 209.8B
$ 10.49B
Mid Cap
$ 10.49B
$ 2.098B
Small Cap
$ 2.098B
$ 314.7M
Micro Cap
$ 314.7M
$ 52.45M
Nano Cap
$ 52.45M
$ -∞
(B=billions, M=millions)
Figure 3.2 is a notched box and whisker plot of logged market capitalization for the
same 51,590 equity series by the continent of the exchange on which the security is
traded. While the extreme values for each of the continents vary considerably, the
distributions for each continent are fairly closely centered as measured by the
sample medians. In fact, for Africa, Asia, Europe, and North America the notches
appear to align, and the data suggest we failed to reject a significance test (p value
= 0.0931) with the null hypothesis that the medians are equal (Tukey [1977]).
-79-
Mega
Large
Mid
Small
Micro
Nano
Figure 3.1 Log-Log Plot Of Market Caps Versus Rank For 51,590 Equity
Series By Continent Of Domicile, Overlaid By Market Caps As
Defined In Table 3.4.
-80-
Figure 3.2 Notched Box And Whisker Plot Of Logged Market
Capitalization Of 51,590 Equity Series.
When the same plots for the reduced universe of 15,528 are created, we can see
some changes to the rankings. In Figure 3.3—the log-log plot of the market caps
versus rank for the reduced universe—the North American equities dominate the
top spots with respect to market cap, displacing the Europeans and leading to the
equity series in the larger universe. The median capitalization levels compared
across the two universes are very close, and the medians across the continents for
the reduced universe (Figure 3.4)—while they show a bit more variation (p value =
0.0279) than the larger universe—are still reasonably close. It is the researcher’s
-81-
conjecture that the reduction in the universe from the hypothetical universe to the
available universe does not strongly bias the market-cap distribution.
Mega
Large
Mid
Small
Micro
Nano
Figure 3-3 Log–Log Plot Of Market Caps Versus Rank For 15,528 Equity
Series By Continent Of Domicile, Overlaid By Market Caps As
Defined In Table 3.4.
-82-
Figure 3.4 Notched Box And Whisker Plot Of Logged
Market Capitalization Of 15,528 Equity Series
3.6
Equity Liquidity
Examination of the return series shows that a value of 0 for the daily return was not
uncommon. According to (FactSet staff, personal interview, November 5, 2007),
these are accurate observations and are not indicative of missing or poor-quality
observations. What does a 0 value for a daily return mean? Well, since a daily
return is defined as:
Rt =
(Pt − Pt −1 )
Pt −1
where: Rt is the total return for time t
Pt is the price adjusted for corporate action at time t
-83-
(3.1)
Therefore, a return value of 0 suggests Pt = Pt −1 . It is most likely in the event of
substantial numbers of the daily values being 0 that little or no trading of the
security had taken place. Such a circumstance is known as illiquidity. In the event
of illiquidity “an asset or security cannot be converted into cash very quickly (or
near prevailing market prices)” (Financial.Dictionary [2008]). A market-clearing
price for a trade is not easily or typically as rapidly found for an illiquid equity as
for a liquid one. The price movement of the illiquid equity is “sticky” and not as
regularized; also, the response of the return to the market factor is likely to be much
more idiosyncratic. In the context of this research a major consequence of such a
result would be that the fitting of a generalized extreme-value (GEV) model would
be problematic because, as is described in the next Chapter, the method of
parameter-fitting used in this research is a highly nonlinear, iterative search. The
manifestation of this lack of liquidity would be the failure to converge to
reasonable maximum-likelihood estimator (MLE) GEV parameters (as defined by
Smith [1985] and reported on in Chapter 1, Section 1.10) or to converge at all.
The GEV model-fitting activity will be developed in detail in Chapter 4. However,
to look at the relationship between the distribution of zeros in the extreme-value
series (liquidity) and the ability to estimate MLE-property-confirming estimates,
non-time-varying estimates (once again discussed in Chapter 4) of the GEV
location, scale, and shape parameters for weekly and monthly extreme-value series
-84-
were computed using the block-maxima model for all 15,528 equities resulting
from the filtering detailed in Table 3.3. If the equities did not converge or did
converge but did not meet the criteria laid out by Smith (1985), reproduced here for
convenience, then a predictor variable ascribing the successful fit of the equity was
assigned the value of 0. On the other hand, a successful convergence meeting the
Smith criteria was assigned a 1:
•
If ξ >-0.5, MLEGEV estimators possess common asymptotic properties.
•
If -1< ξ <-0.5, MLEGEV estimators may be computed, but they do not have
regular or standard asymptotic properties.
•
If ξ <-1, MLEGEV estimators may not be obtained.
Additionally, a number of independent (predictor) variables were created from the
analysis of the extreme weekly and monthly series. These were:
•
Largest number of consecutive zeros (Max.Gap)
•
The percentage of zeros in the extreme-values series (Portion.Zero)
•
The number of “zero gaps,” namely, the number of sets of consecutive zeros
separated by sets of one or more non-zeros (Num.Gaps)
A zero implies no change in price. The dependent random variable describing GEV
model quality was modeled as a function of the functions of the sets of zeros, using
a logistic regression model (Neter et al. [1996]).
-85-
Table 3.5 Analysis of deviance table from stepwise logistic regression for
weekly extreme-value series.
Source
Intercept
Portion.Zero
Num.Gaps
Portion.Zero X Num.Gaps
Max.Gap
Portion.Zero X Max.Gap
Df Deviance Resid.Df Resid.Dev P(Q>=q)
NA
NA
15,527 13,889.15
NA
1
7,747.07
15,526
6,142.08
0
1
458.05
15,525
5,684.03
0
1
125.55
15,524
5,558.48
0
1
60.02
15,523
5,498.46 <0.001
1
82.96
15,522
5,415.50
0
Table 3.6 Analysis of deviance table from stepwise logistic regression for
monthly extreme-value series.
Source
Intercept
Portion.Zero
Max.Gap
Df
NA
1
1
Deviance
NA
9,170.26
61.59
Resid.Df
15,527
15,526
15,525
-86-
Resid.Dev
10,349.14
1,178.88
1,117.29
P(Q>=q)
NA
0
<0.001
Table 3.7 Cross-tabs from the assignment of weekly extreme-value
series to classes.
Obs.Failed
Obs.Converged
Total Column
Pred.Failed
Pred.Converged Total Row
2,014
599
2,613
730
12,185
12,915
2,744
12,784
15,528
Table 3.8 Cross-tabs from the assignment of monthly extreme-value
series to classes.
Obs.Failed
Obs.Converged
Total Column
Pred.Failed
Pred.Converged Total Row
1,487
117
1,604
69
13,855
13,924
1,556
13,972
15,528
The predictor selection method was a stepwise process using both forward and
backward steps. The results (not shown here) exhibited a strong inverse
relationship between conversion and liquidity measures. Tables 3.5 and 3.6 are
analysis of deviance (AOD) tables from the logistic regression for weekly and
monthly extreme-value series, respectively. For both series the coefficients of the
predictors are significant at small values of p. Also, for both series the portion of
zeros found in a series played a leading roll in convergence. More zeros means the
GEV is less likely to converge.
-87-
Tables 3.7 and 3.8 are cross-tabs relating the observed convergence and failure to
converge (denoted as Obs.Converged and Obs.Failed, respectively) to the predicted
convergence and failure to converge (denoted as Pred.Converged and Pred.Failed,
respectively). The predicted classification was created by thresholding the predicted
probability at 0.5. This value was chosen by default because there was no
information to suggest reducing the uncertainty in a particular direction.
Furthermore, when a series of threshold values on either side of 0.5 were tested it
was the misclassified cells which changed the most, swapping values in a direction
dependent on the direction of the movement of the threshold away from 0.5. The
correctly classified cells remained fairly constant under this examination. When the
monthly and weekly datasets were examined separately greater liquidity was seen
in the monthly series, with 13,857 liquid series versus 12,185 for the weekly series.
Further, all securities found in the weekly liquid set were also in the monthly liquid
set. (This result supports the view that increasing the number of zeros in the set
increases illiquidity, which in turn provides a degenerative extreme-value
distribution, certainly under the block-maxima method.)
Note that every monthly series having a maximum of zero over a given month
resulted in four or five zeros in the weekly series. Not only were the illiquid series
problematic for extreme-value parameter estimation, their existence contradicted
the consequences of the efficient market hypothesis (Fama [1965]). Therefore,
-88-
other idiosyncratic behavior could have been expected from them, particularly
when modeling joint distributions of uncertainty. The useable universe of equities
was reduced for further study to the 12,185 securities observed and predicted to
converge in the cross-tabs of the liquidity analysis of the weekly extreme-value
series.
3.7
Covariates
The remainder of this Chapter is dominated by a discussion of the preparatory data
analysis and selection of the covariates to be used in the time-varying estimates of
the GEV parameters. This latter analysis will be discussed in detail in Chapter 4.
All sources of covariate data are detailed in the Section Sources Of Data, the last
section of the dissertation.
An initial set of 139 covariate series were collected. Each series consisted of daily
observations for the period from December 31, 1999, until August 31, 2007. (The
sources of these data series are noted in the Section entitled Sources Of Data.) The
types of data found in this initial set were:
-89-
Table 3.9 Globally recognized benchmarks and indices.
Index/Benchmark
Country
Dow Jones Indus. Avg
USA
S&P 500 Index
USA
Nasdaq Composite Index
USA
S&P/Tsx Composite Index
Canada
Mexico Bolsa Index
Mexico
Brazil Bovespa Stock Index
Brazil
DJ Euro Stoxx 50
EU
FTSE 100 Index
UK
CAC 40 Index
France
DAX Index
Germany
IBEX 35 Index
Spain
S&P/MIB Index
Italy
Amsterdam Exchanges Index
Netherlands
OMX Stockholm 30 Index
Sweden
Swiss Market Index
Switzerland
Nikkei 225
Japan
Hang Seng Index
S&P/ASX 200 Index
Hong Kong
Australia
The series included various-term interest rates and selected other metrics from the
seven global central banks. The central banks were:
•
Bank of Canada
•
Bank of England
•
Bank of Japan
-90-
•
European Central Bank
•
Swiss National Bank
•
The Reserve Bank of Australia
•
The U.S. Federal Reserve Bank
Other covariates in the initial set were:
•
Volatility measure of the Chicago Board of Exchange (CBOE)
•
Lehman Brothers fixed income indices (various)
•
Dollar futures indices of the New York Board of Trade (NYBOT)
•
U.S. mortgage rates (various maturities)
•
Rates for Moody’s AAA and BBB corporate paper
•
Russell equity indices (various)
•
Interest rate swap rates (various maturities)
3.7.1 Sub-Selecting Covariates
While this set of covariates was very rich, it was also too large to use in its entirety.
It was also undoubtedly the case that many of the covariates possessed moderate to
strong correlation with other members of the set. So, the task became one of
retaining much of the covariation within the set of covariates, while at the same
time reducing the dimensionality of the set. One procedure for doing this was to use
a factor-analytic procedure to form factor scores and use these scores as the
reduced set of covariates (Johnson and Wichern [2002]). However, the problem
-91-
with factor scores was that they leave something to be desired in terms of
interpretation; that is, unless we can give a compelling interpretation to a factor, we
lose understanding of what is driving the parameter from a phenomenological or
domain perspective.
In another context the researcher developed a patent-pending method (Labovitz et
al. [2007]) using a factor-analytic model within an iterative framework to extract a
set of securities that reduces the number of securities under consideration but
retains the diversification protection found in the universe of securities. Clearly, it
is important to retain the identity of the securities because the investor must select
from the securities to form a portfolio. The method was adjusted to perform in the
present circumstances, so that in an analogous fashion, while the dominating
patterns of variation are retained in the set of covariates, the identities of the
covariates’ terms are also retained to gain greater domain insights.
As a preprocessing step those covariates not already presented in terms of returns
or rates were converted to decimal returns using Equation 3.1, with index values
taking the place of price as required. The covariates then were transformed to
decimal representation from a percentage representation where necessary. Finally,
the value 1 was added to each covariate value, and the resulting quantities were
logged. (The logarithm transformation is commonly used with returns as a
-92-
variance-stabilizing transform as well as a transform to drive the sample closer to a
normal or Gaussian form.) The modified “high-grading” procedure was applied to
the data:
Start with formation of a data matrix, wherein there are n objects (e.g., days) and
p properties (e.g., covariates). The data can then be depicted in a matrix of
n rows and p columns. Consistent with matrix notation, let’s call the matrix X
and the xi , j the entry at the intersection of the ith row (where i = 1, 2,… , n ) and the
jth column (where j = 1, 2,… , p ), which corresponds to the measurement of the jth
property on the ith object.
The Factor-Analytic (FA) model (results from Johnson and Wichern [2002])
hypothesizes that X is linearly dependent upon unobservable (often called latent)
random variables (known as factors). For a detailed description of the FA model,
the reader is recommended to Johnson and Wichern [2002]. It is sufficient to say
that the FA model operates on a positive semi-definite matrix commonly the
covariance or correlation which is a function of X , commonly denoted Σ or Ρ .
To compute the factors a principal components extraction (PCE) method (Johnson
and Wichern [2002]) was performed. The PCE is based upon several theorems
-93-
from linear algebra. The underlying assumption is that the matrix being operated
upon is symmetric and real, and therefore, by the spectral decomposition theorem
for reals, it has an orthonormal basis of eigenvectors and a full complement of
eigenvectors (Lay [2005]). Further a theorem corollary has that such a matrix can
be written as a sum of its eigenvalues and eigenvectors.
In our present circumstance we have: Ρ which is a standardized data covariance
matrix (defined as a correlation matrix), and by the definition of distance under a
Euclidean metric Ρ is a positive definite, so all the eigenvalues are positive. From
other results of linear algebra, it can be shown that for the FA model operating on
correlation matrix, the proportion of the variance explained by the model is a
function of eigenvalues and the “loadings” or the correlation between the factors
and the properties (columns) of the data matrix X (or more appropriately its
standardized form) is a function of the eigenvectors. However the loadings are
determined only up to an orthogonal matrix. From linear algebra we know that an
orthogonal matrix corresponds to an orthogonal transformation, which in turn
corresponds to a rigid rotation or reflection of the coordinate system. The objective
of applying such a transformation is to maximize the loading of properties on one
factor (i.e., approaching 1.0 or -1.0) and to minimize the loading of these properties
on all other factors (i.e., approaching 0). The reason to have multiple properties
with high, similarly signed loadings on the same factor is because the properties are
-94-
varying together, and the use of any of the properties from this set would bring the
pattern of variation occurring in the set into the analysis. On the other hand,
properties with high, oppositely signed loadings move in an inverse direction to one
another. Finally, properties possessing high absolute-value loadings, but on
different factors, tend to move independently of one another. It should be noted that
the above discussion primarily holds in the context of linear or near-linear
relationships among properties.
In selecting covariates to use in the examination of GEV, the researcher wanted to
find a set of covariates that might be common to “driving” changes in the extreme
values of many securities, not one unique to one or a very few securities. Therefore,
the researcher, based on his experience in using factor analysis for this purpose and
in concert with the variant of the high-grading process referred to above, sought
covariates possessing commonly shared patterns of variation among the covariates
as the covariates most likely to achieve the objective. Of course, there was the
issue: How do you recognize the common pattern from the unique? While there are
a number of rules of thumb for identifying the factors representing this distinction
between common and unique sources of variation, one based on changes in the rate
at which the number of covariates made it through the high-grading filter was
chosen.
-95-
Table 3.10 Number of covariates satisfying the high-grading criteria as
a function of percentage variation explained and threshold
value.
Number of Covariates Retained Above Threshold*
Precentage
of
Variation
90%
95%
98%
99%
1 and -1
17
19
22
22
2 and -2
29
29
3 and -3
29
32
33
33
4 and -4
36
36
5 and -5
35
36
37
37
* x and -y means the largest x positive covariates above the threshold and -y
means the y covariates with the smallest negative values less than the negative
threshold.
From examination of Table 3.10, a minimum number of 33 covariates high-graded
from 98% of the total variation were selected. In selecting this set of covariates up
to three positively and three negatively correlated covariates per factor were
permitted and each covariate had an absolute value of correlation greater than or
equal to 0.75. This selection came from 30 factors with a distribution of variability
as described in Table 3.11.
-96-
Table 3.11 Distribution of variation in loadings between covariates and factors
for the factor structure arising from the selected high-grading
results described in the accompanying text. (Legend: SS==Sum of
Squares; Var == Variability Explained)
Source
Factor 1 Factor 2 Factor 3 Factor 4 Factor 5 Factor 6
SS loadings
34.9437 19.8385 8.6784
9.0731
2.6177
2.7229
Proportion Var 0.3236
0.1837
0.0804
0.0840
0.0242
0.0252
Cumulative Var 0.3236
0.5072
0.5876
0.6716
0.6958
0.7211
Source
Factor 7 Factor 8 Factor 9 Factor 10 Factor 11 Factor 12
SS loadings
3.7574
2.1740
2.1032
1.7823
2.1048
1.3589
Proportion Var 0.0348
0.0201
0.0195
0.0165
0.0195
0.0126
Cumulative Var 0.7558
0.7760
0.7955
0.8120
0.8314
0.8440
Source
Factor 13 Factor 14 Factor 15 Factor 16 Factor 17 Factor 18
SS loadings
2.1748
1.1614
1.0068
1.0030
0.9635
0.8131
Proportion Var 0.0201
0.0108
0.0093
0.0093
0.0089
0.0075
Cumulative Var 0.8642
0.8749
0.8842
0.8935
0.9024
0.9100
Source
Factor 19 Factor 20 Factor 21 Factor 22 Factor 23 Factor 24
SS loadings
0.7427
1.2001
0.8963
0.5725
0.7112
0.4840
Proportion Var 0.0069
0.0111
0.0083
0.0053
0.0066
0.0045
Cumulative Var 0.9169
0.9280
0.9363
0.9416
0.9481
0.9526
Source
Factor 25 Factor 26 Factor 27 Factor 28 Factor 29 Factor 30
SS loadings
0.5762
0.4300
0.7609
0.3629
0.5654
0.3431
Proportion Var 0.0053
0.0040
0.0070
0.0034
0.0052
0.0032
Cumulative Var 0.9580
0.9619
0.9690
0.9724
0.9776
0.9808
Table 3.12 lists the covariates used in the analyses presented in Chapter 4. The
covariates selected clearly have representatives, in many cases multiple
representatives, from each of the data domains described at the beginning of this
section. Based upon conversations with domain experts, the minimum of 33 was
-97-
augmented by 11, increasing the number of covariates used in further analysis to
44.
Table 3.12 Covariates selected for use in estimating GEV parameters.
Covariates Selected For Analysis
1 Year Swap Rate
FTSE 100 P IX
10 Year Constant Maturity (CM)
Spread: 3M ED - 3M TB
10 Year Swap Rate
Hang Seng Hong Kong
15 Year Mortgage
Japan Nikkei Average 225 Benchmarked
Spread: 15 Year Mortgage Less 7 Year CM
KOSPI IX
Spread: 30 Year Mortgage Less 10 Year CM
Lehman Muni Sec Total Return (TR) Inv
3 Month Euro Dollar Return
Libor Six Month
Australian Bank accepted Bills 180 days
Libor Three Month
Australian Bank accepted Bills 30 days
Moody aaa
Australian Treasury Bonds 2 years
Moody bbb
Austria ATX (Austrian Traded Index ATX)
NASDAQ 100 P IX
Austria WBI (Wiener Börse Index) Benchmarked
BOE (Bank of England) IUDVCDA (Daily
Sterling certificates of deposit interest rate - 3
months)
Brazil Bovespa
Canadian Rate V39072 (Prime corporate paper
rate - 1 Month)
CBOE (Chicago Board Options Exchange) U S
Market Volatility
China Shanghai SE Composite
DJ Ind Average Price (P) Index (IX)
Dollar Futures Index NYBOT
Euro STOXX 50
Federal Funds Rate
France CAC 40
3.8
NASDAQ Composite Index
Russell 1000
Russell 2000
Russell 3000
Russell Mid Cap
S P 1500 Supercomp TR IX
Spread: Invest Grade-5 Year CM
US Interest Rate 10 Years
US Interest Rate 20 Years
US Interest Rate 6 Months
VXO (old VIX)
A Couple Of Final Words On Data Organization
The researcher used large datasets in this research and generated additional large
datasets; many of the individual datasets were in excess of 200 megabytes of data.
In order to maintain “control” of a project with datasets of this size, several
conventions were observed, including:
-98-
1. Organization of different types of analyses in separate databases or separate
directory structures. Many of these directories observed a similar
subdirectory structure:
•
Input data
•
Code
•
Comments
•
Output results
2. A highly non-normalized data structure was observed. This resulted in
records that were very redundant in the repetition of data elements, even
data items that did not vary from record to record. However, the
investigator could pick up any random record and know exactly to what it
referred, without having any additional context.
3. The researcher incorporated in each code a means for automating the
construction of a “meaningful” file name for each file. This allowed
understanding in a detailed fashion of the file contents just from reading the
file name. Taken together—the directory structure, the record format, and
the file naming—no ancillary information was required to navigate the data
side of the research.
-99-
4.
4.1
Fitting Time-Varying GEVs
Overview Of The Chapter
In Chapter 4, the data preparation and pre-processing described in Chapter 3 is
leveraged into one of the central analyses of the dissertation, the fitting of time
varying Generalized Extreme Value (GEV) distributions. Estimation of the
parameters required by the fitting is performed using a maximum likelihood
approach.
In preparation for performing this analysis, two important preliminaries are put
forward. Firstly, and after some experimentation the fitting strategy was adopted -use a Nelder/Mead (gradient-free) search (Nelder and Mead [1965]) in combination
with the quasi-Newton BFGS (Broyden-Fletcher-Goldfarb-Shanno) method
(Nocedal and Wright [1999]). It was discovered that for these block maxima data,
Nelder/Mead fitting had some significant deficiencies amongst these was slow
convergence or even a failure to converge within the parametrically set number of
iterations. Using the estimates of parameters generated by Nelder/Mead as the
initial estimate values for BFGS improved the optimization. This result was an
overall substantial improvement in the value of log-likelihood (actually the
-100-
negative log likelihood) and a higher percentage of series parameter fitting
convergences.
Secondly, up to this point in the research block sizes of a week and month were
used side-by-side in the analyses. It was desired to carry only one of these data sets
forward and the preference was to use the week block maxima data set primarily
because it possessed the greater number of observations. Using both simulation
and theoretical approaches, it was concluded that the distribution of weekly block
maxima data was reasonably represented by the GEV distribution. Furthermore,
the parameters of the GEV fitted by the weekly block maxima data sets changed
appropriately in comparison to the parameters of GEVs fitted from data sets
generated by block maxima for other units of time. The analyses showed that the
distribution of block maxima whose lengths are multiples of the weekly block
maxima series are GEV distributions, if the weekly block maxima series are GEV
and the parameters of these larger block sizes are functions of the weekly block size
parameters. Additionally the shape parameter ξ remains fairly constant in moving
from the weekly block data set to blocks of longer time periods, another indication
that the weekly block maxima series are carrying information similar to that
contained in the larger blocks. So the analysis moved forward with a week as the
basic unit of time for the analysis, and model is set out describing the time varying
-101-
parameters of the distribution as a linear function of a yet to be determined subset
of the proposed set of economic and financial covariates.
Prior to fitting the overall covariate set, an analysis was conducted of the utility of
such covariates as a constant, a trend over time (commonly called time’s arrow)
and periodic functions of various and meaningful cycles. The later were introduced
as sets of sine and cosine functions. A sizeable random sample of the block
maximal equity series (approximately 2,500 liquid block maximal series) were
examined for periodic behavior by analysis of their power spectra. The significance
of the estimated coefficients of the sine and cosine functions was evaluated using
both a permutation test and a test due to Jenkins and Watts (Jenkins and Watts
[1968]). The conclusion was that such periodic functions demonstrated no
explanatory power once a constant and trend were accounted for. Therefore from
these analyses only the constant and trend terms were carried forward as candidate
covariates.
The number of financial and economic series covariates had been substantially
reduced in Chapter 3, Section 3.7, from 139 to 44. However these were daily series
and the returns were weekly, further there is ample evidence in finance of lead/lag
relationships amongst financial and economic measures. Because no guidance
existed, each of the 44 daily financial covariate series were converted for purposes
-102-
of the analysis into nine weekly series by crossing three levels of an aggregation
factor (maximum value, minimum value and median value) with three levels of a
time leading or offset factor (two weeks lead, one week lead and time coincident).
After fitting distributions to a randomly selected set of 200 equity series and
examining the covariates which entered the model, it was decided that the median
aggregation level could be dropped from the larger analysis. This reduces the
number of series per covariate from nine to six and therefore yielded a total of 266
candidates (44*6 + trend + constant) with which to linearly model each of the
GEV parameters. Additionally a constant term was forced into the model of each
parameter in order to create an “embedded model strategy” and thereby facilitate
testing of the improvement which occurred with each term which was added to the
model. It also should be clear that a constant only model (for the three parameters)
is equivalent to fitting a non-time varying GEV.
A stepwise modeling approach allowing a forward and backwards step (candidate
introduction and elimination) was used. On the step after the constants were fitted,
265 combinations of predictor variables were tested one by one against each
parameter to see which maximally improved the model (thereby effectively testing
795 covariates parameter combinations). The following step fitted and evaluated
794 covariates, and so forth. Entry, exit and stopping rules were based upon BIC
(Bayes Information Criteria) and a Likelihood Ratio Test (taking advantage of
-103-
model embedding). The fitting of the covariate models and estimation of
distribution parameters was accomplished by maximizing likelihood model (in fact
minimizing negative log-likelihood) using the optimization strategy outlined above.
The fitting of the 3000 weekly block maxima training series and the 100 weekly
block maxima test series were individually performed and was very expensive
computationally and required the use of approximately 90 processors for 10 days.
The Chapter concludes with a brief analysis of the results from the parameter
fitting, with a very extensive analysis found in Appendix B. While they are not part
of the central thread of the dissertation, these results are important to examine
because they do provide a domain sanity check upon the parameter estimation
effort, which is central. Further, the results do suggest new lines of inquiry which
in future research can be articulated/as hypotheses concerning impacts upon
distribution parameters and their moments. Some of the salient results of the
analysis of parameter fitting include:
1. The models for the distribution parameters had on average approximately
seven covariates in total.
2. 95% of the time-varying models showed significant improvement over the
static model.
3. Financial and economic covariates were as or more important compared to
constants or trend.
4. Of the financial/economic covariates, the plurality were used in the
estimation for µ
µ̂ , followed by ξˆ and σˆ .
-104-
5. Time-contemporaneous covariates were less than half of the covariates
compared to time-lagged covariates (46.6% against 53.4%).
6. Covariates associated with lags of one or two weeks were equally divided,
suggesting that, from an overall perspective, the existence of a lead-lag
relationship, but there is not a strong differentiation by time frame for at
least the first two weeks .
7. The minimum values of the covariate in the block more frequently entered
the model than the maximum values of the covariate in the block -- 62.1%
versus 37.9%, respectively.
8. Distinct groupings of covariates exist with respect to when and in what
formed (aggregation) they entered models, for example global market
indices behavior similarly.
9. When the results were observed from the perspective of market
capitalization, model complexity (in terms of the number of components,
the number of leading components and the number of distributional
parameters which were modeled by covariates beyond the constant) tended
to increase with greater capitalization.
10. Africa, Eastern Europe, the Middle East, South America, and to a lesser
extent North America can—at the gross level of this analysis—be modeled
(in the sense described above) by less complex models.
11. The Pacific Rim and Western Europe, in the same sense as the previous,
tend to be more complex models.
An important take-away is that in the end, the value of fitting the TV GEVs is, as
developed throughout Chapters 5 and 6, to compute the return values from which
are generated surrogates for the “risk levels” securities introduce into a portfolio.
-105-
4.2
GEV Reprise
While the topic and ultimate focus of the present Chapter is upon the fitting of
time-varying generalized extreme-value (GEV) distributions, it should be clear that
the value of time-varying GEVs is fundamentally in their contrast to non-timevarying GEVs. The distinction between these two models lies in the nature of the
function, its form, as well as the parameters it contains, which relate the covariates
to the GEV parameters. Recall from Chapter 1, Section 1.8.3, that we are using a
GEV of the form:
G( x; µ,σ ,ξ ) = exp{−[1 + ξ (
x−µ
σ
)]+ −1 / ξ }
(4.1)
defined on { x : 1 + ξ ( x − µ ) σ > 0}, where [ z ]+ = max(0, z ) ,
−∞ < µ < ∞, σ > 0 and − ∞ < ξ < ∞ . The GEV has three parameters: a location
parameter ( µ ), a scale parameter ( σ ), and a shape parameter ( ξ ). We shall note
this form as the non-time-varying form. The parameters in this distribution are
hypothesized to remain constant for all values of the random variable X .
Recall that in Chapter 2, Section 2.4, we introduced the notion of the time-varying
distribution
G( xt ; µt ,σ t ,ξt ) = exp[ − {1 + ξt (
-106-
x t − µt
σt
)} + −1 / ξt ]
(4.2)
where [ z ]+ = max[0,(1 + ξt ( xt − µt ) σ t )] , −∞ < µt < ∞ , 0 < σ t , and −∞ < ξt < ∞ .
The model being suggested is that the GEV distribution is not a single distribution
but a set of distributions (albeit related) whose parameters change with the random
variable X t —a random sequence indexed by time (t). Further, as discussed in more
detail later in this Chapter, (Chapter 4,Section 4.7) we hypothesize and fit the timevarying parameters µt ,σ t , and ξt as a linear function of the subset of covariates
identified in Chapter 3, Section 3.7. Also discussed in detail are functions relating
the parameters and covariates. These functions are of the form:
pt , j = f (Ct ) = θ 0, j + θ1, j Ct ,1 + θ 2, j Ct ,2 + … + θ m, j Ct ,m
(4.3)
where: pt , j is a parameter of the time-varying GEV— µt ,σ t , and ξt at time t
Ct is a time-varying vector of covariates
θ i , j is the coefficient multiplying the ith covariate (i = 1, 2,..., m ) associated
with the jth parameter
Ct ,i is the value of the ith covariate at time t
While we have started out treating non-time-varying models (NTVM) and timevarying models (TVM) as distinct from each other, it should be apparent from
examining the above linear function that the two models are indeed related. If we
let pt , j = θ 0, j , we see that the parameter pt , j = p j is not dependent on time, i.e., it is
a NTVM. In fact, the NTVM is an embedded model with respect to the TVM, and
fitting it along with TVMs generates fit statistics that allow us to evaluate the
improved fit (if any) that the TVM provides over the NTVM. Therefore, we start
-107-
out this Chapter (Chapter 4, Section 4.3) begins with an analysis of the NTVM
results.
4.3
Non-Time-Varying Model
As indicated in Chapter 3 GEV parameters were successfully estimated for 12,185
and 13,855 equity series of weekly and monthly minimum extremes, respectively.
These series represented those that passed through the liquidity analysis described
in Chapter 3, Section 3.6.
Maximum-likelihood was used to estimate the parameters. The likelihood
estimation procedure used the data to estimate µ ,σ , and ξ , such that the negative
log of the likelihood was minimized. From the cumulative density function (cdf) of
the GEV given above the probability density function is computed as:
f ( x i , µ ,σ , ξ ) =
1
σ
[1 + ξ (
xi − µ
σ
)]−1/ξ −1 × exp{−[1 + ξ (
xi − µ
σ
)]−1/ξ }
(4.4)
where: 1 + ξ ( xi − µ ) / σ > 0
−∞ < µ < ∞,
σ > 0 and
−∞ < ξ < ∞
In turn the likelihood function is computed as:
L ( µ,σ ,ξ , x ) = ∏ i =1  [1 + ξ (
n
1
σ
xi − µ
σ
)]−1/ξ −1 × exp{−[1 + ξ (
-108-
xi − µ
σ

)]−1/ξ } 

(4.5)
Logs of the likelihood yield:
x −µ
l ( µ,σ ,ξ ) = −n × log(σ ) − (1 + 1 )∑ log[1 + ξ ( i
)]
ξ i =1
σ
n
n
− ∑ [1 + ξ (
i =1
xi − µ
σ
)]
(4.6)
−1 ξ
The objective is to minimize the negative of this value, subject to the constraints
enumerated above. There are two additional points to note in this formulation:
1. The constraint 1 + ξ ( x − µ ) / σ > 0 is effected by checking every candidate
three-tuple ( µˆ,σˆ,ξˆ ) for each x in the sample, that is,
1 + ξˆ( xi − µˆ ) / σˆ > 0, i = 1, 2,… n (number of weekly maximas).
2. The candidate values for σˆ are chosen by exponentiating an underlying
value taken from a search region, which ranges over an interval contained
in ℜ ; in other words, we are optimizing over log( σˆ ).
There are no closed-form estimates for the parameters, so an iterative search
strategy must be adopted. The search strategy selected by the author was suggested
by Sain (Sain, S.R., personal interview, December, 2007) and consists of two parts:
The first part uses a Nelder-Mead search algorithm (Nelder and Mead [1965]) to
enter the general region of the solution. The Nelder-Mead method, also called the
downhill simplex or the amoeba method, is commonly used in nonlinear searches
for which a gradient cannot be computed or for which it is difficult or undesirable
-109-
to compute a gradient. Central to the Nelder-Mead method is the object called a
simplex: a polytope of p + 1 vertices in p dimensions, where p is the number of
parameters over which the search is being conducted. In each iteration the NelderMead simplex expands and contracts, depending on the nature and degree of
improvement found for the value of the function being optimized. Nelder-Mead
seems to operate better when the function being searched over is steep. The
algorithm can be slow converging, and in many instances in the preliminary
research effort the algorithm did not converge within the set number of iterations.
These appeared to be occasions when the algorithm either encountered very large
and relatively flat regions or was moving along long ridges in the space. In many
such cases the algorithm failed.
This result was improved by cutting the number of iterations allowed and
increasing the completion threshold tolerance level. Of course, then the open
question—whether or not the Nelder-Mead solutions could be improved further—
was left. To this end several features were added to the search procedure. Chief
among these was taking the results from the Nelder-Mead processing and using
them as the initial starting point in an instantiation of the Broyden-FletcherGoldfarb-Shanno (BFGS) method.
-110-
BFGS is another method for solving nonlinear optimization problems (Nocedal
and Wright [1999]). BFGS is a member of the family of hill-climbing methods that
uses a Newton-like method of optimization to identify a stationary point of the
objective function. Stationary points are those where the gradient of the function is
equal to zero. To this end we assume that, locally, the objective function can be
approximated as a quadratic in the region around the optimum value. Using results
from calculus, we can apply the first and second derivatives to find the stationary
point (Nocedal and Wright [1999]).
In point of fact BFGS is called a quasi-Newtonian method because the matrix of
second derivatives of the objective function with respect to the parameters, also
called the Hessian, is not computed in its entirety but is approximated by
successive additions to the initial guess for the Hessian of inner products of
derivatives along the direction of lines of greatest descent. The BFGS algorithm is
expensive in terms of time and in order to speed it up it requires the user to provide
gradients of the objective function up through the second derivative, which in the
present circumstance required the calculation and then repeated computation of the
three gradient equations for the first derivative and the nine gradient equations for
the second derivative. Since we have an approximation of the Hessian coming out
of the BFGS method, we can use the approximation of the Hessian under
convergence to compute the Hessian inverse, which is the covariance matrix of the
-111-
parameter estimates. Under the asymptotics of the maximum-likelihood estimator
(MLE) procedure the estimates of the parameters ( µˆ,σˆ,ξˆ ) are distributed as
multivariate normal, with mean equal to ( µ,σ ,ξ ) and covariance equal to the justdescribed inverse of the Hessian matrix (Nocedal and Wright [1999]).
The combined approach of using Nelder-Mead/BFGS was augmented with the use
of a number of random starts. There were up to 15 random starts and the values
were chosen randomly from the range of values recommended by Smith (1985). An
estimation of the parameters was considered to have converged if 75% or more of
the random starts had converged, and the estimates used were those associated with
the run yielding the best maximum value of the likelihood. In the results observed
the combined procedure worked far better than the results of using Nelder-Mead
alone. For the equity extreme-value data sets that realized convergence of 75% or
more of the random starts, 100% convergence was frequently observed. Finally, the
algorithm was tested against several test sets found in the literature, and the
coefficients generated compared extremely favorably with the published results.
4.4
Block Size And Distribution
Carrying on from our initial discussion at the start of this section, the 12,185
weekly extreme-value return equity series were a proper subset of the 13,855
monthly extreme-value return equity series. (In future discussions the qualifiers
extreme-value and equity will be dropped unless the context is confusing; the
-112-
reader is to assume the qualifiers apply unless otherwise stated.) As was shown in
Chapter 3, Section 3.6 such a result makes sense in light of the fact that the
parameters of a return series do not tend to converge in the presence of zeros (i.e.,
no price change), and for every zero in the monthly series there will be four zeros
in the weekly series. Further, the presence of the proper subset suggests we might
look for other changes in the form of the distributions as a function of block size.
The purpose of this section is to examine whether or not the distributions of block
maxima of different block sizes (one week, four weeks [a month], 26 weeks [a half
year], etc.) are related. That is, if the set of random variables of the base block size
of one week is a GEV, then are maxima of the time units of greater length (which
can be constructed as a sample of the base block size) also distributed as GEVs,
and how do the parameters change? If this is indeed the case then general results
developed for one week are also likely to be applicable to larger blocks and the
selection of a one week block maxima is less arbitrary from an analysis perspective.
To commence this analysis, we first examined the correlations between the GEV
parameters for the weekly return data and compared these to the correlations among
the GEV parameters for the monthly return data. These correlations are given in
Table 4.1. A likelihood ratio test fails to reject the hypothesis that these correlation
matrices are the same under any commonly observed p-value.
-113-
Table 4.1 Correlations among GEV parameters for weekly and
monthly data series.
Mu
Sigma
Xi
Mu
1.0000
0.9160
-0.2100
Weekly
Sigma
0.9160
1.0000
-0.1144
Xi
-0.2100
-0.1144
1.0000
Mu
Sigma
Xi
Mu
1.0000
0.9193
-0.3671
Monthly
Sigma
0.9193
1.0000
-0.2961
Xi
-0.3671
-0.2961
1.0000
We expect the GEV parameters to change as we move between the weekly and
monthly data sets. With respect to location, recall that we have structured the data
so that the maximums are in actuality the minimums. Because monthly is the
maximum of four weeks, the monthly location parameter should be equal to or
greater than the weekly location parameter. For the scale parameter the high
positive correlation with location and the researcher’s intuition suggest that taking
the maximum of approximately four weeks would introduce more variability. But
care needs to be taken in as much as the location and scale parameters are not the
mean and variance of the GEV distribution. There is little insight into the manner in
which ξ would systematically change with changes in time units. The examination
of the GEV parameter estimates from the 12,185 weekly and 13,855 monthly
distributions showed that location and scale parameters increased significantly and
the shape to a much less degree in moving from the weekly to the monthly time
-114-
frame. This problem was examined both empirically and theoretic methods to see
whether the analysis was sensible.
The empirical examination involved generating or simulating a large number of
sets of a predetermined (block) size, formed from weekly extrema using a sample
of the 12,185 liquid equity series. A randomly selected sample of 100 these weekly
block maxima equity series were extracted. A bootstrap procedure (Efron and
Tibshirani [1993]) was useed to create/simulate multiple samples for each of the
designated block sizes from the weekly maximum series to generate a sample of 50
observations at each block size. These 50 observations were used to estimate the
parameters ( µˆ,σˆ,ξˆ ) of a GEV for the time unit designated. The bootstrap was
performed 1000 times for each of the series for each of the desired time units. The
convenience here is that for this analysis each of the time units examined is defined
as a multiple of the basic week length block. For example, from sets of four weekly
maxima an approximate monthly maximum was obtained, for 13 weekly maxima
an approximate quarter was obtained, and so forth. This analysis resulted in
100,000 block maxima sample of size 50 for each of the time units. From each of
the estimated sets of µˆ, σˆ and ξˆ the mean, variance, skewness, and kurtosis were
estimated. These estimates were computed for months (4 weeks), quarters (13
weeks), half years (26 weeks), and one, three, five, and ten years (52, 156, 260, and
-115-
520 weeks, respectively). Table 4.2, and Figures 4.1 through 4.4 depict the median
values of the parameters and moments analysis by time unit, a discussion of which
follows the graphics.
-116-
-117-
Number Weekly Units
1
4
13
26
52
156
260
520
Equivalent
Time
Week
Month
Quarter
Six-Months
One Year
Three Years
Five Years
Ten Years
Mu
0.0284
0.0712
0.1136
0.1416
0.1713
0.2233
0.2507
0.2896
Sigma
0.0287
0.0336
0.0384
0.0415
0.0449
0.0508
0.0538
0.0582
Parameters
Xi
0.1135
0.1094
0.1116
0.1140
0.1142
0.1172
0.1139
0.1138
Excess
Mean Variance Skewness Kurtosis
0.0485 0.0019
2.0548
12.5326
0.0947 0.0026
2.0088
12.0159
0.1405 0.0034
2.0338
12.2947
0.1708 0.0040
2.0602
12.5946
0.2029 0.0047
2.0627
12.6228
0.2593 0.0061
2.0968
13.0227
0.2886 0.0068
2.0591
12.5823
0.3305 0.0079
2.0583
12.5723
Moments
Table 4.2 Tabulation of results from the empirical (bootstrap) analysis of maxima of
different time units and estimation of related GEV parameters and
statistics.
0.35
Mu (Location)
0.3
Sigma (Scale)
Xi (Shape)
Parameter Value
0.25
0.2
0.15
0.1
0.05
0
0
100
200
300
400
500
Number of Weekly Units
Figure 4.1 Plot Of Estimated Medians of GEV Parameters Over Different
Time Units Expressed In Weeks. (Abscissa: 1 = Week, 4 =
Month, 13 = Quarter, 26 = Half Year, 52 = 1 Year, 156 = 3
Years, 260 = 5 Years, 520 = 10 Years).
-118-
600
0.35
0.3
Legend
Parameter Value
0.25
Mean
0.2
Standard
Deviation
0.15
0.1
0.05
0
0
100
200
300
400
500
Number of Weekly Units
Figure 4.2 Estimates Of GEV Median Means And Standard Deviations
Over Different Time Units Expressed In Weeks (Abscissa: 1 =
Week, 4 = Month, 13 = Quarter, 26 = Half Year, 52 = 1 Year,
156 = 3 Years, 260 = 5 Years, 520 = 10 Years).
-119-
600
14
12
10
Parameter Values
Legend
8
Skewness
Kurtosis
6
4
2
0
0
100
200
300
400
500
Number of Weekly Units
Figure 4.3 Estimates Of GEV Median Skewness And Kurtosis Statistics
Over Different Time Units Expressed In Weeks (Abscissa: 1 =
Week, 4 = Month, 13 = Quarter, 26 = Half Year, 52 = 1 Year,
156 = 3 Years, 260 = 5 Years, 520 = 10 Years).
-120-
600
Figure 4.4
Probability Density Function
10
8
6
4
2
0
-1210 .5
E xtre m e s
1 .0
1 .5
2 .0
Histograms Of Extremes On Different Time Scales (Legend: M = Month, Q =
Quarter, H = Half Year, 1 = One Year, 3 = Three Years, 5 = Five Years, T =
Ten Years).
0 .0
M
Q
H
1
3
5
T
H is to g r a m s o f E x t r e m e s O n D iffe r e n t T im e S c a le s
There appears to be a systematic, increasing relationship between the length of time
unit and the location parameter, but it is unclear if this relationship extends to the
scale parameter and clearly does not extend to the shape parameter estimated from
the bootstrap analysis. The shape parameter, after an initial decrease, increases
slightly, then returns to its starting level. It remains fundamentally unchanging over
the time frames. Computing the moments from the simulation yields a similar
result, namely: the mean increases with the length of the time unit and the variance
(standard deviation), if it increases asymptotes quite rapidly. However, the
estimates of skewness and excess kurtosis do not display the same behavior. The
behavior of excess kurtosis over the time units is very similar to the plot of ξˆ .
The changes in observed values of ξˆ may be a function of the ancillary variables,
such as the industries, the exchanges, the market cap, etc. However, this hypothesis
was not examined here.
We may also look at the issue from the perspective of order statistics. Let
X ∼ GEV( µ,σ ,ξ ). From Casella and Berger (2001) we have:

 ( i −1)
n!
( n −i )
f X ( n ) ( x( n ) ) = 
)f X
 F (1 − F
 (n − i )!(i − 1)! 
-122-
(4.7)
FX = exp{−[1 + ξ (
fX ( x ) =
=
where:
1
ξ
[1 + ξ (
1
σ
[1 + ξ (
x−µ
σ
x−µ
σ
x−µ
σ
)]+ −1 / ξ } and
)]−1 / ξ −1 × exp{ −[1 + ξ (
)]
− 1 / ξ −1
× exp{ −[1 + ξ (
x−µ
σ
x−µ
σ
)]−1 / ξ } ×
)]
−1 / ξ
ξ
σ
(4.8)
}
{ x : 1 + ξ ( x − µ ) σ > 0},
−∞ < µ < ∞,
σ > 0 and
−∞ < ξ < ∞
So, for largest-order statistic of a random sample of size n ,
f X( n ) ( x( n ) ) = n × FX( n −1) × f X
= n • (exp{−[1 + ξ (
x−µ
σ
)] −1/ξ })n−1 • exp{−[1 + ξ (
→ FX( n ) ( x(n ) ) = (exp{−[1 + ξ (
= exp{−n * [1 + ξ (
x−µ
σ
x−µ
σ
)] −1/ξ })n
x−µ
σ
)]−1/ξ } •
1
σ
[1 + ξ (
x−µ
σ
)]−1/ξ −1
(4.9)
)] −1/ξ }
Since we found that 0 < ξ < 1 in this analysis, let cn −1/ ξ = n , which implies cn
=n
−ξ
, resulting in a positive number in the range 0.25<c<1. Taking n inside the
exponent and continuing from above:
-123-
= exp{−[cn + ξ (
cn x − cn µ
= exp{−[
cnσ
= exp{−[
(1 − d n )σ
σ
+ ξ(
σ
σ
)]
cn x − cn µ
σ
+ ξ(
−1/ ξ
)]
}
−1/ ξ
cn x − cn µ
σ
}, let cn = (1 − d n )
)]
−1/ ξ
}
c x − cn µ − (d nσ / ξ ) −1/ξ
σ
+ ξ( n
)]
}
σ
σ
c x − (cn µ + (d nσ / ξ )) −1/ξ
= exp{−[1 + ξ ( n
)]
}
σ
c ξ x − (cnξµ + d nσ ) −1/ξ
)]
}
= exp{−[1 + ξ ( n
ξσ
= exp{−[
c nξ  cnξ x − (cnξµ + d nσ )  −1/ξ
}

)]
ξσ
c nξ 

c
= exp{−[1 + ξ ( n {cnξ x − (cnξµ + d nσ )})] −1/ξ }
c nξσ
= exp{−[1 + ξ (
= exp{−[1 + ξ (
= exp{−[1 + ξ (
 cnξµ + d nσ
c n 
x − 
σ  
c nξ
 
 )]
 
 cnξµ + d nσ
1 
x − 
c nξ
 σ  

 
 cn 
−1/ ξ
 
 )]
 
}
−1/ ξ


 )] −1/ξ }




x − µn
defined as FX( n ) ( x( n ) ) = exp{−[1 + ξn (
)]
}
(4.10)

 cnξµ + d nσ
x −
c nξ


= exp{−[1 + ξ ( 
σ 

 

 cn 

σn
−1/ ξn
}
We have shown that the maximum order statistic of a size four random sample
from a random variable distributed as a GEV, is itself distributed as a GEV. Since
there is nothing in this development which restricts the result to any sample size,
-124-
the result is applicable to the maximum of a random sample of any number of
weeks. So if a weekly block maxima is distributed as a GEV, then the maximum of
a block maximum of a random sample of n weeks is also distributed as a GEV with
location or µn =
cnξµ + d nσ
dσ
σ
= µ + n , scale or σ n =
and shape ξ n = ξ ,
c nξ
cnξ
cn
where c,d are functions of n and ξ . Actually the max stable definition and
Theorem 2, Chapter 1, Section 1.8.1, would guarantee the above result, but
conducting the proof also yielded the forms of parameters as a function of the base
time frame. The earlier simulation shows how the parameters tended to behave over
the range of lengths of blocks for the universe which is being analyzed. In the
further analyses only the weekly block maximum will be used.
4.5
The Time-Varying Model
While approaching this modeling an issue which loomed large was the
dimensionality or number covariates to be used in fitting the GEV parameter
estimates. To this end, a strategy was adopted of breaking the analysis into multiple
steps consisting of subsets of covariates, finding the “best” covariates in each step,
combining the “best” covariates of earlier steps in further analyses, and repeating
until the covariate set was “whittled down.”
-125-
The data set on which the time-varying modeling was performed was a sample of
3,000 weekly extreme-value series covering the period from January 3, 2000 (first
business day in 2000), until August 31, 2007 (last business day in August 2007).
This period represented 1,927 trading days and 400 trading weeks. The series
selected formed a stratified random sample built using the proportions of the five
ancillary factors (identified in Chapter 3, Section 3.4) found in the 12,185 series
created from the data preparation activities. The sample size was consistent with
that suggested by the dissertation committee and represented 3,000/12,185 or
nearly 25% (24.6%) of the available sample. Weekly data were chosen because of
the results of the previous section, and because they produced the largest set of
observations of any time unit, and as found in a preliminary analysis the GEV
models based on a weekly block-maximum framework converged readily with
reasonable results. As seen in the previous section there is a strong relationship
between weekly models and models with larger time units. While it is a reasonable
hypothesis that changes in time scale have an effect on the behavior of financial
market phenomenology, examination of monthly and longer extreme-value series
are left for later study and perhaps for other researchers.
-126-
4.5.1
Matrix Form Of Relationships Between TimeVarying Covariates And GEV Parameters
In the present research the matrix form of the relationship between the GEV
parameters and the covariates is given by:
(1)
µtx1 = X txpθ px
1
σ tx1 = exp(Ztxqθ qx( 21) )
(4.11)
ξ tx1 = Wtxr θ rx( 31)
where X, Z and W are matrices of time-indexed covariate values containing
p, q and r number of covariates, respectively, where θ (px1)1,θ (qx2 )1, and θ (rx31) are
coefficient vectors such that (ΘSx1 )T = [(θ px1 )T ,(θ qx1 )T ,(θ rx1 )T ] ,
and where S=p+q+r .
The time-varying log-likelihood function is defined for the extrema random
variable Yt as follows:
ξ t ≠ 0, ∀t :
log(L( y t ; µt ,σ t ,ξt )) = −1 • [
T
T
t =1
t =1
− ∑ log(σ t ) − ∑ (1 +ξt ( y t − µt ) / σ t )−1 / ξt
T
−∑ (1 + 1 / ξt ) • log(1 + ξt • ( y t − µt ) / σ t ) ]
t =1
-127-
(4.12)
ξ t = 0, ∀t :
log(L( y t ; µt ,σ t )) = −1 • [
T
T
t =1
t =1
− ∑ exp(−[( y t − µt ) / σ ]) − ∑ ( y t − µt ) / σ t
T
(4.13)
−∑ log(σ t ) ]
t =1
For Θ(1) , Θ( 2 )and Θ(3) ∋ 1 + ξ t ( y t − µt ) / σ t > 0 ∀t
4.6
4.6.1
Examining Covariates
Time’s Arrow And Periodic Covariates
One hypothesized set of covariates is a set built solely upon a time index. The
behaviors to be examined included a level constant, a non-cycling time-based trend
(also commonly called time’s arrow), and a set of periodic patterns. These are
described in Table 4.3.
Table 4.3 Description of candidate periodic behaviors.
Period (in weeks)*
Description
104.36
Half cycle per year (bi-annual)
52.18
One cycle per year (annual)
26.09
Two cycles per year (six months)
13.04
Four cycles per year (quarterly)
4.35
Approximately 12 cycles per year (monthly)
2.00
Two week cycles
* Based upon 52.18 weeks per year
-128-
An initial evaluation of 17 equities that estimated GEV parameters was conducted
based on covariates consisting of the constant and trend as well as a sine and cosine
covariate equivalent to each candidate period listed in Table 4.3. The result under
the goal of posing general models was that only the constant term was observed as
significant in a high percentage of the equities. The next most commonly
significant covariate was the trend covariate (albeit at a very small value
coefficient), which could have been confused with long-period periodic behavior.
However there were only 17 equities in this initial analysis and since the present
dataset was much larger and covered the population in a far more complete fashion,
we reexamined these time-based covariates, at least in a preliminary fashion.
Using a long and well-defined decomposition of time series, each equity data set
(of weekly maxima) was first examined for the presence of a trend or constant. To
this end the a KPSS test was run. Named after Kwiatkowski, Phillips, Schmidt, and
Shin (1992), KPSS is a test with a null hypothesis of level or trend stationarity,
against the alternative of a unit root or nonstationarity. While we assumed the use
of a constant in any covariate model, we desired: (1) to see if the trend was likely
present and (2) to see whether the extrema could be made level stationary by simple
differencing. The level stationarity series would be used to test the fit of the
periodic elements.
-129-
To perform the test 2,500 equity series of the 12,185 weekly test series were
randomly sampled. As in the initial study of 17 equities the first 18 months of data
were set aside to reduce transient effects of the processing as well as of missing
values. Natural logarithms of the raw extrema series were taken as well as first
differences of both the logged and raw series. These four derived data sets from the
2,500 series were subject to a KPSS test with the null hypothesis of level
stationarity. Table 4.4 provides the results in terms of p-values > 0.01 (fail to reject
null) and p-values < 0.01 (reject null).
Table 4.4 Number of weekly extremal series failing to reject the null
hypothesis of level stationarity under the KPSS test.
Tally
Raw Diff'd Logged
Raw Data
Data
Data
730
2,500
730
Logged
Diff'd Data
2,500
Null Hypotheis Series is Level Stationarity
It is clear from Table 4.4 that extremal series are overwhelmingly first-difference
level stationary, implying there is likely a trend component (or very long periodic
element, which we treat as a trend). As witnessed in the initial study, the coefficient
of any long term trend is very small and may be confounded with the estimate of
the constant.
-130-
Using the differenced series, power spectra were computed for each series and tests
were performed to get a sense of which sine and cosine terms were significant over
the samples. Table 4.5 provides the results from a permutation test of the power
spectra.
Table 4.5 Results from the permutation test from power spectra examining
hypothesized salient periods.
Period
Proportion-Raw
Proportion-Logged
Every Two
Every Six
Years
Every Year Months
0
0
0
0
0
0
Every
Quarter
0.0004
0
Every
Month
0.2580
0.0648
Every Two
Weeks
0.3144
0.0808
Both logged and raw differenced series were examined. The values in the table
represent the proportion of the 2,500 series for which the power spectrum value
exceeded the 0.95 permutation critical value. Periods under examination greater
than six months exhibited no power, while those less than a quarter of a year
(especially for the raw differenced series) possessed increasing power, such that
nearly one-third of the differenced raw series demonstrated significant power for a
period of every two weeks.
An alternative way of looking at significant power spectrum elements is to compare
the observed spectrum to that of a white noise process (Jenkins and Watts [1968]).
A white noise process is one for which the terms of the series are IID N (0,1).
-131-
Under this assumption the power spectrum values are distributed as χ df2 = 2 . Power
spectra were computed for a standardized version of the time series. The ordinates
for the power spectrum were compared with a 95% critical value ( χ df2 = 2,1−α =0.95 =
5.99). Table 4.6 tallies, by specific preselected periodic elements, the proportion of
the 2,500 differenced raw and differenced logged series that exceeded the test value
and thereby rejected the null hypothesis.
Table 4.6 Results from white-noise test from power spectrum examining
hypothesized salient periods.
Every Two
Every Six
Years
Every Year Months
Period
Proportion-Raw
0
0
0
Proportion-Logged
0
0
0
Every
Quarter
0
0
Every
Month
0.0064
0.0020
Every Two
Weeks
0.0372
0.0096
Similar to but more restrictive than the permutation results, only the monthly and
short period showed any significant periodic elements, and these were minuscule in
number. At best, 3.7%*2,500 = 93 series were observed.
The conclusion reached with respect to the time-based covariates was that, while a
constant and trend term may be used in the parameter fitting, none of the other
initially periodic covariates would be candidate predictors in this research.
-132-
4.6.2
4.6.2.1
Financial Markets And Economic Covariates
A Preliminary Fitting
The set of financial covariates were next examined. The processing of the selected
covariates for use in the model involved several data manipulations. First, daily
returns were computed for the covariates. This involved a difference of present
day – previous day for the interest rate covariates; for the index covariates this
difference was divided by the previous day value. The daily return values from
this computation were used in the analysis and not logged.
The raw returns were grouped into the same weekly timeframe as the equity returns
and aggregated. Since there was no real guidance in the literature as how to
aggregate the observations for use in a covariate model of this sort, three aggregates
were computed for each week—the minimum, the maximum, and the median—to
cover the likely range of sources of influence on the extremal equity events. Also, it
was unclear if the impact of the covariate upon maximum-likelihood estimates
would be week coincident, week leading, or week lagging. Lagging, or using
covariates from a timeframe post the extremal value, was ruled out of the analysis
as being of no great practical value. However, the coincident covariate values as
well as those ante one and two weeks were used. This meant that each covariate
-133-
series could be placed in the model as nine series (three aggregates by three
timeframes).
The correlations among these nine variables were computed and are provided in
Table 4.7. While there are some moderate-strength correlations in the table, it was
not felt that the results warranted eliminating any of covariates at this juncture due
to the potential presence of strong colinearity.
-134-
-135-
R0.ME
1.0000
-0.1969
0.1679
-0.1039
-0.0017
-0.0948
-0.0111
-0.0454
-0.0286
R0.NM
-0.1969
1.0000
0.6518
0.0133
0.6772
0.6926
-0.0440
0.7062
0.6369
R0.PM
0.1679
0.6518
1.0000
-0.0702
0.6432
0.5857
0.0247
0.6477
0.6189
R1.ME
-0.1039
0.0133
-0.0702
1.0000
-0.1942
0.1547
-0.0802
-0.0130
-0.0842
R1.NM
-0.0017
0.6772
0.6432
-0.1942
1.0000
0.6571
0.0219
0.6812
0.6938
R1.PM
-0.0948
0.6926
0.5857
0.1547
0.6571
1.0000
-0.0554
0.6429
0.5868
R2.ME
-0.0111
-0.0440
0.0247
-0.0802
0.0219
-0.0554
1.0000
-0.1877
0.1715
R2.NM
-0.0454
0.7062
0.6477
-0.0130
0.6812
0.6429
-0.1877
1.0000
0.6445
R2.PM
-0.0286
0.6369
0.6189
-0.0842
0.6938
0.5868
0.1715
0.6445
1.0000
Correlation matrix from the covariate series created by crossing aggregation and timeframe
factors.
Legend–RX.Y, where X = 0 for time coincident, 1 = one week prior, 2 = two weeks prior. Y = ME for median, PM =
maximum, and NM = minimum.
R0.ME
R0.NM
R0.PM
R1.ME
R1.NM
R1.PM
R2.ME
R2.NM
R2.PM
Table 4.7
A small sample of 200 equity series was randomly selected, and GEV parameters
were fitted using the step-wise methodology. This effort was designed to provide
guidance for the decision to remove covariate types from the larger analysis. A tally
was computed of the number of models by the occurrence of each of the following
five types of covariates: constants, trends, median aggregations (ME), maximum
aggregations (PM), and minimum aggregations (NM). Table 4.8 depicts these
tallies presented as percentages. As an example, the constant term was found in
100% of the 200 models, while a median term was in 3.5% of the models. In fact,
each of the three constant terms (one each for µˆ,σˆ,ξˆ ) appeared in well over 90%
of the models.
Table 4.8 Percentage representation of the number of models in
which each of the covariate classes appears.
Constant Trend
100%
41.00%
ME
3.50%
PM
100%
NM
100%
From this examination the model-generation strategy was modified to modestly
reduce the effort in two ways: All models were started from a baseline model of the
three constants, equivalent as discussed above to fitting a NTVM, and the median
aggregation covariate was dropped from further analyses.
-136-
4.7
4.7.1
The Full Fitting Of Time-Varying GEVs
Stepwise Model
After the adjustment described above, as well as after other adjustments specific to
individual covariates, there were a total of 266 covariates to be examined against
each of the three parameters under the maximum-likelihood model previously
covered. A forward stepwise approach was adopted within the maximumlikelihood model. In this model each covariate was examined for feasibility
(meeting the conditions of the GEV) as well as for forming a linear function for
each GEV parameter. The space of coefficients for the covariates was searched
using the Nelder-Mead and BFGS algorithm previously specified. The covariate
and associated coefficient, which results in the maximum value of the maximum
likelihood, was retained. The process was repeated, and the covariate generating the
maximum likelihood, given that the first-step covariate was in the model, was
retained. This process was repeated until a stopping rule for the process was
triggered. (The stopping rules used will be discussed later in this section.) To
provide a baseline equivalent to the non-time-varying GEV (as described earlier)
and in response to results of the 200 series analyses, the initial stepwise model
contained an estimated coefficient of a constant series (of 1s) for each of the GEV
parameters. While a separate model was fitted to each series, up to ten additional
steps of fitting one covariate at a time were allowed. The limitation of ten steps was
-137-
the result of a preference or philosophy which runs throughout the analysis, namely
that—subject to variation associated with the ancillary factors (described in Chapter
3, Section 3.4)— general results are more interesting than results that are
idiosyncratic to a specific equity series.
With the inclusion of the constant term in the first step, in the second step there
remained 265 covariates to be examined against the three parameters. Because each
covariate could be incorporated into the model for each GEV parameter separately,
this meant that the analysis needed to examine 3× 265 = 795 covariates for the
second step, (2 × 265) + 264 = 794 for the third step, and so forth. At each step a
complete nonlinear optimization step was applied. Additionally, since the
maximum-likelihood model was highly nonlinear and the GEV parameters were
not perfectly correlated, no guidance could be found in the literature or via personal
communications that would support a priori eliminate from the analysis of large
numbers of covariates in later steps, based upon results of earlier steps.
Consequently, analysis of each equity would take a considerable amount of time,
depending upon the number of steps required. For example, using the computer
available a ten-step analysis would take three to four hours. To complete the
analysis of the approximately 3,000 equity training sets and the 100 equity testing
sets, the over 90 processors (from various sources) were used over 10 days of
processing.
-138-
Previously we alluded to stopping rules for the stepwise processing. In the present
analysis two stopping rules were used in combination: a likelihood ratio test (LRT)
and the Bayesian Information Criterion (BIC). The LRT—a consequence of the
Neyman-Pearson Lemma (Casella and Berger [2001])—is the ratio of the
likelihood (maximum likelihood) of an embedded model over the likelihood of a
full or expanded model. In other words, all the estimators or values of estimators in
the reduced model can be found in the full model and then some. The null
hypothesis for the LRT is given as L (Θ | x ) , and the parameters for the model are
contained in a set θ 0 ∈ Θ . The alternative hypothesis is that parameters for the
model are contained in θ1 , defined such that θ 0 ⊂ θ1 ⊆ Θ Therefore, the numerator
is the likelihood of H0 , and the denominator is the likelihood of H1 . Under this
construction L (θ 0 | x ) ≤ L (θ1 | x ) . Therefore, the value of the ratio varies between
0 and 1, with values closer to 0 leading to rejection of H0 . In the present context
the LRT was conducted on maximum likelihood, computed from maximizing the
likelihood for Step s versus that for Step s-1; in other words, examining if the
additional covariate was significantly increasing the maximum-likelihood value. If
the remaining set of covariates, as represented by the covariate that maximizes the
likelihood for the step, did not significantly increase the value of the maximum
likelihood over the previous step, then the search was stopped and the set of
-139-
covariates established was used in the previous step. A test of significance arose, as
follows: if we let λ = L0 / L1 , then as λ → 0 the transformation
2
Λ = −2log(λ ) → ∞ , and as the number of samples n → ∞ , Λ̂ D
χ with degree of
freedom = the dimensionality of θ1 - the dimensionality of θ 0 (Casella and Berger
[2001]), which in our tests was 1.
The BIC, also sometimes called the Schwarz Information Criterion (SIC) (Schwarz
[1978]), is an asymptotic result derived under the assumption that the data
distribution is in the exponential family. BIC is of the following form:
BIC = 2 log(L ) − p × log(n )
(4.14)
where: L is the maximized value of the model’s likelihood
p is the number of covariates in the model
n is the number of observations
The likelihood of BIC increases with more covariates, but the BIC is penalized for
increasing the number of covariates used. In this way the BIC “biases” toward
fewer covariates rather than more covariates in the model. Ideally, the BIC will
increase for a covariate, adding more to the likelihood than being subtracted by the
addition of a parameter. The stopping rule was to select the step (and therefore the
covariates) prior to the step for which the BIC decreases in response to a greater
number of covariates without a commensurate increase in likelihood.
-140-
As mentioned earlier both rules were used in concert to determine when to stop the
stepwise analysis. Table 4.9 is a tally of the stopping steps for the training set
equities.
Table 4.9 Tally of stopping steps for the equities in training set (stopping rules
used are discussed in the text above).
Count
Proportion
Cumulative
Proportion
Count
Proportion
Cumulative
Proportion
Constants
4
5
6
7
8
Only
Coefficients Coefficients Coefficients Coefficients Coefficients
5
497
554
577
343
295
0.002
0.166
0.185
0.192
0.114
0.098
0.002
0.167
0.352
0.544
0.658
9
10
11
12
>12
Coefficients Coefficients Coefficients Coefficients Coefficients
204
143
115
137
132
0.068
0.048
0.038
0.046
0.044
0.824
0.872
0.910
0.956
0.756
Total
3002
1.000
1.000
Despite the limitation of a total of 11 steps placed on the processing, more than
95% of the equities required models of 10 steps or fewer (12 covariates or less).
For the 4.5% of the models requiring more steps, and hence more covariates, in the
interest of generality, modeling of these equities were restricted to 13 covariates
selected by the process. On average the models for the distribution parameters had
approximate 7.0 covariates (6.991) in total.
-141-
4.8
Analyzing The Covariate Models
A detailed analysis of the results from the fitting of time-varying GEV distributions
is found in Appendix B. Having obtained the parameter estimates needed as inputs
later in the analysis (ses), other than as a sanity check on the estimation process,
analysis of the covariate models in and of themselves was not and is not a central
research thread of this dissertation. It does hold hypotheses which the researcher
believes are worthwhile examining in follow-up research efforts, which are noted in
Chapter 7, Section 7.5. Therefore, this examination is “stubbed out” at this
juncture.
Some of the salient results of the analysis of the parameter fitting (supporting
exhibits in Appendix B) include:
1. The models for the distribution parameters had on average approximately
seven covariates in total.
2. 95% of the time-varying models showed significant improvement over the
static model.
3. Financial and economic covariates were as or more important compared to
constants or trend.
4. Of the financial/economic covariates, the plurality were used in the
µ̂ , followed by ξˆ and σˆ .
estimation for µ
5. Time-contemporaneous covariates were less than half of the covariates
compared to time-lagged covariates (46.6% against 53.4%).
6. Covariates associated with lags of one or two weeks were equally divided,
suggesting that, from an overall perspective, the existence of a lead-lag
-142-
relationship, but there is not a strong differentiation by time frame for at
least the first two weeks.
7. The minimum values of the covariate in the block more frequently entered
the model than the maximum values of the covariate in the block -- 62.1%
versus 37.9%, respectively.
8. Distinct groupings of covariates exist with respect to when and in what
formed (aggregation) they entered models, for example global market
indices behavior similarly.
9. When the results were observed from the perspective of market
capitalization, model complexity (in terms of the number of components,
the number of leading components and the number of distributional
parameters which were modeled by covariates beyond the constant) tended
to increase with greater capitalization.
10. Africa, Eastern Europe, the Middle East, South America, and to a lesser
extent North America can—at the gross level of this analysis—be modeled
(in the sense described above) by less complex models.
11. The Pacific Rim and Western Europe, in the same sense as the previous,
tend to be more complex models.
Whether or not the results from fitting these models will provide succinct insights
into which financial covariates ultimately have an effect upon the GEV parameters
and in turn upon risk, is beyond this research. Nevertheless this effort is “a means
to an end,” which is the estimation of the time-varying return values. We turn to
this analysis in the next Chapter (Chapter 5).
-143-
5.
5.1
Estimating Time-Varying Return Levels
And Modeling Return Levels Jointly
Overview Of The Chapter
In Chapter 5, the time varying GEV distributions which were fitted in Chapter 4,
Section 4.7, are put to work or at least their parameter estimates are. The fitting of
the distributions was a means to an end and that end as has been cited in earlier
Chapters, is to use the parameter estimates to estimate time varying return values.
The return values in turn are used in association with the ancillary variables, first
introduced in detail in Chapter 3 Section 3.4, to build a model of the joint
distribution of extreme return behavior. Chapter 5 follows along this collection of
threads.
The meaning of return values and the associated terms such as return levels and
return periods are the same in the time varying circumstance as in the static or nontime varying analysis. The return value is simply the quantile value (so it is
measured in return units) selected such that the probability that a return greater than
the return value is equal to 1/return period. The return period is defined in terms of
the time units of the analysis, in this case weeks. Another useful meaning of the
-144-
return value within the present domain is that the investor will witness a return of
the magnitude of the return value or greater on average once every return periods.
Estimation of return values for a time varying analysis is very similar to that for a
non-time varying model. The differences arise out of the presence of a sequence of
values for each of the distribution parameters rather than a single value, which
necessitates that the return value be solved by iteration.
The variance of the return value is a quantity which may be thought of as the time
varying measurement error for the return value and is alternatively denoted the time
varying nugget-type9 variance or simply the nugget in this discussion. A function
of this variance turns out to be used in describing the error variance of the joint
distribution of return values, and as shall be described in Chapter 6, Section 6.5, the
nugget variation proves useful in further defining the risk matrix. The computation
of the nugget variance is presented in this Chapter (Chapter 5) and is shown to be a
function of the partial derivatives of the return value with respect to the model
covariates and the inverse hessian or inverse of the second partial derivative of the
covariate estimates at the point of solution of the return value.
9
This quantity is called a nugget-type variance because the quantity is time varying driven by
financial covariates which are time indexed, but not time’s arrow or time periodic covariates as are
used in similar TV climatic models or conventional nugget variances associated with NTV models.
However as used in the financial domain a TV return value (the basis for the TV nugget) will only
be used for a fixed period of time, then discarded and re-estimated as will be the nugget. So for
these short periods of time the nugget may be thought as approximately fixed.
-145-
A Gaussian Process (GP) model is postulated as the form of the joint distribution
of return values. To this end, a vector of return values is defined as a random
sample of random variables and which is expressed as a random vector distributed
as a GP with a mean defined as a function of the ancillary variables and a variance
matrix which is broken into two pieces, one based solely upon variation associated
with some or all of the ancillary factors, but which are not captured within the fixed
effects of the expected values. A second variance matrix is defined as a function of
the nugget variation. In the present model both matrices are assumed to be
diagonal and uncorrelated with one another.
The remainder of the Chapter (Chapter 5) is devoted to describing efforts aimed at
fitting this model. The fitting is accomplished in stages and steps within stages.
But first an examination of the return value data suggests the need for transforming
and trimming the data in order to better meet distributional assumptions. An
examination of the transformed and trimmed data suggests that these manipulations
have not materially changed the behavior of the sample, the sample size or the
relationship between categories within factors even over the time of the study, but
they have improved the distributional behavior as was sought.
-146-
After data manipulations, the first stage is to fit the expected values of the GP as a
set of fixed effects developed from a subset of the ancillary factors in categorical
forms (nominal or ordinal scale) and represented in the model as a set of cell means
contrasts. This first stage model results in the inclusion of fixed effects for sectors,
stock exchanges and market capitalizations. However examination of the model
diagnostics suggestions the need to add factors in quantitative form as well as the
need to deal with the presence of heteroscadacity.
Following on from these initial results, in a second stage of model building,
predictors are added to the function estimating the expected values. These include
discounted market capitalizations and a market capitalization by year interaction.
Each of the new predictors as well as the members of the legacy set proved to be
significant under examination using a Type III Sums-of-Squares. With a Type III
Sums-of-Squares each predictor is alternatively tested for significance assuming all
the other predictors are in the model.
Using the estimated values of the predictors from this latest analysis of the
expected value portion of the model as a starting point, an extensive examination of
alternative models for each of the two postulated sources of variability is
conducted. All of the models examined, embed the expected value parameters and
the variance parameters into the parameters of a multivariate normal. Because of
-147-
assumptions, previously articulated, regarding the form and relatedness of the
variance matrices, this multivariate normal distribution decomposes into a set of
sums making parameter estimation using a maximum likelihood approach fairly
straightforward and computationally very tractable. With respect to the two
variance matrices, for the ancillary-factors-variation matrix (denoted in the text as
H ), variances which are a function of year by market capitalization are
hypothesized and estimated. This results in 35 variances. For the portion of the
variation which is a function of the nugget variation (denoted in the text as Σ ), a
nugget multiplier based upon the sector of which the equity is a member is
hypothesized and estimated. Consequently 20 such multipliers are estimated.
Therefore, the final model contains 103 parameters associated with the expected
value predictors (many of which are used for the cell means contrasts, particularly
those for the stock exchanges) and 55 parameters associated with defining the
variance matrices. The selected model contains in total 158 parameters estimated
using in excess of 20,000 observations.
Diagnostics presented at the end of this Chapter (Chapter 5)and model validation
perform at beginning of Chapter 6, (see Chapter 6, Section 6.4) suggests that the
model is satisfactory.
-148-
5.2
A Brief Recap To This Point
In the previous Chapter (Chapter 4) the parameters of the extreme-value
distributions were modeled as a function of a set of financial/market covariates. In
this Chapter (Chapter 5) the researcher will create and examine a multivariate
model composed of fixed and random elements, with the purpose of creating a
model of the joint behavior of equity returns.
First some of the important model elements will described prior to combining them
to form the model. The discussion begins with a description of the time-varying
(TV) return value.
5.3
Computing Return Value Levels
As the reader may recall from the description in Chapter 1 (Section 1.12), we may
invert the generalized extreme-value (GEV) cumulative density function (cdf) and
thereby obtain a quantity x p , defined as the return level associated with the 1 / p
return period. A more accessible definition of the return level states that the
quantile x p will be exceeded on average once every 1 / p base periods (in this
context, weeks) where p = Pr[ X ≥ x p ] . For example, if we let p = 0.001, then 1 / p
(also notated as N ) = 1,000, and if we use the results of Chapter 1, Section 1.12 to
-149-
find a value x0.001 , such that Pr[ X ≥ x 0.001 ] = 0.001, then we may conclude that on
average X will be greater than or equal to x0.001 every 1,000 weeks.
However, the computation provided in Chapter 1, Section1.12, is based on a nontime-varying (NTV) model. Smith, et al. (2006) suggested an approach for
estimating return values for an arbitrary but fixed return period of N weeks for the
TV models examined in the previous Chapter10 (Chapter 4). We need to find the
quantile value xΘ,N , such that
T
∏ exp[−{1 + ξ (
xΘ,N − µt
t
t =1
σt
1
N
(5.1)
1
)
N
(5.2)
)} + −1 / ξt ] = 1 −
Or, by taking logs, we have
T
∑ [1 + ξ (
t
t =1
xΘ,N − µt
σt
)]+ −1 / ξt = − log(1 −
In this model the return value xΘ,N is arrived at by iteration. For the present
circumstance a bisection method is used. Because of the straightforward inverse
relationship between N and p , x can be indexed by either as long as the index is
10
The use in this research of the equations for TV return value estimation taken from Smith (2006)
requires a comment similar to that made in the footnote at the end of Chapter 4. The covariates in
Smith’s application of the model are time varying, but deterministic, e.g. time’s arrow and periodic
functions. In this research the covariates are stochastic. However the use of periodic rebalancing in
the financial domain means that returns levels and functions of returns will be re-estimated
relatively frequently and they may be thought of and used as fixed for relatively short periods of
time.
-150-
clear. The parameter vector Θ is included to remind the reader that x is a function
of the underlying covariates through values of the TV parameters µt ,σ t and ξ t ,
defined previously as was the non-negative function [ z ]+ .
5.4
The Variance Of Return Values
The estimate11 of the variance of the return value is one of the random elements in
the model. To compute this we need to bring together a couple of results.
T
xΘ,N − µt
t =1
σt
Let h be defined as h( xΘ,N ; Θ ) = ∑ {1 + ξt (
we then have
∂h( xΘ,N ; Θ)
∂θ j
)} + −1/ξt . At the solution for h
= 0, where j varies over the components of Θ . By
further manipulation we have:
∂h( xΘ,N ; Θ)
∂θ j
⇒
=
∂θ j
∂xΘ,N
∂h( xΘ,N ; Θ)
⇒0=
11
∂h( xΘ,N ; Θ) ∂xΘ,N
∂θ j
+0=
∂h( x Θ,N ; Θ) ∂xΘ,N
∂xΘ,N
∂h( xΘ,N ; Θ ) ∂xΘ,N
∂xΘ,N
∂θ j
Variances stimated on an annual basis.
-151-
(by chain rule )
+
∂θ j
∂h( x Θ,N ; Θ)
∂θ j
+0
⇒
∂xΘ,N
∂θ j
=−
∂h( xΘ,N ; Θ )
∂θ j
∂h( xΘ,N ; Θ)
∂xΘ,N
(5.3)
Let us set this result aside for a moment, set out some additional notation, and
recall an important result from mathematical statistics. As per the results from the
TV GEV and associated notation, let Θ̂k be the maximum-likelihood estimate of
Θk —the S -length vector of covariate terms used to estimate the GEV
parameters µt ,σ t and ξ t (estimates notated by µˆt ,σˆ t and ξˆt ) for security k , coming
from the analysis presented in Chapter 4, Section 4.7. Let θ i ,k , i = 1, 2,… , S be the
individual elements or coefficients of the vector Θk , with θˆi ,k notating the
maximum-likelihood estimator. Let the derivative of the log-likelihood function be
denoted ∇ log(L( x; Θk )) ≡ ∇ l ( Θ k ) ≡ ∇l k , which is equal to 0 when the function is
ˆ k ) ≡ ∇l ˆk when Θk is replaced by Θ̂k in the function. The
maximized and ∇l ( Θ
second partial derivative of the log-likelihood function is also the Hessian matrix12
and is notated by ∇ 2 l
k
and ∇ 2 l ˆk when once again Θk is replaced by Θ̂k . Finally,
let us recall that by theorem maximum-likelihood estimators are asymptotically
unbiased.
12
The Hessian is as indicated above defined as the second partial derivative of the log-likelihood
function. It also has other properties: 1) the negative of the expectation of the matrix is the
information matrix and 2) the inverse of the Hessian (if it is positive definite) yields the variances of
the estimates of the parameters being estimated.
-152-
The result from mathematical statistics—the reader should recall—is the delta
method (Casella and Berger [2001]). Let φn be a sequence of RVs, where µ and
σ 2 are finitely valued and
D
n (φn − µ ) → N (0,σ 2 )
Then, for some function g , satisfying the property that g ' (φ ) exists and is non-zero
valued,
D
n (g(φn ) − g ( µ )) → N (0,σ 2 [g ' ( µ )2 ])
In the present context we find h taking the role of g , and Θk ( Θ̂k ) and its
constituents θ i ,k ( θˆi ,k ) substituting for φ .
Applying a Taylor expansion of ∇l
k
about Θk , we have
ˆ k − Θk ≈ −(∇ 2 l k )−1 ∇l
Θ
k
ˆ k ] = Θk , and by the definition of covariance, we have
Recognizing E [Θ
ˆ k ,Θ
ˆ j ) ≈ (∇ 2 l ˆk )−1Cov (∇ l k , ∇l j )(∇ 2 l ˆj )−1
Cov (Θ
where the maximum likelihood of the inverse of the Hessian has been substitute for
the Hessian. Recalling ∇l ˆk = 0 , we have
Cov (∇l k , ∇l j ) ≈ ∇l ˆk (∇l ˆj )T
-153-
where ( )T is defined as the transpose. Using another Taylor expansion, we have
xΘˆ
k ,N
Recalling once again that E [ xΘˆ
S
∂xΘk ,N
j =1
∂θ j
− xΘk ,N ≈ ∑
k ,N
(θˆj ,k − θ j ,k )
] = xΘk ,N and using our previous results and the
delta method, we have
Cov ( xΘˆ ,N , xΘˆ ,N ) ≈ (
k
j
∂x ˆ
∂xΘˆ ,N T
k
ˆ k ,Θ
ˆ j )( Θ j ,N )
) Cov (Θ
ˆk
ˆj
∂Θ
∂Θ
and if k = j , this becomes
Var ( xΘˆ
k ,N
)≈(
∂xΘˆ ,N T
∂x
k
ˆ k )( Θˆ k ,N )
) Var (Θ
ˆk
ˆk
∂Θ
∂Θ
Substituting from an earlier result, we have the desired quantity, namely
Var ( xΘˆ
k ,N
) ≈ (−
∂h( xΘ,N ; Θ)
∂θ j
∂h( xΘ,N ; Θ)
∂xΘ,N
ˆ k )( −
)T Var (Θ
∂h( xΘ,N ; Θ )
∂θ j
∂h( xΘ,N ; Θ)
∂xΘ,N
) (5.4)
where i = 1, 2,… , S . The results present here will be used to compute one of the
variance elements, namely the nugget variance in the analysis below
5.5
Multivariate Model
In this section the multivariate model is detailed. The multivariate model will be
used in further research to describe the joint behavior of the return values. This
model is posed as an alternative means of describing the joint behavior of the
-154-
downside tail behavior of equities. It is proposed in the place of identifying other
dependence functions and using said dependence function to form a joint
distribution of the univariate GEVs. The issues with the latter approach were
detailed earlier in this research, in Chapter 1, Section 1.13.1.
The discussion starts with a review of the existing hypothesis followed by a
definition of the model in broad subject matter (finance) terms, before proceeding
on to a formal statistical definition. Let us define the following set of random
variables—the return values X Θk ,N for security k —as a function of the GEV
parameters µt ,σ t and ξ t , which themselves are postulated to be a function of a set
of financial/market covariates scaled by a set of coefficients contained in an S
length vector Θ and for a specified return level N . (In the previous Chapter
(Chapter 4) the development of a maximum-likelihood estimator Θ̂k is described,
where k = 1, 2,… [number of securities13] K [see Chapter 4, Section 4.5.1].)
Earlier in this Chapter (Chapter 5,Sections 5.3 and 5.4) a method for estimating
X Θˆ
k ,N
and Var ( X Θˆ
k ,N
) at an arbitrary return level is described and implemented for
13
Please recall that because the only type of financial security being studied in the research is
common stock, the words equity and security within the context of this research should be
understood to mean common stock, unless explicitly stated otherwise.
-155-
each security in the sample. So, one useful question is: can these return values be
modeled as functions of salient market dimensions?
In building this model from a stochastic perspective let
R ∼ GP (Y β ,Η + Σ ) ,
where:
(5.5)
GP stands for Gaussian process14,
R is a random vector of extreme value return values for a given return
level,
Y is the design matrix of values of predictors of the expected values of
the return values (fixed effects),
β is the vector of coefficients multiplying the predictor values,
H is the covariance matrix of the predictors (random effects),
Σ is the covariance matrix of equity return values (random effects).
In the present research we have R as a linear model composed of both fixed effects
that comprise the mean, Y β , and random effects that enter the model through the
variance/covariance structure. Therefore,
R =Yβ +h+ε
where:
(5.6)
R and Y β are defined as previously,
h is a random vector of innovations with variance, Σh ,
ε is a random vector of innovations with variance, Σε .
There are clearly two parts to the model. The fixed effects are defined as those
predictors such that the values the predictors take on are the only ones at which
14
A Gaussian process is a stochastic process that generates random samples over time or space,
such that no matter the finite linear combination of random samples one takes, that linear
combination will benormally distributed.
-156-
(typically a limited number) the predictor occurs or the only ones (once again
typically limited) in which the analyst is interested. In other words, the values of
the predictor for a fixed effect do not represent a sample of the values or levels
drawn from say an infinitely large set of such values, but rather represent the
population or population of interest of values. The consequence is that if some or
all of the levels of a fixed-effect predictor are explaining all the variation observed
in the response variable, the response is deterministic or provides a fixed-quantity
change from level to level of the fixed effect. The fixed effects are typically
associated with the expected value of the response. There is no uncertainty or
variability associated with the fixed effects.
In contrast, a random effect behaves in a contrary manner. The levels or values of
the random effect observed in the sample represent a small number of the total
levels that exist in the population. Further the values of the random effect as per
their name are chosen or occur at random. This random behavior is characterized
or modeled typically as some probability law. We will return to a discussion of the
broader Gaussian process later in this Chapter (Chapter 5, Sections 5.6 through
5.9).
-157-
5.6
Model Building
In identifying the form of the model the researcher first examines the adequacy of
fit of potential fixed effects before looking at potential random effects, since the
latter tend to be more complex, more flexible in the form of their implementation,
and therefore more useful in the analysis for adjusting the model for a variety of
deficiencies.
5.6.1
Fixed-Effect Models
Initially introduced in Chapter 2, (Section 2.3) as the ancillary dataset, seven
factors were examined—each quantified in a discrete form—for their utility in
describing variation in the return levels. The factors once again were:
1. Continent of the security’s domicile
2. Country of the security’s domicile
3. Stock exchange of the security
4. Geographic trade area of the underlying issuing company (geographic
location of its headquarters)
5.
Exchange area in which the security is traded (geographic location of the
stock exchange)
6. Sector of the issuing company
7. Discretized form of the market value
-158-
(Readers are referred to Chapter 3, Section 3.4, to reacquaint themselves with
detailed descriptions of these potential predictors.)
In an initial analysis the variation in quantile value for return level of 52 weeks (a
quantile value exceeded on average once every 52 weeks) over approximately
3,000 securities was modeled as a function of the seven ancillary data factors. The
results indicated that the factors were remarkably without any substantive
discriminatory power with respect to the values of the return level.
In the subsequent analyses major modifications were made to the data and the data
analysis approach. Firstly, the analyses were broken up by year, i.e., a separate
model was developed for each year from 2001-2007. A year return period, along
with breaking up the data on an annual basis, was chosen because it aligns with the
common advice in the financial industry that investors should review and rework
their financial portfolio no more frequently than once a year. After separating the
data by year, a histogram of the return values was examined and it was decided to
perform a Box-Cox analysis (Box and Cox [1964]) in search of a transformation
that would yield a distribution of return value that was more symmetric, unimodal,
and more closely approximating a normal distribution. (Transformations with this
goal are commonly performed so that the data more closely conform to the
-159-
underlying assumptions of the analytical steps.) The Box-Cox analysis suggested
that a log transformation could be applied to the data.
Histograms of data after transformation are provided in Figures 5.1, 5.2 and 5.3; the
data clearly conform more carefully to the desired properties after transformation.
These figures are, as the captions provide, histograms of logged return levels from
the TV GEVs for a return period of one year (approximated 52 weeks). In
examining the distributions under the log transformation, the effects upon the
distributions of trimming the set of return values at selected, large values of return
was also examined. Discussions of the results of trimming are continued in the text
below Figure 5.3.
-160-
2002 Ln Ret.Val,for P>=0.019 <20
400
0
-4
-3
-2
-1
0
1
2
3
-4
-2
0
2
Quantile, P>=0.019
0 values deleted
2003 Ln Ret.Val,for P>=0.019 <20
2004 Ln Ret.Val,for P>=0.019 <20
400
200
0
200
400
Frequency
600
600
Quantile, P>=0.019
1 values deleted
0
Frequency
200
Frequency
400
200
0
Frequency
600
600
2001 Ln Ret.Val,for P>=0.019 <20
-4
-2
0
2
-4
Quantile, P>=0.019
0 values deleted
-2
0
2
Quantile, P>=0.019
0 values deleted
Figure 5.1 Histograms Of Log Return Values For Selected Years For
P [ X ≥ x] ≤ 0.019 , Censored At A Value Of 2,000 Percent. The
Return Value Will Occur On Average No More Than Once In 52
Weeks. Return Levels. Number Of Deleted Observations Are
Provided.
-161-
2002 Ln Ret.Val,for P>=0.019 <10
400
0
-4
-3
-2
-1
0
1
2
-4
-3
-2
-1
0
1
2
Quantile, P>=0.019
4 values deleted
2003 Ln Ret.Val,for P>=0.019 <10
2004 Ln Ret.Val,for P>=0.019 <10
400
200
0
200
400
Frequency
600
600
Quantile, P>=0.019
2 values deleted
0
Frequency
200
Frequency
400
200
0
Frequency
600
600
2001 Ln Ret.Val,for P>=0.019 <10
-4
-3
-2
-1
0
1
2
-4
Quantile, P>=0.019
3 values deleted
-3
-2
-1
0
1
2
Quantile, P>=0.019
3 values deleted
Figure 5.2 Histograms Of Log Return Values For Selected Years For
P [ X ≥ x] ≤ 0.019 , Censored At A Value Of 1,000 Percent. The
Return Value Will Occur On Average No More Than Once In 52
Weeks. Return Levels. Number Of Deleted Observations Are
Provided.
-162-
2002 Ln Ret.Val,for P>=0.019 <5
400
0
-4
-3
-2
-1
0
1
2
-4
-3
-2
-1
0
1
2
Quantile, P>=0.019
21 values deleted
2003 Ln Ret.Val,for P>=0.019 <5
2004 Ln Ret.Val,for P>=0.019 <5
400
200
0
200
400
Frequency
600
600
Quantile, P>=0.019
18 values deleted
0
Frequency
200
Frequency
400
200
0
Frequency
600
600
2001 Ln Ret.Val,for P>=0.019 <5
-4
-3
-2
-1
0
1
2
-4
Quantile, P>=0.019
22 values deleted
-3
-2
-1
0
1
Quantile, P>=0.019
25 values deleted
Figure 5.3 Histograms Of Log Return Values For Selected Years For
P [ X ≥ x] ≤ 0.019 , Censored At A Value Of 500 Percent. The
Return Value Will Occur On Average No More Than Once In 52
Weeks. Return Levels. Number Of Deleted Observations Are
Provided.
-163-
2
The data-censored histograms of a log-return value was examined. Censoring was
performed upon the weekly return values at arithmetic values of 5, 10, and 20 (that
is, weekly return quantiles of 5 = 500%, 10 = 1,000%, and 20 = 2,000%). Further
an examination was made of notched box-plots15 of each of the ancillary factors for
each year. (Selected box-plots from this set are provided in Figures 5.4 through
5.7.) The examination of the box-plots suggests that the geometry of the
relationships for return values, when differentiated by the ancillary factor levels,
remained similar year over year. The change between years for a box-plot within an
ancillary factor was dominated by an overall translation of the positions of the
boxes. For those factors which possess a large number of levels, namely Sectors
(Figure 5.5) and Exchanges (Figure 5.7), labels on the abscissas were removed as
not all could be legibly printed and the scale is nominal. However the order of the
boxes is the same across the years and the point is that the similarity of the patterns
is fairly compelling
Under these limits the box-plot relationships remained the same; even at a censor of
5, very few values were deleted (no more than about two dozen), with most of the
deleted values coming from the North American (predominantly U.S.) micro-cap
class. Since the return values were based upon the TV fits, it was felt that the
logged return value of 500% was a more realistic upper value from a financial
15
The notched box plot is a graphical representation of the data distribution with a confidence
interval for the median provided by the notch (Tukey, 1977).
-164-
perspective, clearly improving distributional understanding while not affecting
relationships within the ancillary factors.
-165-
2001 BoxPlots Ln Ret. Val.,for P>=0.02 Less >5
-1
-3
-2
Log Return
Value
Ret.Val_0.02
0
-1
-4
-169-166-
2
0
.
0
=
P
,l
a
V
t
e
R
-2
2
0
.
0
=
P
,l
a
V
t
e
R
-3
LogRet.Val_0.02
Return Value
0
1
1
2007 BoxPlots Ln Ret. Val.,for P>=0.02 Less >5
Africa
Africa
Asia
Asia
Eur
Eur
N_Amer
NAmer
Africa
Africa
S_Amer
SAmer
Asia
Asia
Eur
Eur
N_Amer
NAmer
S_Amer
SAmer
Continent
Continent
Continent
Continent
Continent
(A)
(B)
Figure 5.4 Notched Box-Plots Of 52-Week Logged Return Value Aggregated By Continent For (A)
Year 2001 And (B) Year 2007. Data Censored At Return Value Of 500% .
-166-
2001 BoxPlots Ln Ret. Val.,for P>=0.02 Less >5
-167-
-1
-4
-3
-3
-2
Ret.Val_0.02
Log Return
Value
0
2
0
.
0
=
P
,l
a
V
t
e
R
-1
2
0
.
0
=
P
,l
a
V
t
e
R
-2
LogRet.Val_0.02
Return Value
0
1
1
2007 BoxPlots Ln Ret. Val.,for P>=0.02 Less >5
Commercial Services
Energy Minerals
Process Industries
Utilities
Sector
Sector
Commercial Services
Energy Minerals
Miscellaneous
Transportation
Sector
Sector
Sector
Sector
(A)
(B)
Figure 5.5 Notched Box-Plots Of 52 Week Logged Return Value Aggregated By Sector For (A) Year 2001
And (B) Year 2007. Data censored At Return Value Of 500%
-167-
2007 BoxPlots Ln Ret. Val.,for P>=0.02 Less >5
-3
-2
-1
-1
0
LogRet.Val_0.02
Return Value
-4
-168-
2
.
20
0
.0
0=
=P
l,
P
,l a
aV
t
V
te
eR
R
-2
2
0
.
0
=
P
,l
a
V
t
e
R
-3
Ret.Val_0.02
Log
Return Value
0
1
1
2001 BoxPlots Ln Ret. Val.,for P>=0.02 Less >5
large
mega
micro
mid
small
large
MarketF.Markt.Value
Value (Factor)
mega
micro
mid
small
MarketF.Markt.Value
Value (Factor)
(A)
(B)
Figure 5.6 Notched Box-Plots Of 52-Week Logged Return Value Aggregated By Market Cap For (A)
Year 2001 And (B) Year 2007. Data censored At Return Value Of 500%. The Mega-Cap
Plot Is A Result Of The Confidence Interval Going Beyond The Quartile.
-168-
2001 BoxPlots Ln Ret. Val.,for P>=0.02 Less >5
0
-1
-2
-3
-2
-1
0
2
.0
0
=
P
,l
a
V
t
e
R
-4
-3
-169-
LogRet.Val_0.02
Return Value
2
.0
0
=
P
,l
a
V
t
e
R
Log
Return Value
Ret.Val_0.02
1
1
2
2007 BoxPlots Ln Ret. Val.,for P>=0.02 Less >5
AMEX
Cairo Frankfurt
Lima
Nicosia
SEAQ
XETRA
AMEX
Cairo
Frankfurt
Lagos
Nairobi
Exchange
Exchange
Exchange
(A)
(B)
Exchange
OTC SEAQ
Tunis
Figure 5.7 Notched Box-Plots Of 52 Week Logged Return Value Aggregated By Exchange For (A) Year
2001 And (B) Year 2007. Data censored At Return Value Of 500%.
-169-
Using the data just described, a stepwise analysis of the seven ancillary factors was
performed, and the Bayesian Information Criterion (BIC) (previously described in
Chapter 4, Section 4.7) was used as the stopping value. The analysis was performed
using the R lm function (R Development Core Team, 2009). The design matrix
was formed using cell means contrasts. Table 5.1 displays the steps in this analysis.
This analysis was iterative with each step including the most explanatory factor(s)
from earlier steps. An additional factor entered the model according to whether or
not the BIC decreased over the previous step. As can be seen in the Table the
modeling process was concluded for all years for additional main effects at the
conclusion of the third iteration or step. An additional iteration examining selected
first order interactions was conducted, but yielded no additions to the model.
For the main fitted effects three of the factors appeared to be significant: exchange,
sector, and market value (in discretized form denoted as F.Markt.Value). In a
second experiment exchange was removed from the analysis and the model was rerun. In this result along with F.Markt.Value and sector, country entered the model.
The substitution of country for exchange is interesting. The results of the analysis
suggests that two factors [sector and market value] are carrying the economics and
market trend factors into the return values, and the exchange or country represents
-170-
the impact of regulatory, legal, social, and governmental effects. The model clearly
has explanatory power. In fact, the degree of linear explanatory power is very
strong, as witnessed in Table 5.2, wherein the coefficient of determination R 2 is
quite high.
In examining the quality of the fit of this fixed-effect model, it was postulated when
the model was defined, that the residuals or errors—if the model is appropriate—
will follow a normal distribution, with a mean of zero and a single variance over all
return levels, denoted as σ ε2 or as commonly denoted, N(0,σ ε2 ) . From elementary
mathematical statistics
εˆi , j ,k = (ri , j ,k − rˆi , j ,k )
εˆi , j ,k / sε ∼ N (0,1)
where: i , j , k are factor indices and sε2 is an estimate of σ ε2 .
-171-
(5.7)
Table 5.1
Results from stepwise model construction for fixed-effects model by year. The top left of each
subtable provides the effect already in the table, and yellow entries highlight the effect of the
increased value of BIC for that year. The red- highlighted subtable identifies the model construction
stopping.
Year
Year
-172-
Step 0, In: None
Continent
Country
Exchange
Exch.Region
Trade.Region
Sector
F.Markt.Value
2001
-1171.4
-1871.6
-1947.9
-1155.3
-1154.9
-1362.0
-1933.4
2002
-1131.5
-1850.2
-2004.4
-1121.9
-1118.3
-1306.8
-1967.5
2003
-1060.6
-1789.1
-2109.3
-1056.2
-1055.0
-1233.0
-2083.6
2004
-962.1
-1697.6
-2062.5
-963.6
-965.1
-1120.2
-2098.2
2005
-754.3
-1488.1
-1839.8
-759.3
-760.1
-882.1
-1864.7
2006
-848.0
-1568.7
-1875.7
-851.3
-851.9
-976.2
-1931.3
2007
-838.8
-1538.9
-1941.8
-839.4
-839.9
-947.2
-1833.7
Step 1, In: Exchange
Continent
Country
Exch.Region
Trade.Region
Sector
F.Markt.Value
2001.0
-1917.3
-1731.2
-1917.7
-1947.9
-2053.3
-2222.4
2002.0
-1974.0
-1811.3
-1980.4
-2004.4
-2125.7
-2368.4
2003.0
-2078.2
-1920.8
-2081.5
-2109.3
-2225.9
-2589.9
Year
2004.0
-2030.9
-1879.1
-2026.9
-2062.5
-2157.7
-2635.3
2005.0
-1807.9
-1654.6
-1801.2
-1839.8
-1922.4
-2417.3
2006.0
-1844.2
-1687.8
-1836.1
-1875.7
-1927.9
-2415.6
2007.0
-1910.0
-1741.2
-1905.6
-1941.8
-2002.3
-2423.8
Step 1, In: F.Markt.Value
2001.0
2002.0
2003.0
2004.0
2005.0
2006.0
2007.0
Step 2, In: Exchange and
F.Markt.Value
Continent
Country
Exch.Region
Trade.Region
Sector
2001
-2191.7
-2002.6
-2176.7
-2222.4
-2325.7
2002
-2338.7
-2166.9
-2325.4
-2368.4
-2485.8
2003
-2562.7
-2396.7
-2547.9
-2589.9
-2708.9
-1981.3
-2047.3
-2142.4
-2151.4
2005
-2395.9
-2233.9
-2378.5
-2417.3
-2507.6
2006
-2391.1
-2227.3
-2372.5
-2415.6
-2463.4
2007
-2400.7
-2225.5
-2383.2
-2423.8
-2480.0
Year
Step 3, In: Exchange,
F.Markt.Value and Sector
Continent
Country
Exch.Region
Trade.Region
2001
2002
2003
2004
2005
2006
2007
-2294.1
-2103.3
-2277.9
-2325.7
-2454.8
-2287.0
-2439.8
-2485.8
-2679.2
-2517.3
-2662.9
-2708.9
-2709.4
-2553.4
-2691.7
-2736.6
-2481.7
-2324.6
-2464.5
-2507.6
-2435.1
-2275.2
-2416.6
-2463.4
-2452.6
-2281.7
-2435.0
-2480.0
2001
2002
2003
2005
2006
2007
Year
Continent
2004
-2612.1
-2451.6
-2594.9
-2635.3
-2736.6
Year
-1923.1
-1955.2
-1893.7
2004
Step 3, In: Exchange,
F.Markt.Value and Sector
Country
-1675.8
-1782.3
-1900.4
-1924.5
-1705.0
-1726.0
-1643.4
Exchange X F.Markt.Value
-1634.8
-1791.5
-2032.0
-2059.2
-1821.1
-1759.2
-1762.7
Exchange
-2222.4
-2368.4
-2589.9
-2635.3
-2417.3
-2415.6
-2423.8
Sector X F.Markt.Value
-1934.8
-2109.1
-2319.4
-2342.1
-2107.4
-2064.0
-2076.5
Exch.Region
Trade.Region
Sector
-1964.4
-1958.0
-2081.9
-2040.3
-2032.9
-2100.1
-2143.0
-2133.0
-2207.9
-2161.8
-2149.4
-2209.3
-1939.2
-1925.7
-1953.7
-1970.3
-1961.1
-1979.3
-1901.5
-1891.1
-1899.0
Exchange X Sector
334.0
147.1
-158.3
-214.5
6.3
73.8
98.8
-172-
Table 5.2
Statistic
R^2
R^2 Adj
Coefficients of determination (R 2 ) for the annually computed three
fixed-effects model described in the text above.
2001
0.831
0.826
2002
0.844
0.839
2003
0.874
0.870
Year
2004
0.889
0.885
2005
0.890
0.887
2006
0.886
0.882
2007
0.899
0.895
“Standardized” residuals εˆs (i ) = εˆi , j ,k / sε were created. These observations were
evaluated against the Gaussian assumption as the means of examining the quality
of the fit.
QQnorm plots of the standardized residuals, as well as a plot of the standardized
residuals versus the fitted values arising from this model, are provided in Figures
5.8 and 5.9. The plot of the standardized residuals versus the fitted values displays
no particular systematic departure from a random scatter (Figure 5.9). However, as
we can see in the QQnorm plots (Figures 5.8A-G) there are substantial and
systematic departures at the upper end of all of the plots. These departures indicate
that in each of the distributions there are too many observations at the positive end,
versus what the theoretical distribution suggests there should be. The positive
skewness observed in the distributions depicted in Figures 5.1 to 5.3is manifested
in the standardized residuals. Further, a Jarque-Bera Test of the standardized
-173-
residuals plotted in Figure 5.7 rejects the null hypothesis of normality at a p-value <
2.2 * 10−6 . It was concluded that the postulated model is deficient.
-174-
-175-
Figure 5.8 A-G QQNORM PLOTS OF STANDARDIZED RESIDUALS FROM THREE-FACTOR
FIXED-EFFECTS MODEL FOR YEARS 2001 TO 2007.
-175-
4
2
0
-2
Sample Quantiles Based Upon Standardized Residuals
QQPlot Based On OLS Standardized Residuals For 2001
-3
-2
-1
0
1
2
3
Theoretical Quantiles
-176-
(F) For 2006
(E) For 2005
-2
-1
0
1
2
-177-
(G) For 2007
4
2
0
Sample Quantiles Based Upon Standardized Residuals
-2
-1
0
1
2
-2
-1
0
1
Theoretical Quantiles
QQPlot Based On OLS Standardized Residuals For 2007
-2
0
-3
3
Theoretical Quantiles
-3
-2
Sample Quantiles Based Upon Standardized Residuals
4
2
0
-2
Sample Quantiles Based Upon Standardized Residuals
-3
4
QQPlot Based On OLS Standardized Residuals For 2006
QQPlot Based On OLS Standardized Residuals For 2005
2
3
Theoretical Quantiles
-177-
2
3
1
0
-2
-1
Raw Residuals
2
OLS Raw Residuals Vs. Fitted Values 2001
-2.5
-2.0
-1.5
-1.0
-0.5
0.0
0.5
Fitted Values
Trtmnt Con
Figure 5.9 Scatter Plot Of Standardized Residuals Versus Fitted Values From
Three-Factor Fixed-Effects Model For 2001. Factors Entered Into
Design Matrix As Mean Contrasts.
In summary, the issue associated with these results is that, if we assume a Gaussian
model, the residuals—particularly at the tails of the QQnorm plot—are deficient
with respect to a normal (0,1) distribution.
5.7
Examining Factors As Sources Of Variability
Diagnostics from the previous suggest that both the three-factor mean and the
single variance are inadequate explanations for the behavior of the return levels. In
the next several figures—Figures 5.10 to 5.12—depict some results from the
examination of the explanatory power of additional predictors and the stability of
the variance over the entire dataset.
Figures 5.10 and 5.12 suggest that market capitalization and year should be entered
as continuous variables because there appear to be at least linear elements which
are not being captured in the model., The variance is not constant over the entire
residual dataset, the residuals do not appear to be homoscedastic over market
capitalization and year.
-179-
Log(Mkt Cap) Vs Log(Return Value)-Mean and Std Dev
0
U
-1
U
M
U
U
M
-2
Log(Return Value)
U
M
M
D
-3
M
D
D
D
D
4
6
8
10
12
Log(Mkt Cap)
Figure 5.10 Plot Of Mean And +/- Two Standard Deviations Of Logarithm Of
Market Cap Versus Logarithm Of Mean Return Value.
-180-
-1.4
-1.5
-1.6
Log(Return Value)
-1.3
-1.2
Log(Return Value) Vs Year
2001
2002
2003
2004
2005
2006
2007
Years
Figure 5.11 Plot Of Year Versus Logarithm Of Mean Return Value.
-181-
Figure 5.12 Plot Of Market Capitalization Versus Standardized Residuals
From Model Composed Of Earlier Three Factors Augmented By
Market Capitalization As A Continuous Predictor.
-182-
5.8
Further Modeling
A two-step strategy was adopted for additional model construction. These steps
were:
1. Construct a more complete fixed-effect model, going beyond the
incorporation of factors in the form of cell-mean contrasts to include
continuous factor forms for quantitative factors.
2. Use this fixed-effect model as a starting point and complement it with
hypothesized error structures to estimate the values of the parameters, using
maximum-likelihood estimators within a proposed Gaussian process
framework.
5.8.1
Step 1–Further Fixed-Effect Modeling
A series of additional fixed-effect models were examined. In these models, more
factors which are in continuous forms as well as selected interaction terms were
incorporated. Added to the model were a discrete form of year using cell means
contrasts, a quantitative form of year using the value of year, a discounted
quantitative market cap, and a year-by-market-cap interaction.
Table 5.3 is an ANOVA based upon a Type III or marginal sums of squares (SoS),
meaning each source of variation (SOV) is treated in turn as if it were entering the
model last. This means that the variation associated with each SOV is being tested
-183-
under an assumption that all of the other variables are in the model. If the SOV
represents variation that, when tested against its error, is significantly greater as
evaluated by the p-value, the table values yield greater confident that the result is
not due to the order of placement of the SOV in the model.
-184-
-1851
1
Discounted Continuous Mkt Cap
Continuous Year X Discounted
Continuous Mkt Cap
20,599
5.00
6
Discrete Year
Residuals
6.00
19
Discrete Sector
6,789
446.00
554.00
2,851.00
68
Discrete Exchange
45,670.00
Sum Sq
5
Df
Discrete Mkt Cap
Source of Variation
Response: 52 Week Return
Value
0.33
5.00
6.00
74.00
29.00
42.00
9,134.00
Mean Sq
14.19
17.59
225.51
88.54
127.23
27,713.59
F value
1.66E-04
2.75E-05
< 2.2E-16
< 2.2E-16
< 2.2E-16
< 2.2E-16
Pr(>F)
Table 5.3 ANOVA arising from Type III SoS for fixed-effect models containing additional discrete and
continuous factors/predictors.
So the further analysis based upon the deficiencies of the earlier fixed effects
models incorporated a number of continuous forms of predictors. The sources of
variation indicated in Table 5.3 are the predictors used to be used in the final model
for estimating the expected value of the return level. It is understood the potential
of some over-fitting exists within this step, but the next step will be used to sort out
additional issues of parameter significance.
5.8.2
Step 2–Fitting Gaussian Processes Using
Maximum-Likelihood Estimation
The Gaussian process has been explained in an earlier section (Chapter 5, Section
5.5). Let us recall that the model for the random vector R defined as the 52-week
return values is given by
R = Y β + c(h, ε )
(5.8)
where: Y is the matrix of predictor values, commonly called the design matrix.
β is the vector of coefficients.
h is an innovation that is due to a factor effect or effects; it is distributed as
N(0, H ) and H is assumed to be a diagonal matrix.
ε is an innovation that is due to measurement error, commonly called a
“nugget effect” (whose computation was addressed earlier in this
Chapter, (Chapter 5, Section 5.4 ); it is distributed as N(0, Σ ) , and Σ is
assumed to be a diagonal matrix.
h, ε are assumed to be uncorrelated.
c(h, ε ) or g(H , Σ ) are functions (assumed to be simple functions of the
innovations and the variance, for example a simple sum).
R ~ N (Y β , g (H , Σ )) ; so, in this situation the likelihood function is
-186-
L( β , g (H , Σ ) | r ) = ( 2π )− p / 2 g (H , Σ)
−1/ 2
•
(5.9)
1
exp{− [(r − Y β )T g (H , Σ)−1 (r − Y β )]}
2
Maximum-likelihood estimation under a set of assumptions concerning the fixed
effects16 and structure of the uncertainty elements was used to estimate the
parameters ( β , H and Σ ). To generate the estimates, the commonly used approach
of maximizing the logarithm of the likelihood function (or more accurately in this
analysis, minimizing the negative of the log-likelihood function) was performed.
BIC and AIC were used as aids in selecting the particular model. (Table 5.4 tallies
some of the salient model details.) Several different factors or combinations of
factors were examined as in building H . These included year (Year), market
capitalization (Mkt Cap), and year by market capitalization. The use of a specific
combination of factors, say year by market capitalization, meant that instead of the
one variance there were 35 variances postulated, one for every combination of year
by market capitalization. With respect to bringing in the annual nugget variance,
five alternatives were examined for the form of Σ' (functions of Σ , the diagonal
matrix of nugget variances) as means of bringing this element into the model:
1. Use no nugget at all (Models 1-3), that is g (H , Σ ' ) defined as H + Σ ' and
Σ ' is define as 0 .
16
With respect to estimating the full model, the fixed effects factors and their starting values for
are defined from the analysis which produced Table 5.4.
-187-
ββ̂
2. Estimate a single multiplier for all the nuggets (Model 4), that is
g (H , Σ ' ) is defined as H + Σ ' and Σ' is defined as aΣ .
3. Use the simple addition of the estimated nugget to the variance from the
grouping structure (Model 5), that is g (H , Σ ' ) is defined as H + Σ ' and Σ' is
defined as Σ .
4. Multiply the estimated nugget by the variance from the grouping structure
(Model 6), that is g(H , Σ ) is defined as H × Σ ' . and Σ' is defined as Σ .
5. Estimate a multiplier for each nugget that changes based upon the sector
classification of the equity (Model 7). that is g (H , Σ ' ) is defined as
H + Σ ' and Σ' is defined as aT Σ where an individual value of a is estimate
for each sector.
The results provided in Table 5.4 support the use of the year-by-marketcapitalization variance grouping with no nugget, or secondly, the year-by-marketcapitalization grouping with sector multipliers.
-188-
-189-
None
None
None
Nug Add Single Multiplier
Nug Add No Multiplier
NugXVar
Nug Add Sector Multipliers
Model for Sigma'
Num
Parameters
110
108
138
139
138
138
158
Num Grps
in H
7
5
35
35
35
35
35
-99794.52
2927.712
3954.242
3275.923
1439.36
-24327.02
3394.734
BIC
Selected results from maximum-likelihood computations under assumptions and structure
described in the previous text.
Model Number Factor Grouping for
H
1
Year
2
Mkt Cap
3
Year X Mkt Cap
4
Year X Mkt Cap
5
Year X Mkt Cap
6
Year X Mkt Cap
7
Year X Mkt Cap
Table 5.4
Table 5.5 and Figure 5.13 provide some additional insights. “Standardized”
residuals for each of the models in Table 5.4. Under the assumptions laid out
above, these residuals were of the form
(r − rˆ)T [g (Hˆ , Σˆ ' )]−1/ 2
where: rˆ = Y βˆ .
Ĥ is a diagonal matrix with ŝH2 i , where i = 1, 2,… , number of groupings in
the variance structure (models 3 and 7).
Σ̂ is a diagonal matrix with each diagonal entry being aˆ j nˆk , where â j ;
depending on the model, i may be 1 or = 1, 2, … , number of sector, and
n̂ is the nugget variance estimated for each equity.
'
From examination of Tables 5.5 and 5.6, it was concluded that—of all the models
examined—the model including a sector-adjusted nugget multiplier most closely
met the underlying structural hypothesis of normality17. The conclusion was drawn
based upon the number of non-significant JB tests (i.e. the normality hypothesis) by
model. So given the results from Table 5.4 where models 3 and 7 were preferred
and then from Tables 5.5 and 5.6 where model 7 was preferred, so model 7 will be
used in further analyses. Even though this model has the largest number of
parameters, the additional parameters were allocated to the model elements which
went into defining the variation in the model, H and Σ' . Further the allocation of
parameters in this manner does have some domain rationality. The factors of time
17
All models had in excess of 20,000 observations, and the critical p-values of the tests were
adjusted using an (admittedly conservative) Bonferroni adjustment (J. Neter, et al. [1996]), so that
the family of test α level was set at 0.05.
-190-
and market capitalization did show up as potential sources of heteroscadestic
variation in the earlier model deficiency analysis. The addition of nugget
multipliers which are a function of sector may be reasonable under the hypothesis
that different sectors possess different volatilities.
Finally, Figure 5.13 provides a visualization of the behavior of the standardized
residuals from the model. The Figure contains QQnorm plots of the standardized
residuals of various market capitalizations (mega-caps excluded for reasons of
sample size) for the selected model, which is model 7. While there clearly are
some data departures from the theoretical values (represented by the straight line),
the model is closing in on a normal-looking set of residuals (supported by results
from the Jarque-Bera test).
In summary, the final model contains 103 parameters associated with the expected
value predictors (many of which are used for the cell means contrasts, particularly
those for the stock exchanges) and 55 parameters associated with defining the
variance matrices. Therefore, the selected model contains in total 158 parameters
estimated using in excess of 20,000 observations.
In the next Chapter (Chapter 6) the results developed in this and earlier chapters
will be incorporated into a focus upon three subjects:
-191-
1. Validating this Chapter’s (Chapter 5) results using a holdout sample
2. Using the model to perform portfolio optimization
3. Augmenting the model to perform some forecasting
-192-
Table 5.5
Results from repeated performance of the Jarque-Bera test of
normality by variance grouping and by model for first pair of
treatments of nuggets. Bonferroni adjustment for controlling family
of test α error by model,. Labels define the filter values for subsetting observations from the sample by year.market-cap into
groups which were then tested for normality.
-193-
Table 5.6
Results from repeated performance of the Jarque-Bera test of
normality by variance grouping and by model for second pair of
treatments of nuggets. Bonferroni adjustment for controlling family
of test α error by model. Labels define the filter values for subsetting observations from the sample by year.market-cap into
groups which were then tested for normality.
-194-
Micro Caps
Small Caps
Mid Caps
Large Caps
Figure 5.13 Selected Qqnorm Plots Of Standardized Residuals From Various
Market Capitalization For 2001coming From The Selected
Model.
-195-
5.9
The Selected Model, In Detail
In this section, the model selected in the previous sections is more formally
detailed. Recall that model formalism is given by eqns.5.5 and 5.8.
R ∼ GP (Y β ,g(H, Σ ' ))
and
R = Y β + c (h, ε ' )
where:
GP stands for Gaussian process.
R
is a random vector of extreme value return values for a given return
level.
Y
is the design matrix of values of predictors of the expected values of
the return values.
β
is the vector of coefficients multiplying the predictor values.
H
is the covariance matrix of the predictors.
Σ'
is the covariance matrix of equity return values. Recall the initial
matrix of nugget variance is denote Σ , the nugget covariance
augmented sector multipliers is denoted Σ ' .
h
is an innovation that is due to a factor effect or effects; it is
distributed as N(0, H ) and H is assumed to be a diagonal matrix.
ε , ε ' is an innovation that is due to measurement error, commonly called
a “nugget effect,” it is distributed as N(0, Σ ) and N (0, Σ ' )
respectively, and Σ , Σ ' are assumed to be a diagonal matrix
h, ε ; h, ε ' are assumed to be uncorrelated.
c (h, ε ' ) or g (H , Σ ' ) are functions (assumed to be simple functions of the
innovations and the variance, for example a simple sum).
In this analysis and for the selected model R is a vector of just under 21,000
(20,678) logged return values with a return period of approximately one year
-196-
(2,954 equities for seven years). The X matrix is a design matrix consisting of
20,678 rows and 103 columns. The discrete or qualitative predictors are
represented in the design matrix as cell mean contrasts. For example, the discrete
form of the market capitalization predictor is coded as per Table 5.7. The columns
of the X matrix is divided up per Table 5.8.
Table 5.7 Example of cell mean contrasts for discrete market capitalization
predictor.
Micro
Small
Mid
Large
Mega
1
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
1
Table 5.8 Structure of the columns of the design matrix X .
Number of
Columns
Type of
Predictor
Market Capitalization
5
Qualitative
Stock Exchange
Sector
Year
Continuous Market
Capitalization
69
20
7
Qualitative
Qualitative
Qualitative
1
Quantitative
1
Quantitative
Predictor Name
(Continuous Market
Capitalization) X
(Quantitative Year)
-197-
β is a vector of length 103 of coefficients which are the “weightings” of the
individual terms of the design matrix. These values were estimated using the
maximum likelihood process described in Chapter 5, Section 5.8.2. When an
β̂ (the maximum likelihood
individual row of the design matrix is multiplied by β
estimate of β ) using the commonly observed rules of linear algebra, the result is
the expected value for the associated entry of R .
For the random portion of the model there are, as outlined above, two innovation
vectors (denoted h and ε ' ) each of length 20,678 which are assumed to be
uncorrelated. Both h and ε ' are hypothesized to be generated by normal
distributions, each having a mean of zero and variance-covariance matrices denoted
as H and Σ ' , respectively. H and Σ ' are diagonal matrices of size 20,678× 20,678
and as per similar assumptions for h and ε ' , H and Σ ' are uncorrelated.
A considerable set of analyses, as detailed once again in Chapter 5, Section 5.8.2,
was performed. In these analyses a range of alternative were examined. For the
selected model H is posited as
-198-
σ i , j 0 0

0 σ i , j 0

0 0 σ i , j
 
0 0
 
where:
0 0
0 0
0 0
σ i,j
0 σ i,j








σ i , j is the variance associated with the random effects from combination
of the i th market capitalization class ( i =1,2,…,5) and the j th year
( j =1,2,…,7).
This results in 35 distinct estimates of variance for this effect. Clearly the matrix is
likely to have numerous duplications of each of the values.
On the other hand for the selected model, Σ ' is structured as follows:
0
a1σ k 0

 0 a2σ k 0
 0
0 a3σ k

 
0
0
where:


0 0 
0 0 

a4σ k

0 aN σ k 
0 0
am is the estimated nugget variance computed as per Chapter 5, Section
5.4, m =1,2,…,20,678.
σ k is a multiplier of the nugget variance and a variance of the random
effect due to industry sector, k =1,2,…,20.
Once again there are likely to be numerous duplications of each of the estimated
multipliers within Σ ' .
-199-
The postulate version of the selected model was fitted as the parameters of a
multivariate normal distribution, using a maximum likelihood process which is
described in Chapter 5, Section 5.8.2.
In this model the variance-covariance matrices are hypothesized as diagonal,
uncorrelated and possess multiple duplicates of each of the individual quantities to
be estimated, This modeling assumption simplified enormously the estimation
process.
-200-
6.
6.1
Consequences Of The Research For Portfolio
Formation And Innovation
Overview Of The Chapter
In this Chapter (Chapter 6), the effort is to examine and apply the Gaussian Process
model and/or its components developed by the conclusion of Chapter 5 (see
Section 5.9). The examination occurs principally in three thrusts. Firstly, the
model is applied to the test data first introduced in Chapter 3, Section 3.6.
Secondly, the nugget variation portion of the model is used to augment the
commonly estimated MVO risk model in an effort to more precisely describe the
risk and in turn construct portfolios which yield increased return as suggested by
earlier research (Clark and Labovitz [2006]). Thirdly, because the second research
thread “requires” the investor to make a choice of which nugget to use (more
clearly stated as which return level is appropriate), the rudiments of a guidance
mechanism in this regard is suggested.
The test data consists of 100 equity series randomly selected from the 12,185 liquid
series available. They were subjected to the same processing steps as was the
larger training set of 3,000 series, and return values with a 52 week return level
were extracted. Coefficient values developed from the training set for model 7
-201-
selected in Chapter 5 (Section 5.8.2) were applied to the test set predictors to
estimate the expected returns. These results show that the expectations of these
new values are well within appropriate confidence intervals. Having used the test
data set to validate the fixed effects portion of the model, an examination is made
of the data’s variability. The empirical distributions of both the training and test
sets were compared by year and found to be the same for reasonable p-values.
Portfolio creation/optimization is at the heart of modern portfolio theory. As
described in Chapter 1, Section 1.4, the commonality in such models is increases in
reward are associated with increases in risk. But investing using more accurate and
more fully detailed descriptions of risk is associated with higher returns. The
nugget variance structure as computed in Chapter 5, Section 5.4, was computed by
year for seven return levels covering from just over one week out to three years.
These were augmented with off-diagonal elements and combined using simple
addition with the commonly used second central moment variances and
covariances (denoted Sigma). These risk structures also called covariance
structures were alternatively used in the commonly applied portfolio optimization
model MVO (see Chapter 1, Section 1.4.1). The returns from the use of these
various covariance structures on a well defined portfolio known as the Sharpe
portfolio (Sharpe [1966]) were examined using two rebalancing schemes as a back
test of the strategy (see footnote 14 for definition of back test). The results for both
-202-
rebalancing schemes were that the covariance structures of greater than 26 weeks
outperformed MVO Sigma, with respect to returns, while the covariance structures
of less than 26 weeks generally underperformed MVO Sigma. The outperformance
was between 100 percent and nearly 400 percent over a period of seven years
depending on the rebalancing scheme used.
This second research thread created a new open research question. Not all of the
covariance structures, even those of length greater than 26 weeks produced the
same level of outperformance of Sigma, nor was one covariance structure dominant
(with respect to MVO return) over all the others during the study time frame.
Therefore, an investor had a choice at rebalance time of which covariance structure
to use in portfolio optimization. Clearly the choice was very likely to make a
difference in the returns enjoyed by the investor. In this third research thread
examined within the Chapter (Chapter 6, Section 6.6) a time-leading or look ahead
signal will be sought as guide with respect to which covariance structure to select at
rebalance time. Based upon a long established inverse linear relationship between
returns for the S&P500 and returns from a volatility measure known as the VIX (le
Roux [2007]), the VIX was examined for its potential as such a signal. The result
was that the VIX tended to pick out the best covariance structure to use on a time
coincident basis. This meant that the correlation between returns of the VIX and
returns from the Sharpe portfolios were most negative for the covariance structure
-203-
which yield the greatest outperformance for the year. But the VIX is known as a 30
days look-ahead signal and this study was performing rebalancing once a year
Using the VIX as a year look ahead signal yielded mixed results, but the findings of
this research are promising and do suggest directions to go to obtain a look ahead
signal.
-204-
6.2
Tasks To Be Performed In The Chapter
In this Chapter (Chapter 6)the discussion will focus upon three research threads,
representing completion of the substantive portion of the dissertation:
1. Performing a model validation for the expected-value portion of the model
presented in Chapter 5, Section 5.8.1.
2. Using this model in portfolio construction and comparing the results with
those from a commonly used method.
3. Suggesting a model extension to incorporate a forecasting mechanism.
The test dataset described in the next section was used to perform some of these
research efforts.
6.3
Test Data Sets
As described initially in Chapter 3, Section 3.6, an independent test sample
consisting of 100 securities was created from the overall set of 12,185 securities.
These securities were put through a computational process identical to that for the
training set. During this process two securities were lost because of an inability to
generate an invertible Hessian matrix as part of the optimization. Therefore, the test
set was reduced to 98 securities; the names and identifiers/indices are presented in
Table 6.1.
-205-
6.4
Model Validation
As stated earlier, the test data were processed through the same set of procedures as
was the training set. In this section both the expected values or expected responses
and the residuals from the fixed-effects model are examined. The prepared test
dataset was run through the fixed-effects model, using the coefficients estimated
from the training set as created by selected model set forth in Chapter 5 Section
5.8.2. Therefore, a design matrix for the seven years over the 98 securities in the
test dataset was created, yielding 686 rows. If each row of this design matrix is
designated X h , h = 1, 2,..., 686 , the expected response associated with each of these
row vectors is computed as
E [Yh ] = X h β train
(6.1)
Using the estimates of β train , the estimated value of the expected response is given
by
Yˆh = X h btrain
where: E [Yˆh ] = E [ X h btrain ] = X h β train = E [Yh ] .
-206-
(6.2)
Table 6.1 Identifiers/indices and names of 98 securities in test data set.
Symbol/Ticker
Security Name
*ACD
Accord Financial Corp.
*DPM
*DYG
*EH
*GBE
*HEM
*LNK
*MCO.H
*MEQ
*MLI.H
*NWF.U
Dundee Precious Metals Inc.
Dynasty Gold Corp.
easyhome Ltd.
Grand Banks Energy Corp.
Hemisphere GPS Inc.
ClubLink Corp.
MCO Capital Inc.
Mainstreet Equity Corp.
Millstreet Industries Inc.
North West Company Fund
*SNC
*TCA*X
*TN
001750
026938
029450
SNC-Lavalin Group Inc.
TRANSCANADA CORP.
True North Corp.
F&C Global Smaller Cos. PLC
DA Group PLC
Edinburgh Dragon Trust PLC
Gartmore European Investment
Trust PLC
Hansa Trust PLC
Seed Co. Ltd.
Taihei Kogyo Co. Ltd.
Kameda Seika Co. Ltd.
Internix Inc.
Bull-Dog Sauce Co. Ltd.
Ariake Japan Co. Ltd.
Japan Wool Textile Co. Ltd.
Tokai Senko K.K.
Bouygues S.A.
Autostrada Torino-Milano S.p.A.
Okura Industrial Co. Ltd.
Immobiliere Hoteliere S.A.
Lisgrafica Impressao Artes
Graficas S.A.
Nel Lines
Tohpe Corp.
Corticeira Amorim SGPS
S.BSB.A
Dromeas S.A.
MOL Hungarian Oil and Gas Plc
D.H. Cyprotels Public Ltd.
Thrace Plastics Co. S.A.
FORTIS INV MGMT FORTIS
OBAM
Valeo S.A.
Duran Duboi
Fresenius Medical Care pfd
Consolidated Minerals Ltd.
Lynas Corp. Ltd.
ChemGenex Pharmaceuticals
Ltd.
Downer EDI Ltd.
Bytes Technology Group Ltd.
Analog Devices Inc.
052688
078797
1739
1819
2220
2657
2804
2815
3201
3577
400212
406398
4221
442383
454444
455959
4614
465773
465873
474249
474346
487891
490084
493757
500079
516007
611292
612117
627363
646557
691406
ADI
Symbol/Ticker
Security Name
AEM
Agnico-Eagle Mines Ltd.
Law Enforcement Associates
AID
Corp.
B16RK7
Wo Kee Hong (Holdings) Ltd.
B17MN5
nTorino Corp. Inc.
B17Q6Z
Dongwon Metal Co. Ltd.
B1G9ZL
See Corp. Ltd.
B1YBRK
CEMIG-Cia Ener Minas Ger
BHE
Benchmark Electronics Inc.
BPOP
Popular Inc.
BRBI
Blue River Bancshares Inc.
BTRNQ
Biotransplant Inc.
BrightStar Information
BTSR
Technology Group Inc.
CAR
Avis Budget Group Inc.
CENX
Century Aluminum Co.
CHKE
Cherokee Inc.
CHMS
China Mobility Solutions Inc.
CTV
CommScope Inc.
DELL
DTIIQ
EDEN
ENGY
EQY
FCZA
FITB
FL
FMC
FRX
HE
HNR
KNGS
LIV
Dell Inc.
DT Industries Inc.
EDEN Bioscience Corp.
Enviro-Energy Corp.
Equity One Inc.
First Citizens Banc Corp.
Fifth Third Bancorp
Foot Locker Inc.
FMC Corp.
Forest Laboratories Inc.
Hawaiian Electric Industries Inc.
Harvest Natural Resources Inc.
Kingsley Coach Inc.
Samaritan Pharmaceuticals
PKTR
PPS
PRA
Packeteer Inc.
Post Properties Inc.
ProAssurance Corp.
PTEO
QCRH
RGMI
RSG
RVI
Proteo Inc.
QCR Holdings Inc.
RG America Inc.
Republic Services Inc.
Retail Ventures Inc.
SATC
SPA
SSCC
STT
SUNW
TTES
Satcon Technology Corp.
Sparton Corp.
Smurfit-Stone Container Corp.
State Street Corp.
Sun Microsystems Inc.
T-3 Energy Services Inc.
VSTY
WMK
XCHC
ZHNE
Varsity Group Inc.
Weis Markets Inc.
The X-Change Corp.
Zhone Technologies Inc.
-207-
Figure 6.1 is a plot of the Yˆh versus the observed values of the log of the 52-week
return level. The plot suggests the observed values follow very well the response
estimated, using the coefficients estimated from the training set. By eye, there
appear to be small changes in variance over the estimated mean response value.
Such a result would be consistent with the findings from the analysis of the training
dataset.
6.4.1
Expectations Of The Test Set
In a related analysis with these data prediction intervals were computed for the test
data. In this analysis the data are treated as new observations. In that case the
parameters or coefficients used to estimate the mean response were the ones
estimated using the training set. The results per Neter et al. (1996) were developed
through results of the estimate of the matrix standard error for the coefficients,
which is given by
s 2 (b ) = MSE ( X T X )−1
where: MSE is the mean squared error for the training data model.
X is the design matrix of the training set.
-208-
(6.3)
Figure 6.1 Plot Of Estimated Mean Response For The Test Dataset, Using
Coefficients Based On The Training Set Fixed-Effects Model
Versus The Logged 52-Week Return Response. Red line is y = x .
Using some elementary mathematical statistics, the estimated variance of the new
response value led to the following result:
s 2 (Yˆh ) = X hs 2 (b ) X hT
-209-
(6.4)
with all quantities as previously defined. Finally, the prediction interval of a new
observation was given by
s 2 ( pred ) = MSE + s 2 (Yˆh )
(6.5)
In order to control for α error in setting the (1- α ) prediction interval for a large
number of new observations, the Scheffé predicting limits for g different levels of
X h (Neter et al. [1996]) was used, which is given by
Yˆh ± S × s( pred )
(6.6)
where: S 2 = g × F (1 − α ; g, n − p ) .
n is the number of observations in the training set.
p is the number of parameters or coefficients used in the training set
model.
Figure 6.2 is a plot of the estimated mean responses (Predicted) versus the
observed log of 52 weeks of return values (Obs), the same data as in Figure 6.1.
-210-
0
-10
-5
Log Return 52 Weeks
5
Scheffé
Individual
О Obs
Predicted
2
2
22222222222
2
22222
2
2
2222222
2222
22222222 22
2222222222222222222222
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
22222222222222222222222222222222222222222222222222222
2
2
222
222222222222222222222222222222222222222222222
2
2
2
2
2
2
2
2
2
2
1
11111111111111111111
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
111111111111111111
111111111111111111111111111111111111111111111111
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1111111111111111111111111111111111111111
111111111111111111111111
111111111111111
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
111111111111111111111111
111111111111111111111111111111111111111111111111111
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
11111111111111111111111111111111
111111111
2
222222222222222
2222222
22222
2222222
2
2
2
2
2
2 2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
222222222222222222222222222222222
22222222222222222222222222222222222222222222222222222
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
22222
222
2 22
0
100
200
300
400
500
600
700
Plotting Index
Figure 6.2 Mean Response And Observation Values Within 95% Prediction
Interval Envelopes, Using Individually Derived And SchefféAdjusted Prediction Intervals. Plot Elements Are Denoted In
Legend Located Within Graph.
The blue lines are 95% prediction intervals, individually derived response
prediction intervals and the green lines are the 95% Scheffé-adjusted prediction
intervals using g = 98 . Note that not only do the observations correspond well to
their expectations but variation is well within the 95% individual prediction
intervals and entirely within the 95% Scheffé-adjusted prediction intervals. This
-211-
result is interpreted to mean that all the expected value structure captured by the
model developed in Chapter 5, Section 5.8.2 is applicable to the training set of data.
It is believed that the fixed-effects model, which defines the mean response, is
thereby validated.
6.4.2
Variability And Distribution
In The Test Data Set
But what about the overall remaining structure in terms of the distributions? The
two sets of residuals from the extended fixed-effects models of the training set and
those from the test set were compared for distributional similarity on a year-by-year
basis as well as over the entire dataset. The test used was the Kolmogorov-Smirnov
(KS) two-sample test, which has the following structure:
•
H0 : The two samples come from a common distribution.
•
Ha : The two samples do not come from a common distribution.
•
Test Statistic: The KS two-sample test statistic.
The KS test (NIST [2003]) is based on the empirical cumulative distribution
functions (ECDF), each of which is defined as: let y1, y 2 ,..., y N be a set of N
observations, then
P[Y ≤ y i ] E (i ) = card ( y i ) / N
where: card ( y i ) is the number of points less than y i .
-212-
(6.7)
Then the two samples were defined:
D = max | E1 (i ) − E2 (i ) |
1≤ i ≤N
(6.8)
H 0 is rejected and Ha is accepted, if D exceeds a table value. In the present
circumstance a Gaussian importance sampler (Durham, G., personal interview,
October, 2008) was used to generate the weights to select a sample from the
residuals of the training set with which to perform the KS test. The results of the
KS for each year and over the entire analysis period are given in Table 6.2.
Table 6.2 P-values from KS test of the null hypothesis that
the residuals from the fixed-effects model run on
both the training and test sets have the same
distribution.
Year
2001
2002
2003
2004
2005
2006
2007
All Periods
P-Value
0.6901
0.1469
0.0732
0.3559
0.5699
0.4565
0.5699
0.3685
From this test there is no evidence to reject the null hypothesis for any of the study
years or over the entire analysis study period. That is, there is no evidence to
suggest the distributions were unequal.
-213-
The results from the analysis of the test data suggest a factor model has been
created which allows a researcher or practitioner to examine the joint distribution of
tail behavior equity returns over a variety of combinations of return levels under a
multivariate lognormal assumption. Using this model, the practitioner can set the
probability or return level and estimated joint probabilities over vectors of quantiles
as desired.
6.5
Application Of Model Results
To Portfolio Formation
As a final line of inquiry in this dissertation an examination was performed
concerning whether or not the elements of the model just presented could be used
to create and improve portfolios. Because there are many ways in which results
from the return model could possibly be used in construction, this analysis focuses
on performing a comparison with a well-recognized alternative. (In Chapter 7,
Section 7.5, other research threads arising from the model and which may be of use
to the financial community are discussed.)
The question which is focused upon is: Could these results be used to improve the
performance of portfolios (defined Chapter 1, footnote 1 within Section 1.3)
generated using common mean-variance optimization (MVO)? MVO is defined in
Chapter 1, Section 1.4.1. Could performance be improved while maintaining or
-214-
reducing the proportional amount of assumed risk? Specifically, could uncertainty
captured within the “nugget” variation associated with specified extreme-return
levels be used to more clearly define the risk in optimizing portfolios and to
produce better performance in the out-of-sample years?
At an overview level the procedure was as follows:
•
Using previous results and methods for computing the time-varying
return values and their variance given in Chapter 5 (Sections 5.3 and
5.4), compute the time-varying “nugget” variances for each equity by
year for the following return levels: slightly greater than one week
(1.0526), two weeks, four weeks, a half year, one year, three years, and
six years. The probabilities of exceeding the quantiles associated with
these return values were 0.95, 0.50, 0.25, 0.0383, 0.0192, 0.0064, and
0.0032, respectively.
•
Estimate the annual correlations from the return data for the equities
used in the portfolio optimization and use these to compute the
remainder of the covariances, the off-diagonal elements.
•
Combine these “extreme” covariance matrices with a conventionally
computed or base covariance. For the present research combining these
two matrices was a rather naïve element-by-element summation.
(Chapter 7, Section 7.5) will suggest much more sophisticated
-215-
approaches for combining the uncertainty structures.) This uncertainty
structure was computed on an annual basis. For this analysis the
following eight covariance structures were examined:
1. Conventional covariance, that is the matrix of estimated central
second moments of security returns on the diagonal and
estimated covariance of security returns off diagonal (denoted in
the following discussion as Sigma)
2. Sigma + 1.0526-week return level covariance matrix(denoted in
the following discussion as Sigma, 1.0526 Wks)
3. Sigma + 2-week return level covariance matrix (denoted in the
following discussion as Sigma, 2.0 Wks)
4. Sigma + 4-week return level covariance matrix (denoted in the
following discussion as Sigma, 4.0 Wks)
5. Sigma + 6-month return level covariance matrix (denoted Sigma,
26.071 Wks)
6. Sigma + one-year return level covariance matrix (denoted in the
following discussion as Sigma, 52.142 Wks)
7. Sigma + three-year return level covariance matrix (denoted in
the following discussion as Sigma, 156.42 Wks)
8. Sigma + six-year return level covariance (denoted in the
following discussion as Sigma, 312.85 Wks)
-216-
•
Use a high-grading step (Labovitz [2007]) to reduce the number of
equities going into the final portfolio step from approximately 3,000 to
an average of 200 on an annual basis. The set of post-high-grading
securities is herein referred to as the candidates and the group
collectively is referred to as the candidate pool.
•
Apply quadratic optimization techniques to generate an efficient frontier
of about 600 portfolios. The efficient frontier is discussed and defined in
Chapter 1, Section 1.4.1.
The optimization was of the form
max[aT µr − λ aT (Σ + Σ ε RL )a]
s.t .
ai ≥ 0, i = 1, 2,… , numSec Ns
(6.9)
Ns
∑a
i
=1
i =1
where: a is a vector of weights of length equal to the number of
securities, all weights must be greater than or equal to 0, and the
sum of the weights must equal 1.
µr is a vector of expected returns for the securities in the candidate
pool, which is to be estimated by rˆ .
λ is a scalar used to capture risk aversion and enter the present
analysis through setting a target for the smallest and largest
portfolio risk as well as an increment of additional risk for each
portfolio.
Σ, Σ ε RL are, respectively, the conventional covariance matrix and
the covariance matrix for a specific return level (listed above).
-217-
Although it is not indexed to avoid notation overload, each of the
quantities was estimated on a year-by-year basis.
•
Create portfolios lying along the efficient frontier for each combined
covariance structure for each year (2001-2007), using within each year
the same candidate pool across the optimization runs for each
covariance structure.
•
Assume the invest-and-hold or annual-rebalancing strategies for each of
the eight risk models.
The strategy described is one commonly used in back-testing18 financial models. In
this strategy in-sample and out-of-sample components are differentiated. In certain
ways these elements are analogous to the training and test-set structure more
familiar to statisticians, but they frequently more explicitly incorporate time into
the analysis. The in-sample/out-of-sample (IS/OS) strategy works in the following
manner: Any time unit for which the weight sets (the a vectors defined ) is
estimated is considered an in-sample element, and the process of building a weight
set is called rebalancing. The reader should note that construction of a weight set
has two consequences. Firstly, the non-zero weights (the positive weights as per the
18
Backing testing is a procedure commonly used in financial model testing wherein a strategy
developed at the present time, such as the portfolio formation/optimization procedure being
explored here, is tested using historic data as if it had occurred today, by simulating. Data and
decisions going into the model test are restricted to that known at the investment horizon.
-218-
constraints) define which candidate securities are actually in the portfolio. The
securities that make the portfolio are called the constituents. Collectively, the
constituent list or the portfolio differentiates this set from the candidate pool.
Secondly, the weights define how much (what proportion of the portfolio) is
invested in each security. So, for example, if the total value of the portfolio at time
t is given by Mt and under the present asset allocation (weighting) scheme,
security j in the portfolio has a weight of a j , then a j Mt is the value security j
holds in the portfolio.
Any time unit to which weights are applied to returns realized during the duration
of the time unit but to which the returns are not used to estimate weights, is
considered an out-of-sample element or test element. The manner in which the
weights are applied to an OS element is to multiply each weight times its respective
daily return and compound the daily returns. Of course, out-of-sample elements can
become in-sample elements. In fact, this is precisely the analysis approach adopted
for back testing the strategies developed herein and described below. The analysis
strategy used would be some variant on the following theme: Develop a set of
portfolios (and weights) for each uncertainty model for month t and then apply the
weights unchanged to the returns for year t+1 yielding the performance for each
portfolio at the end of year t+1. Repeat this process by developing a new set of
-219-
portfolios and weighting schemes based upon returns from year t+1 and then
applying them to year t+2.
This strategy of alternating building the portfolio and estimating the performance
has the following two important aspects: Firstly, it mimics a commonly
recommended strategy involving investing and reevaluating the investment
portfolio. This approach is the opposite of investment timing19, a questionable
strategy not recommend for the vast major of investors who are of the “invest and
forget” mindset. The strategy also mimics the real-life practice of rebalancing a
portfolio, namely changing the securities mixture (in this case equities) to achieve
good performance in the marketplace in the short-term until the next rebalance. It
makes the assumption that in the short-term, “What is past is prologue,”
(Shakespeare [2004]); in other words what has happened in the recent term will
continue to happen in the near short term. Secondly, this approach as practiced
places investors in the position of experiencing precisely what would have
happened had they invested at that time in the manner described. No information
available at a time later than the investment time horizon was used in constructing
the portfolios.
19
A strategy for investing which tries to create and use signals regarding the price movements of
markets and specific securities typically over the short term (sometimes as short as intraday) and to
trade based-upon those signals. This is as opposed to investing for the long term and historic long
term market gains.
-220-
6.5.1
Construction Of Efficient Frontiers
As the reader may recall from Chapter 1, Section 1.4.1, the efficient frontier—a
concept initially developed by Markowitz (1952)—may be defined in one of two
ways:
1. Select a set of volatilities (risks) that are positive and strictly monotone
increasing. For each of the risks identify the portfolio with the highest
expected reward (return) among all portfolios possessing that level of
volatility.
2. Select a set of expected rewards (returns) that is strictly monotone
increasing and for each of the levels of reward identify the portfolio with
the lowest risk among all portfolios possessing that level of expected
reward.
The two definitions are equivalent and the set of portfolios satisfying each
definition forms the efficient frontier, which is typically depicted graphically (for
example, see Figure 1.1) in two-dimensional space with volatility/risk on the
abscissa and reward/performance on the ordinate20. The set of portfolios lying
along the efficient frontier is also known as the optimal portfolios, inasmuch as in
20
It should be apparent to the reader that since the form of the objective is quadratic in the weights
for which the investigator is seeking a solution, two solutions may possibly exist. In such instances
the efficient frontier consists of the solution that possesses the maximum reward of the two
solutions.
-221-
the absence of borrowing money and taking a short position21 for a set of securities
in the portfolio, the efficient frontier divides the space of feasible solutions
(portfolios) from the space of infeasible solutions (portfolios). The latter is the
space in which the combination of risk and reward does not exist in any portfolio.
To create the efficient frontier the R functions from Wuertz (2009) were modified.
The efficient frontier was created by defining a range of the minimum and
maximum returns and an increment, such that 600 portfolios were computed. This
is a point which bares highlighting, that is an efficient frontier represents an infinite
number of portfolios (obviously finite in estimation) formed from the same set of
securities. The difference portfolio to portfolio is the weighting of the securities or
the proportions of a security forming a specific portfolio.
In further processing, the code removed any portfolio representing second solutions
(see footnote on previous page) and any portfolio near the origin affecting the
convexity of the efficient frontier. This reduced the number of portfolios from 600
to a number no less than 518 in all cases. Figures 6.3 through 6.5 provide examples
21
A short position trade is one in which the seller sells a security s/he does not own. In this trade
the seller sells to the buyer a quantity of a specific security, at a stated price to be delivered on a
specific date in the future. To meet regulatory requirements the seller must borrow and hold the
designated quantity of the security. The seller makes money if the security falls in value before the
delivery date, because the buyer will pay a higher than market price for the security. The seller
deliveries the borrowed securities and with the proceeds goes in the market and purchases
replacement securities to satisfy the lending party, then pockets the difference as profit. Of course
the seller loses money if the price of the security goes up.
-222-
of the efficient frontiers over various years and covariance structures. The first
image in each set is the raw solution to quadratic optimization. The second image,
labeled “Upper Quad Optim,” is a plot of the quadratic optimization solution with
second solutions removed. In the third plot any “starting” solutions that do not form
part of the convex solution was removed. The straight line in the third plot is an
estimate of the capital-asset-pricing model (defined Chapter 1, Section 1.4.1) for a
risk-free return rate of 0.0. The point of tangency between the straight line and the
convex efficient frontier is also known as the Sharpe portfolio (Sharpe [1966]),
defined as the
max(
E [ Reward i ],i ∈EF
(E [Re ward i ] − RiskFree ) / UnitRisk i )
max(( µ r − rRF ) / σ i )
µri ,i ∈EF
i
-223-
Mean
0.1
0.2
0.3
0.4
0.1
0.2
0.3
0.4
Standard Deviation
Factorized = FALSE, No. of Portfolios = 518, Risk Free Rate = 0
0.0
X
Convx Frntr for 2002, Risk==Sigma
Standard Deviation
Factorized = FALSE, No. of Portfolios = 600, Risk Free Rate = 0
0.0
Quad Optim for 2002, Risk==Sigma
Mean
0.1
0.2
0.3
0.4
Standard Deviation
Factorized = FALSE, No. of Portfolios = 518, Risk Free Rate = 0
0.0
Upper Quad Optim for 2002, Risk==Sigma
Figure 6.3 Graphs Depicting The Results In The Formation Of Efficient Frontiers For The Sigma Covariance
Structure, Rebalance Year 2002. The Line Represents The Capital Asset Pricing Model (CAPM), The
Red X Is The Tangent Between The CAPM, And The Efficient Frontier Is The Sharpe Portfolio. Other
Data Provided Include The Number Of Portfolios Forming The Efficient Frontier, The Risk-Free Rate
Of Return, And An Indication If The Covariance Matrix Was Factored.
Mean
0.06
0.04
0.06
0.04
0.02
0.00
0.02
0.00
0.06
0.04
0.02
0.00
-224-
Mean
0
1
2
3
4
2
3
4
Standard Deviation
Factorized = FALSE, No. of Portfolios = 589, Risk Free Rate = 0
1
Convx Frntr for 2004, Risk==Sigma, 26.071 Wks
Standard Deviation
Factorized = FALSE, No. of Portfolios = 600, Risk Free Rate = 0
X
0
Quad Optim for 2004, Risk==Sigma, 26.071 Wks
Mean
0.20
0
2
3
4
Standard Deviation
Factorized = FALSE, No. of Portfolios = 589, Risk Free Rate = 0
1
Upper Quad Optim for 2004, Risk==Sigma, 26.071 Wks
Figure 6.4 Graphs Depicting The Results In The Formation Of Efficient Frontiers For The Sigma, 26.071 Weeks
Covariance Structure, Rebalance Year 2004. The Line Represents The Capital Asset Pricing Model
(CAPM), The Red X Is The Tangent Between The CAPM, And The Efficient Frontier Is The Sharpe
Portfolio. Other Data Provided Include The Number Of Portfolios Forming The Efficient Frontier, The
Risk-Free Rate Of Return, And An Indication If The Covariance Matrix Was Factored.
Mean
0.20
0.10
0.00
0.10
0.00
0.20
0.10
0.00
-225-
0.06
0.04
0.2
0.4
0.6
0.8
1.0
1.2
0.2
0.4
0.6
0.8
1.0
1.2
Mean
0.2
0.4
0.6
0.8
1.0
1.2
Standard Deviation
Factorized = FALSE, No. of Portfolios = 548, Risk Free Rate = 0
0.0
Upper Quad Optim for 2006, Risk==Sigma, 156.42 Wks
Figure 6.5 Graphs Depicting The Results In The Formation Of Efficient Frontiers For The Sigma, 256.42 Weeks
Covariance Structure, Rebalance Year 2006. The Line Represents The Capital Asset Pricing Model
(CAPM), The Red X Is The Tangent Between The CAPM, And The Efficient Frontier Is The Sharpe
Portfolio. Other Data Provided Include The Number Of Portfolios Forming The Efficient Frontier, The
Risk-Free Rate Of Return, And An Indication If The Covariance Matrix Was Factored.
Standard Deviation
Factorized = FALSE, No. of Portfolios = 548, Risk Free Rate = 0
0.0
X
Convx Frntr for 2006, Risk==Sigma, 156.42 Wks
Standard Deviation
Factorized = FALSE, No. of Portfolios = 600, Risk Free Rate = 0
0.0
Quad Optim for 2006, Risk==Sigma, 156.42 Wks
0.06
0.04
0.02
0.00
0.02
0.00
0.06
0.04
0.02
0.00
-226-
where:
the argument of the maximum function is known as the Sharpe ratio.
µri is the expected value of return from portfolio i .
EF is the efficient frontier.
rRF is the risk-free rate of rate, set to zero in this analysis.
σ i is the risk for portfolio i .
So, the Sharpe portfolio is the portfolio that provides the greatest return per unit of
risk.
6.5.2
Applying The MVO Weights
The MVO weights generated as described above were applied. Figure 6.6 depicts
an invest-and-hold strategy. Using the same candidate pool, Sharpe portfolios for a
2001 rebalance were identified for each of the covariance structures. Each of the
2001 Sharpe portfolios were held over the entire investment horizon of this
research. Using this invest-and-hold strategy for the portfolios, the compounded the
portfolio returns over the seven-year investment horizon was computed. All of the
covariance structures with return periods greater than or equal to 26.071
weeks ≈ 0.5 year yielded investments better than or equal to the base covariance
structure (Sigma). On the other hand, all of the covariance structures representing
return periods less than 0.5 year did worse than the base covariance structure.
To look at the entire sweep of the portfolios (all the portfolio on an efficient
frontier including the Sharpe portfolio) the researcher computed for each
-227-
covariance structure the average return over the entire set of portfolios for each of
the out-of-sample years using an the invest-and-hold strategy and then “adjusted”
(divided) the average by the standard deviation. The result is plotted in Figure 6.7.
In this plot the greater the value on the ordinate the greater reward per unit of risk.
Once again, the adjusted means for portfolios with return levels ≥ 0.5 year exceed
the base covariance model adjusted means by a large amount. Over time, as
expected without an interim rebalance, the advantage of the large return means
declines.
While investing and holding is an investment strategy recommended by some
professionals, more professionals recommend periodically rebalancing or
reviewing a financial portfolio. A year is a common period between rebalances.
Figure 6.8 depicts the results of investing in an alternative strategy to invest and
hold. For this strategy the Sharpe portfolio was “reinvested” each year in that
Sharpe portfolio which yielded the best reward over all of the covariance structures,
the worst reward over all of the covariance structures and in the Sharpe portfolio
built upon the base covariance structure. The results are considerably different. The
best reinvestment strategy product produced a return of just under $35 for every
dollar invested ($34.70), the worst strategy produced just under $9 for every dollar
invested, and the base covariance structure produced just over $12 ($12.40) for
every dollar invested. Of course, none of these returns were adjusted for transaction
-228-
costs. But there is a clear advantage to choosing the best strategy. Table 6.3 shows
the best strategy for each year. Here the best strategy means the covariance
structure which produced the highest return for the Sharpe portfolio
Table 6.3 Rebalance strategy yielding highest returns under the set of covariance
structures described in the text for an annual rebalance over the years
indicated
Year
Best
Covariance
Structure
2002
Sigma,
156.42
Wks
2003
Sigma,
312.85
Wks
2004
Sigma,
26.071
Wks
-229-
2005
Sigma,
156.42
Wks
2006
Sigma
2007
Sigma,
156.42
Wks
-2302002
7
8
3
6
2
1
5
4
1
2
3
4
5
6
7
8
2003
7
6
8
3
5
2
4
1
Sigma
Sigma, 1.0526 Wks
Sigma, 2.0 Wks
Sigma, 4.0 Wks
Sigma, 26.071 Wks
Sigma, 52.142 Wks
Sigma, 156.42 Wks
Sigma, 312.85 Wks
Legend
4
1
5
3
2
8
6
2005
Out of Sample Years
2004
4
6
8
1
3
5
2
7
7
2006
4
5
1
3
2
6
8
7
Cumulative Return Sharp Portfolio, Rebalance Year 2001
2007
4
2
3
1
5
8
6
7
Figure 6.6 Cumulative Return Plot (VAMI) Over The Years Indicated For An Invest-And-Hold Strategy Under
The Set Of Covariance Structures Described In The Text.
Cumulative Return
15
10
5
-231-
2002
8
4
3
1
2
5
7
6
2003
4
3
1
2
6
5
8
7
7
6
5
8
4
2
3
1
2005
Out of Sample Years
2004
4
3
1
2
7
8
6
5
1
2
3
4
5
6
7
8
Normalized Returns, Rebalance Year 2001
2006
7
6
5
1
2
3
4
8
Sigma
Sigma, 1.0526 Wks
Sigma, 2.0 Wks
Sigma, 4.0 Wks
Sigma, 26.071 Wks
Sigma, 52.142 Wks
Sigma, 156.42 Wks
Sigma, 312.85 Wks
Legend
2007
7
5
6
8
4
1
2
3
Figure 6.7 Returns Normalized By Standard Deviation For An Invest-And-Hold Strategy Under The Set Of
Covariance Structures Described In The Text.
Normalized Return
5
4
3
2
1
.
2002
M
m
S
M
S
m
2003
M
m
S
Maximum 1 Year Rtn
Sigma Only 1 Year Rtn
Minimum 1 Year Rtn
Legend
S
m
2005
First Out of Sample Years
2004
m
S
M
M
Cumulative Return Sharp Portfolio, Sequence Rebalance Years
2006
m
S
M
2007
m
S
M
Figure 6.8 Cumulative Return Plot (VAMI) Over The Years Indicated For An Annual-Rebalance Strategy For
Maximum, Minimum, And Base Covariance Structures For Risk
Cumulative Return (VAMI)
35
30
25
20
15
10
5
0
-232-
The observation to notice from Table 6.3 is that the dominant numbers of the best
reinvest strategies are, as per the other graphics, coming from the covariance
structures with return periods of 0.5 year or greater.
The obvious rub in the material just discussed is, are there any signals which can be
used before the fact to suggest which covariance structure to use? In the next
section one such signal is explored.
6.6
Predicting The Best Covariance Structure
A long-observed result has been the high negative correlation between the returns
of the S&P 500 Index and the VIX (le Roux [2007]), where the VIX is the Chicago
Board Options Exchange’s (CBOE) VIX, which has an estimated 30 days in
advance of implied volatility. Figure 6.9 depicts a plot of daily returns from VIX
versus daily returns from the S&P 500 for the period from January 3, 2000, to
December 31, 2007. The correlation is minus 0.75
-233-
-0.2
0.0
S&P500 Daily Return
0.2
Plot of Log(Returns On VIX) Vs S&P 500 Returns
0.4
Figure 6.9 Plot Of Daily Returns From S&P 500 Versus Similar Measure Of The VIX For The Period From
2000-2007.
Log(Return VIX)
0.06
0.04
0.02
0.00
-0.02
-0.04
-0.06
-234-
The VIX is computed using the implied volatilities of a wide range of S&P 500
Index puts and calls. The VIX, a widely observed measure of market risk, is meant
to be forward looking. The VIX is often referred to as the "investor fear gauge"
(Investopedia [2009]).
Correlations between the daily VIX and out-of-sample daily returns for each of the
Sharpe portfolios developed from each of the covariance structures were looked at;
these results are given in Table 6.4. The following observations are evident: Firstly,
the VIX is built upon S&P 500 options and therefore is likely to be much more
sensitive to the behavior of large-cap (capitalization) and mega-cap securities. The
portfolios generated by the investigator came from a candidate pool that included
micro-, small-, mid-, large-, and mega-cap firms. The entries in Table 6.4 are
correlations; the bolded and underscored values are the best periodic rebalance
strategies. Despite the basics of VIX construction, there appears to a good
alignment between the maximum negative value and the best strategy, especially if
the observer considers just covariance structures with return periods greater than or
equal to 0.5 year, i.e. the covariance structures based on return periods of greater
than or equal to approximately 26 weeks. In Table 6.4 these are the covariance
structures in the lower half of the table.
-235-
Table 6.4 Correlation between returns on VIX and returns on Sharpe portfolios
for the years indicated. Bolded and underscored values in columns
represent the covariance structure providing the best returns for an
annual rebalance strategy.
To Year
2002
2003
2004
2005
2006
2007
Sigma
-0.1797
-0.1817
-0.1358
-0.0193
0.0019
-0.1025
Sigma,
1.0526
Wks
-0.2300
-0.1927
-0.4042
-0.0270
0.0037
-0.1137
To Year
2002
2003
2004
2005
2006
2007
Sigma,
26.071
Wks
-0.1904
-0.2560
-0.3911
-0.1629
0.0231
-0.1651
Sigma,
52.142
Wks
-0.1985
-0.2502
-0.4091
-0.1712
0.0222
-0.1495
Sigma, 2
Wks
-0.2270
-0.1528
-0.4008
-0.1476
0.0149
-0.1689
Sigma, 4
Wks
-0.4095
-0.1472
-0.3946
-0.1790
0.0212
-0.1781
Sigma,
156.42
Wks
-0.2221
-0.2341
-0.1003
-0.1962
0.0198
-0.1650
Sigma,
312.85
Wks
-0.2155
-0.2589
-0.4105
-0.2088
0.0202
-0.1002
While this appears to be the beginning of a signal, it is still contemporaneous with
the returns. Therefore, an open question is whether there exists a signal available in
advance of the investment decision that an investor can observe at the time the
annual rebalance decision is to be made. The VIX returns from the preceding year
were examined as a signal to guide selection of a covariance structure in the
following year. The results were mixed, with about half of the investment decisions
-236-
(counting by year) suggested by the signal being the correct one or the second best
choice; the other half was not as predictive. This is not totally a surprise—the VIX
is advertised as an about-a-month-look-ahead estimate of implied volatility. It
would be a bit surprising if it behaved as a year-ahead signal. However, the results
observed leads to the belief that the search for such an “early-warning” is a
worthwhile effort and is likely to bear fruit. Research threads aimed at further
exploration for this look-ahead signal are discussed in the section on future
research (see Chapter 7, Section 7.5). Regardless of whether the research continues
to examine the utility of the VIX or pursues another measure as a substitute for or
for use in combination with the VIX, the investigator would:
1. Look for an appropriate weighting scheme for the annual VIX series, giving
more weight to the most current estimate of the VIX.
2. Reconsider the rebalance period to be more frequent (for example,
quarterly) and use the signal series of appropriate length.
3. Pursue some combination of Approaches 1 and 2.
An interesting question is why there is poorer behavior of covariance structure with
return periods of less than 0.5 year. A simply stated hypothesis is that the nugget
covariances for the return period are adding no additional information over that
provided by the base covariance in terms of describing the risk in the portfolio.
There is some support for this interpretation. The standard deviations of the base
-237-
covariance are very similar to those nuggets out to about four weeks. So, in fact,
adding this shorter-return-period covariance is somehow “misdirecting” or
“subtracting” from the ability of the base covariance to describe risk.
In this Chapter (Chapter 6, Section 6.4) the researcher has provided validation for a
model with a potentially wide set of applications, including the ability to estimate
the joint probability of any given set of equities for an arbitrary set of return periods
(return values). The Chapter (Chapter 6, Section 6.5) concluded with a specific
application of elements computed for a model of the common and important
problem of portfolio formation. It was shown that, in a context realistically
mimicking an investing environment, material outperformance of a commonly used
model could be had by an investor with the introduction of the covariance structure
associated with longer return periods. This is true for investors with hold/forget
strategies as well as periodic-rebalance strategies. Finally, an initial exploration was
undertaken looking for a signal to predict which covariance structure an investor
should selected to get the most performance out of a periodic-rebalance strategy .
In the next Chapter (Chapter 7) the researcher summarizes the research presented
herein, the unique aspects of the dissertation, and future research that could offer
very ample opportunities for understanding portfolio risk/return.
-238-
7.
7.1
Summary And Conclusions, Along With
Thoughts On The Current Research
Overview Of The Chapter
In this the final Chapter of the dissertation there are four sections which tie-together
the research, speak to its unique and innovative aspects and lays-out some
extensions to the research performed as well as some additional, new research
threads.
The Chapter (Chapter 7, Section 7.2) commences with a summary of the research
performed. This is followed by a section (Chapter 7, Section 7.3) which takes the
summaries and articulates them in terms of the conclusions which they yielded.
Behind the summary and conclusions and perhaps leading to them is a set of unique
and novel features which underpin the research (Chapter 7, Section 7.4). These are
enumerated in the third section of the Chapter. The final section (Chapter 7,
Section 7.5) brings together and organizes the next steps which will extend the
presented research into areas which the researcher believes are promising.
-239-
7.2
Summary
It is clear that the characterization of risk is important in the design of financial
investment strategies. A host of earlier investigators has shown that returns on
equities are not normally distributed and that this departure from normality can
mean that estimates of tail-risk (the probability density in the distributions’ tails)
are greater—potentially far greater—than would be computed under the normaldistribution assumption. Many researchers have in fact concluded that the return on
equities is in general leptokurtic (too much density in the center and in the tails and
too little in the shoulders) and negatively skewed. Therefore, commonly occurring
levels of returns or returns in the center of the distribution may appear Gaussian. It
is the returns in the tails that are of most concern to investors, and in fact
understanding tail behavior is perhaps the most important (legal) financial
intelligence component relating to whether money will be made or lost. Sadly, an
understanding of tail behavior is not widespread among investors (especially in the
U.S.), and—to be fair—there are still a lot of unknowns. The purpose of the
research in this paper is to study at least some of these unknowns in a new and
novel fashion.
The starting perspective for this research—based on a host of other researchers’
prior analyses—was that better characterizations of risk do matter in terms of
improving returns and that the characterization of risk that produces consistently
better returns is an important goal. The research focused upon worldwide equity
-240-
performance because of the desire to make general statements about
performance/risk over a broad range of geographies and types of businesses. In so
doing, the research looked at a very important set of securities and a large asset
class—in fact, a fundamental asset class with direct links to the business of the
investment business (equities are not complicated by all the overlying factors that
impact derivatives or collective investments).
For the initial research it was decided not to include other fundamental asset classes
such as fixed income or bonds, since they could create unwarranted complications
for the research. Daily performance data for 76,424 common stocks from 95 stock
exchanges, covering the period January 2000 to August 2007, were gathered (see
Chapter 3, Section 3.3). Along with the performance data a set of ancillary data
(some value statistics) for firms and exchanges was also collected. Finally, a
substantial number of economic and market time series were gathered. These
covered the same timeframes as the performance data that, taken together,
represented a global view of the behavior of markets and economies. The equity
performance series (in particular, the return series) were filtered for completeness
of series, length of series, dead equities, and possession of liquidity. This latter
property was determined with an approach developed by the researcher that is now
being used by a major financial data reporting/dissemination organization. The
number of performance series meeting the filtering criteria was 12,185. Of these,
-241-
3,000 were sampled, using a stratified random sample (proportions based on 12,185
equities) to mirror the proportions found in the larger sample.
As indicated earlier, the research started from the perspective that it is worthwhile
to examine the behavior of the tails. While there are results (cited earlier in the
document) that take proposed distribution for returns and that examine the tails or
the maxima (minima), the research examined the distribution of tail values directly,
using an area of active research that pursues these studies—extreme-value theory.
The challenge, however, was not the examination of tail behavior as a set of
univariate series but to combine the univariate distributions to form a multivariate
series, since there does not appear to exist a method for combining the univariate
series that is reliable, applicable across a range of data domains, and supported by
good diagnostic techniques that can be readily explained to business (nontechnical)
managers. (Copulas have over the past decade been the mechanism of choice for
quite a few analysts. However, this mechanism quite often does not meet the
criteria just laid out.) So, another means of forming a joint distribution was
proposed. This involved converting the problem from a joint distribution of
generalized extreme-values (GEVs) into a model of the distribution of return values
estimated from the GEVs. (This model may be defined as a Gaussian process that
has the advantage of following a multivariate (log normal) Gaussian distribution,
(after log transform) with a mean and variance dependent upon the ancillary data
-242-
characteristics of equities and a variance that is a function of the estimated GEV
parameters.)
In setting up the analysis the researcher, based upon several preliminary studies,
decided to use a block maximum method (with a block being a trading week) to
create a minima extreme series for each of the equities. In an earlier experiment
which examined block maxima for one month and six month blocks as well as
peaks-over-threshold (POT) mechanisms, it was found that the estimated
parameters for the block maximum model yielded cleaner and more rapid
convergence than POTs or Poisson-Generalized Pareto Distributions (P-GPDs).
Further examination of block maxima-based models showed that the parameters of
distributions of blocks of different lengths are functionally related and use of a
week block was selected because it yields the maximum number of observations
with which to estimate parameters as well as provides an easier interpretation from
a domain perspective. For ease of computation the analysis was changed from an
analysis of extreme minima to an analysis of extreme maxima by multiplying the
series by -1.
From the review of GEV models, those parameterized by only one value for each
of the distribution parameters ( µ,σ and ξ ) are labeled as static or non-time varying
GEVs. Given the nature of the subject under study, there was ample reason to
-243-
believe that a time-varying set of parameters might be a better fit to the data. To
this end, a model was adopted wherein the GEV parameters were estimated as
linear functions of covariates. The covariates examined included simple linear
trends (time’s arrow), constants, periodic (trigonometric) functions, and an
algorithmically selected subset (44 series) of the economic and market time series
alluded to earlier in the summary. The trigonometric functions proved to yield no
significant coefficients. The constant-value estimates for the parameters were
significant in the overwhelming percentage of samples of the maxima series. Using
a constant only was equivalent to a static model, so consistency was assumed as
being an element in the linear covariate for all samples. This strategy had the added
advantage of creating an embedded model structure by looking at the efficiency of
models containing economic and financial covariates. Examining the utility of
these economic and financial covariates proved far more exhausting. Uncertainty
on the manner in which the covariate was to enter the model led to the application
of time lags and aggregation factors to covariates, so each covariate resulted in 9
values for each week, or 396 (9*44) in all. Some preliminary examinations allowed
the number to be cut down to 264. A linear relationships between covariates and
parameters was assumed and parameter estimation was set within a maximumlikelihood estimator (MLE) framework.
-244-
A (forward and backward) step-wise model was used to check over the 266
covariates for each of the three parameters (798 potential candidate covariates). A
combination of Nelder/Mead and Broyden-Fletcher-Goldfarb-Shanno (BFGS)
methods was used to search over space. Stopping rules were provided by
examination of the Bayesian Information Criterion (BIC) and likelihood-ratio tests.
Models were estimated for the 3,000 external series, using 90 processors for a
period of 10 days. The result was a linear model for each of the parameters for each
sample of block maxima datasets. The average number of covariates found in the
linear models was just under seven. A number of patterns were noted in the results
(not examined further in this research, since they did not bear directly upon the
major dissertation propositions but which do represent a research thread to examine
in the future).
Having obtained estimates of the time-varying parameters for the GEVs, these
estimates were used along a domain meaningful set of return periods to obtain
return values, using a time-varying return level computation. So, for each of the
predetermined return periods (articulated in weeks) and for each of the analysis
years from 2001 to 2007, for each security in the training set a time-varying return
value zi was computed. zi was defined such that
P [Z j > zi , j ] = 1 / Ni
-245-
(7.1)
where: Z j is the return variable describing the maxima returns for equity I,
following a time-varying EVD described above.
zi , j is the arbitrary but fixed return value for equity i and return period j .
Ni is the return period in weeks.
The return value was arrived at by using a search algorithm as described earlier.
Finally, these return value results, along with other functions of the estimated timevarying parameters and assumptions about the behavior of the solution space in the
neighborhood of the selected solution, were used to compute the measurement or
nugget variance for each of the equities for each year by return period.
Using these results and the ancillary data set, a multivariate model to describe the
behavior of any given set of return levels was defined. The model was adapted into
the form of a Gaussian process. The mean of the process comprised fixed effects
formed by market capitalization, stock exchange, and market sector. These fixed
effects were selected as the result of a step-wise (forward and backward) modeling
analysis performed over the entire set of ancillary variables, both as main effects
and with some interaction. BIC was used as the test statistic and selection criterion.
Two random effects were identified and incorporated in the form of covariance
structures (assumed to be independent of one another). These included a covariance
structure based upon market capitalization, time, and the nugget variance. The
model diagnostics run on “studentized” residuals showed no discernible deficiency
in the model against a normal-distribution hypothesis. The model was also applied
-246-
to a holdout sample of 100 equities. The coefficients generated by the training set
were those used in the test-set model. In other words, the coefficients were not reestimated for the test-set model. Once again, the studentized residuals failed to
reject either the normal-distribution hypothesis or the hypothesis that the
distributions of the training and test sets were identical.
This model provides an alternative means of describing the joint uncertainty,
particularly in the downside tails, of equity return distributions. It is believed the
model and its components form the basis for a number of lines of inquiry and
applications. Future research to which this model might be applied is discussed in a
later section of this Chapter (Chapter 7, Section 7.5). This dissertation concludes
with an examination of portfolio construction and what, if anything, these results
add to the performance of this activity.
Construction of portfolios is an important activity in the financial domain; it may
even be argued that constructing portfolio and trading strategies is central to the
financial investment function. The methodologies for constructing portfolios are
abundant; however, in this research it was examined whether an element as
represented by the connected concept of time-varying GEVs and the return values
and the nugget variation can add something to the description of the tail variation
and in turn to the definition of financial portfolios. To this end the nugget variances
-247-
of a series of financially meaningful return periods for approximately 3,000
equities for each year from 2001-2007 were computed. These nugget covariances
were combined with the standard or base covariance. In this research that
combination was a rather naïve addition operation that certainly could be expanded
upon, as will be described in the section covering further research. Nonetheless, the
set of covariances was used to form portfolios by means of mean-variance
optimization (MVO)—long considered the gold standard of optimization for this
application.
Upward of 600 portfolios for each combination of year- and return period-based
covariance were formed on the same set of equities, which were reduced in number
using a variant of the high-grading methodology described earlier (see Chapter 3,
Section 3.7.1). For purposes of comparison MVOs also were run using the base
covariance structure. The output of the analysis was an example of using wellknown financial management strategies: invest and hold—wherein an investor
invests in a portfolio and leaves the portfolio untouched over a long investment
horizon, and annual rebalancing—wherein the investor reexamines and modifies
the portfolio on an annual basis. An “in-sample/out-of-sample” analysis
construction was utilized, so that the overall analysis was equivalent to an actual
investment strategy and behavior. At each timeframe the computations used no
information that would have been considered as occurring in advance of the
-248-
timeframe. For each of these analysis methods the Sharpe portfolio was used for
the comparisons. The Sharpe portfolio is well-defined and a commonly used
benchmark and provides a well known point of reference. With respect to the
invest-and-hold-approach portfolio, all the longer-return period strategies
outperformed the base strategy. The annual-rebalancing strategy generated even
greater outperformance for the same return period models. The annual-rebalancing
strategy required a look-ahead element (utilization of a forward-looking market
covariate) to be fully effective. In this case some success was realized using the
measure of volatility commonly known as the VIX. (This is an area the researcher
believes will yield clearly improved results with further research.)
7.3
Conclusions
Many of the conclusions arising from this research flow directly from the
Summary. The research has established a platform for inquiry into and further
characterization of the uncertainty of financial returns. This platform is built upon
the previously unused timbers of time-varying GEV distributions and the related
species of time-varying return values. The platform provides a means for
incorporating into the modeling process a substantial portion of the variability and
value of globally recognized financial measures. In this regard the GEV parameters
tend to be a function of time-leading global equity indices, followed by wellobserved longer-term sovereign and investible-grade corporate/nongovernment
-249-
debt interest rates. The return values generated from the time-varying GEVs are
able to be modeled jointly as a Gaussian Process using fixed and random elements
based on industrial sector, exchange, and market capitalization. These results from
a finance perspective could be interpreted as a new factor model that conforms to
the results of earlier investigators, while adding new dimensions in terms of the
specific forms and the manner in which the model is structured. The good-quality
diagnostics generated from this model suggest the model should be investigated as
an alternative to using copulas to understand joint equity behavior, particularly with
respect to tail uncertainty of returns in general and the joint probability of extreme
returns in specific. Finally, the platform has provided a launching pad for a new
means of describing return uncertainty that could be incorporated into the
formation of financial portfolios, leading to realization of better overall
performance.
7.4
Unique Aspects Of The Research
The research described herein has a number of novel and unique aspects:
1. Breadth of the data inasmuch as the dataset example used global equities on
a heretofore unexamined scale.
2. Use of return values as the dependency function for building a joint
distribution of tail behavior of equity returns.
-250-
3. Development and subsequent patenting of an application to deal with the
problem of the number of properties being greater than the number of
samples.
4. Application in finance of a complete time-varying extreme-value approach,
including fitting model parameters using covariates and estimation of timevarying return values. (Such an approach is much more appropriate than the
static or non-time-varying approaches used previously.)
5. Breadth of financial covariates examined as potential driving variables to
describe the behavior of the extreme-value parameters.
6. Development of a new liquidity filter now being used in the industry to
filter out illiquid equities from sets of candidate equities in constructing
portfolios.
7. Development of a new portfolio generation mechanism using the Omega
statistic.
8. Creation of a new factor model for describing joint tail behavior of equities.
9. A new portfolio construction method (to augment a well-known existing
technique), using an estimate of risk that captures behavior in a more
detailed manner.
10. Introduction of a signal to help the investor choose among a set of return
period-driven covariance structures.
-251-
7.5
Future Research
In earlier Sections of this Chapter (Chapter 7, Sections 7.2 and 7.3) the reader was
provided with a sense that the line of inquiry presented herein has great potential
for very substantial expansion. In fact, the researcher believes what has really been
developed in this research is the skeleton or support structure on which a number of
potentially profitable inquiries may be built. In this section some major threads of
possible future research based on the present study will be enumerate with brief
explanation:
1. Parameters of the EVDs and the covariates will undoubtedly need to be
refined, if not revisited in very substantial proportion. This activity will lead
to at least two threads of inquiry: first, an improvement in the efficiency of
examining the covariate space and second, confirming and deepening the
understanding of which covariates are driving the parameters. In fact, the
value of adding latents (latent variables) such as business cycles,
geopolitical regimes, and stochastic volatility to the model should be
examined.
2. Combining covariance structures in the present research was accomplished
through simple addition of the covariances. Other methods worth
investigating, include incorporating the optimization-weighted version of
the covariances and using more than two covariances or moments other than
just the second moment to capture behavior in tail uncertainty.
-252-
3. The researcher initially sought to expand the factor model to include factors
that have stronger spatial or temporal elements. An examination of the
impact of temporal and spatial covariance (the latter in a more general sense
than just geography) was conducted However, the results of this
examination (not reported on herein) were either nonexistent (for some
factors examined) or were confusing. It is the researcher’s view that the
direction of investigation should include more careful delineation of factors,
including more selective use of greater factor granularity as well as adding
other conditioning variables, including latent effects such as statistic
volatility, are amongst the directions to extend the model into the realm of
spatial extreme-value models.
4. The value to the investor of a signal suggesting the return-maximizing
covariance structure to select, was demonstrated. This signal would be used
by an investor as part of a periodic rebalance strategy. The research needs
to be extended to improve upon the signal and its look-ahead capability.
5. At the outset of this research suggestions were made to the investigator that
he might try to establish models to express the manner in which a financial
storm ripples through the global financial system. (In climatology this is
often defined in terms of a distinction between models that describe how
weather propagates versus models that describe climatological
relationships. It also may be thought of as the model impacts of introducing
-253-
shorter-term innovations versus long-term expected behavior.) The
researcher did not achieve the goal of offering a “financial weather” model,
but he did develop tools and insights (unreported herein), leading him to
believe that a rudimentary form of the model is feasible, along with a
direction in which to pursue it.
6. Finally, if these models are to be used ultimately as part of a financial
system to make decisions about the direction of the research, much of the
modeling and models will need to be made more efficient and easier to
modify than the set of models on which this research rests.
The researcher believes the material presented herein represents the basis for a set
of research threads that will last a minimum of the next five to seven years and will
ultimately yield a greater understanding of the character of risk in designing
financial investment strategies.
-254-
Appendix A. List of Countries, Stock Exchanges, Industries
And Sectors Used In This Research
Table A.1 Countries used in this research.
Country
Argentina
Australia
Austria
Bahrain
Belgium
Belize
Benin
Bermuda
Botswana
Brazil
British Virgin Islands
Canada
Cayman Islands
Chile
China
Colombia
Croatia
Cyprus
Czech Republic
Denmark
Ecuador
Egypt
Estonia
Finland
France
Gabon
Germany
Greece
Guernsey
Hong Kong
Hungary
Iceland
Country
India
Indonesia
Ireland
Isle Of Man
Israel
Italy
Ivory Coast
Jamaica
Japan
Jersey
Jordan
Kenya
Latvia
Lebanon
Liberia
Lithuania
Luxembourg
Malaysia
Malta
Mauritius
Mexico
Monaco
Morocco
Namibia
Netherlands
Netherlands Antilles
New Zealand
Nigeria
Norway
Oman
Pakistan
Panama
-255-
Country
Papua New Guinea
Peru
Philippines
Poland
Portugal
Puerto Rico
Romania
Russia
Saudi Arabia
Scotland
Senegal
Singapore
Slovak Republic
Slovenia
South Africa
South Korea
Spain
Sri Lanka
Sweden
Switzerland
Taiwan
Thailand
Trinidad And Tobago
Tunisia
Turkey
United Arab Emirates
United Kingdom
United States
Venezuela
Zambia
Zimbabwe
Table A.2 Stock exchanges used in this research.
Exchange
AMEX
Amman
ASX National
Athens
Bangkok
Bangkok Alien
Berlin
Berne
Bogota
Bombay
Bratislava
Bucharest
Budapest
Buenos Aires
Cairo
Canadian Venture
Exchange
Caracas
Casablanca
Cats
Colombo
Copenhagen
Dubai Financial
Market
Dusseldorf
Euronext Belgium
Euronext France
Euronext
Netherlands
Euronext Portugal
Frankfurt
Fukuoka
Gaberone
Granville
Exchange
Hamburg
Harare
Helsinki
Hong Kong
Irish
Italy Continuous
Jakarta
JASDAQ
Johannesburg
Kingston
Exchange
Kuala Lumpur
Lagos
Lima
Ljubijana
London
Lusaka
Osaka
Oslo
OTC
Port Louis
Port of Spain
Prague
Reykjavik
Riga
Riyadh
Russian Trading
System
Santiago
Sao Paulo
Sapporo
SEAQ
Seoul
Shanghai
Luxembourg
Madrid
Manila
Mexico City
Singapore
Stuttgart
Swiss Virt-X
SWX Swiss Exchange
Munich
Muscat
Tel Aviv
Tokyo
Nagoya
Nairobi
NASDAQ Natl Market
Toronto
Tunis
Vienna
Nicosia
Vilnius
NYSE
NYSE Arca
OFEX
OMX Exchanges
Warsaw
Wellington
Windhoek
XETRA
Zagreb
-256-
Table A.3 Industries used in this research.
Industry
Advertising/Mrketng
Services
Aerospace & Defense
Industry
Construction Materials
Industry
Forest Products
Industry
Medical Specialties
Industry
Publishing: Newspapers
Consumer Sundries
Gas Distributors
Medical/Nursing Services
Pulp & Paper
Agricultural
Commodities/Milling
Air Freight/Couriers
Containers/Packaging
Home Furnishings
Metal Fabrication
Railroads
Contract Drilling
Home Improvement Chains
Miscellaneous
Real Estate Development
Airlines
Data Processing Services
Homebuilding
Alternative Power
Generation
Aluminum
Department Stores
Real Estate Investment
Trusts
Recreational Products
Discount Stores
Hospital/Nursing
Management
Hotels/Resorts/ Cruiselines
Miscellaneous Commercial
Services
Miscellaneous
Manufacturing
Motor Vehicles
Apparel/Footwear
Drugstore Chains
Household/Personal Care
Movies/Entertainment
Restaurants
Apparel/Footwear Retail
Electric Utilities
Industrial Conglomerates
Multi-Line Insurance
Savings Banks
Auto Parts: OEM
Electrical Products
Industrial Machinery
Office Equipment/Supplies
Semiconductors
Automotive Aftermarket
Electronic Components
Industrial Specialties
Oil & Gas Pipelines
Beverages: Alcoholic
Electronic
Equipment/Instruments
Electronic Production
Equipment
Electronics Distributors
Information Technology
Services
Insurance Brokers/Services
Oil & Gas Production
Services to the Health
Industry
Specialty Insurance
Oil Refining/ Marketing
Specialty Stores
Integrated Oil
Internet Retail
Building Products
Electronics/Appliance
Stores
Electronics/Appliances
Oilfield
Services/Equipment
Other Consumer Services
Internet Software/Services
Other Consumer Specialties
Specialty
Telecommunications
Specialty
Telecommunications
Steel
Cable/Satellite TV
Engineering & Construction
Investment Banks/Brokers
Other Metals/Minerals
Casinos/Gaming
Environmental Services
Investment Managers
Other Transportation
Telecommunications
Equipment
Textiles
Catalog/Specialty
Distribution
Chemicals: Agricultural
Chemicals: Major
Diversified
Chemicals: Specialty
Finance/Rental/Leasing
Packaged Software
Tobacco
Financial Conglomerates
Financial
Publishing/Services
Food Distributors
Investment Trusts/Mutual
Funds
Life/Health Insurance
Major Banks
Personnel Services
Pharmaceuticals: Generic
Tools & Hardware
Trucking
Major Telecommunications
Pharmaceuticals: Major
Coal
Food Retail
Managed Health Care
Pharmaceuticals: Other
Trucks/Construction/Farm
Machinery
Unknown
Commercial Printing/Forms
Food: Major Diversified
Marine Shipping
Precious Metals
Water Utilities
Computer Communications
Food: Meat/Fish/Dairy
Media Conglomerates
Property/Casualty Insurance
Wholesale Distributors
Computer Peripherals
Food: Specialty/Candy
Medical Distributors
Publishing:
Books/Magazines
Wireless
Telecommunications
Beverages: Non-Alcoholic
Biotechnology
Broadcasting
Computer Processing
Hardware
-257-
Regional Banks
Table A.4 Sectors used in this research.
Factset Sectors
Commercial Services
Communications
Consumer Durables
Consumer NonDurables
Consumer Services
Distribution Services
Electronic
Technology
Energy Minerals
Finance
Health Services
Health Technology
Industrial Services
Miscellaneous
Non-Energy
Minerals
Process Industries
Producer
Manufacturing
Retail Trade
Technology Services
Transportation
Utilities
-258-
Appendix B. Detailed Analyses Of The Covariate Models
To analyze results wherein 795 covariates could have been chosen, the investigator
first remapped the results to a series of simple binary values. These included the
presence or absence of a covariate in the model for each equity having the
following attributes:
1. Was in the function for µˆ,σˆ, or ξˆ
2. Was contemporaneous or lagged, given 1
3. Was one-week or two-week lagged, given 1
4. Was maximum or minimum, given 1
Additional indices were built upon factor analyses of the covariates at the level of
the individual covariate. The factor analyses of contemporaneous covariates yielded
distinct groupings (see Tables B.1 and B.2). Using these groupings as a guide, two
set of indices were created for each time aggregation: One set was based upon
simple membership or non-membership in a multi-covariate factor analytic
grouping. The second set was a more complex mapping in which multi-covariate
groups were individually coded, while single-covariate groups were combined
under one code.
-259-
Tables B.1-B.5 summarize the overall distribution of coefficients from the timevarying GEV analysis. Highlights from these tables include:
1. There was an average of 6.99 covariates per model (this included the three
constants, one each for µ , σ and ξ forced into each model). Of these, about
half were financial/economic covariates rather than constants or trends.
2. Of the financial/economic covariates, the plurality were used in the
µ̂ , followed by ξˆ and σˆ .
estimation of µ
3. Time-contemporaneous covariates were less than half of the covariates
compared to time-lagged covariates (46.6% against 53.4%). Events
occurring in the previous two weeks had slightly more influence on
parameters than events occurring in the same week.
4. Covariates associated with lags of one or two weeks were equally divided,
suggesting that, from an overall perspective, the existence of a lead-lag
relationship, but there is not a strong differentiation by time frame for at
least the first two weeks.
5. The minimum values of the covariate in the block (NM) more frequently
entered the model than the maximum values of the covariate in the block
(PM), 62.1% versus 37.9%, respectively.
6. Covariates associated with a group were more frequently in the model than
independent covariates and formed their own factor.
-260-
Table B.1 Covariates grouped as per factors developed using a principalcomponents extraction and a varimax rotation, groups (factors)
contain more than one covariate. (More complete covariate names
[where appropriate] may be found in Table 3.12; NM = minimum
value in time block, PM = maximum value in time block.)
Group 1
Group 2
DJ Ind Average P IX
NM
DJ Ind Average P IX
PM
Group 3
Spread: 30 Yr Mtg-10 Yr
(CM)
NASDAQ 100 P IX
S P 1500 Supercomp TR
IX
NM
NASDAQ 100 P IX
PM
And 10 Year Swap Rate
PM
NM
S P 1500 Supercomp TR IX
PM
US Interest 10 Yr
PM
NM
Russell 1000
NM
Russell 1000
PM
US Interest 20 Yr
PM
Russell 2000
NM
Russell 2000
PM
Moody aaa
PM
Russell 3000
NM
Russell 3000
PM
Moody bbb
PM
Russell Mid Cap
NM
Russell Mid Cap
PM
Euro STOXX 50
NM
NASDAQ Composite Index
PM
France CAC 40
NASDAQ Composite
Index
NM
NM
Group 4
Libor Three Months
NM
Group 5
US Interest 10 Yr
NM
Libor Six Months
NM
US Interest 20 Yr
NM
Moody aaa
Moody bbb
Spread 30 Yr Mtg-10 Yr
(CM)
NM
NM
Group 7
Group 6
10 Yr (CM)
Spread 15 Yr Mtg-7 Yr
(CM)
Spread Invest Grade-5Yr
(CM)
NM
PM
PM
PM
Group 8
Group 9
Austria ATX
NM
Spread 15 Yr Mtg-7 Yr (CM)
NM
Libor Three Months
PM
Austria WBI Benchmarked
NM
Invest Grade-5 Yr CM
10 Yr (CM)
NM
PM
Libor Six Months
PM
Group 12
Aust Bank accepted Bills
180 day
Aust Treasury Bonds 2
years
NM
Group 10
Group 11
Austria ATX
PM
15 Yr Mtg
NM
Austria WBI Benchmarked
PM
15 Yr Mtg
PM
Group 13
Aust Bank accepted Bills
180 day
Aust Treasury Bonds 2
years
Group 14
PM
Euro STOXX 50
PM
PM
France CAC 40
PM
-261-
NM
Table B.2 Covariates grouped as per factors developed using a principalcomponents extraction and a varimax rotation, groups (factors) contain
only one covariate. (More complete covariate names [where
appropriate] may be found in Table 3.12; NM = minimum value in
time block, PM = maximum value in time.)
K OSPI IX
C hina Shanghai SE
C om posite
V XO
Japan N ikkei Average 225
Benchm arked
U S Interest 6 m o
D lr Fut Ind
D lr Fut Ind
3M Euro D ollars Fut
Ungrouped Covariates
Brazil Bovespa
NM
NM
NM
PM
CBO E U S M arket Volatility
Fed_Fund_Rate
NM
NM
PM
PM
PM
NM
PM
KOSPI IX
Hang Seng Hong K ong
One Yr Interest Rate Sw ap
And 10 Y ear Swap Rate
Hang Seng Hong K ong
PM
PM
PM
NM
NM
C an V39072
Fed_Fund_R ate
PM
PM
PM
NM
BO E IU DV CDA
A ust Bank accepted Bills
30 day
Brazil Bovespa
PM
CBO E U S M arket Volatility
Lehm an M uni Sec Tr Inv
Aust Bank accepted Bills 30
day
NM
PM
NM
NM
V XO
NM
BO E IU DV CDA
C an V39072
C hina Shanghai SE
C om posite
Japan N ikkei Average 225
Benchm arked
Lehm an M uni Sec Tr Inv
NM
NM
One Yr Interest Rate Sw ap
3M Euro Dollars Fut
Spread 3M TB - 3M Euro
Dlrs
Spread 3M TB - 3M Euro
Dlrs
FTSE 100 P IX
PM
NM
PM
FTSE 100 P IX
PM
NM
PM
US Interest 6 m o
NM
-262-
PM
NM
Table B.3 Overall tally of covariates entering models, broken down by
parameters and broad themes.
Name
Mu Count
All Parms (Exclude Contemporaneous
Constants)
No Trend (NT)
4,377
2,369
Lags NT
1,793
Trends
215
Mu Average/Equity
Propor'n of Col
1.458
0.373
0.789
0.474
0.597
0.313
0.072
0.209
Sigma Count
Sigma
Average/Equity
Propor'n of Col
3,283
1,230
1,379
674
1.094
0.279
0.410
0.246
0.459
0.241
0.225
0.655
Xi Count
Xi Average/Equity
Propor'n of Col
4,089
1.362
0.348
1,401
0.467
0.280
2,548
0.849
0.445
140
0.047
0.136
Column Total
Average Over All
Equities
11,749
5,000
5,720
1,029
3.669
1.562
1.786
0.321
Proportion All
(exclude constant)
0.426
0.487
0.088
Proportion All
(exclude constant,
trend)
0.466
0.534
-263-
Table B.4 Tally of covariates entering models, broken down
by parameters and time lag, not including
contemporaneous observations.
Name
Lag 1 NT
Mu Count
944
Mu
Average/Equity
0.314
Propor'n of Col
0.330
Sigma Count
Sigma
Average/Equity
Propor'n of Col
Xi Count
Xi
Average/Equity
Propor'n of Col
Lag 2 NT
849
0.283
0.297
698
681
0.233
0.244
0.227
0.238
1,215
1,333
0.405
0.425
0.444
0.466
Column Total
Average Over
All Equities
2,857
2,863
0.892
0.894
Proportion All
(exclude
constant)
0.499
0.501
-264-
Table B.5 Tally of covariates entering models broken down by parameters
versus aggregate function and covariate groupings. (Legend: PM
= maximum, NM = minimum; NG = covariate is a singleton in
its factor, GR = covariate highly loads on a factor possessing
more than one covariate; NT means that trend covariates were
not included in the counts)
Name
Mu Count
Mu
Average/Equity
Propor'n of Col
Sigma Count
Sigma
Average/Equity
Propor'n of Col
PM NT NM NT
1,259
2,903
Total
NG
1,821
Total
GR
2,341
0.419
0.310
0.967
0.436
0.607
0.398
0.780
0.381
1,035
1,574
1,072
1,537
0.345
0.255
0.524
0.236
0.357
0.234
0.512
0.250
Xi Count
Xi
Average/Equity
Propor'n of Col
1,768
2,181
1,686
2,263
0.589
0.435
0.727
0.328
0.562
0.368
0.754
0.369
Column Total
4,062
6,658
4,579
6,141
Proportion All
(exclude
constant, trend)
0.379
0.621
0.427
0.573
-265-
Table B.6 provides a breakdown of the model covariates by group affiliation,
provided that the covariate is a member of a group. Note that the last line of each
subsection of the table indicates in descending order the rank of the group in terms
of the amount of variation explained by the factor forming the group. There appears
to be a very strong relationship between the number of covariates from a specific
group and the variation explained by the group. This suggests to the investigator
that there is strong explanatory power in the model being postulated.
-266-
-267128
0.378
339
0.055
6
Column Total
Proportion All (exclude
constant, trend)
SS Covariates Position
1
SS Covariates Position
Xi Count
Propor'n of Col
0.198
Proportion All (exclude
constant, trend)
87
0.257
1,216
Column Total
Sigma Count
Propor'n of Col
438
0.360
Xi Count
Propor'n of Col
Total GR 8
124
0.366
317
0.261
Sigma Count
Propor'n of Col
Name
Mu Count
Propor'n of Col
Total Group
(GR) 1
461
0.379
Name
Mu Count
Propor'n of Col
10
0.035
218
89
0.408
46
0.211
Total GR 9
83
0.381
2
0.154
946
354
0.374
224
0.237
Total GR 2
368
0.389
11
0.038
231
87
0.377
55
0.238
Total GR 10
89
0.385
3
0.122
748
280
0.374
175
0.234
Total GR 3
293
0.392
8
0.045
277
106
0.383
83
0.300
Total GR 11
88
0.318
5
0.039
241
80
0.332
74
0.307
Total GR 4
87
0.361
12
0.044
270
100
0.370
66
0.244
Total GR 12
104
0.385
4
0.098
600
228
0.380
141
0.235
Total GR 5
231
0.385
13
0.042
255
88
0.345
73
0.286
Total GR 13
94
0.369
7
0.055
335
111
0.331
90
0.269
Total GR 6
134
0.400
38
0.040
247
87
0.352
61
0.247
Total GR 14
99
0.401
9
0.035
218
87
0.399
45
0.206
Total GR 7
86
0.394
Table B.6 Tally of covariates entering models, broken down by parameters versus
covariate group/factor.
An additional analysis was performed to examine the relationship between the
time-varying models and the factors formed from the ancillary variables described
in Chapter 3, namely market value (MV), sector (Sect), trade region (Tr.reg),
exchange region (Reg), and exchange (Ex). The analysis was set up in terms of
factors formed from the values of the ancillary variables versus what the author
calls dependent variables. The dependent variables were formed for certain
modeling characteristics by the summations of the tallies created from the
presence/absence data of the model covariates just described. Summations of the
tallies were formed for characteristics, creating the dependent variables:
•
Contemporaneous covariates (Con_NT)
•
Time-lagged covariates (Lags_NT)
•
Covariates that were members of a group of size greater than 1 (Grp)
•
Covariates that were members of a group of size = 1 (NoG)
•
Maximum extreme value of the covariate (Max.ex)
•
Minimum extreme value of the covariate (Min.ex)
The eight values of these dependent variables (beyond constant and trend terms)
were:
1. “,,,” = no covariates
2. “m,,” = µ covariate(s) only (meaning covariates present only in
functions of µ )
-268-
3. “,s,” = σ covariate(s) only
4. “,,x” = ξ covariate(s) only
5. “m,s,” = µ and σ covariates only
6. “m,,x” = µ and ξ covariates only
7. “,s,x” = σ and ξ covariates only
8. “m,s,x” = µ , σ and ξ covariates
A cross-tabulation was created and a contingency-table analysis was performed.
The examination tested for the presence of an association between the values of a
specific factor and a dependent variable. The null hypothesis for each of these tests
was the lack of presence of an association, i.e., that the factor and dependent were
independent of one another. This would indicate that the observed tallies in the
cells were not significantly different than expected or the expected value was
formed by the sample size adjustment of the product of the marginal probabilities.
The alternative hypothesis was that the factor and the dependent variable were not
independent. Of course, a rejection of the null hypothesis did not imply the specific
form of the association or, even more interesting, that the association was
meaningful in the present context. (A further examination of the results is needed
for that determination.) A Bonferroni adjustment was made to the 30 chi-square
-269-
tests to control the overall α error level. Table B.7 reports on what the investigator
interpreted as the significant results of the testing.
Table B.7 Results from chi-square test for the presence of association
between the values of the stated factor and the dependent
variable.
Factor
Ex
Ex
Ex
Ex
MV
MV
MV
MV
Reg
Reg
Reg
Sect
Sect
Tr.reg
Tr.reg
Tr.reg
Dependent P(X>stat)
Con_NT
0.0005
Grp
0.0005
Min.ex
0.0005
NoG
0.0005
Con_NT
0.0005
Grp
0.0005
Min.ex
0.0005
NoG
0.0005
Con_NT
0.0005
Grp
0.0010
Min.ex
0.0005
Con_NT
0.0010
Min.ex
0.0005
Con_NT
0.0005
Grp
0.0005
Min.ex
0.0005
It is of interest that the significant dependent variables are Con_NT, Grp, Min.ex,
and at a lesser frequency NoG. Neither Lag_NT nor Max.ex appear as significant
dependent variables.
-270-
Tables B.8-B.11 were computed by creating a directional mapping of the residuals
or differences between the expected cell counts under the independence
assumption and the observed cell counts. A negative residual count meant that the
expected count was greater than the observed count, and vice versa. With
directional mapping the residuals -1, 0, or 1 were mapped based on count-off
values. From examining these tables the investigator suggests the following:
•
Micro-cap firms tend to be modeled by less complex models; models for
small-, mid-, and large-cap firms tend toward greater complexity. (Since
only one mega-cap firm was in the sample, it was dropped from the
analysis.)
•
The models associated with trade region and exchange region were very
similar in their gross patterns of covariate usage, and the results suggest we
should work on investigating the possibility of dropping one or the other
from further analysis.
•
Africa, Eastern Europe, the Middle East, South America, and to a lesser
extent North America can—at the gross level of this analysis—be modeled
by less complex models.
•
The Pacific Rim and Western Europe, in the same sense as the previous,
tend to be more complex models.
-271-
Table B.8 Directional representation of residuals from contingency table
analyses between the Market Value factor and significant dependent
variables as listed in Table B.7.
large
micro
mid
small
,,
0
+
-
,,x
+
-
Group Membership
,s,
,s,x
m,,
0
0
0
0
0
0
+
0
0
+
large
micro
mid
small
,,
+
-
,,x
+
0
No Group Membership
,s,
,s,x
m,,
0
0
0
0
0
0
0
+
0
0
+
m,,x
0
0
0
m,s,
+
+
+
m,s,x
+
+
+
large
micro
mid
small
,,
+
-
,,x
+
-
Contemporaneous
,s,
,s,x
m,,
0
+
+
+
0
0
+
m,,x
+
+
+
m,s,
+
+
+
m,s,x
+
+
+
large
micro
mid
small
,,
+
-
,,x
+
-
Minimum Aggregation
,s,
,s,x
m,,
0
+
+
0
+
+
m,,x
0
0
0
0
m,s,
+
+
+
m,s,x
+
+
+
-272-
m,,x
0
0
+
0
m,s,
+
+
+
m,s,x
+
+
+
Table B.9 Directional representation of residuals from contingency table analyses
between the Trade Region factor and significant dependent variables as
listed in Table B.7.
Africa
Central Asia
Eastern Europe
Middle East
North America
Pacific Rim
South America
Western Europe
Contemporaneous Timeframe
,,
,,x
,s,
,s,x
m,,
+
0
0
+
0
0
+
+
0
0
0
+
0
0
+
0
0
+
0
0
0
0
+
0
0
0
0
0
0
0
+
+
+
-
m,,x
0
0
0
0
0
0
m,s,
0
0
+
0
+
m,s,x
0
0
+
+
Africa
Central Asia
Eastern Europe
Middle East
North America
Pacific Rim
South America
Western Europe
,,
0
0
0
+
0
+
0
0
Grouped Covariates
,,x
,s,
,s,x
0
0
+
+
0
0
0
+
+
0
0
0
0
0
0
0
+
0
+
0
0
m,,
0
0
+
+
0
-
m,,x
0
0
0
+
0
0
m,s,
0
0
0
0
+
m,s,x
0
0
+
0
+
Africa
Central Asia
Eastern Europe
Middle East
North America
Pacific Rim
South America
Western Europe
,,
0
0
0
0
0
+
0
0
Minimum Aggregation
,,x
,s,
,s,x
m,,
+
+
0
+
+
0
+
0
+
0
+
0
0
+
0
0
0
+
0
0
+
+
0
0
+
0
-
m,,x
0
0
0
+
0
-
m,s,
0
+
0
+
m,s,x
0
0
0
+
0
+
-273-
Table B.10 Directional representation of residuals from contingency table
analyses between the Exchange Region factor and significant
dependent variables as listed in Table B.7.
Africa
Central Asia
Eastern Europe
Middle East
North America
Pacific Rim
South America
Western Europe
Africa
Central Asia
Eastern Europe
Middle East
North America
Pacific Rim
South America
Western Europe
Africa
Central Asia
Eastern Europe
Middle East
North America
Pacific Rim
South America
Western Europe
,,
+
0
+
0
0
0
0
-
Contemporaneous Timeframe
,,x
,s,
,s,x
m,,
0
0
+
0
+
+
0
0
+
0
0
0
0
+
0
0
0
+
0
0
0
0
+
+
+
-
m,,x
0
0
0
0
0
0
0
m,s,
0
0
+
0
+
m,s,x
0
0
+
+
,,
0
0
0
0
0
+
0
0
Grouped Covariates
,,x
,s,
,s,x
0
0
+
0
0
0
0
+
+
0
0
+
0
0
0
0
+
0
+
0
0
m,,
0
0
0
+
0
-
m,,x
0
0
0
+
0
-
m,s,
0
0
0
0
0
+
m,s,x
0
0
+
0
+
Minimum Aggregation
,,x
,s,
,s,x
m,,
+
+
0
+
+
0
+
0
+
0
0
+
0
+
0
0
0
+
0
0
0
+
+
0
0
+
0
-
m,,x
0
0
0
+
-
m,s,
0
+
0
+
m,s,x
0
0
0
0
+
,,
0
0
0
+
0
+
0
0
-274-
Table B.11 Directional representation of residuals from contingency table analyses
between the Sector factor and significant dependent variables as listed
in Table B.7.
Contemporaneous
Commercial Services
Communications
Consumer Durables
Consumer Non-Durables
Consumer Services
Distribution Services
Electronic Technology
Energy Minerals
Finance
Health Services
Health Technology
Industrial Services
Miscellaneous
Non-Energy Minerals
Process Industries
Producer Manufacturing
Retail Trade
Technology Services
Transportation
Utilities
,,
0
0
0
0
0
0
0
0
0
+
0
0
+
0
0
0
0
0
0
Commercial Services
Communications
Consumer Durables
Consumer Non-Durables
Consumer Services
Distribution Services
Electronic Technology
Energy Minerals
Finance
Health Services
Health Technology
Industrial Services
Miscellaneous
Non-Energy Minerals
Process Industries
Producer Manufacturing
Retail Trade
Technology Services
Transportation
Utilities
,,
+
+
0
0
0
0
0
0
+
0
+
+
0
0
,,x
+
0
0
0
0
0
+
0
0
0
0
0
+
0
0
+
0
,s,
0
0
0
0
0
0
+
0
0
+
+
0
0
0
0
0
,s,x
0
+
0
+
0
+
+
0
0
0
0
0
0
0
+
0
0
0
m,,
0
0
+
+
0
0
0
0
0
0
0
+
0
0
0
0
0
m,,x
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
m,s,
0
0
0
+
0
0
+
0
0
+
+
0
0
0
0
+
m,s,x
0
+
0
+
0
0
0
0
+
0
0
+
+
+
m,,
0
+
+
+
0
0
0
0
0
0
+
+
+
0
0
m,,x
0
+
+
+
+
0
+
+
0
+
-
m,s,
+
0
0
0
+
0
+
0
0
+
+
+
+
0
m,s,x
0
+
0
0
+
+
+
+
+
+
Minimum Aggregation
,,x
+
0
+
0
+
+
0
0
+
+
0
-
,s,
0
0
+
+
0
0
0
+
0
0
+
0
0
+
0
0
-275-
,s,x
0
+
0
+
+
+
+
+
+
+
0
0
+
0
0
0
Since the sector tables contain more data than the other tables, to gain additional
insights a Q-mode factor analysis (Rummel [1970]) was performed, the results of
which are presented in Table B.12. The two results, while built on fundamentally
the same data, represent somewhat different structures and different degrees of
sensitivity. These two results, as well as the others, will be used to guide the
construction of the time/space models in the next section (see Table B.13).
Table B.12 Q-Mode factor analysis—cases represented by dependent variable
minimum aggregation (Min.ex) and properties represented by the data
factor sector (Sect).
Fact.1.names
Fact.2.names
Industrial
Services
1
Utilities
1
Miscellaneous
1
Non-Energy
Minerals
Consumer
Durables
Producer
Manufacturing
Consumer
Services
Process
Industries
Fact.3.names
Fact.4.names
1
Technology
Services
-1
-1
Communications
-1
Energy
Minerals
Electronic
Technology
1
-1
-1
-1
-1
Fact.5.names
Fact.6.names
Transportation
1
Commercial
Services
-1
Health
Technology
Distribution
Services
Fact.7.names
-1
Health Services
-1
-276-
Fact.8.names
-1
Retail Trade
-1
Table B.13
Results from Q-Mode factor analysis. Results appended to
directional representation of residuals from contingency table
between the minimum aggregation dependent variable (Min.ex) and
the sector data factor (Sect) (NSL = No substantive load, meaning
that the sector did not load highly on any rotated factor.)
Sector
Industrial Services
Miscellaneous
Utilities
Consumer Durables
Non-Energy Minerals
Producer Manufacturing
Consumer Services
Process Industries
Communications
Technology Services
Energy Minerals
Electronic Technology
Transportation
Commercial Services
Distribution Services
Health Technology
Health Services
Retail Trade
Consumer Non-Durables
Finance
,,
0
0
0
+
0
+
+
0
0
0
+
0
0
+
-
,,x
0
+
0
+
0
+
+
0
0
+
+
,s,
+
0
0
0
0
0
+
+
0
+
0
0
+
0
0
0
,s,x
0
0
0
+
0
+
0
+
0
0
+
+
+
+
+
+
-277-
m,,
0
+
+
+
+
+
+
0
0
0
0
0
0
0
0
m,,x m,s, m,s,x Facr Sign
+
+
1
1
+
+
1
1
0
+
1
1
+
1
-1
+
1
-1
+
2
1
+
0
2
-1
+
2
-1
0
3
-1
3
-1
0
0
4
1
+
+
0
4
-1
+
+
5
1
0
+
5
-1
+
0
6
-1
+
0
6
-1
0
+
7
-1
0
+
8
1
0
+
NSL
+
0
NSL
Bibliography
Affleck-Graves, J., & McDonald, B. (1989). Nonnormalities and tests of asset
pricing theories. The Journal of Finance, 44 (4), 889-908.
Answers.com. (2007). Business and finance: Market capitalization. Retrieved
November 12, 2007, from the Answers.com website:
http://www.answers.com/topic/market-capitalization?cat=biz-fin
Beirlant, J., Goegebeur, Y., Segers J., & Teugels, J. (2004). Statistics of extremes:
Theory and applications. West Sussex, England: John Wiley & Sons, Ltd.
Box, G.E.P., & Cox, D.R. (1964). An analysis of transformations (with
discussion). Journal of the Royal Statistical Society, Series B, 26 (2), 211252.
Campbell, J.Y., Lo, A.W. & MacKinlay, A.C. (1997). The econometrics of
financial markets. Princeton, New Jersey: Princeton University Press.
Casella, G. & Berger, R.L. (2001). Statistical inference. (2nd ed.). Pacific Grove,
CA: Duxbury Press.
Cebrián, A.C., Denuit, M., & Lambert, P. (2003). Analysis of bivariate tail
dependence using extreme value copulas: An application to the SOA medical
large claims database. Belgian Actuarial Bulletin, 3 (1), 33-41.
Chatfield, C. (1996). The analysis of time series: An introduction. (5th ed.).
London: Chapman and Hall.
Clark, A., & Labovitz, M. (2006, October). Securities selection and portfolio
optimization: Is money being left on the table? Paper presented at the
Conference on Financial Engineering and Applications (FEA 2006).
Cambridge, MA.
Coles, S.G. (2001). An introduction to statistical modeling of extreme values.
London: Springer-Verlag.
-278-
Coles, S.G., &. Dixon, M.J. (1999). Likelihood-based inference for extreme value
models. Extremes, 2 (1), 5-23.
Cooley, D., Nychka, D., & Naveau, P. (2006a). Bayesian spatial modeling of
extreme precipitation return levels. Retrieved January 7, 2008 from the
Colorado State University website:
http://www.stat.colostate.edu/~cooleyd/Papers/frRev.pdf
Cooley, D., Naveau, P., & Jomelli, V. (2006b). A Bayesian Hierarchical Extreme
Value Model for Lichenometry. Environmetrics, 17 (6), 555-574.
Costa M., Cavaliere, G., & Iezzi, S. (2005). The role of the normal distribution in
financial markets. Proceedings of the Meeting of the Classification and Data
Analysis Group (CLADAG) of the Italian Statistical Society (pp. 343-350).
Bologna, Italy: University of Bologna.
Davison, A.C. (1984). Modelling excesses over high thresholds, with an
application. In J.Tiago de Oliveira (Ed.) Statistical extremes and applications
(pp. 461-482). Dordrecht, The Netherlands: Reidel,
Davison, A.C. & Smith, R.L. (1990). Models for Exceedances Over
HighThresholds. Journal of the Royal Statistical Society, Series B, 52 (3),
393- 442.
Day, T.E., Wang, Y., & Xu, Y. (2001). Investigating underperformance by mutual
fund portfolios. Retrieved March 26, 2006 from the University of Texas at
Dallas website: http://www.utdallas.edu/~yexiaoxu/Mfd.pdf
Diggle, P. J., & Ribeiro, P.J. (2007). Model-based geostatistics. New York:
Springer Series in Statistics.
Dijk, V. and L. de Haan 1992. On the estimation of the exceedance probability of a
high level. In: Order statistics and nonparametrics: Theory and applications.
Sen, P.K. and I.A. Salama. Eds. Elsevier, Amsterdam. pp. 79-92.
Dufour, J.M., Khalaf, L., & Beaulieu, M.C. (2003). Exact Skewness–Kurtosis
Tests for Multivariate Normality and Goodness-of-Fit in Multivariate
Regressions with Application to Asset Pricing Models. 65 (s1), 891-906.
-279-
Efron, B., & Tibshirani, R. (1993). An introduction to the bootstrap. London:
Chapman and Hall.
Fama, E.F. (1965a). Random walks in stock market prices. Financial Analysts
Journal, 21 (5), 55-59.
Fama, E.F. (1965b). The behavior of stock-market prices. Journal of Business, 38
(1), 34-105.
Financial Dictionary. (2008). The definition of illiquidity. Retrieved February 2,
2008 from the Free Dictionary website:
http://financial-dictionary.thefreedictionary.com/Illiquid
Fischer, M., Köck, C., Schlüter, S., & Weigert, F. (2009). An empirical analysis of
multivariate copula models. Quantitative Finance, 9 (7), 839 – 854.
Fisher, R.A., & Tippett, L.H.C. (1928). On the estimation of the frequency
distributions of the largest or smallest member of a sample. Proceedings of
the Cambridge Philosophical Society (pp.180-190). Cambridge, England.
Gilleland E., & Katz, R.W. (2006). Analyzing seasonal to interannual extreme
weather and climate variability with the extremes toolkit. Poster session
presented at the 18th Conference on Climate Variability and Change, 86th
Annual Meeting of the American Meteorological Society (AMS), Atlanta,
GA.
Gilleland, E., & Nychka, D. (2005). Statistical models for monitoring and
regulating ground-level ozone. Environmetrics, 16 (5), 535-546.
Gnedenko, B. (1943). Sur La Distribution Limite Du Terme Maximum D'Une
Serie Aleatoire. The Annals of Mathematics, Series 2, 44 (3), 423-453.
Gumbel E.J. (1958). Statistics of extremes. New York: Columbia University Press.
Heffernan, J.E., & Tawn, J.A. (2004). A conditional approach for multivariate
extreme values. Journal of the Royal Statistical Society, Series B, 66 (3), 497534.
Hosking, J.R.M., Wallis, J.R., & Wood, E.F. (1985). Estimation of the generalized
extreme-value distribution by the method of probability-weighted moments.
Technometrics, 27 (3), 251-261.
-280-
Hosking, J.R.M. & Wallis, J.R. (1997). Regional frequency analysis: An approach
based on LMoments. Cambridge: Cambridge University Press.
International Monetary Fund. (2007). Chapter 1: Global Prospects and Policy
Issues. Retrieved October 28, 2007 from the International Monetary Fund
website: http://www.imf.org/external/pubs/ft/weo/2007/01/pdf/c1.pdf
Investopedia. (2007). Market Capitalization. Retrieved December 8, 2007 from
the Investopedia website:
http://www.investopedia.com/terms/m/marketcapitalization.asp
Investopedia. (2009). VIX - CBOE Volatility Index. Retrieved May 19, 2009 from
the Investopedia website: http://www.investopedia.com/terms/v/vix.asp
Jenkins, G.M. & Watts, D.G. (1968). Spectral analysis and its applications. San
Francisco: Holden-Day.
Jenkinson, A.F. (1955). The frequency distribution of the annual maximum or
minimum values of meteorological events. Quarterly Journal of the Royal
Meteorological Society, (81) 158-172.
Johnson, R.A. & Wichern, D.W. (2002). Applied multivariate statistical analysis.
(5th ed.). Upper Saddle River, New Jersey: Prentice Hall.
Jondeau, E., & Rockinger, M. (2003). Testing for differences in the tails of stockmarket returns. Journal of Empirical Finance, 10 (5), 559-581.
Kwiatkowski, D., Phillips, P.C.B., Schmidt, P., & Shin, Y. (1992). Testing the
Null Hypothesis of Stationarity against the Alternative of a Unit Root.
Journal of Econometrics, 54, 159-178.
Labovitz, M.L., Turowski, H., & Kenyon, J.D. (2007). Lipper Research Series:
High-Grading Data: Retaining Variation, Reducing Dimensionality.
Retrieved November 4, 2007 from the Lipper website:
http://www.lipperweb.com/research/fundIndustryOverview.asp
Lay, D.C. (2005). Linear Algebra and Its Applications. (3rd ed.). Boston: Addison
Wesley.
-281-
Leadbetter, M.R., Lindgren, G. & Rootzén, H. (1983). Extremes and related
properties of random sequences and series. New York: Springer-Verlag.
Leadbetter, M.R., Weissman, I., De Haan, L., & Rootzén, H. (1989). On clustering
of high values in stationary series. Paper presented at the International
Meeting on Statistical Climatology. Rotorua, New Zealand.
Ledford, A.W., & Tawn, J.A. (1996). Statistics for near independence in
multivariate extreme values. Biometrika, 83 (1), 169-187.
Le Roux, M. (2007). A long-term model of the dynamics of the S&P500 implied
volatility surface. North American Actuarial Journal, 11 (4), 61-75.
Lintner, J. (1965). The valuation of risk assets and the selection of risky
investments in stock portfolios and capital budgets. The Review of Economics
and Statistics, 47 (1), 13-37.
Madsen, H., Rasmussen, P.F., & Rosbjerg, D. (1997). Comparison of annual
maximum series and partial duration series methods for modeling extreme
hydrologic events: 1. At-site modeling. Water Resources Research, 33 (4),
747–758.
Malevergne, Y., & Sornette, D. (2001). General framework for a portfolio theory
with non-gaussian risks and non-linear correlations. Retrieved March 21,
2005 from the GloriaMundi.org website:
http://www.gloriamundi.org/ShowTracking.asp?ResourceID=453055857
Malevergne, Y., & Sornette, D. (2002). Multi-moments method for portfolio
management: Generalized capital asset pricing model in homogeneous and
heterogeneous markets. Retrieved March 21, 2005 from the Cornell
University Library website:
http://arxiv.org/PS_cache/cond-mat/pdf/0207/0207475v1.pdf
Markowitz, (H.M.) (1952). Portfolio selection. Journal of Finance, 7 (1), 77-91.
Markowitz, H.M. (1959). Portfolio selection: Efficient diversification of
investments. New York: John Wiley and Sons.
Martins, E.S., & Stedinger, J.R. (2000). Generalized maximum-likelihood
generalized extreme value quantile estimators for hydrologic data. Water
Resources Research, 36 (3), 737-744.
-282-
Martins, E.S., & Stedinger, J.R. (2001). Generalized maximum-likelihood ParetoPoisson estimators for partial duration series, Water Resources Research, 37
(10), 2551-2557.
McNeil, A., Frey R., & Embrechts, P. (2005). Quantitative risk management:
Concepts techniques and tools. Princeton, NJ: Princeton University Press.
Nelder, J.A., & Mead, R. (1965). A simplex method for function minimization.
Computer Journal, 7, 308-313.
Nelsen, R.B. (1999). An introduction to copulas. New York: Springer-Verlag.
Neter, J., Kutner, M.H., Wasserman, W., & Nachtsheim, C.J. (1996). Applied
linear statistical models. (4th ed.). Chicago: McGraw-Hill/Irwin.
NIST (National Institution of Standards and Technology). (2003). Kolmogorov
Smirnov two sample. Retrieved October 11, 2008 from the National
Institution of Standards and Technology website:
http://www.itl.nist.gov/div898/software/dataplot/refman1/auxillar/ks2samp.ht
m
Nocedal, J., & Wright, S.J. (1999). Numerical optimization. New York: SpringerVerlag.
Pickands, J. (1975). Statistical inference using extreme order statistics. Annals of
Statistics, 3 (1), 119-131.
Prescott, P., & Walden, A.T. (1980). Maximum likelihood estimation of the
parameters of the generalized extreme-value distribution. Biometrika, 67 (3),
723-724.
Prescott, P., & Walden, A.T. (1983). Maximum likelihood estimation of the
parameters of the three parameter generalized extreme-value distribution
from censored samples. Journal of Statistical Computation and Simulation,
16 (3), 241-250.
R Development Core Team. (2009). The R Stats package, R: A Language and
Environment for Statistical Computing. R Foundation for Statistical
Computing. Vienna, Austria.
-283-
Rockafellar, R.T., & Uryasev, S. (2002). Conditional value-at-risk for general loss
distributions. Journal of Banking and Finance, 26 (7), 1443–1471.
Rummel, R. J. (1970). Applied factor analysis. Evanston, Illinois: Northwestern
University Press.
Sain, S. (2004). MATH 6026, Topics in Probability & Statistics: Spatial Data
Analysis. Lecture Series, University of Colorado at Denver. Denver, CO.
Schlather, M., & Tawn, J.A. (2002). Inequalities for the extremeal coefficients of
multivariate extreme value distributions. Extremes, 5 (1), 87-102.
Schlather, M., & Tawn, J.A. (2003). A dependence measure for multivariate and
spatial extreme values: Properties and inference. Biometrika, 90 (1), 139–156.
Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6
(2), 461-464.
Shakespeare, W. (2004). The tempest. Washington D.C.: Simon & Schuster.
(Original work published 1623).
Sharpe, W.F. (1964). Capital asset prices - A theory of market equilibrium under
conditions of risk. Journal of Finance, 19 (3), 425-442.
Sharpe, W. F. (1966). Mutual fund performance. Journal of Business, 39 (1, Part 2:
Supplement on Security Prices ), 119-138.
Sharpe W.F. (1974). Imputing expected returns from portfolio composition.
Journal of Financial and Quantitative Analysis, 9 (3), 463-472.
Smith, R. L. 1984. Threshold methods for sample extremes. In: Statistical
Extremes and Applications. Tiago de Oliveira, J., Ed. Reidel, Dordrecht. pp.
621-638.
Smith, R. L. (1985). Maximum likelihood estimation in a class of nonregular cases.
Biometrika, 72 (1), 67-90.
Smith, R.L. (1989). Extreme value analysis of environmental time series: An
example based on ozone data. Statistical Science, 4 (4), 367-393.
-284-
Smith, R.L., Tawn, J.A., & Yuen, H.-K. (1990). Statistics of multivariate extremes.
International Statistical Review, 58 (1), 47-58.
Smith, R.L. (2003). Statistics of Extremes, With Applications in Environment,
Insurance and Finance. Unpublished manuscript. Retrieved December 14,
2006 from the University of North Carolina website:
http://www.stat.unc.edu/postscript/rs/semstatrls.ps
Smith, R.L., & Weissman, I. (1994). Estimating the extremal index. Journal of the
Royal Statistical Society, Series B (Methodological), 56 (3), 515-528.
Smith, R.L., Grady, A.M., & Hegerl, G.C. (2006). Trend in extreme precipitation
levels over the contiguous United States. Unpublished manuscript.
Stephenson, A., & Tawn, J.A. (2004). Bayesian inference for extremes:
Accounting for the three extremal types. Extremes, 7 (4), 291-307.
Szego, G. (2002). Measures of risk. Journal of Banking and Finance, 26 (7), 12531272.
Tawn, J.A. (1988). Bivariate extreme value theory: models and estimation.
Biometrika, 75 (3), 397-415.
Tawn, J.A. (1990). Modelling multivariate extreme value distributions.
Biometrika, 77 (2), 245-253.
Tobler, W.R. (1970). A computer model simulation of urban growth in the Detroit
region. Economic Geography, 46 (2), 234-240.
Tokat, Y., Rachev, S.T., & Schwartz, E.S. (2003). The stable non-Gaussian asset
allocation: A comparison with the classical Gaussian approach. Journal of
Economic Dynamics and Control, 27 (6), 937-969.
Trevor, H., Tibshirani, R., & Friedman, J. (2001). The elements of statistical
learning, data mining, inference and prediction. New York: Springer.
Tukey, J.W. (1977). Exploratory data analysis. New York: Addison-Wesley.
von Mises, R. (1954). La distribution de la plus grande de n valeurs. In (Ed.),
Selected Papers (Vol. II, pp. 271-294). Providence, RI.: American
Mathematical Society.
-285-
Walshaw, D. (1994). Getting the most from your extreme wind data: A step-bystep guide. Journal of Research of the National Institute of Standards and
Technology, 99 (4), 399-411.
Wuertz, D. (2009). Markowitz portfolio, R-port. Retrieved April 2009 from the Rforge website:
http://r-forge.r-project.org/plugins/scmsvn/viewcvs.php/*checkout*/
pkg/fPortfolio/R/B2-MarkowitzPortfolio.R?rev=1&root=rmetrics
-286-
Sources Of Data
Source of Consumer Price Index Data
Consumer Price Index. 2008. U.S. Department of Labor, Bureau of Labor
Statistics, Bureau of Labor Statistics Data. First retrieved January 2008 from
http://data.bls.gov/PDQ/servlet/SurveyOutputServlet.
Source of Economic Time Series
http://www.fxstreet.com/fundamental/economic-time-series. First retrieved January
2008.
Source of Equity Performance and Ancillary Data
FactSet. 2007. On Line Editor. FactSet Research Systems Inc. Norwalk, CT. Data
retrieved from FactSet Data Content during fourth calendar quarter 2007.
Lipper. 2007. Lipper Analytical New Application (LANA). Lipper, A Thomson
Reuters Co. Denver, CO. Data retrieved from LANA during fourth calendar
quarter 2007.
Lipper. 2008. Lipper Security Master File. A Thomson Reuters Co. Denver, CO.
Data retrieved from Security Master File during first calendar quarter 2008.
Reuters. 2007. Security Analysis and Validation Modeling (StockVal). A Thomson
Reuters Co. Phoenix, AZ. Data retrieved from StockVal during fourth
calendar quarter 2007.
-287-
Source of Interest Rate Data
Country
Bank of Canada
Bank of England
Bank of Japan
European Central Bank
Swiss National Bank
The Reserve Bank of Australia
U.S. Federal Reserve
Source
http://www.bankofcanada.ca/en/rates/interest-look.html
http://www.bankofengland.co.uk/statistics/index.htm
http://www.boj.or.jp/en/type/stat/dlong/index.htm
http://sdw.ecb.europa.eu/browse.do?node=bbn131&
http://www.snb.ch/en/iabout/stat/id/statdata
http://www.rba.gov.au/Statistics/interest_rates_yields.html
http://www.federalreserve.gov/RELEASES/
All series first retrieved December 2007.
-288-
Source of Exchange Index Data
Index/Benchmark
Country
Dow Jones Indus. Avg
USA
S&P 500 Index
USA
Nasdaq Composite Index
USA
S&P/Tsx Composite Index
Canada
Mexico Bolsa Index
Mexico
Brazil Bovespa Stock Index
Brazil
DJ Euro Stoxx 50
EU
FTSE 100 Index
UK
CAC 40 Index
DAX Index
IBEX 35 Index
S&P/MIB Index
Amsterdam Exchanges Index
OMX Stockholm 30 Index
Swiss Market Index
Nikkei 225
Hang Seng Index
S&P/ASX 200 Index
Source
Lipper Analytical New
Application (LANA), Lipper
Lipper Analytical New
Application (LANA), Lipper
http://www.bloomberg.com/apps/
quote?ticker=CCMP:IND
http://www.bloomberg.com/apps/
quote?ticker=SPTSX:IND
http://www.bloomberg.com/apps/
quote?ticker=MEXBOL:IND
http://www.bloomberg.com/apps/
quote?ticker=IBOV:IND
http://www.bloomberg.com/apps/
quote?ticker=SX5E:IND
http://www.bloomberg.com/apps/
quote?ticker=UKX:IND
http://www.bloomberg.com/apps/
France
quote?ticker=CAC:IND
http://www.bloomberg.com/apps/
Germany quote?ticker=DAX:IND
http://www.bloomberg.com/apps/
Spain
quote?ticker=IBEX:IND
http://www.bloomberg.com/apps/
Italy
quote?ticker=SPMIB:IND
http://www.bloomberg.com/apps/
Netherlands quote?ticker=AEX:IND
http://www.bloomberg.com/apps/
Sweden
quote?ticker=OMX:IND
http://www.bloomberg.com/apps/
Switzerland quote?ticker=SMI:IND
http://www.bloomberg.com/apps/
Japan
quote?ticker=NKY:IND
http://www.bloomberg.com/apps/
Hong Kong quote?ticker=HSI:IND
http://www.bloomberg.com/apps/
Australia quote?ticker=AS51:IND
All data sets first retrieved December 2007.
-289-
Source of Other Data Sets Called
Out In Chapter 3, Section 3.3
Data Set
Chicago Board of Exchange
(CBOE)
Lehman Brothers fixed income
indices (various)
Dollar futures indices of the New
York Board of Trade (NYBOT)
U.S. mortgage rates (various
maturities)
Rates for Moody’s AAA and
BBB corporate paper
Russell equity indices (various)
Interest rate swap rates (various
maturities)
Source
www.cboe.com
Lipper Analytical New
Application (LANA), Lipper
www.nybot.com
http://mortgage-x.com/
general/indexes/default.asp
FactSet, FactSet Research
Systems Inc
Lipper Analytical New
Application (LANA), Lipper
FactSet, FactSet Research
Systems Inc
All data sets first retrieved December 2007.
-290-