Risk measures

Faculty of Science
Department of Applied Mathematics, Computer Science and Statistics
Utility based risk measures
Jasmine Maes
Promotor: Prof. Dr. D. Vyncke
Supervisor: H. Gudmundsson
Thesis submitted tot obtain the academic degree of Master of Science: Applied
Mathematics
Academic year 2015–2016
Acknowledgements
First of all I would like to thank my supervisor Mr. Gundmundson for letting
me come by his office whenever I felt like I needed it, for coming up with good
ideas and for supporting me throughout the thesis. I also would like to thank
my promotor prof. Vyncke for giving me advice when I asked for it, while still
allowing me a lot of freedom. Last but not least I would like to thank my friends
for listing to all my complaints when things didn’t go as planned and when I got
stuck, and my parents for their financial support during my education.
The author gives permission to make this master thesis available for consultation
and to copy parts of this master thesis for personal use. In the case of any other use,
the limitations of the copyright have to be respected, in particular with regard to
the obligation to state expressly the source when quoting results from this master
dissertation.
Ghent, 1 June 2016.
2
Contents
Preface
6
1 Mathematical representation of risk
1.1 Definitions and properties . . . . . . . . . . .
1.2 The acceptance set of a risk measure . . . . .
1.3 The penalty function . . . . . . . . . . . . . .
1.4 Robust representation of convex risk measures
2 An
2.1
2.2
2.3
introduction to decision theory
The axioms of von Neumann-Morgenstern
Risk and utility . . . . . . . . . . . . . . .
Certainty equivalents . . . . . . . . . . . .
2.3.1 The ordinary certainty equivalent .
2.3.2 The optimised certainty equivalent
2.3.3 The u-Mean certainty equivalent .
2.4 The exponential utility function . . . . . .
2.5 Stochastic dominance . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
8
. 8
. 12
. 16
. 20
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
24
24
26
28
28
29
33
33
35
3 Value at Risk and Expected shortfall
3.1 Value at Risk . . . . . . . . . . . . . . . . . . . . . . .
3.1.1 General properties . . . . . . . . . . . . . . . .
3.1.2 Consistency with expected utility maximisation
3.2 Expected shortfall . . . . . . . . . . . . . . . . . . . . .
3.2.1 General properties . . . . . . . . . . . . . . . .
3.2.2 Consistency with expected utility maximisation
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
38
38
39
42
45
46
49
4 Utility based risk measures
4.1 Utility based shortfall risk measures . . . . . . . .
4.2 Divergence risk measures . . . . . . . . . . . . . .
4.2.1 Construction and representation . . . . . .
4.2.2 The coherence of divergence risk measures
4.2.3 Examples . . . . . . . . . . . . . . . . . .
4.3 The ordinary certainty equivalent as risk measure
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
54
54
60
60
71
75
76
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5 Utility functions
78
5.1 The power utility functions . . . . . . . . . . . . . . . . . . . . . . . 80
5.2 The exponential utility functions . . . . . . . . . . . . . . . . . . . 81
5.3 The polynomial utility functions . . . . . . . . . . . . . . . . . . . . 84
4
5.4
5.5
The SAHARA utility functions . . . . . . . . . . . . . . . . . . . . 88
The κ-utility functions . . . . . . . . . . . . . . . . . . . . . . . . . 96
Conclusion
102
A Dutch summary
104
B Additional computations
B.1 Computations regarding the SAHARA utility class
B.1.1 Computation of the utility function . . . . .
B.1.2 Computation of the divergence function . .
B.2 Computations regarding the κ-utility class . . . . .
B.2.1 Determining the asymptotic behaviour . . .
B.2.2 Computation of the divergence function . .
5
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
107
107
107
110
112
112
114
Preface
When choosing between different investment opportunities it is tempting to select
the one which offers the highest expected return. However, this strategy would
ignore the risk associated with that investment. Generally speaking we have that
the larger the expected return of an investment, the larger the risk associated with
it. Taking into account the risk of a particular investment is not only necessary to
pick the best investment, but also to set up capital requirements. These capital
requirements should create a buffer for potential losses of the investments.
But how do we describe and measure this risk? We could of course try to describe
the cumulative distribution or the density function of the investment. Although
this would give us a lot of information about the risk involved, it could still be
very difficult to compare different investment opportunities in terms of risk. But
a more important problem is that it gives us too much information in some sense.
Therefore it would be useful to summarize the distribution of the investment into a
number, which represents the risk. These numbers can then be used to determine
the necessary capital requirements. More formally if the stochastic variable X
models the returns of an investment, then a risk measure is a mapping ρ : X 7→
ρ(X) such that ρ(X) ∈ R.
Because a stochastic variable can be viewed as a function, a risk measure can be
interpreted as a functional. We could therefore study risk measures by looking
at them as purely mathematical objects. Using techniques and ideas from mathematical analysis we could then analyse properties of these functionals. This is
exactly what we will do in the first chapter of this thesis.
Studying risk measures only from a purely mathematical point of view has the
downside that it ignores the intuition behind it. The attitude towards risky alternatives is a subjective matter determined by personal preferences. These personal
preferences can be represented using so called utility functions. Utility functions
are commonly used in economics to model how people make decisions under uncertainty. In the second chapter we will therefore introduce this decision theoretic
framework and explain the necessary concepts of economics.
Armed with both a strong mathematical and economic framework we will then
apply these concepts to two commonly used risk measures in industry, Value at
Risk and Expected Shortfall. This analysis will be the subject of the third chapter.
The fourth chapter combines the axiomatic approach from the first chapter and
the economic ideas from the second chapter and describes different ways in which
utility functions can be used to construct risk measures. We will introduce utility
based shortfall risk measures and divergences risk measures. Using ideas from
mathematical optimisation we will link different utility based risk measures and
discuss different representations of these risk measures.
6
After this discussion the question arises which utility function we should use to
construct these utility based risk measures. Because utility functions represent
personal preferences we do not believe that there is a straightforward answer to this
question. However, the properties of the utility function used in the risk measures
do affect this risk measure. The last chapter takes a closer look at different classes
of utility functions which appear in literature and asses their properties in the
context of utility based risk measures.
7
1
Mathematical representation of risk
In this chapter we will look at risk measures from a solely mathematical point of
view. We will define what a risk measure is, and what properties it should have.
We will take a closer look at the concepts of the acceptance set and the penalty
function. Finally we will introduce the robust representation of a risk measure.
The contents of this chapter is largely based on the theorems found in [11].
1.1
Definitions and properties
Consider a probability space (Ω, F, P ). Where Ω represents the set of all possible
scenario’s, where F is a σ-algebra and where P is a probability measure. The
future value of a scenario is uncertain and can be represented by a stochastic
variable X. This is a function on the set of all possible scenario’s to the real
numbers, X : Ω → R.
Let X denote a given linear space of functions X : Ω → R including the constants.
A risk measure ρ is a mapping ρ : X → R. Our goal is to define ρ in such a way
that it can quantify the risk of a market position X, such that it can serve as a
measure to determine the capital requirement of X. That is the amount of capital needed when invested in a risk-free manner will make the position acceptable.
Using this interpretation of ρ(X), we would like to have a risk measure that has
some likeable properties.
First of all, if the value of the portfolio X is smaller then the value of the portfolio
Y almost surely, then it would be logical that than you need more money to make
the position of X acceptable, than to make the position of Y acceptable. This
property is called monotonicity.
Property 1. (Monotonicity) If X ≤ Y , then ρ(X) ≥ ρ(Y ).
Secondly, it is logical to assume that to make the position X +m acceptable, where
m is a risk-free amount, we need to have ρ(X) − m. This is precisely the amount
8
ρ(X) to make the position X acceptable reduced by the risk-free amount m we
already had. This property is called translation invariance or cash invariance.
Property 2. (Translation invariance) If m ∈ R, then ρ(X + m) = ρ(X) − m.
Definition 1.1. A mapping ρ : X → R is called a monetary risk measure if ρ
satisfies both monotonicity and translation invariance.
Here we would like to point out that some authors define a monetary risk such
that ρ can also take the values of +∞ and −∞. But then they use the additional
property that ρ(0) is finite or even normalized ρ(0) = 0.
In [11] we found the following lemma.
Lemma 1.1. Any monetary risk measure ρ is Lipschitz continuous with respect
to the supremum norm k · k, we have:
|ρ(X) − ρ(Y )| ≤ kX − Y k
(1.1)
Proof. We have that
X − Y ≤ sup |X(ω) − Y (ω)|,
ω∈Ω
hence X ≤ Y +kX−Y k. Using monotonicity we find that ρ(X) ≥ ρ (Y + kX − Y k).
Using translation invariance we get ρ(X) ≥ ρ(Y ) − kX − Y k. This gives us that
ρ(X) − ρ(Y ) ≥ −kX − Y k or equivalently, ρ(Y ) − ρ(X) ≤ kX − Y k. We also have
that
Y − X ≤ sup |Y (ω) − X(ω)|.
ω∈Ω
Again using monotonicity and translation invariance we find that ρ(Y ) ≥ ρ(X) −
kY − Xk. From this we conclude that ρ(X) − ρ(Y ) ≤ kY − Xk = kX − Y k. Hence
we have that ρ(Y ) − ρ(X) ≤ kX − Y k and ρ(X) − ρ(Y ) ≤ kX − Y k. This leads
us to conclude that |ρ(X) − ρ(Y )| ≤ kX − Y k.
An important subclass of monetary risk measures are the convex risk measures.
These risk measures have the extra property of being convex.
Property 3. (Convexity) ρ(λX + (1 − λ)Y ) ≤ λρ(X) + (1 − λ)ρ(Y ), for 0 ≤ λ ≤ 1
We will prove in lemma 1.2 that for monetary risk measures this property is equivalent with the property of quasi convexity.
Property 4. (Quasi convexity) ρ(λX + (1 − λ)Y ) ≤ max (ρ(X), ρ(Y ))
Definition 1.2. A convex risk measure is a monetary risk measure satisfying the
convexity property.
We can easily interpret the property of quasi convexity. Consider an investor who
can invest his resources in such a way that he obtains X, or in another way so that
he obtains Y . If he spends only a fraction λ of his resources on the first investment
strategy and the rest on Y , he will obtain λX + (1 − λ)Y . This diversification
strategy will give him a risk of ρ(λX + (1 − λ)Y ). The property of quasi convexity
then states that the risk of this diversified portfolio cannot be greater than the
risk of the riskiest investment strategy.
In [11, p 178] we find the following statement which we will prove in this thesis.
9
Lemma 1.2. A monetary risk measure is convex if and only if it is quasi convex.
Proof. First consider a risk measure satisfying convexity, hence ρ(λX +(1−λ)Y ) ≤
λρ(X) + (1 − λ)ρ(Y ), for 0 ≤ λ ≤ 1. Without loss of generality we can assume
that ρ(X) ≥ ρ(Y ) and hence max (ρ(X), ρ(Y )) = ρ(X). We find that
ρ(λX + (1 − λ)Y ) ≤ λρ(X) + (1 − λ)ρ(Y )
≤ λρ(X) + (1 − λ)ρ(X)
= ρ(X)
= max (ρ(X), ρ(Y ))
From which we can conclude that convexity of a monetary risk measure implies
quasi convexity.
Now consider a monetary risk measure satisfying quasi convexity. For all X, Y ∈ X
we can define X 0 := X +ρ(X) and Y 0 := Y +ρ(Y ). Then it is clear that X 0 , Y 0 ∈ X .
Without loss of generality we can suppose that ρ(Y 0 ) ≥ ρ(X 0 ). Because ρ is quasi
convex we have that ρ(λX 0 + (1 − λ)Y 0 ) ≤ ρ(Y 0 ). Rewriting this expression
in terms of X and Y we find that ρ (λX + λρ(X) + (1 − λ)Y + (1 − λ)ρ(Y )) ≤
ρ (Y + ρ(Y )). Now using the fact that ρ satisfies translation invariance we have
that
ρ (λX + (1 − λ)Y ) − λρ(X) − (1 − λ)ρ(Y ) ≤ ρ(Y + ρ(Y )) = ρ(Y ) − ρ(Y ) = 0
We can conclude that ρ(λX + (1 − λ)Y ) ≤ λρ(X) + (1 − λ)ρ(Y ) for all X, Y ∈ X ,
i.e. ρ is a convex risk measure.
We can define a special subclass of convex risk measures using the notion of positive homogeneity. Consider an investor who invests his wealth using an investment
strategy that replicates X, with an associated risk ρ(X). If he only invests a fraction λ of his wealth in the same investment strategy he will obtain λX, with an
associated risk of ρ(λX). If this risk equals the proportional risk of the initial
investment, we say that the risk measure satisfies the property of positive homogeneity.
Property 5. (Positive Homogeneity) If λ ≥ 0, then ρ(λX) = λρ(X)
Definition 1.3. A coherent risk measure is a convex risk measure satisfying positive homogeneity
Coherent risk measures can also be defined by using the sub-additivity property.
If a risk measure is sub-additive, you can decentralize the task of managing the
risk of different positions. Consider an investor who has invested his wealth in
a contingent claim X + Y . If the risk measure is sub-additive this will never be
greater than ρ(X) + ρ(Y ).
Property 6. (Sub-additivity) ρ(X + Y ) ≤ ρ(X) + ρ(Y )
It is stated in [11] that a coherent risk measure is a monetary risk measure satisfying positive homogeneity and sub-additivity. We now prove this equivalent
definition.
10
Lemma 1.3. For a monetary risk measure that satisfies positive homogeneity, the
convexity property is equivalent to the sub-additivity property.
Proof. First assume ρ is sub-additive, X, Y ∈ X , and 0 ≤ λ ≤ 1. We find that:
ρ(λX + (1 − λ)Y ) ≤ ρ(λX) + ρ((1 − λ)Y ) = λρ(X) + (1 − λ)ρ(Y ).
The first inequality follows from the fact that ρ is sub-additive, the second equality
uses the assumption that ρ satisfies positive homogeneity. Note that λX ∈ X and
(1 − λ)Y ∈ X because of the assumed linearity of X .
Now assume ρ is convex. Then for a fixed λ, 0 < λ < 1, define X 0 := λ1 X and
1
Y 0 := (1−λ)
Y . Notice that X 0 , Y 0 ∈ X . It follows from the convexity property and
the positive homogeneity that
ρ(X + Y ) = ρ(λX 0 + (1 − λ)Y 0 ) ≤ λρ(X 0 ) + (1 − λ)ρ(Y 0 ) = ρ(λX 0 ) + ρ((1 − λ)Y 0 )
= ρ(X) + ρ(Y ).
This proves that ρ satisfies the sub-additive property.
So far we have defined a coherent risk measure as a risk measure which satisfies
the following four properties:
1. (Monotonicity) If X ≤ Y , then ρ(X) ≥ ρ(Y ).
2. (Translation invariance) If m ∈ R, then ρ(X + m) = ρ(X) − m.
3. (Positive homogeneity) If λ ≥ 0, then ρ(λX) = λρ(X).
4. (Convexity)ρ(λX + (1 − λ)Y ) ≤ λρ(X) + (1 − λ)ρ(Y ), for 0 ≤ λ ≤ 1.
Where the convexity property can be replaced by the subadditivity propery. However some author’s like [1] and [2] use a positivity axiom instead of the monotonicity
axiom.
Property 7. (Positivity) ∀X ≥ 0 ⇒ ρ(X) ≤ 0.
In general positivity and monotonicity are not equivalent. However it turns out
that when a risk measure satisfies the positive homogeneity property and the subadditivity property they are. The reason for using the positivity property instead
of monotonicity property, is that the positivity property is often easier to prove.
Lemma 1.4. If a risk measure is translation invariant, sub-additive and positive
homogeneous, then it is positive if and only if it is monotone.
11
Proof. First suppose the risk measure is positive homogeneous, translation invariant,subadditive and positive. Because of positivity we have that
(X − Y ) ≥ 0 ⇒ ρ(X − Y ) ≤ 0.
(1.2)
Using the sub-additivity property we find that
ρ(X) = ρ(X − Y + Y ) ≤ ρ(X − Y ) + ρ(Y ).
(1.3)
Combining equation 1.2 and equation 1.3 we find that
X ≥ Y ⇒ ρ(X) ≤ ρ(Y ).
(1.4)
We conclude that the risk measure is monotone. Now suppose the risk measure is
positive homogeneous, translation invariant, sub-additive and monotone, we need
to show that it is positive. Using monotonicity we find that
X ≥ 0 ⇒ ρ(X) ≤ ρ(0).
(1.5)
Using positive homogenity we find that for all λ > 0,
ρ(0) = ρ(λ0) = λρ(0).
(1.6)
Because this is true for all λ > 0 we can conclude that ρ(0) = 0. Using equation
1.5 we can conclude that
X ≥ 0 ⇒ ρ(X) ≤ 0.
(1.7)
This proves positivity.
Remark 1.1. Using lemma 1.3 and lemma 1.4 we see that a risk measure is
coherent if and only if it satisfies the following properties for X, Y ∈ X .
1. (Positivity) X ≥ 0 ⇒ ρ(X) ≤ 0.
2. (Sub-additivity) ρ(X + Y ) ≤ ρ(X) + ρ(Y ).
3. (Positive homogeneous) ∀λ > 0 ρ(λX) = λρ(X).
4. Ttranslation invariant) ∀m ∈ R ρ(X + m) = ρ(X) − m.
1.2
The acceptance set of a risk measure
In the previous section we interpreted ρ(X) as the amount of capital which, if
invested in a risk-free manner, makes the position X acceptable. In this section
we will define the acceptance set of a risk measure. This is the set of all positions
which do not require surplus capital. We will also demonstrate the relationship
between the properties of the risk measure and the corresponding acceptance set.
12
Definition 1.4. The acceptance set induced by a monetary risk measure ρ is defined by
Aρ := {X ∈ X |ρ(X) ≤ 0}.
(1.8)
The following theorem was taken from [11] and proves that there is a clear connection between the properties of a monetary risk measure and the associated
acceptance set. We have worked out the proof.
Theorem 1.1. If ρ is a monetary risk measure with acceptance set A := Aρ then
1. A is non-empty.
2. A is closed in X with respect to the supremum norm k · k.
3. inf{m ∈ R|m ∈ A} > −∞.
4. X ∈ A, Y ∈ X , Y ≥ X ⇒ Y ∈ A.
5. ρ can be recovered from A:
ρ(X) = inf{m ∈ R|m + X ∈ A}.
(1.9)
6. If ρ is a convex risk measure, then A is a convex set.
7. If ρ is positively homogeneous, then A is a cone. In particular is ρ is a
coherent risk measure, A is a convex cone.
Proof.
1. Consider m = ρ(0), then m ∈ X . We will prove that m ∈ A. m ∈
A ⇔ ρ(m) ≤ 0 ⇔ ρ(0) − m ≤ 0 ⇔ ρ(0) ≤ m.
2. Consider a sequence Xn ∈ A such that Xn → X 1 . We need to prove that
X ∈ A. Suppose ρ(X) > 0 then ∃c > 0 : |ρ(Xn ) − ρ(X)| > c but using
lemma 1.1 we have that kXn − Xk ≥ |ρ(Xn ) − ρ(X)| > c > 0. If Xn
converges to X in the supremum norm the left-hand side goes to 0. This
gives us a contradiction. Hence ρ(X) ≤ 0, and therefore ρ(X) ∈ A.
3. ∀m ∈ A we have: m ∈ A ⇔ ρ(m) ∈ A ⇔ ρ(0) − m ≤ 0 ⇔ ρ(0) ≤ m.
Hence ρ(0) is a lower bound for all m ∈ A. This concludes the proof since
we supposed ρ(0) is finite for a monetary risk measure.
4. We know that X ∈ A ⇒ ρ(X) ≤ 0 and using monotonicity Y ≥ X ⇒
ρ(Y ) ≤ ρ(X). Combining those two facts we find that ρ(Y ) ≤ ρ(X) ≤ 0.
Finally we can conclude that Y ∈ A.
5. Notice that inf{m ∈ R|ρ(m + X) ≤ 0} = inf{m ∈ R|ρ(X) ≤ m} = ρ(X).
6. We need to prove that ∀X, Y ∈ A and ∀λ ∈ [0, 1] we have that λX +
(1 − λ)Y ∈ A. It is sufficient to prove that ρ (λX + (1 − λ)Y ) ≤ 0. Since
X, Y ∈ A, and λ ∈ [0, 1] we have λρ(X) ≤ 0 and (1 − λ)ρ(Y ) ≤ 0. Since ρ
is convex we have ρ (λX + (1 − λ)Y ) ≤ λρ(X) + (1 − λ)ρ(Y ) ≤ 0. This is
what we needed to prove.
1
convergence with respect to the supremum norm k · k
13
7. To prove that A is a cone it is sufficient to prove that ∀X ∈ X and ∀λ ≥ 0,
we have that λX ∈ A. Because ρ is positively homogeneous we have that
λX ∈ A ⇔ ρ (λX) ≤ 0 ⇔ λρ (X) ≤ 0 ⇔ ρ(X) ≤ 0 ⇔ ρ(X) ∈ A. This
proves that A is a cone. From the above proofs it follows directly that if ρ
is a coherent risk measure then A is a convex cone.
In 1.4 we defined for each monetary risk measure the associated acceptance set.
We can also do the opposite and define for each acceptance set an associated risk
measure.
Definition 1.5. ρA (X) := inf{m ∈ R|m + X ∈ A}
This is a very intuitive definition for a risk measure. If X is a financial position,
then ρA (X) is the minimal amount of money needed to make the position X
acceptable. Next theorem will show that the properties of the acceptance set are
linked to the properties of the associated risk measure. This theorem was found
in [11] and we have worked out the proof.
Theorem 1.2. If A is a non-empty subset of X such that properties 3 and 4 from
theorem 1.1 are both satisfied, Then the functional ρA has the following properties
1. ρA is a monetary risk measure.
2. If A is a convex set, then ρA is a convex risk measure.
3. If A is a cone, then ρA is positively homogeneous. In particular if A is a
convex cone then ρA is a coherent risk measure.
4. A ⊆ AρA , and A = AρA if and only if A is k · k-closed in X .
Proof.
1. To prove that ρA is a monetary risk measure, we need to check that
∀X ∈ X ρA (X) is finite and that, ρA (X) satisfies monotonicity and translation invariance.
ˆ (translation invariance) We need to prove that for X ∈ X and m ∈ R
ρA (X + m) = ρA (X) − m. This follows almost immediately from the
properties of the infimum, since ρA (X) − m = inf{l ∈ R|l + X ∈
A} − m = inf{l ∈ R|l + X + m ∈ A} = ρA (X + m)
ˆ (monotonicity) X ≤ Y ⇒ m + X ≤ m + Y ∀m ∈ R this implies that
inf{m ∈ R|m + X ∈ A} ≥ inf{m ∈ R|m + Y ∈ A}. Using the definition
of ρA we conclude that X ≤ Y ⇒ ρA (X) ≥ ρA (Y )
ˆ (ρA (X) is finite) Since A =
6 ∅, we can find a Y ∈ A. Fix this Y and
let X ∈ X . From the assumptions on X we have that X and Y are
both bounded, hence there exists a m ∈ R such that m + X > Y .
Using that Y ∈ A, monotonicity and the translation invariance we find
0 ≥ ρA (Y ) ≥ ρA (X + m) = ρA (X) − m. We conclude that ∀X ∈ X
ρA (X) ≤ m < +∞. Because we have assumed property 3 from theorem
1.1, we have ρA (0) > −∞. We need to prove that ρA (X) > −∞ ∀X ∈
X . Take m0 ∈ R such that X +m0 ≤ 0. Using translation invariance and
monotonicity we find that ρA (X + m0 ) = ρA (X) − m0 ≥ ρA (0) > −∞.
From this we can conclude that for a random X ∈ X ρA (X) > −∞.
14
2. We need to prove that if A is convex, then ∀X1 , X2 ∈ X and ∀λ ∈ [0, 1]
ρA (λX1 + (1 − λ)X2 ) ≤ λρA (X1 ) + (1 − λ)ρA (X2 ). Because of translation
invariance we find ∀i ∈ 1, 2 ρA (Xi + ρA (Xi )) = ρA (Xi ) − ρA (Xi ) = 0, hence
ρA (Xi ) + Xi ∈ A. Because A is a convex set we have λ (ρA (X1 ) + X1 ) + (1 −
λ) (ρA (X2 ) + X2 ) ∈ A. Using this we find that
0 ≥ ρA (λ (ρA (X1 ) + X1 ) + (1 − λ) (ρA (X2 ) + X2 ))
= ρA (λX1 + (1 − λ)X2 ) − λρA (X1 ) − (1 − λ)ρA (X2 ).
From this we can conclude that ∀X1 , X2 ∈ X , λ ∈ [0, 1], ρA (λX1 + (1 − λ)X2 ) ≤
λρA (X1 ) − (1 − λ)ρA (X2 ). Which is precisely what we needed to prove.
3. If A is a cone we need to prove that ∀X ∈ X and ∀λ ≥ 0, ρA (λX) = λρA (X).
We first prove ρA (λX) ≤ λρA (X). We know that since ρA (X)+X ∈ A and A
is a cone that λ (ρA (X) + X) ∈ A. Hence we have 0 ≥ ρA (λ (ρA (X) + X)) =
ρA (λX) − λρA (X). This proves that ρA (λX) ≤ λρA (X). To prove the
opposite inequality take m such that m < ρA (X). Then m + X ∈
/ A, which
also implies that for λ ≥ 0 λm + λX ∈
/ A. Which is equivalent with λm <
ρA (λX). We have that λm < λρA (X) ⇒ λm < ρA (λX) This can only be
true if λρA (X) ≤ ρA (λX). Finally we can conclude that λρA (X) = ρA (λX).
4. First we’ll prove the inclusion A ⊂ AρA . For this take an X ∈ A then it is
clear that ρA (X) = inf{m ∈ R|m + X ∈ A} ≤ 0 and therefore X ∈ AρA .
Secondly from part 2 of theorem 1.1 we know that if A = AρA , then A is
k · k-closed in X . Finally assume that A is k · k-closed in X . We need to
prove that AρA ⊂ A, hence we need to prove that X ∈ AρA ⇒ X ∈ A. This
is equivalent with X ∈
/ A⇒X ∈
/ AρA . Take an X ∈
/ A it is sufficient to
prove that ρA (X) > 0. To prove this we need to take m > kXk. Since A is
k · k-closed in X , X \ A is k · k-open in X . Because X ∈ X \ A we can find
a λ ∈ (0, 1) such that λm + (1 − λ)X ∈
/ A. Therefore we have
0 ≤ ρA (λm + (1 − λ)X) = ρA ((1 − λ)X) − λm.
Because ρA is a monetary risk measure we can apply lemma 1.1. We find
that
|ρA ((1 − λ)X) − ρA (X)| ≤ kX − λX − Xk = λkXk.
Using the two inequalities which we have obtained above, we can conclude
that
ρA (X) ≥ ρA ((1 − λ)X) − λkXk ≥ λ(m − kXk) > 0.
This is precisely what we needed to prove.
We have connected the concepts of monetary risk measures, convex risk measures
and coherent risk measures to the concept of the acceptance set. The acceptance
contains all acceptable financial positions. But what is an acceptable position?
This is subjective and can depend on the risk-aversion of the portfolio-holder. Or
it could depend on regulations of a supervisory agency.
15
1.3
The penalty function
In 1921 the economist Frank Knight formulated a distinction between risk and
uncertainty. Risk only applies to situations where, although we do not know the
outcome of an event, we can accurately assign a probability measure to the different outcomes. This situation might occur when tossing a fair coin. Although you
do not know if the coin will land head’s up or not, you know (with certainty) that
this will happen with probability 12 .
Uncertainty in Knigth’s work is different. It applies to situations in which we do
not have all the information to accurately assign a probability measure to the different outcomes. This type of uncertainty, named after Knight, is called Knightian
uncertainty. Knightian uncertainty is very common in real world situations. Consider for example the future return of a stock. The return of the stock is uncertain
and we cannot accurately assign a probability measure to the different returns.
From historical returns of the stock we could estimate such a probability measure.
But would this be the correct probability measure? Obviously not.
In this section we consider the case of Knightian uncertainty where we have a
measurable space (Ω, F) but without a fixed probability measure assigned to this
space. Let X be the space of all bounded measurable functions on (Ω, F) endowed
with the supremum-norm k · k. It is straightforward to show that X is a Banach
space. Let M1 := M1 (Ω, F) be the set of all probability measures on (Ω, F) and
denote with M1,f the set of all functions Q : F → [0, 1] with are normalized i.e.
Q (Ω) = 1 and which are finitely additive. It is clear that M1 ⊂ M1,f and that the
elements of M1,f are not necessarily probability measures since it is not guaranteed that they satisfyRσ-additivity. In the next section we use the notation EQ [X]
with Q ∈ M1,f for XdQ, where the integral is understood to be a Lebesgue
integral and Q ∈ M1,f .
Definition 1.6. A penalty function for ρ on M1,f is a functional α : M1,f →
R ∪ {+∞} such that
inf α (Q) ∈ R.
(1.10)
Q∈M1,f
Penalty functions are strongly linked to convex risk measures. Each penalty function defines a convex risk measure and convex risk measures can be represented
by using a penalty function. We will prove this in the next two theorems.
Theorem 1.3. The functional
ρ(X) := sup (EQ [−X] − α (Q))
(1.11)
Q∈M1,f
defines a convex risk measure on X , such that ρ(0) = − inf
Q∈M1,f
α(Q).
Proof. For each Q ∈ M1,f we define for all X in X the functional ρQ (X) :=
(EQ [−X] − α (Q)). We will first show that ρQ satisfies monotonicity and translation invariance. Monotonicity follows from
X ≤ Y ⇒ −X ≥ −Y
⇒ EQ [−X] ≥ EQ [−Y ]
⇒ (EQ [−X] − α(Q)) ≥ (EQ [−Y ] − α(Q))
⇒ ρQ (X) ≥ ρQ (Y ).
16
To prove that ρQ satisfies translation invariance take X ∈ X and m ∈ R. We have
that
ρQ (m + X) = EQ [−(X + m)] − α(Q) = EQ [−X] − α(Q) − m = ρQ (X) − m.
Where we have used that Q is normalized. We also want to prove that the
functional ρQ is convex. From the proof of 1.2 we know that it is sufficient
to prove that ∀X, Y ∈ X and ∀λ ∈ [0, 1] we have that ρQ (λX + (1 − λY ) ≤
max (ρQ (X), ρQ (Y )). We can assume without loss of generality that EQ [−X] ≤
EQ [−Y ], then ρQ (X) ≤ ρQ (Y ) and therefore max (ρQ (X), ρQ (Y )) = ρQ (Y ). We
also have that
ρQ (λX + (1 − λ)Y ) = EQ [− (λX + (1 − λ)Y )] − α(Q)
= λEQ [−X] + (1 − λ)EQ [−Y ] − α(Q)
≥ λEQ [−Y ] + (1 − λ)EQ [−Y ] − α(Q)
= EQ [−Y ] − α(Q)
= ρQ (Y ) = max (ρQ (X), ρQ (Y )) .
The properties monotonicity, translation invariance and convexity are satisfied for
all Q ∈ M1,f . Hence we have that the functional defined by 1.11 also satisfies these
properties since they are preserved when taking the supremum over all Q ∈ M1,f .
Because of the definition of a penalty function and the fact that X ∈ X is bounded,
we have that ρ(X) only takes finite values. We can conclude that ρ(X) is a convex
risk measure. The fact that ρ(0) = − inf α(Q) follows immediately from the
Q∈M1,f
properties of supremum and infimum.
Next theorem will prove that we can represent each convex risk measure using
a penalty function. The proof of this theorem is not easy and uses results from
functional analysis. For the ease of the reader we give these results without proof.
Theorem 1.4. (Separating hyperplane theorem) In a topological vector space E,
any two disjoint convex sets B and C, one of which has an interior point, can be
separated by a non-zéro continuous linear functional l on E, i.e.,
l(x) ≤ l(y)
∀x ∈ C, ∀y ∈ B.
(1.12)
Proof. Without proof, see [11, p.508].
Theorem 1.5. (Riesz representation theorem) There is a one-to-one correspondence between the set of functions Q ∈ M1,f and linear continuous functionals l
on X such that l(1) =
R 1 and l(X) ≥ 0 for X ∈ X . The correspondence is defined
by l(X) = EQ [X] = XdQ, ∀Q ∈ M1,f .
Proof. Without proof see [11, p.506].
Theorem 1.6. Any convex risk measure ρ on X is of the form
ρ(X) = max EQ [−X] − αmin (Q) , X ∈ X ,
Q∈M1,f
17
(1.13)
where the penalty function αmin is given by
αmin (Q) := sup EQ [−X] for Q ∈ M1,f
(1.14)
X∈Aρ
Moreover, αmin is the minimal penalty function which represents ρ i.e., any penalty
function α for which 1.11 holds satisfies α(Q) ≥ αmin (Q) for all Q ∈ M1,f
Proof. Step 1: We will prove that
ρ(X) ≥ sup
EQ [−X] − αmin (Q) , ∀X ∈ X .
(1.15)
Q∈M1,f
Let X 0 := ρ(X) + X, then because of the translation invariance property we have
that ρ(X 0 ) = ρ(ρ(X)+X) = ρ(X)−ρ(X) = 0. Hence X 0 ∈ Aρ . Because of the definition of αmin (Q) and X 0 ∈ Aρ , we have that ∀Q ∈ M1,f αmin (Q) ≥ EQ [−X 0 ] =
EQ [−X]−ρ(X). This leads us to conclude that ρ(X) ≥ sup EQ [−X]−αmin (Q).
Q∈M1,f
Which is what we wanted to prove.
Step 2: For a given X we will construct a QX ∈ M1,f such that
ρ(X) ≤ EQX [−X] − αmin (QX ).
(1.16)
In combination with 1.15 from the first part, this will prove 1.13. It is sufficient
to prove this for X ∈ X with ρ(X) = 0. Because if ρ(X) = m, then ρ(X + m) = 0
and we have that ρ(X) − m = ρ(X + m) ≤ EQX [−(X + m)] − αmin (QX ) =
EQX [−X] − αmin (QX ) − m. We can also assume without loss of generality that
ρ(0) = 0.
Consider the set
B := {Y ∈ X |ρ(Y ) < 0}.
(1.17)
It is clear that B is non-empty. We’ll prove that B is open in X . To prove this
it is sufficient to prove that X \ B is closed in X . Take a sequence Xn ∈ X \ B,
i.e. ρ(Xn ) ≥ 0 for all n, such that Xn → X. Because of lemma 1.1 we have that
ρ is Lipschitz continuous with respect to the supremum norm and ρ(Xn ) → ρ(X).
We find that ρ(X) ≥ 0, i.e. X ∈ X \ B. The set B is also convex because if
we take X, Y ∈ B and λ ∈ [0, 1] then because of the convexity of ρ we have
(λX + (1 − λ)Y ) ∈ B ⇔ ρ (λX + (1 − λ)Y )) ≤ λρ(X) + (1 − λ)ρ(Y ) < 0. Since
X∈
/ B and a singleton is a convex set we can apply theorem 1.4 to find a non-zéro
continuous linear functional l on X such that
l(X) ≤ inf l(Y ) =: b.
Y ∈B
(1.18)
To construct QX we’ll use 1.5. For this we will first need to prove that Y ≥ 0 ⇒
l(Y ) ≥ 0. Take Y ≥ 0 for all λ > 0 we have λY ≥ 0, because of monotonicity we have ρ(λY ) ≤ 0. Furthermore because of translation invariance we find
ρ(1 + λY ) = ρ(λY ) − 1 < 0. We find that ∀λ > 0 (1 + λY ) ∈ B. Because of the
linearity of l we have that l(X) ≤ l(1) + λl(Y ) for all λ > 0. If l(Y ) < 0 you could,
by choosing λ large enough, make sure that l(1) + λl(Y ) < l(X), a contradiction.
18
We conclude that l(Y ) ≥ 0 if Y ≥ 0.
Now we will prove that l(1) > 0. Since l is a non-zéro continuous linear functional,
there exists a Y ∈ X such that l(Y ) 6= 0 and also l(−Y ) = −l(Y ) 6= 0. Hence
we find a Y + and aY − such that 0 < l(Y ) := l(Y + ) − l(Y − ). With Y + ≥ 0 and
Y − ≥ 0. This representation of Y is not unique and because of the linearity of l we
can pick Y with l(Y ) > 0 and a representation of this Y such that kY + k < 1. Then
because 1 − Y + ≥ 0 and the positivity of l we have l(1 − Y + ) ≥ and l(Y + ) > 0.
Using linearity we find that l(1) = l(Y + ) + l(1 − Y + ) > 0.
Now we can use 1.5 to find a QX in M1,f such that
EQX [Y ] =
l(Y )
l(1)
∀Y ∈ X
(1.19)
It is clear from the definitions of B and Aρ that B ⊂ Aρ . Therefore we have
−l(Y )
l(Y )
−b
= − inf
=
Y ∈B l(1)
l(1)
Y ∈Aρ
Y ∈B
Y ∈B l(1)
(1.20)
Because we know that ∀ > 0 Y + ∈ B for any Y ∈ Aρ . Therefore we can conclude
using the epsilon characterisation of the supremum, that the above inequality is
−b
. Using the assumption that ρ(X) = 0 and the
an equality, hence αmin (QX ) = l(1)
fact that l(X) ≤ b, we can conclude that
αmin (QX ) = sup EQX [−Y ] ≥ sup EQX [−Y ] = sup
EQX [−X] − αmin (QX ) =
1
(b − l(X)) ≥ 0 = ρ(X)
l(1)
(1.21)
This is what we needed to prove.
The only part which is left to prove is the fact that αmin is the minimal penalty
function which represents ρ. Let α be a random penalty function which represents
ρ. Then we need to prove that for all Q ∈ M1,f α(Q) ≥ αmin (Q). We have that
∀X ∈ X and Q ∈ M1,f , ρ(X) ≥ EQ [−X] − α(Q). therefore
α(Q) ≥ sup (EQ [−X] − ρ(X)) ≥ sup (EQ [−X] − ρ(X)) ≥ αmin (Q).
X∈X
X∈Aρ
This concludes the proof.
We have learned that each convex risk measure can be represented using a penalty
function. Because coherent risk measures are by definition convex, this is also true
for coherent risk measures. We now show that the penalty function of a coherent
risk measure has some interesting properties and that the the representation 1.13
can be further specified.
Theorem 1.7. The minimal penalty function αmin of a coherent risk measure
ρ takes only values 0 and +∞. In particular a coherent risk measure can be
represented by
ρ(X) = max
EQ [−X] .
(1.22)
max
Q∈Q
19
Where Qmax is defined as
Qmax := {Q ∈ M1,f |αmin (Q) = 0}.
(1.23)
Proof. We know from theorem 1.1 that Aρ is a convex cone. Hence for all λ > 0,
λX ∈ Aρ. Using theorem 1.6 we know that
αmin (Q) = sup EQ [−X] = sup EQ [−λX] = λ sup EQ [−X] = λαmin (Q).
X∈Aρ
λX∈Aρ
λX∈Aρ
Because this equation must hold for all Q ∈ M1,f and for all λ > 0, we have that
αmin (Q) = 0 or αmin (Q) = +∞. It is now clear that 1.22 holds.
We would like to remind the reader that in the representation 1.13 of a convex risk
measure the Q is not necessarily a probability measure. In the next section we will
impose some extra conditions with respect to the space X and the continuity of ρ
to obtain an analogous representation in which Q is indeed a probability measure.
1.4
Robust representation of convex risk measures
In the previous section we considered the situation in which there was no probability measure fixed to the space (Ω, F). In this section fix a probability measure
P to the space (Ω, F) and let X = L∞ := L∞ (Ω, F, P ). Theorem 1.6 gave us a
representation for any convex risk measure. In this section we will only consider
risk measures ρ such that
if X = Y
P − almost surely then ρ(X) = ρ(Y ).
(1.24)
We introduce the the notion of absolute continuity.
Definition 1.7. Q ∈ M1,f is absolute continuous with respect to P ∈ M1,f on
the σ-algebra F, and we write Q P if for all A ∈ F
P (A) = 0 ⇒ Q(A) = 0.
(1.25)
Notice that if P and Q are probability measures then this definition reduces to the
definition of absolute continuity of two probability measures.
Lemma 1.5. Let ρ be a convex risk measure that satisfies 1.24 and which is
represented by a penalty function α as in 1.11. Then α(Q) = +∞ for any Q ∈
M1,f (Ω, F) which is not absolutely continuous to P .
Proof. Take Q ∈ M1,f (Ω, F) such that Q is not absolute continuous with respect
to P . Then because Q : F → [0, 1] there exists an A ∈ F such that
Q(A) > 0 and P (A) = 0.
(1.26)
(
1, if ω ∈ A
Take any X ∈ Aρ and define Xn := X − nIA , with IA (ω) =
.
0, if ω ∈
/A
Because P (A) = 0, A is a null-set of P . Since X and Xn only differ on a null-set
20
of P and we have assumed that ρ satisfies 1.24, we have that ρ(Xn ) = ρ(X) for all
n. Using theorem 1.6 we have that
α(Q) ≥ αmin (Q) ≥ EQ (−Xn ) = EQ [−X + nIA ] = EQ [−X] + nQ(A).
(1.27)
Because Q(A) > 0 we have that EQ [−X] + nQ(A) → +∞ if n → +∞. We can
conclude that α(Q) = +∞.
Now let M1 := M1 (Ω, F, P ) denote the set of all probability measures which
are absolute continuous with respect to P . From theorem 1.6 we know that each
convex risk measure can be represented by a minimal penalty function αmin , but
in the representation the supremum is taken over all Q ∈ M1,f . The following
theorem characterizes a class of convex risk measures in which the Q is indeed a
probability measure and αmin is a penalty function concentrated on M1 (P ) The
proof of this theorem is very technical and is outside the scope of this thesis.
Theorem 1.8. Suppose ρ : L∞ → R is a convex risk measure, then the following
conditions are equivalent:
1. ρ satisfies the following Fatou property: for any bounded sequence (Xn ) which
converges P-a.s. to some X,
ρ(X) ≤ lim inf ρ(Xn )
n↑∞
2. ρ can be represented by the restriction of the minimal penalty function αmin
to the set M1 (P )
ρ(X) =
sup (EQ [−X] − αmin (Q)).
X ∈ L∞
(1.28)
Q∈M1 (P )
Proof. Without proof, the proof can be found in [11].
Instead of proving this theorem we’ll try to give an intuition behind the seemingly
technical formula 1.28. Consider the situation where you have some subjective
belief P . Consider also the set of all other probabilistic models M1 (P ) which have
the property that, for an event A, if under your subjective belief it is impossible
for A to happen, then under other probabilistic models from M1 (P ), A cannot
happen.
For a fixed probabilistic model Q we can interpret EQ [X] as the expected value
of the portfolio under this probabilistic model. Using the interpretation of a riskmeasure as a capital requirement we can interpret ρ(X) = EQ [−X] as the risk-free
capital you should hold so that your total expected wealth, which consists of the
portfolio and the risk-free capital, equals zero. If you portfolio has a positive
expected value under Q, then the position X is acceptable. Hence ρ(X) ≤ 0 and
X ∈ Aρ .
But which probability measure Q should we pick in de definition of ρ? Instead
of focusing on a specific probabilistic model, we could consider all plausible probabilistic models M1 (P ). We could define ρ as the capital requirement needed in
21
the worst-case scenario of all these probabilistic models M1 (P ) to make sure our
total expected wealth is always at least zero i.e.
ρ(X) =
sup
EQ [−X] .
(1.29)
Q∈M1 (P )
But we did have some beliefs about the probabilistic model. So we would like
to give more importance to probabilistic models which are ”more similar” to P ,
than the models which deviate a lot from P . This is where the penalty function
comes in. If we let α(Q) be so that is assigns higher values to probabilistic models
Q which deviate a lot from our model P , then they have a smaller influence on
the supremum. Now the question is how do we measure the similarity between
two probability measures? One way of doing this is using the notion of relative
entropy, or Kullback-Leibler divergence.
Definition 1.8. The Kullback-Leibler divergence or relative entropy for a probability measure Q which is absolute continuous with respect to a probability measure
P is defined as
Z
dQ
dQ
dQ
ln
=
dP.
(1.30)
KL(Q|P ) = EQ ln
dP
dP
dP
Where
dQ
dP
is the Radon-Nikodým derivative of Q with respect to P .
If we want to use relative entropy as a penalty function, it is important to check
that it takes a minimal value for Q = P . We want to penalize the probabilistic
model P the least. This is what we prove in the following lemma.
Lemma 1.6. For all Q ∈ M1 (P ) we have KL(Q|P ) ≥ 0. Furthermore we have
KL(P |P ) = 0.
Proof. Let f (x) = x ln(x). Then f (x) is a convex function. By definition we have
that
dQ
dQ
KL(Q|P ) = E
ln
dP
dP
dQ
=E f
dP
dQ
≥f E
dP
dQ
dQ
=E
ln E
dP
dP
= 1 ln (1) = 0.
Where the inequality follows from Jensen’s inequality. It is clear that
dP
dP
KL(P |P ) = E
ln
= E [1 ln (1)] = 0
dP
dP
22
Using the Kullback-Leibler entropy as penalty function we get the following riskmeasure:
ρ(X) = sup (EQ [−X] − KL(Q|P )) .
(1.31)
Q∈M1 (P )
This risk measure is known as an entropic risk measure. We’ll study this risk
measure in more detail in chapters four and five.
We would like to point out that each risk measure that is defined as
ρ(X) =
sup (EQ [−X] − αmin (Q)),
X ∈ L∞ ,
(1.32)
Q∈M1 (P )
is a convex risk measure. To see this it is sufficient to notice that the proof of
theorem 1.11 still works, if we take the supremum over all Q ∈ M1 (P ) instead of
over all Q ∈ M1,f . The representation of a convex risk measure in the form of
1.28 is often called the robust representation of a convex risk measure. This name
refers to the fact that we don’t pick a fixed probability measure Q to calculate the
risk, but consider all possible scenario’s.
Sometimes the supremum in the representation 1.28 is actually a maximum.
Theorem 1.9. For a convex risk measure ρ on X we have that if ρ is continuous
from below, which means that for all sequences Xn :
Xn % X pointwise on Ω ⇒ ρ(Xn ) & ρ(X).
The the supremum in 1.28 is a maximum and we have that
ρ(X) = max EQ [X] − αmin (Q) , X ∈ X .
Q∈M1 (P )
Proof. Without proof, see [11, p192]
23
(1.33)
2
An introduction to decision theory
2.1
The axioms of von Neumann-Morgenstern
When dealing with risk individual preferences matter. In the first chapter we
defined the acceptance set of a risk measure. The acceptance set contained all
acceptable positions X ∈ X . But we never gave a clear explanation of what an
acceptable position is. This is because whether or not you find a specific position
acceptable depends on your individual preferences and therefore your risk aversion.
In this section we repeat some basic notions from expected utility theory. A
central question in this discussion is ”How does a rational investor choose between
different portfolio’s?” This choice is risky, because the return of the portfolio’s
is uncertain and can be modelled using stochastic variables. The attitude of the
investor towards risk can be studied using expected utility theory. Crucial to this
theory is the concept of a preference order over a set of lotteries L.
Definition 2.1. A lottery L is defined as a probability measure over a set of
outcomes, called the outcome space.
In our case the outcome space will be the real axis. These are all the possible
net payoffs of the portfolio, X. The different probability distributions of the net
payoffs of the different portfolio’s are given by the lotteries.
Definition 2.2. A preference order on the set of lotteries L is defined as a binary
relation < with the following two axioms:
Axiom 1. (Completeness)
∀L1 , L2 ∈ L :
L1 < L2 or L2 < L1 .
Axiom 2. (Transitivity)
∀L1 , L2 , L3 ∈ L:
L1 < L2 and L2 < L3 ⇒ L1 < L3 .
24
Sometimes a preference order has a numerical representation.
Definition 2.3. A numerical representation of a preference order is a function
U : L → R such that
L1 < L2 ⇔ U (L1 ) ≥ U (L2 ).
(2.1)
A numerical representation is called affine if for all L1 , L2 ∈ L and α ∈ [0, 1].
U (αL1 + (1 − α)L2 ) = αU (L1 ) + (1 − α)U (L2 ).
(2.2)
To be sure there exists numerical representation and that it is affine we have to
impose two extra axioms.
Axiom 3. (Independence)
∀L1 , L2 , L3 ∈ L and α ∈ (0, 1], we have that
L1 < L2 ⇒ αL1 + (1 − α)L3 < αL2 + (1 − α)L3 .
Using the concept of a compound lottery, we can give an interpretation of the
independence axiom. The compound lottery is represented by the distribution
αL1 + (1 − α)L3 , and can be interpreted as a two-step procedure where first a
choice is made between lottery L1 ,and lottery L3 , with probabilities α and 1 − α
respectively, and then the chosen lottery is played. The axiom of independence
states that if we prefer lottery L1 to lottery L2 , we must prefer the compound
lottery αL1 + (1 − α)L3 to αL2 + (1 − α)L3
Axiom 4. (Continuity) ∀L1 , L2 , L3 ∈ L, the following sets are closed.
{α ∈ [0, 1]|αL1 + (1 − α)L2 < L3 } ⊂ [0, 1]
{α ∈ [0, 1]|L3 < αL1 + (1 − α)L2 } ⊂ [0, 1]
Theorem 2.1. If < is a preference order that satisfies the axiom of independence
and the axiom of continuity, then there exists an affine numerical representation U
of <. However U is not unique, but is unique up to an affine transformation. This
means that another affine numerical representation Ũ of < is such that Ũ = aU +b,
with a > 0 and b ∈ R.
Proof. Without proof, see e.g. [11, p. 58].
Sometimes this numerical affine representation has a special form, called the von
Neumann-Morgenstern representation.
Definition 2.4. A numerical representation of a preference order < is a von
Neumann-Morgenstern representation if it is of the form
Z
U (L) = u(x)L(dx) ∀L ∈ L.
(2.3)
Where u is a real function of the outcomes. We will call this function u the utility
function.
25
In the case that the outcome space is not finite it is generally not guaranteed that
the numerical representation will be of the von Neumann-Morgenstern form. But
if there is a von Neumann-Morgenstern representation then both U and u are only
unique up to affine transformation.
For an interpretation of the von Neumann-Morgenstern representation and to understand why it is useful, consider a fixed preference
relation < which has a von
R
Neumann-Morgenstern representation U (L) = u(x)L(dx). In our context the
lottery L can be interpreted as the probability distribution that characterizes the
returns of out investment, modelled by a stochastic variable X. We will assume
that we can make a loss or a profit, such that the outcome space is the whole real
line. Taking the integral over our outcome space R gives us that U (L) = E [u(X)].
In the expected utility framework a rational investor with utility function u, will
rank different portfolios based on their expected utility.
X1 < X2 ⇔ E [u(X1 )] ≥ E [u(X2 )] .
2.2
(2.4)
Risk and utility
In this section we will only consider investors who’s preference order admits to a
von Neumann-Morgenstern utility representation. The utility function u of such
an investor reveals his attitude towards risk.
Definition 2.5. We will call a preference order < (strictly) risk averse if and only
if u is (strictly) concave.
This definition should not come as a surprise. From Jensen’s inequality we know
that if u is a concave function we have that
u (E [X]) ≥ E [u (X)] .
(2.5)
Where if u is strictly concave, the inequality holds. In the expected utility context,
Jensen’s inequality states that when a rational risk averse investor has to choose
between taking a gamble X or getting a certain amount equal to the expected
payoff of the gamble E [X], he will prefer the certain ammount. This is because
his utility for taking the certain amount, u (E [X]), is higher than the expected
utility he receives when he takes the gamble, i.e. E [u (X)] .
Similarity, if the utility function of an investor is convex this means the investor is
risk loving. And if the utility function of an investor is a linear function it means
that the investor is risk neutral.
This gives us another way to look at utility functions in the context of financial
mathematics. Concave utility functions can be viewed as a way to make risky
investments less valuable. Conceptually it is comparable to the discount factor
used to make future payoffs less valuable.
If the utility function u is twice differentiable, we could analyse the concavity of a
utility function using the second derivative of this function. However, two remarks
must be made about this approach. Firstly, the second derivative is a local measure
and will reflect the local risk aversion. Secondly it is impossible to compare the
risk aversion of two utility functions using only the second derivative. This second
26
problem is a direct consequence of the fact that the von Neumann-Morgenstern
utility representation is only unique up to an affine transformation. The utility
functions u1 (x) = −x2 and u2 (x) = −2x2 have as second derivatives respectively
−2 and −4. But both utility functions can represent the same preference order.
One way to deal with this problem is to use the Arrow-Pratt coefficient of absolute
risk aversion.
Definition 2.6. The Arrow-Pratt coefficient of absolute risk aversion of a twice
differentiable utility function u is defined as
rA (x) = −
u00 (x)
.
u0 (x)
(2.6)
u
We will use the notation rA
(x) if we want to specify the used utility function u.
By dividing u00 (x) by u0 (x) we have made sure that all affine transformations of a
utility function u have the same coefficient of absolute risk aversion. The minus
sign makes sure that positive values of the coefficient of absolute risk version reflect
a risk averse investor.
For an interpretation of the numerical value of this coefficient consider
ra (x) = −
du0 /dx
du0 /u0
u00 (x)
=
−
=
.
u0 (x)
u0
dx
(2.7)
Where u0 (x) is called the marginal utility. The marginal utility measures the
increase of utility per unit of increase in payoff of the portfolio. It would be rational
to assume non-saturation, that is u0 (x) ≥ 0, which means that the utility is nondecreasing when the payoff of the portfolio increases. The Arrow-Pratt coefficient
of absolute risk aversion can be interpreted as the percentage decrease in marginal
utility per unit of increase in net payoffs of the portfolio. E.g. if rA = 0.01 this
means that in the neighbourhood of x the investor’s marginal utility is decreasing
at the rate of 1% per unit of increase in the net payoff. As a little remark we
would like to point that if the net payoff of the portfolio is expressed in euro, then
1
. However generally the units of the Arrow-Pratt measure
the unit of rA (x) is euro
of absolute risk aversion are omitted, and we’ll do the same.
Given the functional form of the Arrow-Pratt measure of risk aversion, you could
find a utility satisfying it, by solving the following second order linear differential
equation:
u00 (x) + rA (x)u0 (x) = 0.
(2.8)
Solving this equation is fairly straightforward and we’ll do this in the next theorem.
Theorem 2.2. The solutions to equation 2.8 is given by:
Z η
Z x
u(x) =
C1 exp
−rA (ζ)dζ dη + C2 .
1
(2.9)
1
Where C1 and C2 are two constants and rA (x) is the Arrow-Pratt coefficient of
absolute risk aversion.
27
Proof. Using the substitution v(x) = u0 (x) in equation 2.8, we find
v 0 (x) = −rA (x)v(x).
Rewriting this we find
v 0 (x)
= −rA (x) ⇔ ln(v(x))0 = −rA (x).
v(x)
Integrating both sides we get
η
Z
−rA (ζ)dζ + C.
ln(v(η)) =
1
This gives us
η
Z
−rA (ζ)dζ .
v(η) = C1 exp
1
Since v(x) = u0 (x) we find that
Z
Z x
u(x) =
C1 exp
1
2.3
η
−rA (ζ)dζ
+ C2 .
1
Certainty equivalents
Certainty equivalents will play a crucial role in this thesis, since they are strongly
linked to risk-measures. In this section we will take a look at three different
certainty equivalents. We will consider the ordinary certainty equivalent, the optimised certainty equivalent and the certainty equivalent resulting from the zeroutility principle.
2.3.1
The ordinary certainty equivalent
Definition 2.7. The (ordinary) certainty equivalent, CEu (X) of a stochastic variable X, with distribution L, is the amount of money for which an individual is
indifferent between X and the certain amount CEu (X). This means
Z
u (CEu (X)) = u(x)L(dx) = E [u(X)] .
(2.10)
Sometime we will use the notation CELu (X) if we explicitly want to specify the
distribution L of X.
If an investor is risk averse we have that u (E [X]) ≥ E [u (X)]. This means that
CEu (X) = u−1 (E [u(X)]) ≤ E [X] .
(2.11)
This coincides with our intuition about risk aversion. When faced with the choice
between a gamble X and a risk free amount CEu (X) it is possible that a risk
averse investor will choose the certain amount even if it’s less than the expected
payoff of the gamble. So far we have seen different ways to asses the attitude of
investor towards risk. The following theorem links these different concepts.
28
Theorem 2.3. Given two investors with utility functions u1 and u2 respectively,
then the following statements are equivalent.
u2
u1
1. rA
(x) ≥ rA
(x) for every x.
2. There exists an increasing concave function φ(·) such that u2 (x) = φ(u1 (x))
at all x. This means that u2 can be seen as a concave transformation of u1 .
3. CELu2 (X) ≤ CELu1 (X) for all L.
4. Whenever the second investor with utility function u2 finds a lottery L as
least as good as a risk free outcome x̄, then the first investor
R with utility
function u
also
finds
the
lottery
L
as
least
as
good
as
x̄.
So
u2 (x)L(dx) ≥
R1
u2 (x̄) ⇒ u1 (x)L(dx) ≥ u1 (x̄) for all L and x̄.
Proof. Without proof, see [20, p191].
All expressions in theorem 2.3 reflect the fact that the second investor is more risk
averse than the first investor. Remember that in expected utility theory, a rational
investor is able to rank different portfolios using their expected utility. It is easy
to see that if the investor has an increasing utility function, this ranking can also
be obtained when the investor uses his ordinary certainty equivalent, because:
X1 < X2 ⇔ E [u(X1 )] ≥ E [u(X2 )] ⇔ u−1 (E [u(X1 )]) ≥ u−1 (E [u(X2 )]) . (2.12)
2.3.2
The optimised certainty equivalent
A certainty equivalent which will play an important role in the study of so called
divergence risk measures will be the optimised certainty equivalent.
Definition 2.8. The optimized certainty equivalent, OCEu (X) of a stochastic variable X of an investor with a concave utility function u is defined as
OCEu (X) = sup (η + E [u (X − η)]) .
(2.13)
η∈R
Before we give the economic intuition behind this certainty equivalent, we will
take a closer look at definition 2.13. When we do this, we immediately notice
a problem. Consider u1 (x) and u2 (x) = u1 (x) + b with b 6= 0. From the von
Neumann-Morgenstern utility theory we know that both utility functions represent
the same preferences. However because of the linearity of the expected value we
have OCEu2 (X) = OCEu1 (X) + b 6= OCEu1 (X). This means that the OCEu
is not invariant under an affine transformation of the utility function, which is
an undesirable property for a certainty equivalent and makes the interpretation
difficult.
The authors [5], who introduced this certainty equivalent did only define the optimised certainty equivalent for a limited class U0 of ”normalised” utility functions.
29
Definition 2.9. Let u : R 7→ [−∞, +∞) be a proper1 closed concave and nondecreasing utility function with effective domain dom u = {t ∈ R|u(t) > −∞} =
6 ∅.
Then u is contained in the class of normalized utility functions U0 if u satisfies
u(0) = 0 and 1 ∈ ∂u(0). Where ∂u(·) is the subdifferential 2 map of u.
If u is differentiable at 0 then the two normalisation properties of definition 2.9
yield u(0) = 0 and u0 (0) = 1. Since the utility functions in U0 are non-decreasing
and u(0) = 0 we have that for u ∈ U0 , u(x) ≥ 0 for all x ≥ 0. We also have that
for u ∈ U0 and for all x ∈ R, u(x) ≤ x because of the concavity of the utility
function and 1 ∈ ∂(0).
We will now try to give an intuition behind the definition of the optimised certainty
equivalent let X denote the net payoff of our portfolio. Then E [u(X)] can be
interpreted as sure present value of the net payoff of out portfolio. Now consider an
investor who can choose to receive a part η of the future net payoff of the portfolio
X, giving him a total present value of η + E [u(X − η)]. If the investor were to
optimise this choice, he would receive max (η + E [u(X − η)]). However since it
η∈R
is not always guaranteed a maximum exists the optimised certainty equivalent is
defined using a supremum. From [4] we have following properties and proofs.
Theorem 2.4. For u ∈ U0 the optimised certainty equivalent has following properties:
1. (Monotonicity) X ≤ Y ⇒ OCEu (X) ≤ OCEu (Y )
2. (Shift additive) For all c ∈ R we have OCEu (X + c) = OCEu (X) + c.
3. (Risk aversion) u(x) ≤ x for all x if and only if OCEu (X) ≤ E [X].
4. (Concavity) For all stochastic variables X1 and X2 and all λ ∈ [0, 1] we have
OCEu (λX1 + (1 − λ)X2 ) ≥ λ OCEu (X1 ) + (1 − λ) OCEu (X2 ).
Proof. The proofs of these properties follow from straightforward calculations, as
demonstrated below.
1. (Monotonicity)
Because u is non-decreasing we have that
X ≤Y ⇒X −η ≤Y −η
⇒ E [u (X − η)] ≤ E [u (Y − η)]
⇒ η + E [u (X − η)] ≤ η + E [u (Y − η)]
⇒ sup (η + E [u (X − η)]) ≤ sup (η + E [u (Y − η)])
η∈R
η∈R
⇒ OCEu (X) ≤ OCEu (Y )
1
This means that u(·) is such that for all x u(x) < +∞ and for at least one x we have
u(x) > −∞.
2
For a concave function the subdifferential at x0 is defined as following set ∂u(x0 ) = {c ∈
R|u(x) − u(x0 ) ≤ c(x − x0 )}. In this case 1 ∈ ∂u(0) ⇔ u(x) ≤ x.
30
2. (Shift additive)
OCEu (X + c) = sup (η + E [u(X + c − η)])
η∈R
= sup (η − c + E [u(X − (η − c))]) + c
η∈R
= sup (η − c + E [u(X − (η − c))]) + c
(η−c)∈R
= OCEu (X) + c
3. (Risk aversion)
First suppose u(x) ≤ x for all x then:
OCEu (X) = sup (η + E [u(X − η)])
η∈R
≤ sup (η + E [X − η])
η∈R
= sup (η + E [X] − η)
η∈R
= sup E [X]
η∈R
= E [X]
Now suppose OCEu (X) ≤ E [X]. For all X we have:
sup (η + E [u(X − η)]) ≤ E [X]
η∈R
⇒ η + E [u(X − η)] ≤ E [X] ∀η ∈ R
⇒ E [u(X − η)] ≤ E [X − η] ∀η ∈ R
⇒ E [u(X)] ≤ E [X]
Since this is true for all X, this is especially true for all x ∈ R. Hence we
can conclude that u(x) ≤ x.
4. (Concavity) For all λ ∈ [0, 1], let Xλ = λX1 + (1 − λ)X2 . Because of the
concavity of u we have for all η1 , η2 ∈ R that
E [u(λX1 + (1 − λ)X2 − λη1 − (1 − λ)η2 )] ≥ λE [u(X1 − η1 )]+(1−λ)E [u(X2 − η2 )] .
Notice that λη1 + (1 − λ)η2 ∈ R. Adding this to both sides we find that
λη1 + (1 − λ)η2 + E [u(λX1 + (1 − λ)X2 − λη1 − (1 − λ)η2 )]
≥ λη1 + λE [u(X1 − η1 )] + (1 − λ)η2 + (1 − λ)E [u(X2 − η2 )] .
Taking the supremum of both sides we get
sup {λη1 + (1 − λ)η2 + E [u(Xλ − λη1 − (1 − λ)η2 )]}
η1 ,η2 ∈R
≥ sup {λ (η1 + E [u(X1 − η1 )]) + (1 − λ) (η2 + E [u(X2 − η2 )])}.
η1 ,η2 ∈R
31
Because the mapping (η1 , η2 ) 7→ λη1 + (1 − λ)η2 defines a surjection from R2
to R, we have that
OCEu (Xλ ) = sup {λη1 + (1 − λ)η2 + E [u(Xλ − λη1 − (1 − λ)η2 )]}
η1 ,η2 ∈R
≥ sup {λ (η1 + E [u(X1 − η1 )]) + (1 − λ) (η2 + E [u(X2 − η2 )])}
η1 ,η2 ∈R
= λ sup (η1 + E [u(X1 − η1 )]) + (1 − λ) sup (η2 + E [u(X2 − η2 )])
η1 ∈R
η2 ∈R
= λ OCEu (X1 ) + (1 − λ) OCEu (X2 ).
This proves the concavity property of the optimised certainty equivalent.
It is now natural to ask if the optimised certainty equivalent provides the same
ranking on the portfolios as the ordinary certainty equivalent, or the expected
utility criterion. It is stated in [5] that this will not always be the case, theorem 2.6
links the optimised certainty equivalent and the ordinary certainty equivalent. In
this theorem we need to assume that the supremum in the definition of OCEu (X)
is attained for an η ∈ R. This will be the case if the support of X is a closed
bounded interval. From [5] we have following theorems and proof.
Theorem 2.5. If u ∈ U0 and if X is a stochastic variable with support a closed
bounded interval, then the supremum in the definition of the optimised certainty
equivalent is attained. I.e.
∃η ∈ R : OCEu (X) = η + E [u(X − η)] .
Proof. Without proof, see [5].
Theorem 2.6. If X and Y are stochastic variables with a compact support, then
OCEu (X) ≥ OCEu (Y ) ∀u ∈ U0 ⇔ CEu (X) ≥ CEu (Y ) ∀u ∈ U0 .
(2.14)
Proof. First assume that CEu (X) ≥ CEu (Y ) ∀u ∈ U0 . Using that u is nondecreasing we find that ∀u ∈ U0 and η ∈ R:
CEu (X) ≥ CEu (Y ) ⇒ E [u(X)] ≥ E [u(Y )]
⇒ η + E [u(X − η)] ≥ η + E [u(Y − η)]
⇒ sup (η + E [u(X − η)]) ≥ sup (η + E [u(Y − η)])
η∈R
η∈R
⇒ OCEu (X) ≥ OCEu (Y )
Where the first two implications follow from the fact that u is nondecreasing. Now
assume that OCEu (X) ≥ OCEu (Y ). Because X and Y have compact supports, the
supremum in the optimised certainty equivalents is attained. Hence for every u ∈
32
u
u
u
U0 there exists ηX
and ηYu such that we have that OCEu (X) = ηX
+ E [u(X − ηX
)]
u
u
and OCEu (Y ) = ηY + E [u(Y − ηY )]. We have for any u ∈ U0 that
u
u
u
u
OCEu (X) = ηX
+ E [u(X − ηX
)] ≥ ηYu + E [u(Y − ηYu )] ≥ ηX
+ E [u(Y − ηX
)] .
u
u
We conclude that for any u ∈ U0 E [u(X − ηX
)] ≥ E [u(Y − ηX
)]. Which implies
that E [u(X)] ≥ E [u(Y )]. Again using the fact that u is nondecreasing this implies
that CEu (X) ≥ CEu (Y ).
We would like to remark that we do not necessarily need the fact that both X and
Y have compact support. We only need to be sure that the supremum is attained
for all utility functions u.
2.3.3
The u-Mean certainty equivalent
Definition 2.10. The u-Mean certainty equivalent, Mu (X) of a stochastic variable
X is defined by the equation
E [u (X − Mu (X))] = 0.
(2.15)
Where u is a strictly increasing utility function. The equation 2.15 is known as
the principle of zero utility.
Notice that the u-Mean certainty equivalent also has the problem that it is not
invariant under an affine transformation of the utility function u. When u is
non-decreasing one can give a more general definition of the u-Mean certainty
equivalent,
Mu (X) = sup{m ∈ R|E [u(X − m)] ≥ 0}.
(2.16)
In the fourth chapter of this thesis we will derive a relation between the u-Mean
certainty equivalent and the optimised certainty equivalent. We will also show that
under reasonable assumptions, the zero-utility principle has a unique solution.
2.4
The exponential utility function
The exponential utility function will be an important utility function in this thesis,
since it is strongly connected with the concept of an entropic risk measure. In this
section I will therefore apply the concepts described above to the exponential
utility function.
The exponential utility function occurs when we model an investor with constant
absolute risk aversion. Let a, with a > 0 be the coefficient of absolute risk aversion
of a risk averse investor. Using theorem 2.2 we find that
Z η
Z x
u(x) =
C1 exp
−adζ dη + C2 .
1
1
Calculating these integrals we have that
u(x) =
−C1
(exp (−a(x − 1)) − 1) + C2 .
a
33
We will choose the constants C1 and C2 such that u ∈ U0 . The condition u0 (0) = 1
gives us that C1 = exp (−a), and the condition u(0) = 0 gives us that C2 =
1
(1 + exp (−a)). Using these constants the utility function becomes
a
u(x) =
1 − exp (−ax)
.
a
(2.17)
We will take a look at the different kinds of certainty equivalents. It is stated in
[5] that for the exponential utility function 2.17 the ordinary certainty equivalent,
the optimised certainty equivalent and the u-Mean certainty equivalent coincide.
We will provide a proof of this statement in this thesis.
Theorem 2.7. If u(x) =
1−exp(−ax)
a
then
1
CEu (X) = OCEu (X) = Mu (X) = − ln (E [exp (−aX)]) .
a
(2.18)
Proof. Using the definitions from section 2.3, we’ll compute all the different certainty equivalents.
ˆ The ordinary certainty equivalent CEu (X)
Notice that u−1 (x) = − ln(1−ax)
a
CEu (X) = u−1 (E [u(X)])
h
i
− ln 1 − aE 1−exp(−aX)
a
=
a
−1
=
ln (1 − E [1 − exp (−aX)])
a
−1
=
ln (E [exp (−aX)])
a
ˆ The optimised certainty equivalent OCEu (X)
Consider the function f (η) = η + E [u (X − η)]. For the optimised certainty
equivalent we’re interested in the supremum of this function. If f (η) has a
maximum, this maximum will be equal to the supremum.
The first order condition gives
d
(η + E [u (X − η)])
dη
d
1 − exp (−a (X − η))
=1+
E
dη
a
d 1 − exp (aη) E [exp (−aX)]
=1+
dη
a
= 1 − exp (aη) E [exp (−aX)]
0=
Solving this last equation to η we find
η∗ =
− ln E [exp (−aX)]
a
34
To show that the function f (η) attains a maximum at η ∗ it is suffucient to notice that for a > 0 the second derivative f 00 (η) = −a exp (aη) E [exp (−aX)]
is negative.
We can conclude that
1 − exp (−a (X − η ∗ ))
∗
OCEu (X) = η + E
a


ln E[exp(−aX)]
1
−
exp
−aX
+
a
a
− ln E [exp (−aX)]

=
+ E
a
a
− ln E [exp (−aX)] 1 1
+ − E [exp (−aX) exp (ln E [exp (−aX)])]
a
a a − ln E [exp (−aX)] 1 1
exp (−aX)
=
+ − E
a
a a
E [exp (−aX)]
− ln E [exp (−aX)]
=
a
=
ˆ The u-Mean Mu (X)
We will show that Mu (X) =
utility.
− ln E[exp(−aX)]
a
satisfies the principle of zero
E [u (X − Mu (X))] = E [1 − exp (−aX − ln E [exp(−aX)])]
= E [1 − exp(−aX) exp (− ln E [exp(−aX)])]
E [exp(−aX)]
=1−
E [exp(−aX)]
=0
Because u is a strictly increasing and continuous function Mu (X) is the
unique solution of the zero-utility principle.
2.5
Stochastic dominance
Expected utility theory states that an investor with a utility function u would
prefer X1 to X2 if and only if E [u(X1 )] ≥ E [u(X2 )]. In this section we will take a
look at a situation where X1 is preferred to X2 , not only for one investor with a
specific utility function u, but for a whole class of investors with different utility
functions. This study can be done using the concept of stochastic dominance.
The main goal of this section is to introduce the necessary concepts and definitions
regarding stochastic dominance so that we can apply them later on in concrete
situations. The definitions and theorems in this section are taken from [29].
As always the stochastic variable X will be the net payoff of a portfolio, and take
negative as well as positive values. We will assume that the distribution of X is
given by F (x) and that his derivative exists, so that the probability density of X
is well defined. We will further assume that the utility function u is sufficiently
differentiable.
35
An important concept in the study of stochastic dominance is the n-th order
distribution of a stochastic variable X. The n-th order distribution can be defined
inductively as follows
Definition 2.11. The n-th order distribution function, F (n) (x), of a stochastic
variable X is inductively defined by
Z x
(1)
(n)
F (n−1) (u)du.
(2.19)
F (x) := F (x), F (x) :=
−∞
Where F (x) is the cummulative distribution of X.
Using the notion of a n-th order distributions functions we can define the concept
of n-th order stochastic dominance.
Definition 2.12. If X1 and X2 are random variables, then X1 dominates X2 in
the sense of n-th order stochastic dominance, X1 ≥SD(n) X2 , if
(n)
(n)
F1 (x) ≤ F2 (x)
(n)
∀x ∈ R
(2.20)
(n)
Where F1 (x) and F2 (x) are the n-th order distributions of X1 and X2 .
There is an important link between the concept of n-th order stochastic dominance
and expected utility maximisation this link is given by following theorem.
Theorem 2.8.
X1 ≥SD(n) X2 ⇔ E [u (X1 )] ≥ E [u (X2 )] .
(2.21)
for all utility functions u(x) for which (−1)k u(k) (x) ≤ 0 for k ∈ {1, 2, 3, . . . , n} for
all x (with at least one utility function satisfying the inequality).
When we take a closer look at the special case of second order stochastic dominance. We can see see that second order stochastic dominance has a clear and
easy economic interpretation.
Theorem 2.9.
X1 ≥SD(2) X2 ⇔ E [u (X1 )] ≥ E [u (X2 )] .
(2.22)
For all utility functions u(x) with u0 (x) ≥ 0 and u00 (x) ≤ 0 for all x, where there is
at least one utility function u(x) with the property that u0 (x) > 0 and u00 (x) < 0.
Second order dominance means that X1 is ranked above X2 if for all risk averse
(u00 (x) ≤ 0) and non-saturated (u0 (x) ≥ 0) investors the expected utility of X1 is
more than the expected utility of X2 .
Risk measures can induce an ordering on different portfolio’s, that is if ρ(X) ≥
ρ(Y ) ⇒ X ≤ Y . When all non-saturated risk averse investors obtain the same
ordering using the von Neumann-Morgenstern criterion we will say that this riskmeasure is consistent with second order stochastic dominance. More generally we
have following definition.
Definition 2.13. A risk measure ρ(X) is consistent with n-th order stochastic
dominance if
X1 ≥SD(n) X2 ⇒ ρ(X1 ) ≤ ρ(X2 ).
(2.23)
36
Using the definition of the n-th order distribution it is easy to see that one has
following inclusion.
Theorem 2.10. X1 ≥SD(n) X2 ⇒ X1 ≥SD(n+1) X2 .
Proof. without proof, see [29, Theorem 4]
Using definition 2.13 and theorem 2.10 we can conclude that following theorem
holds.
Theorem 2.11. A risk measure consistent with (n + 1)-th order stochastic dominance is also consistent with n-th order stochastic dominance.
Proof. Without proof, see [29, Theorem 6].
37
3
Value at Risk and Expected shortfall
”Value at Risk is like an air bag
that works well all the time except
when you have an accident.”
David Einhorn
In this chapter we will take a closer look at two commenly used risk measures, Value
at Risk (VaR) and Expected shortfall (ES). We will use the concepts described
in chapter one and two to check whether these risk measures have both desirable
mathematical properties as well as desirable decision theoretic properties.
3.1
Value at Risk
Value at Risk is perhaps the most famous risk measures in existence. It’s definition
is based on the quantiles of a stochastic variables.
Definition 3.1. The lower α quantile of a stochastic variable X is defined by
xα := qα (X) := inf{x ∈ R|P [X ≤ x] ≥ α}.
(3.1)
Definition 3.2. The upper α quantile of a stochastic variable X is defined by
xα := q α (X) = inf{x ∈ R|P [X ≤ x] > α}.
(3.2)
Because {x ∈ R|P [X ≤ x] > α} ⊂ {x ∈ R|P [X ≤ x] ≥ α} we have that qα (X) ≤
q α (X). The equality is in general not true. However in [1] it is stated that equality
holds iff P [X ≤ x] = α for at most one x.
We follow [1] , [11] and [29] in defining VaRα as the smallest value such that the
probability of an absolute loss being at most this value is at least 1 − α. More
formally we have following definition.
38
Definition 3.3. Fix α ∈ (0, 1), Then the Value at Risk of a portfolio, where the
net payoff is modelled by X at a level α1 is defined as
VaRα (X) = − inf{x ∈ R|P [X ≤ x] > α}.
(3.3)
We have illustrated the concept of Value at Risk in figure 3.1. Where we have
assumed that X ∼ N (0, 1) and the total blue area equals α. In this case the upper
and lower quantile are the same.
Figure 3.1: Value at Risk.
y
− VaRα
α
3.1.1
0
x
General properties
We will now apply properties of general risk measures form chapter one to analyse
the properties of Value at Risk. It is easy to check that Value at Risk is a monetary
risk measure.
Theorem 3.1. VaRα (X) is a monetary risk measure.
Proof. To prove that VaRα is a monetary risk measure, we need to prove that it
satisfies both monotonicity and translation invariance.
1. Monotonicity
Take X ≤ Y . We need to prove that for all α ∈ (0, 1) we have that
VaRα (X) ≥ VaRα (Y ). Notice that
VaRα (X) ≥ VaRα (Y ) ⇔ inf{x|P [X ≤ x] > α} ≤ inf{x|P [Y ≤ x] > α}.
To prove the inequality on the right side we will show that
{x ∈ R|P [Y − x ≤ 0] > α} ⊂ {x ∈ R|P [X − x ≤ 0] > α}.
(3.4)
From our assumption X ≤ Y we have that ∀x ∈ R, X − x ≤ Y − x. From
this it follows that P [X − x ≤ 0] ≥ P [Y − x ≤ 0].
Take an x ∈ {x ∈ R|P [Y − x ≤ 0] > α} randomly, then we have P [Y − x ≤ 0] >
α. From which it follows that P [X − x ≤ 0] ≥ P [Y − x ≤ 0] > α. We can
conclude that x ∈ {x ∈ R|P [X − x ≤] > α}. This proves 3.4. We conclude
that if X ≤ Y then VaRα (X) ≥ VaRα (Y ).
1
This means on a 100(1 − α) percent confidence level.
39
2. Translation invariance
We need to prove that VaRα (X + m) = VaRα (X) − m. This follows from a
straightforward calculation.
VaRα (X + m) = − inf{x ∈ R|P [X + m ≤ x] > α}
= − inf{x ∈ R|P [X ≤ x − m] > α}
= − inf{x + m ∈ R|P [X ≤ x] > α}
= − inf{x ∈ R|P [X ≤ x] > α} − m
= VaRα (X) − m.
This concludes the proof that VaRα (X) is a monetary risk measure.
It is now natural to ask whether Value at Risk is a convex risk measure. From
theorem 1.1 from chapter one, we know that if VaRα is a convex risk measure, then
the associated acceptance set should be convex as well. Unfortunately it turns out
that VaRα is not a convex risk measure. This is an undesirable property to have
because it can penalize more diversified portfolio’s. We will illustrate the fact that
VaRα is not an convex risk measure using a simplified example.
Let the risk-free rate be 0%. Consider a zero-coupon bond which costs 100, pays
out 101 and has a default probability of 0.0095. Denote the net payoff of an
investment in the bond with X. Then
(
−100, when the bond defaults.
X=
(3.5)
1,
otherwise.
It is easy to see that the Value at Risk at the 99% confidence level, VaR0.01 (X) =
−1. This is because the probability of default is below the 1% at which we calculate
the Value at Risk. This default is considered to unlikely to be taken into account by
VaR0.01 . Because VaR0.01 (X) = −1 ≤ 0, we have that X is an acceptable position,
i.e. X ∈ AVaR0.01 . Now consider a second bond which has exactly the same default
risk, payoff and price as the first bond. If the net payoff of the investment in this
second bond are modelled by Y , then it is clear that VaR0.01 (Y ) = −1 and thus
Y ∈ AVaR0.01 . We have that both X and Y are acceptable positions. Now assume
that the default of the first bond is independent of the default of the second bond.
If Value at Risk would be a convex risk measure, then the more diversified portfolio
with payoff P = 12 X + 12 Y should be an acceptable position as well. Using the
independence of X and Y we find that P has following distribution.

2

−100, when both bonds default, p = (0.0095) .
X = − 99
,
precisely one bond defaults, p = 2 · 0.0095(1 − 0.0095).
2


1,
otherwise.
(3.6)
The probability that at least one bond will default equals (0.0095)2 + 2 · 0.0095(1 −
0.0095) = 0.0189. Hence the Value at Risk of the diversified portfolio is VaR0.01 (P ) =
99
. The portfolio P is not an acceptable position. The acceptance set AVaR0.01 is
2
not convex.
40
Although we constructed a concrete example for α = 0.01, it is possible to construct such an example for all α ∈ (0, 1) by choosing the default probability of the
bonds small enough.
In the previous example another important problem of Value at Risk became clear.
Value at Risk can ignore potentially very large losses. Consider again the example
of a bond which costs b, has a positive net return r and defaults with probability
p < α. Then VaRα (X) = −r ≤ 0 no matter the value of b. If the bond defaults
the payoff is −b, but because the default probability is too low, Value at Risk isn’t
affected by this, potentially very large, loss.
Figure 3.3: Standard normal density
function.
Figure 3.2: Density function of S1 .
Like all risk measures in this thesis, VaR tries to summarise the distribution of
a portfolio into one number which should reflect the level of risk. Hence it is inevitable that some information regarding the complete distribution of the stochastic variable is lost. It is however important to be aware of this information loss.
Because Value at Risk is defined as a quantile, it does not incorporate well the
information about the shape of the left tail of the density function. We will illustrate this with a theoretical example. Remember that the normal density is given
by
(x − µ)2
1
2
f (x, µ, σ ) = √ exp −
.
(3.7)
2σ 2
σ 2π
Now consider S1 such that the net payoffs have following density function
g(x) = 0.99f (x, 0, 1.002974) + 0.01f (x, −8, 0.04).
(3.8)
This density function is plotted in figure 3.2. Next to this figure the density of the
standard normal is plotted. Apart from the spike which occurs around -8, both
density functions are very similar. In fact if we calculate the Value at Risk at a
95%-confidence level of S1 we find that VaR0.05 (S1 ) = 1.64485. If we now consider
S2 , for which the net payoffs have a standard normal distribution, then we find that
the Value at Risk at a 95% confidence level is the same,i.e. VaR0.05 (S2 ) = 1.64485.
Although the Value at Risk at the 95%-confidence level is the same. The risk associated with both investments is definitely not. For S1 there is a 0.5% probability
that the loss exceeds 8, while for S2 we have that this probability is negligible2 .
This problem of Value at Risk get addressed by other risk measures such as Expected shortfall.
2
p = 6.2210−16
41
3.1.2
Consistency with expected utility maximisation
In this subsection we will take a closer look at some of the results reported in [29]
regarding the consistency of Value at Risk with expected utility maximisation.
For this we will rely on the definitions and theorems introduced in chapter two
regarding stochastic dominance. In [29] it is stated that Value at Risk is consistent
with first order stochastic dominance. This should not be surprising, since Value
at Risk is defined as a quantile.
Theorem 3.2. VaR is consistent with first-order stochastic dominance. This
means
X1 ≥SD(1) X2 ⇒ VaRα (X1 ) ≤ VaRα (X2 ).
(3.9)
Theorem 3.3. Without proof, follows from [19, Theorem 1’].
Instead of copying the proof of this theorem, we will give an example which will
illustrate the fact that X1 ≥SD(1) X2 is a very strong assumption. Consider 2 stocks
and let X1 and X2 denote the net payoffs of these stocks. Assume X1 ∼ N (1, 1)
and X2 ∼ N (0, 1). It is known that the cumulative distribution of of a normal
distribution with mean µ and deviation σ is given by
Z x
x−µ
1
1
√
exp (−t2 )dt. (3.10)
1 + erf
, where erf(x) := √
F (x) =
2
π −x
σ 2
We have that
1
x−1
√
F1 (x) =
1 + erf
2
2
x
1
F2 (x) =
1 + erf √
2
2
Because the error function is an increasing function, we have that for all x ∈
R F1 (x) ≤ F2 (x). By definition of first order stochastic dominance this implies that X1 ≥SD(1) X2 . We have plotted these cumulative distributions in
figure 3.1.2 together with the line y = 0.05. By definition we have that xcoordinate of intersection of this line with the cumulative distribution of X1 equals
− VaR0.05 (X1 ) = −0.6449. Similarity we find that − VaR0.05 (X2 ) = −1.6449. We
have that VaR0.05 (X1 ) ≤ VaR0.05 (X2 ). From figure 3.1.2 it is clear that this inequality would hold for any α ∈ (0, 1).
Hence we have that
X1 ≥SD(1) X2 ⇒ VaRα (X1 ) ≤ VaRα (X2 ).
42
(3.11)
Figure 3.4: First order stochastic dominance and VaR
We realize the above example is a rather theoretical one. However it does illustrate
an important point. The condition that F1 (x) ≤ F2 (x) for all x ∈ R is very
(2)
(2)
restrictive. A less severe restriction would be that F1 (x) ≤ F2 (x) for all x ∈ R.
Where F (2) denotes the second order distribution.
In [29] it is stated that Value at Risk is in general not consistent with second order
stochastic dominance. This means that in general we have that
X1 ≥SD(2) X2 ; VaRα (X1 ) ≤ VaRα (X2 ).
(3.12)
However in [29] we also find an important exception.
Theorem 3.4. VaR is consistent with second order stochastic dominance when
portfolios’ profits and losses have an elliptical distribution 3 with finite variance
and the same mean.
Proof. Without proof, see [29, Theorem 14].
Again we will not repeat the proof here, but we will construct an example to get
a better understanding of the concept of second order stochastic dominance and
the assumptions made in theorem 3.4.
Consider again two stocks such that there net payoff is given by X1 and X2 respectively. Assume that X1 ∼ N (µ, σ12 ) and X2 ∼ N (µ, σ22 ), with σ1 < σ2 and
denote with F11 and F21 there cumulative distributions. It is known that a normal
distribution is an elliptical distribution.
Then for i = 1 and i = 2 we have that
x−µ
1
1
√
1 + erf
.
(3.13)
Fi (x) =
2
σi 2
3
An n-dimensional random vector R = [R1 , R2 , . . . , Rn ]T has an elliptical distribution if the
density function of R (denoted by f(R)) is represented below with a function φ(·, n)
f (R, θ, Σ) =
1
|Σ|
1
2
φ (R − θ)T Σ−1 (R − θ), n .
Where Σ is an n-dimensional positive definite matrix and θ is an n-dimensional column vector.
43
By assumption we also have that
σ1 < σ2 ⇒
From this we can conclude that
 x−µ
√

 σ1 2 >
x−µ
√ <
σ1 2

 x−µ
√ =
σ 2
1
1
1
√ > √ .
σ1 2
σ2 2
x−µ
√ ,
σ2 2
x−µ
√ ,
σ2 2
x−µ
√
σ2 2
(3.14)
when x > µ.
when x < µ
(3.15)
when x = µ.
Using the fact that the error function erf is

1
1

F1 (x) > F2 (x),
F11 (x) < F21 (x),

 1
F1 (x) = F21 (x),
an increasing function we find that
when x > µ.
when x < µ
when x = µ.
(3.16)
From this we can conclude that we do not have the necessary condition for first
order stochastic dominance. Graphically this is illustrated in figure 3.5 in which
we have taken µ = 0, σ1 = 1 and σ2 = 3. Although the conditions for first order
stochastic dominance is not fulfilled, the condition for second order stochastic
dominance is. The second order distributions are plotted in figure 3.6. We will
have that
X1 ∼ N (µ, σ12 ) and X2 ∼ N (µ, σ22 ), with σ1 < σ2 ⇒ F22 (x) ≥ F12 (x) ∀x ∈ R.
(3.17)
Figure 3.5: First order distirbutions
for X1 ∼ N (0, 1) and
X2 ∼ N (0, 9).
Figure 3.6: Second order distributions
for X1 ∼ N (0, 1) and
X2 ∼ N (0, 9).
To see why 3.17 is true remember that for i = 1 and i = 2 we have by definition
that:
Z x
2
Fi (x) =
Fi1 (u)du.
(3.18)
−∞
F11
Using equations 3.16 we see that
and F21 intersect each other for x = µ. Using
the assumption that both X1 and X2 have the same mean and the properties of the
44
normal distribution we see that this intersection will happen for F11 (µ) = F22 (µ) =
0.5. When calculating the second order distribution, we in fact calculate the area
under the first order distribution form −∞ to some point x.
We know from 3.16 that for x < µ we have that F22 (x) > F12 (x). Hence when
calculating the second order distribution until some x < µ we accumulate extra
area, this difference in area is labelled A in figure 3.5. We can also clearly see this
accumulating effect in figure 3.6 where the difference between F22 and F12 grows
until x = µ. In this same figure we also notice that after this point the difference
between F22 and F12 decreases again, but it never becomes negative. This effect is
a result from the fact that for x > µ we have that F11 (x) > F21 (x) which implies
that the access area between F21 (x) and F11 (x) is negative for x > µ. Using the
symmetry property of the normal distribution we see that
Z +∞
Z µ
1
1
F2 (x) − F1 (x) dx = −
F21 (x) − F11 (x) dx
(3.19)
−∞
µ
Graphically this means that in figure 3.5 the area A and B are the same, but they
have a different sign. Hence when integrating the first order distributions form
−∞ to some point x, one first accumulates the extra area A, and then loses part
of this excess area when x > µ. However it is impossible to lose more than the
already accumulated excess area because the absolute value of the area A is the
same as that of area B, a fact which follows from equation 3.19.
Hence we conclude that 3.17 holds and by definition of second order stochastic
dominance we have that:
X1 ∼ N (µ, σ12 ), X2 ∼ N (µ, σ22 ) with σ1 < σ2 ⇒ X1 ≥SD(2) X2 .
(3.20)
This conclusion coincides with our intuition that the riskiest of two portfolios X1
and X2 , with the same expected return, but with different variance, is the portfolio
with the largest variance.
Applying theorem 3.4 we find that
X1 ∼ N (µ, σ12 ), X2 ∼ N (µ, σ22 ) with σ1 < σ2 ⇒ VaRα (X1 ) ≤ VaRα (X2 ). (3.21)
We want to stress that in this previous example the fact that X1 and X2 had
the same mean, is a crucial assumption. If this constraint would not be fulfilled
it would not be guaranteed that Value at Risk is consistent with second order
stochastic dominance.
We can conclude that although Value at Risk has an easy definition, it also has a
lot of shortcomings both from a mathematical point of view and from a decision
theoretic point of view.
3.2
Expected shortfall
In this section we will look at an improvement of Value at Risk called Expected
shortfall. The theorems and definitions used in this section are taken from [1].
45
Definition 3.4. Assume E [X − ] < +∞. Then the expected shortfall at a level
α ∈ (0, 1) is defined as
1
ESα (X) = − (E [XI(X ≤ xα )] + xα (α − P (X ≤ xα ))) .
α
Where I(·) denotes the indicator function.
(3.22)
An interesting representation of expected shortfall is the integral representation.
Theorem 3.5. If X is a real valued random variable on the probability space
(Ω, F, P ) with E [X − ] < +∞ and α ∈ (0, 1) is fixed, then
Z
Z
1 α
1 α
ESα (X) = −
xu du = −
qu (X)du.
(3.23)
α 0
α 0
Proof. Without proof, see [1].
At this point we would like to mention that sometimes the definition
TCEα := −E [X|X ≤ − VaRα (X)]
(3.24)
is used as a synonym of expected shortfall. When the distribution of X is continuous, this definition is equivalent to the definition of expected shortfall given
by equation 3.22, see [1]. However in general the equality TCEα (X) = ESα (X)
does not hold. The risk measure defined in equation 3.24 is known as upper tail
conditional expectation. It is stated in [1] that it is not guaranteed that this
risk-measure is coherent, because it sometimes lacks the sub-additivity property.
3.2.1
General properties
The most important property of expected shortfall is that it is a coherent
measure. This is an improvement upon Value at Risk since Value at Risk
not even convex. To prove this we will use the alternative characterisation
coherent risk measure described in remark 1.1 of the first chapter. Here we
that we need to show that for all X, Y and α ∈ (0, 1) we have that:
risk
was
of a
find
1. (Positivity) X ≥ 0 ⇒ ESα (X) ≤ 0
2. (Positive homogeneous) ∀λ > 0 ESα (λX) = λ ESα (X)
3. (Translation invariant) ∀m ∈ R ESα (X + m) = ESα (X) − m.
4. (Sub-additivity) ESα (X + Y ) ≤ ESα (X) + ESα (Y )
From all these properties the sub-additivity property is the most difficult to prove.
We will work out the sub-additivity proof from [1]. For this we need to define
following function.
46
(
I (X ≤ x) ,
I α (X ≤ x) :=
I (X ≤ x) +
α−P (X≤x)
I
P (X=x)
if P (X = x) = 0
(X = x) , if P (X = x) > 0
(3.25)
In [1] we find following lemma.
Lemma 3.1. We have following equalities
1. I α X ≤ x(α) ∈ [0, 1]
2. E I α X ≤ x(α) = α
3.
1
E
α
α
XI X ≤ x(α) = − ESα (X)
We will use this lemma to prove following lemma.
Lemma 3.2.
(
I α Z ≤ z(α) − I α X ≤ x(α) ≥ 0, if X > x(α)
I α Z ≤ z(α) − I α X ≤ x(α) ≤ 0, if X < x(α)
(3.26)
Proof. If X > x(α) or if X < x(α) we have that P [X = xα ] = 0. Using definition
3.25 we have that I α X ≤ x(α) = I X ≤ x(α) . Hence we have
(
I α X ≤ x(α) = 0, if,X > x(α)
I α X ≤ x(α) = 1, if X < x(α) .
From 3.1 we have that I α Z ≤ z(α) ∈ [0, 1]. Hence we can conclude that
(
I α Z ≤ z(α) − I α X ≤ x(α) ≥ 0, if X > x(α)
I α Z ≤ z(α) − I α X ≤ x(α) ≤ 0, if X < x(α)
Which is what we needed to prove.
Theorem 3.6. Expected shortfall is a coherent risk measure.
Proof.
1. (Positivity)
Take X ≥ 0 then for all α ∈ (0, 1) we have that qα (X) ≥ 0. Using the
integral representation of expected shortfall we find that
Z
1 α
ESα (X) = −
qu (X)du ≤ 0.
(3.27)
α 0
This proves the positivity property.
47
2. (Positive Homogeneity) Take λ > 0 then for all α ∈ (0, 1) we have that
q(α) (λX) = inf {x ∈ R|P (λX ≤ x) ≥ α}
n
o
x
= inf x ∈ R|P X ≤
≥α
λ
= inf {λx ∈ R|P (X ≤ x) ≥ α}
= λ inf {x ∈ R|P (X ≤ x) ≥ α}
= λq(α) (X).
Using the integral representation of expected shortfall we get that
Z
1 α
ESα (λX) = −
q(u) (λX)du
α 0
Z
1 α
=−
λq(u) (X)du
α 0
= λ ESα (X).
This proves positive homogeneity.
3. (Translation invariance)
Let m ∈ R then we have for all α ∈ (0, 1)
q(α) (X + m) = inf {x ∈ R|P (X + m ≤ x) ≥ α}
= inf {x ∈ R|P (X ≤ x − m) ≥ α}
= inf {x + m ∈ R|P (X ≤ x) ≥ α}
= inf {x ∈ R|P (X ≤ x) ≥ α} + m
= q(α) (X) + m.
Using the integral representation of expected shortfall we have that
Z
1 α
q(u) (X + m)du
ESα (X + m) = −
α 0
Z
1 α
=−
(q(u) (X) + m)du
α 0
Z
Z
1 α
m α
=−
q(u) (X)du −
du
α 0
α 0
Z
1 α
=−
q(u) (X)du − m
α 0
= ESα (X) − m
4. (Sub-additivity)
Take X and Y then we need to show that following inequality holds.
ESα (X) + ESα (Y ) − ESα (X + Y ) ≥ 0.
48
(3.28)
Let Z := X + Y and take
α > 0. From lemma 3.1 we have that α ESα (X) =
(α)
−E XI (X ≤ x(α) ) . We find that:
α (ESα (X) + ESα (Y ) − ESα (Z))
=E ZI (α) (Z ≤ z(α) ) − XI (α) (X ≤ x(α) ) − Y I (α) (Y ≤ y(α) ) .
Using the fact that Z = X + Y , we can rewrite this as
E X I (α) (Z ≤ z(α) ) − I (α) (X ≤ x(α) +E Y I (α) (Z ≤ z(α) − I (α) (Y ≤ y(α) ) .
(3.29)
Now we use lemma 3.2 to obtain following inequalities
≥ xα E I (α) Z ≤ z(α) − I (α) X ≤ x(α)
E X I (α) Z ≤ z(α) − I (α) X ≤ x(α)
E Y I (α) Z ≤ z(α) − I (α) Y ≤ y(α)
≥ yα E I (α) Z ≤ z(α) − I (α) Y ≤ y(α)
We conclude that
E X I (α) (Z ≤ z(α) ) − I (α) (X ≤ x(α) ) + E Y I (α) (Z ≤ z(α) − I (α) (Y ≤ y(α) ))
≥ xα E I (α) Z ≤ z(α) − I (α) X ≤ x(α) + yα E I (α) Z ≤ z(α) − I (α) Y ≤ y(α)
= x(α) (α − α) + y(α) (α − α)
= 0.
We conclude that expected shortfall satisfies the sub-additivity property.
3.2.2
Consistency with expected utility maximisation
We will now look at the consistency of expected shortfall with expected utility
maximisation. For this we will need a result from [19, Theorem 5’] where we find
following theorem.
Theorem 3.7. Let qα (X1 ) and qα (X2 ) be quantiles of X1 and X2 respectively, then
following expressions are equivalent
1. X1 ≥SD(2) X2
Rα
Rα
2. 0 qu (X1 )du ≥ 0 qu (X2 )du for all α ∈ [0, 1] and a strict inequality holds for
some α.
Proof. without proof, see [19, Theorem 5’].
In [29] we find following theorem with a proof based on theorem 3.7 and the integral
representation of expected shortfall.
Theorem 3.8. Expected shortfall is consistent with second-order stochastic dominance.
49
Proof. By definition of second order stochastic dominance we need to show that
for all α ∈ (0, 1)
X1 ≥SD(2) X2 ⇒ ESα (X1 ) ≤ ESα (X2 ).
(3.30)
From theorem 3.7 we have that
Z
X1 ≥SD(2) X2 ⇒
α
Z
α
qu (X1 )du ≥
qu (X2 )du
0
Z α
Z
1
1 α
⇒−
qu (X1 )du ≤ −
qu (X2 )du
α 0
α 0
⇒ ESα (X1 ) ≤ ESα (X2 ).
0
This concludes the proof.
The fact that expected shortfall is consistent with second order stochastic dominance means that if all risk averse and non-saturated investors prefer X1 to X2 ,
then the expected shortfall of X1 is lower than the expected shortfall of X2 . This
theorem shows that expected shortfall is not only an improvement upon Value at
Risk from a mathematical point of view but also from an economic point of view.
We would like to point out that the condition that all risk averse investors prefer
X1 to X2 is a rather severe one. When this condition is not fulfilled consistency
with expected utility maximisation cannot be guaranteed.
To illustrate the severity of the assumption of second order stochastic dominance
we’ll give a numerical example. Consider two investors A and B with following
utility functions.
1 − exp (−0.02x)
0.02
√
uB (x) := 1 + x − 1 + x2
uA (x) :=
(3.31)
(3.32)
Notice that both utility functions are increasing and concave. Consider two portfolio’s X1 and X2 such that their net payoffs are given by


p = 0.99
2,
X1 = −25, p = 0.0075
(3.33)


−50, p = 0.0025
and


p = 0.55
5,
X2 = 2,
p = 0.44


−75, p = 0.01.
(3.34)
We have calculated the expected utility for both investors as well as the expected
shortfall at a 0.01 level of both portfolios. These results are summarised in table
3.2.2. We notice that on the basis of expected utility investor A prefers X2 to
X1 while investor B prefers X1 to X2 . We conclude that second order stochastic
dominance cannot order X2 and X1 .
50
Table 3.1: summery results
Expected utility A
Expected utility B
ES0.01
X1
X2
conclusion
1.48
0.14
31.25
1.74
-0.66
75
X2 <A X1
X1 <B X2
ES0.01 (X2 ) ≥ ES0.01 (X1 )
Definition 3.5. Assume E [X − ] < +∞ Then the conditional value at risk at level
α of X is defined as
1 −
E (X − s) − s .
(3.35)
CVaRα (X) = inf
s∈R
α
In [1, Corollary 4.3] it is stated that under a mild integrability condition, expected
shortfall and conditional value at risk are the same object. More formally following
theorem is stated.
Theorem 3.9. Let X be a real integrable random variable on some probability
space (Ω, F, P ) and α ∈ (0, 1) be fixed. Then
ESα (X) = CVaRα (X)
(3.36)
Proof. Without proof, see [1, Corollary 4.3].
It is interesting to notice that CVaRα can be rewritten in the form of an optimised
certainty equivalent.
1 −
CVaRα (X) = inf
E (X − s) − s
s∈R
α
1 −
= inf − s − E (X − s)
s∈R
α
−1
−
= − sup s + E
(X − s)
α
s∈R
= − OCEu (X).
Where u(x) = −1
max(0, −x). Notice that u(0) = 0 and 1 ∈ ∂(0) and that u(x) is
α
increasing. Furthermore because 0 < α < 1 we have that u is a concave function.
−1
max(0, −λx − (1 − λ)y)
α
−1
−1
max(0, −λx) +
max(0, −(1 − λ)y)
≥
α
α
−1
−1
=λ
max(0, −x) + (1 − λ)
max(0, −y)
α
α
= λu(x) + (1 − λ)u(y).
u(λx + (1 − λ)y) =
51
To get a better understanding of this utility function we have plotted it in figure
3.2.2). We notice that, locally, the investor is risk neutral because the utility
function is a piecewise linear function.
The interpretation of expected shortfall as the optimised certainty equivalent of
max(0, −x) reveals a potential
an investor with the utility function u(x) = −1
α
criticism. When an investor with this utility function knows he will lose money,
he is indifferent between an uncertain loss X and a certain loss E [X]. And when
this investor knows he will gain money, his utility score does not depend upon the
amount he eventually gains.
Figure 3.7: Utility function for CVaRα with α = 0.05.
52
4
Utility based risk measures
In this chapter we will discuss how utility functions can be incorporated in financial
risk measures. The stochastic variable X will model, as always, the net payoffs
of the portfolio. We will again assume that X ∈ L∞ (Ω, F, P ) and that X can
take positive as well as negative values. When studying risk, we are especially
interested in the losses of our portfolio. Instead of using the utility function u to
study these losses we will use the associated loss function l defined as:
l(x) = −u(−x).
(4.1)
Non-saturated and risk averse investors are modelled using increasing and concave
utility functions. Using the relation 4.1 we can see that this implies that their
associated loss function will be increasing and convex.
4.1
Utility based shortfall risk measures
The first class of utility based risk measures that we will discuss was introducted
in [13] and [11]. These risk measures are called utility based shortfall risk and can
be constructed using the notion of a loss function. In this section we will explain
the construction of this utility based risk measure and show the link to the u-Mean
certainty equivalent.
Definition 4.1. A function l : R → R is called a loss function if it is increasing
and not identically constant.
Loss functions can induce risk measures in a natural way using the notion of a
acceptance set. Take x0 in the interior of l (R). We can now define the following
acceptance set:
A : = {X ∈ L∞ (Ω, F, P ) |E [l(−X)] ≤ x0 }
= {X ∈ L∞ (Ω, F, P ) |E [−u(X)] ≤ x0 }
= {X ∈ L∞ (Ω, F, P ) |E [u(X)] ≥ −x0 }.
54
Hence a position X is acceptable if the expected utility of it is larger than a given
amount, or equivalently if the expected loss is smaller than a given amount. Using
this acceptance set we are able to define the risk measure associated with it.
ρA (X) : = inf{m ∈ R|m + X ∈ A}
= inf{m ∈ R|E [l(−X − m)] ≤ x0 }
= inf{m ∈ R|E [u(X + m)] ≥ −x0 }.
The risk measure defined above is called utility based shortfall risk and we will
denote it with SF lx0 (X). In [5, p. 473] a link between utility based shortfall risk
measures and u-Mean certainty equivalents is mentioned. We will derive this link
here.
SF lx0 (X) = inf{m ∈ R|E [l(−X − m)] ≤ x0 }
= − sup{−m ∈ R|E [l(−X − m)] ≤ x0 }
= − sup{m ∈ R|E [l(−X + m)] ≤ x0 }
= − sup{m ∈ R|E [u(X − m)] ≥ −x0 }
= − sup{m ∈ R|E [ũ(X − m)] ≥ 0}
= −Mũ (X)
Where ũ(x) = u(x) + x0 = −l(−x) + x0 . We conclude that
SF lx0 (X) = −Mũ (X),
with ũ(x) = u(x) + x0 = −l(−x) + x0 .
(4.2)
The value of x0 has an influence on the utility based shortfall risk. Suppose x1 ≥ x0
and take m ∈ {m ∈ R|E [u (X + m)]+x0 ≥ 0}. Then we have that E [u (X + m)]+
x1 ≥ E [u (X + m)]+x0 ≥ 0. Hence m ∈ {m ∈ R|E [u (X + m)]+x1 ≥ 0}. We can
conclude that {m ∈ R|E [u (X + m)]+x0 ≥ 0} ⊂ {m ∈ R|E [u (X + m)]+x1 ≥ 0}.
This implies that inf{m ∈ R|E [u (X + m)]+x0 ≥ 0} ≥ inf{m ∈ R|E [u (X + m)]+
x0 ≥ 0}. We conclude that
x1 ≥ x0 ⇒ SF lx1 (X) ≤ SF lx0 (X).
(4.3)
In [11, p247] it is stated without proof that SF lx0 (X) is a monetary risk measure
and that if l is a convex loss function this risk measure is convex. In this thesis
we’ll prove these claims.
Theorem 4.1. The utility based shortfall risk measure
SF lx0 (X) = inf{m ∈ R|E [l(−X − m)] ≤ x0 }
(4.4)
is a monetary risk measure.
Proof. To prove that SF lx0 (X) is a monetary risk measure we will prove that it
satisfies the monotonicity property and the translation invariance property.
1. (Monotonicity) Without loss of generality we can assume that X ≤ Y . We
have that
X ≤ Y ⇒ −X − m ≥ −Y − m
⇒ E [l(−X − m)] ≥ E [l(−Y − m)] .
55
Now take m ∈ {m ∈ R|E [l(−X − m)] ≤ x0 }, then we have that
E [l(−Y − m)] ≤ E [l(−X − m)] ≤ x0 .
From this we conclude that m ∈ {m ∈ R|E [l(−Y − m)] ≤ x0 }. Hence we find
that
{m ∈ R|E [l(−X − m)] ≤ x0 } ⊂ {m ∈ R|E [l(−Y − m)] ≤ x0 }.
We can conclude that
inf{m ∈ R|E [l(−X − m)] ≤ x0 } ≥ inf{m ∈ R|E [l(−Y − m)] ≤ x0 }.
This proves that if X ≤ Y then SF lx0 (X) ≥ SF lx0 (Y ).
(Translation invariance) We have that
SF lx0 (X + k) = inf{m ∈ R|E [l(−X − k − m)] ≤ x0 }
= inf{m ∈ R|E [l(−X − (k + m))] ≤ x0 }
= inf{m − k ∈ R|E [l(−X − m)] ≤ x0 }
= inf{m ∈ R|E [l(−X − m)] ≤ x0 } − k
= SF lx0 (X) − k.
This proves the translation invariance property.
We now proof that utility based shortfall risk is a convex risk measure if l is a
convex loss function, or equivalently if u is a concave utility function.
Theorem 4.2. If the loss function l is convex, then the utility based shortfall risk
measure
SF lx0 (X) = inf{m ∈ R|E [l(−X − m)] ≤ x0 }
(4.5)
is a convex risk measure.
Proof. From theorem 1.1 form chapter 1 we know that it is sufficiënt to prove
that the acceptance set A = {X ∈ L∞ (Ω, F, P )|E [l(−X)] ≤ x0 } is convex. Take
X ∈ A, Y ∈ A and λ ∈ [0, 1] randomly. We need to prove that λX +(1−λ)Y ∈ A.
Because the loss function l is convex, we have that E [l(−(λX + (1 − λ)Y ))] ≤
E [λl(−X) + (1 − λ)l(−Y )] = λE [l(−X)] + (1 − λ)E [l(−Y )]. Since X ∈ A and
Y ∈ A we have E [l(−X)] ≤ x0 and E [u(−Y )] ≤ x0 . We can conclude that
E [l(−(λX + (1 − λ)Y ))] ≤ λ(x0 ) + (1 − λ)(x0 ) = x0 . This means that λX + (1 −
λ)Y ∈ A, which is what we needed to prove.
It is now natural to ask whether this utility based shortfall risk measure is a
coherent risk measure. Unfortunately this will not always be the case. I will
demonstrate this using a numerical example. Consider a bond which at time t = 0
costs 100, and will pay 105 at time t = 1. Assume risk free interest rate of 2%, and
a default probability of the bond of 1%. Then the net payoff X of this investment
5
is −100
= −98.04 with probability 1% and 1.02
= 4.90 with probability 99%. If
1.02
56
we use the exponential utility function u(x) = 1 − exp(−x) and take x0 = 0.
Then SF l0 (X) = − Mu (X) = ln E [exp(−X)]. Where the last equality follows
from theorem 2.7 in chapter 2. If this risk measure was coherent it would satisfy
the property of positive homogeneity, i.e. λ ln E [exp(−X)] = ln E [exp(−λX)], for
all λ > 0. However if we take λ = 2, we find that 2 ln E [exp(−X)] = 186.87
and ln E [exp(−2X)] = 191, 47. Which shows that this risk measure is not always
coherent.
From [27, p.101] we have following fact.
Lemma 4.1. A convex function l : R → R is continuous.
From [11, p.248] we have following lemma. We will work out the proof of this
lemma.
Lemma 4.2. If l is a convex loss function, then the equation
E [l (−z − X)] = x0
(4.6)
has a unique solution z = SF lx0 (X).
Proof. Consider a sequence zn with zn ∈ {z ∈ R|E [l(−z − X)] = x0 }1 such
that zn → SF lx0 (X) = inf{z ∈ R|E [l(−z − X)] = x0 } if n → +∞. Then
lim E [l(−zn − X)] = x0 . Now we will use that because X ∈ L∞ (Ω, F, P ),
n→+∞
X is a bounded measurable function. Which means that ∃M ∈ R : ∀ω ∈ Ω
|X(ω)| ≤ M . Because l : R → R is continuous and zn ∈ R then for all n, we have
that
∃M 0 ∈ R, ∀ω ∈ Ω, ∀n ≥ 0 : |l(−zn − X(ω))| ≤ M 0 < +∞
Using bounded convergence we have that E lim l(−zn − X) = x0 . Using the
n→+∞
fact
that l is an increasing
and convex function and thus continuous we have that
E l lim −zn − X
= E l −SF lx0 (X) − X = x0 . Hence SF lx0 (X) is a
n→+∞
solution to 4.6. To show that the solution is unique it is sufficient to notice that
if x0 is an interior point of an increasing, convex and non-identically constant
function l and that l is strictly increasing in (l−1 (x0 ) − , +∞) for some > 0.
Because any solution of 4.6 has to lie in this interval we have that the solution is
unique.
In [11, p.248] we found following theorem and proof.
Theorem 4.3. The utility-based shortfall risk measure SF lx0 (X) is continuous
from below. Hence SF lx0 (X) can be represented in the form
SF lx0 (X) = max EQ (−X) − αmin (Q)
(4.7)
Q∈M1 (P )
Proof. If SF lx0 (X) is continuous from below, representation 4.7 follows directly
from theorem 1.9 from the first chapter. Take a sequence Xn ∈ L∞ (Ω, F, P )
such that Xn % X, point wise. Then SF lx0 (Xn ) & R ∈ R. We need to
1
We know this set is not empty because x0 is assumed to be an internal point of l.
57
show that R = SF lx0 (X). Just as in the proof of the
lemma 4.2 we
above
l
can use bounded convergence to obtain that lim E l(−SF x0 (Xn ) − Xn ) =
n→+∞
E lim l(−SF lx0 (Xn ) − Xn ) . Again using the fact that l is continuous we have
n→+∞
l
E lim l(−SF x0 (Xn ) − Xn ) = E [l(−R − X)]. Because E l(−SF lx0 (Xn ) − Xn ) =
n→+∞
x0 for all n we have that lim E l(−SF lx0 (Xn ) − Xn ) = x0 . This implies that
n→+∞
E [l(−R − X)] = x0 . Since we know that the only solution to equation 4.6 is
SF lx0 (X). We have that R = SF lx0 (X).
Equation 4.2 states the link between the link between the utility based shortfall
risk measure and the u-Mean certainty equivalent. We will work out the idea of
using strong Lagrangian duality proposed in [5] to derive a relation between the
optimised certainty equivalent en the u-Mean certainty equivalent. From [14, p60]
we have following theorem concerning strong Lagrangian duality.
Theorem 4.4. Let X be a non-empty convex set in Rn , Let f : Rn → R and
g : Rn → Rm be convex and h : Rn → Rl be affine. Suppose that the following
constraint is satisfied: There exists a x̂ ∈ X such that g(x̂) < 0 and h(x̂) = 0 with
0 ∈ int(h(X)) where h(X) = {h(x)|x ∈ X}. Then
inf{f (x)|x ∈ X, g(x) ≤ 0, h(x) = 0} = sup{θ(u, v)|u ≥ 0}.
(4.8)
Where θ(u, v) = inf{f (x) + uT g(x) + v T h(x)|x ∈ X}. Furthermore if the infimum
is finite, then sup{θ(u, v)|u ≥ 0} is achieved at (û, v̂) with û ≥ 0. If the infimum
is achieved at x̂, then ûT g(x̂) = 0.
For λ > 0, denote with OCEλu (X) := sup (η + λE [u(X − η)]). By definition we
η∈R
have that:
SF lx0 = inf{η ∈ R|E [l(−X − η)] ≤ x0 }.
Use following translation ˜l(x) = l(x) − x0 , we have that
h
i
l̃
˜
SF 0 = inf{η ∈ R|E l(−X − η) ≤ 0}.
h
i
The Lagrange function for this problem is given by L(λ) = η + λE ˜l(−X − η) .
We want to apply the strong duality theorem.
We have
h
i that f (η) = η is a convex
function. We also have that g(η) := E ˜l(−X − η) is a convex function. This
follows from the convexity of l. To formally prove this, take η1 , η2 ∈ R and
58
t ∈ [0, 1], then we have that
h
i
g(tη1 + (1 − t)η2 ) = E ˜l(−X − tη1 − (1 − t)η2 )
h
i
= E ˜l(−tX − (1 − t)X − tη1 − (1 − t)η2 )
h
i
˜
= E l (t(−X − η1 ) + (1 − t)(−X − η2 ))
h
i
≤ E t˜l(−X − η1 ) + (1 − t)˜l(−X − η2 )
h
i
h
i
˜
˜
= tE l(−X − η1 ) + (1 − t)E l(−X − η2 )
= tg(η1 ) + (1 − t)g(η2 ).
To apply strong Lagrangian duality we need to show that there exists an internal
solution. That is, we need to find a η̂ such that g(η̂) < 0. Because x0 is an
internal point of l(R), 0 is an internal point of ˜l(R). Using the same arguments
ashwith equation
i 4.6 we find that there exists an > 0 such that the equation
˜
E l (−X − η) = − has a solution η̂. From this we can conclude there exists an
i
h
˜
η̂ such that g(η̂) = E l (−X − η̂) = − < 0, which proves the existence of the
internal solution. Denote with ũ the utility function associatedhwith ˜l. Because
i
˜l is a continuous and non-decreasing function, the restriction E ˜l(−X − η) ≤ 0
will be binding. Hence we can assume λ > 0.
We can now apply thestrong duality
theorem.
h
i
˜
Using that θ(λ) = inf η + λE l(−X − η) we find that
η∈R
n
h
i
o
SF l̃0 = inf η ∈ R|E ˜l(−X − η) ≤ 0
h
i
= sup inf η + λE ˜l(−X − η)
λ>0 η∈R
= sup inf (η − λE [ũ(X + η)])
λ>0 η∈R
= sup inf (−η − λE [ũ(X − η)])
λ>0 η∈R
= sup − sup (η + λE [ũ(X − η)])
η∈R
λ>0
= − inf sup (η + λE [ũ(X − η)])
λ>0
η∈R
= − inf (OCEλũ (X)) .
λ>0
We already showed that SF l̃0 = −Mũ . Hence we have that
Mũ = inf (OCEλũ (X)) .
λ>0
From this we conclude that
59
(4.9)
Mu ≤ OCEu (X).
4.2
(4.10)
Divergence risk measures
Apart from utility based shortfall risk measures there is another way to incorporate utility functions into risk measures. Although less obvious, divergence risk
measures are another example of utility based risk measures. The next section is
devoted to the study of this class of risk measures.
4.2.1
Construction and representation
Divergence risk measures are based on the robust representation of a convex risk
measure, something which we have discussed in the first chapter. The robust
representation of a risk measure has the following form
ρ(X) =
sup
(EQ [−X] − α(Q)) .
(4.11)
Q∈M1 (P )
In this representation we have taken some probabilistic models more seriously than
others using the penalty function α(Q). In divergence based risk measures this
penalty function will be the φ-divergence. We will make following assumptions on
the function φ:
1. φ : R → (−∞, +∞] is a proper2 closed convex function.
2. φ is lower semicontinous.
3. If the effective domain3 is denoted by dom φ, then 1 ∈ int(dom φ).
4. The minimum of φ is 0 which is attained at 1.
The class of functions for which these properties are satisfied will be denoted with
Φ. We will call the function φ a divergence function.
Definition 4.2. For φ ∈ Φ the φ-divergence of the probability measure Q with
respect to P is defined as
R
dQ
dP if Q ∈ M1 (P )
φ
dP
Ω
(4.12)
Iφ (Q|P ) =
+∞,
otherwise
Where
2
3
dQ
dP
denotes the Radon-Nikodym derivative.
Which means there exists an x ∈ R such that φ(x) < +∞.
The effective domain of the proper function is the set {x|φ(x) < +∞}.
60
Note that if the probability measure Q would not be absolute continuous with
respect to P then the Radon-Nikodym derivative would not be well defined. Using the φ-divergence as a penalty function we can define divergence based risk
measures.
Definition 4.3. The φ-divergence based risk measure is defined as
Dφ (X) =
sup
(E[ − X ]−Iφ (Q|P )) .
(4.13)
Q∈M1 (P )
In what follows we will often use the Legendre transform. This transform is sometimes also called the Fenchel-Legendre transform.
Definition 4.4. The Legendre transform of a convex function l : R → R ∪ {+∞}
is defined as
l∗ (y) := sup (yx − l(x)) ,
y ∈ R.
(4.14)
x∈R
At first sight it might not be clear why divergence risk measures are also utility
based risk measures. However, it turns out that divergence risk measures are in fact
negative optimised certainty equivalents. The negative of the optimised certainty
equivalent can be viewed as the dual optimisation problem of the divergence risk
measure. i.e for u(x) = −φ∗ (−x) we have that
sup
(EQ [ − X ]−Iφ (Q|P )) = − sup (η + E [u (X − η)]) .
(4.15)
η∈R
Q∈M1 (P )
We want to make the remark that in the optimisation problem on the left hand
side of 4.15 we optimise over an infinite dimensional space, while on the right hand
side the optimisation happens over a finite dimensional space. In [5] the authors
use strong Lagrangian duality to obtain this link. However we are not convinced
that they checked all necessary assumptions to conclude that strong duality holds.
Therefore we have added the assumption that φ is a lower semicontinuous function,
an assumption also made in [11, p.256]. Using this extra assumption and the ideas
proposed in [5] we have reworked the proof of 4.15. Instead of using strong Lagrangian duality we will use the closely related concept of Fenchel duality to prove
this connection. Using this type of duality explains why the Fenchel-Legendre
transformation turns up in some of the equations. We will need the concept of the
core of a set.
Definition 4.5. If X is a normed space then the core of a set A ⊂ X is defined
by x ∈ core(A) if for each h ∈ {x ∈ X|kxk = 1} there exists an δ > 0 such that
x + th ∈ A for all 0 ≤ t ≤ δ.
Lemma 4.3. If A is a set then int(A) ⊂ core(A).
Proof. without proof, see [9].
Our proof will be based on following duality theorem regarding Fenchel duality
with equality constraints. From [7, Corollary 1.3] we have that
61
Theorem 4.5. (Fenchel Duality theorem for linear constraints) Let X and Y be
Banach spaces. Given any f : X → (−∞, +∞]. Any bounded map A : X → Y
any element b ∈ Y . The following weak duality holds:
inf {f (x)|Ax = b} ≥ sup {hb, µi − f ∗ (A∗ µ)}.
x∈X
(4.16)
µ∈Y ∗
If f is lower semicontinuous and b ∈ core(A dom f ), then we have equality. And
the supremum is attained if finite.
In [16] we found the following Fatou property which states that
Theorem 4.6. (Fatou property)
Let g, fn for n ∈ N be measurable functions such
R
that fn ≥ g for all n and gdµ > −∞, then
Z
Z
lim inf fn dµ ≥
lim inf fn dµ.
(4.17)
n→∞
n→∞
We also have that
Theorem 4.7. Let Ω be a σ-finite measure space, and X := Lp (Ω, F, P ), p ∈
[1, +∞]. Let g : R × Ω → R(−∞, +∞] be a normal integrand, and define on X the
integral function Ig (x) := Ω g (x(ω), ω) dP (ω). Then,
Z
Z
inf g(s, ω)dP (ω),
(4.18)
inf
g(x(ω), ω)dP (ω) =
x∈X
Ω s∈R
Ω
provided the left-hand side is finite. Moreover,
x̄ ∈ arg min Ig (x) ⇔ x̄(ω) ∈ arg min Ig (s, ω), a.e.
x∈X
s∈R
(4.19)
Proof. Theorem from [5, p20].
Theorem 4.8. Let f : R × Ω → (−∞, +∞]. If f (·, ω) is (convex) and closed for
almost all ω, and measurable in ω for each x such that dom f (·, ω) has a non-empty
interior for every ω, then f is a normal (convex) integrand.
Proof. Without proof, see [5].
Theorem 4.9. For all p ∈ [1, ∞] the spaces Lp (Ω, F, P ) are Banach spaces
Proof. Without proof, see [11, p207].
Before we prove theorem 4.15 we will prove some lemma’s which will make the
final proof easier.
LemmaR 4.4. If z ∈ L1 (Ω, F, P ) then the functional B : L1 → R defined by
B(z) = Ω z(ω)dP (ω) is continuous and linear.
Proof. Take z1 , z2 ∈ L1 (Ω, F, P ). We need to show that ∀ > 0 ∃δ > 0 such
that
if kz1 − z2 kL1 < δ then |B(z1 ) − B(Z2 )| < . We have that kz1 − z2 kL1 =
R
|z (ω) − z2 (ω)|dP (ω) < δ. We can now conclude that
Ω 1
62
Z
Z
Z
|B(z1 )−B(Z2 )| = z1 (ω)dP (ω) − z2 (ω)dP (ω) ≤
|z1 (ω)−z2 (ω)|dP (ω) < δ.
Ω
Ω
Ω
(4.20)
Hence for each > 0 we can pick δ such that δ = .
The linearity of the functional B follows from the fact that the Lebegues integral
is linear.
A standard result from functional analysis yields that a linear operator between
normed spaces is bounded if and only if it is a continuous linear operator. From
this we can conclude that B is a bounded functional.
Lemma 4.5. If A and B are sets such that A ⊂ B then int(A) ⊂ int(B)4
Proof. take a ∈ int(A) then there exists an environment U of a with U ⊂ A.
Because A ⊂ B we have U ⊂ B. Hence U is an environment of a in B. we
conclude that a ∈ int(B). Because a was chosen randomly, we can conclude that
int(A) ⊂ int(B).
∞
1
Lemma 4.6.
R If X ∈ L (Ω, F, P ), then the function g : L (Ω, F, P ) → R defined
by g(z) := Ω X(ω)z(ω)dP (ω) is continuous.
Proof.R Take z1 , z2 ∈ L1 (Ω, F, P ), such that for δ > 0 kz1 − z2 kL1 < δ. This means
that Ω |z1 (ω) − z2 (ω)|dP (ω) < δ. Then we have that
Z
|g(z1 ) − g(z2 )| = X(ω) (z1 (ω) − z2 (ω)) dP (ω)
ZΩ
≤
|X(ω) (z1 (ω) − z2 (ω))| dP (ω)
Ω
Z
≤ sup |X(ω)| |z1 (ω) − z2 (ω)|dP (ω)
ω
Ω
< sup |X(ω)|δ
ω
Because X ∈ L∞ (Ω, F, P ) we have sup |X(ω)| < +∞. Hence if we pick δ =
sup |X(ω)|
ω
> 0 then kz1 − z2 kL1 < δ implies |g(z1 ) − g(z2 )| < .
ω
We can now prove the main theorem of this section.
Theorem 4.10. Let φ ∈ Φ and let X ∈ L∞ (Ω, F, P ). Then
inf
Q∈M1 (P )
4
(EQ [X] + Iφ (Q|P )) = sup (η − EP [φ∗ (η − X)]) .
η∈R
int(A) denotes the interior of the set A.
63
(4.21)
Therefore with u(t) := −φ∗ (−x), we have
OCEu (X) =
inf
(EQ [X] + Iφ (Q|P ))
Q∈M1 (P )
=−
(EQ [−X] − Iφ (Q|P ))
sup
(4.22)
(4.23)
Q∈M1 (P )
= −Dφ (X).
Proof. Take φ in Φ. Let v :=
inf
Q∈M1 (P )
(4.24)
(4.25)
(EQ [X] + Iφ (Q|P )). Now fix Q ∈ M1 (P ).
Then by definition of M1 (P ), Q is absolute continuous with respect to P . Using the Radon-Nikodym theorem we have that this is equivalent with the existence of a density z(ω) := dQ(ω)
. We have that z ≥ 0 a.e. and it is clear that
dP (ω)
R
|z(ω)|dP (ω) = 1. Hence we have that z(ω) ∈ L1 (Ω, F, P ).
Ω
v=
inf
(EQ [X] + Iφ (Q|P ))
dQ
= inf
EQ [X] + EP φ
Q∈M1 (P )
dP
Z
Z
Z
z(ω) = 1, z ≥ 0 a.e.
φ (z(ω)) dP (ω) + X(ω)z(ω)dP (ω) = inf1
z∈L
Ω
Ω
Ω
Z
Z
Z
φ (z(ω)) dP (ω) + X(ω)z(ω)dP (ω) z(ω) = 1 .
= inf1
Q∈M1 (P )
z∈L
Ω
Ω
Ω
The last equality follows from the fact that if z(ω) < 0 for a set S ⊂ Ω with
P (S) > 0 then z can not correspond to the Radon-Nikodym derivative of a certain probability
measure Q with respect to P . By definition of φ-divergence we
R
Rhave that Ω φ (z(ω)) dP (ω) = +∞. R Furthermore we always have that −∞ <
X(ω)z(ω)dP (ω) ≤ supω∈Ω |X(ω)| Ω z(ω)dP (ω) < +∞. From
all this it folΩ
R
lows
that if z(ω) < 0 for a set S ⊂ Ω with P (S) > 0 then Ω φ (z(ω)) dP (ω) +
R
X(ω)z(ω)dP
(ω) = +∞. Therefore we can conclude that the last equality holds.
Ω
We want to apply theorem 4.5 regarding Fenchel duality for linear constraints. In
the context of this theorem let f : L1 (Ω, F, P ) → (−∞, +∞] and defined by
Z
Z
f (z) :=
φ(z(ω))dP (ω) + X(ω)z(ω)dP (ω).
(4.26)
Ω
Ω
1
and let A : L (Ω, F, P ) → R:
Z
A(z) :=
z(ω)dP (ω).
(4.27)
Ω
Then A is linear because the Lebegues integral is linear. It is bounded because it
is also a continuous functional, see lemma 4.4. Let b = 1 and note that R∗ = R.
We will now calculate
d := sup (hb, µi − f ∗ (A∗ µ)) .
µ∈R
64
(4.28)
We have that
f ∗ (A∗ µ) = sup (hA∗ µ, zi − f (z))
z∈L1
= sup (hµ, Azi − f (z))
z∈L1
Z
Z
Z
= sup µ z(ω)dP (ω) − φ(z(ω))dP (ω) − X(ω)z(ω)dP (ω)
z∈L1
Ω
Ω
ZΩ
Z
= sup − φ(z(ω))dP (ω) + (µ − X(ω))z(ω)dP (ω)
z∈L1
Ω
ZΩ
Z
= − inf1
φ(z(ω))dP (ω) − (µ − X(ω))z(ω)dP (ω)
z∈L
Ω
Ω
Z
= − inf1
(φ(z(ω)) − (µ − X(ω))z(ω)) dP (ω) .
z∈L
Ω
We now want to apply theorem 4.7. To be able to apply this theorem we first need
to check that I(s, ω) := φ(s) − (µ − X(ω))s is a normal integrand. For this we
can use lemma 4.8. I(s, ω) is convex and closed in s for almost all ω because φ is
convex and closed. 1 ∈ int(dom I(·, ω)) for every ω, because 1 ∈ int(dom φ).
We also need to prove that
Z
(φ(z(ω)) − (µ − X(ω))z(ω)) dP (ω)
(4.29)
inf1
z∈L
Ω
is finite. By assumption the minimum of φ is 0 which is attained at 1. Hence we
have for all z.
Z
−∞<
Z
(µ − X(ω)) dP (ω) ≤
Ω
(φ(z(ω)) − (µ − X(ω))z(ω)) dP (ω).
(4.30)
Ω
The
first strict inequality follows from the fact that EP [X] is finite. Hence
R
(φ(z(ω)) − (µ −X(ω))z(ω)) dP (ω) is bounded from below.
Which implies that
Ω
Z
that −∞ < inf1
(φ(z(ω)) − (µ − X(ω))z(ω)) dP (ω) . Because z = 1 is a
z∈L
Ω
possible solution we have
Z
Z
inf1
(φ(z(ω)) − (µ − X(ω))z(ω)) dP (ω) ≤ − (µ − X(ω)) dP (ω) < +∞.
z∈L
Ω
Ω
We can conclude that
Z
(φ(z(ω)) − (µ − X(ω))z(ω)) dP (ω)
inf
z∈L1
Ω
is finite and that we can apply theorem 4.7.
We find that
65
∗
Z
∗
f (A µ) = − inf (φ(s) − (µ − X(ω))sdP (ω)
s∈R
Z Ω
=
sup ((µ − X(ω))s − φ(s)) dP (ω)
Ω s∈R
Z
=
φ∗ (µ − X(ω)) dP (ω).
Ω
Using the fact that b = 1 we can conclude that
d = sup (µ − f ∗ (A∗ µ))
µ∈R
Z
∗
= sup µ − φ (µ − X(ω)) dP (ω) .
µ∈R
Ω
Using theorem 4.5 we can conclude that we have weak duality which means that
Z
z(ω) = 1
φ (z(ω)) dP (ω) + X(ω)z(ω)dP (ω) inf
z∈L1
Ω
Ω
Ω
Z
∗
≥ sup µ − φ (µ − X(ω)) dP (ω) .
Z
µ∈R
Z
Ω
We now want to show that the equality holds. We need to show that 1 ∈
core(A dom f ), this will follow from the assumption that 1 ∈ int(dom φ). We
will first show that dom φ ∩ R ⊂ A dom f . Take w ∈ dom φ ∩ R. RThen by definition
of the effective domain M := φ(w) < +∞ and we have that Ω φ(w)dP (ω) +
R
X(ω)wdP (ω) = M + wEP [X] < +∞. Hence w ∈ dom f . Because Aw = w
Ω
we have w ∈ A dom f . By assumption we have 1 ∈ int(dom φ ∩ R). Using lemma
4.5 we can conclude that 1 ∈ int(A dom f ). Using lemma 4.3 we conclude that
1 ∈ core(A dom f ).
We also need to show that f is lower semicontinuous. Which means we need to
prove that
lim inf f (z) ≥ f (z0 ).
zn →z0
(4.31)
R
R
Denote with h(z) := Ω φ(z(ω))dP (ω) and with g(z) := Ω X(ω)z(ω)dP (ω). Then
f (z) = h(z) + g(z).
Using the sub-additivity property of limit inferior we find that
lim inf f (z) ≥ lim inf h(z) + lim inf g(z).
zn →z
zn →z
zn →z
We know from lemma 4.6 that g(z) is continuous. This implies that g(z) is lower
semicontinuous and we can conclude that lim inf g(z) ≥ g(z0 ).
zn →z
For each sequence zn ∈ L1 , we can define a sequence φn (ω) := φ(zn (ω)) such that
φ0 (ω) := φ(z(ω)). Then
66
Z
Z
φ(zn (ω))dP (ω) = lim inf
lim inf
zn →z0
n→∞
Ω
φn (ω)dP (ω).
Ω
We have assumed that Rthe minimum of φ is 0. Therefore we have that φn (ω) =
φ(zn (ω)) ≥ 0. Because Ω 0dP (ω) = 0 and φn are measurable functions because φ
is a measurable function5 we can use Fatou’s lemma.
We have that
Z
Z
lim inf
φn (ω)dP (ω) ≥
lim inf φn (ω)dP (ω)
n→∞
n→∞
Ω
ZΩ
=
lim inf φ(zn (ω))dP (ω)
zn →z0
Ω
Z
≥
φ(z0 (ω))dP (ω)
Ω
= h(z0 ).
Where in the last inequality we have used that φ is lower semicontinuous. We find
that
lim inf f (z) ≥ lim inf h(z) + lim inf g(z) ≥ h(z0 ) + g(z0 ) = f (z0 ). (4.32)
zn →z0
zn →z0
zn →z−0
Which proves the lower semicontinuity of f .
We can conclude that
Z
inf1
L
Z
φ (z(ω)) +
Ω
Ω
Z
Z
∗
X(ω)z(ω)dP (ω) z(ω) = 1 = sup µ − φ (µ − X(ω)) dP (ω) .
µ∈R
Ω
Ω
(4.33)
Which means that
inf
Q∈M1 (P )
(Iφ (Q|P ) + EP [X]) = sup (µ − E [φ∗ (µ − X)]) .
(4.34)
µ∈R
This concludes the proof.
Using the relationship between optimised certainty equivalents and φ-divergence
risk measures and the relationship between u-Mean certainty equivalents and utility based shortfall risk measures we can derive the robust representation of a utility
based shortfall risk measure. This representation was found in [13]. Their proof
is rather technical and is outside the scope of this thesis. Therefore we will use
the proof suggested in [5] to obtain this result. For this we will first derive some
elementary properties of the Legendre transform. In [11] we found following result.
Theorem 4.11. If φ is a proper convex function which is lower semicontinuous,
then φ∗∗ = φ. I.e
φ(t) = sup (xt − φ∗ (x)) .
(4.35)
x∈R
5
Because it is lower semicontinuous.
67
Proof. Without proof, see [11, p479].
From this result we can conclude that if φ is the divergence function which is linked
to the utility function u by u(t) = −φ∗ (−t), or equivalently to the loss function
by φ∗ (t) = l(t), then φ can be obtained by φ(t) = l∗ (t).
Lemma 4.7. If f is a convex function and let f ∗ denote its Legendre transform
then:
If λ > 0 then (λf )∗ (t) = λf ∗ λt .
Proof.
t
t
∗
(λf ) (t) = sup (xt − λf (x)) = λ sup x − f (x) = λf
λ
λ
x∈R
x∈R
∗
Lemma 4.8. If l is a convex function and define ˜l := l − x0 with x0 ∈ R. Then
˜l∗ = l∗ + x0 .
Proof.
˜l∗ (t) = sup xt − ˜l(x) = sup (xt − l(x) + x0 ) = sup (xt − l(x)) + x0 = l∗ (t) + x0
x∈R
x∈R
x∈R
Theorem 4.12. For any convex loss function l, the minimal penalty function in
the representation (4.7) is given by
1
dQ
min
∗
x0 + EP l λ
, Q ∈ M1 (P ).
(4.36)
α (q) = inf
λ>0 λ
dP
In particular we have
SF lx0 (X)
1
dQ
∗
= max EQ (−X) − inf
x0 + EP l λ
, X ∈ L∞ .
λ>0
Q∈M1 (P )
λ
dP
(4.37)
Proof. By definition we have that
SF lx0 (X) = inf{η ∈ R|EP [l(−X − η)] ≤ x0 }
= inf{η ∈ R|EP [l(−X − η) − x0 ] ≤ 0}
h
i
= inf{η ∈ R|EP ˜l(−X − η) ≤ 0}
= SF l̃0 (X).
Where ˜l = l − x0 . Denote with ũ the associated utility function. From equation
4.2 we have that SF lx0 (X) = −Mũ (X). Using equation 4.9 and theorem 4.10 we
have that
68
Mũ (X) = inf (OCEλũ (X))
λ>0
EQ [X] + Iφ̃ (Q|P )
λ>0 Q∈M1 (P )
dQ
= inf inf
EQ [X] + EP φ̃
λ>0 Q∈M1 (P )
dP
dQ
∗
˜
= inf inf
EQ (X) + EP λl
λ>0 Q∈M1 (P )
λdP
dQ
∗
= inf inf
EQ [X] + EP λx0 + λl
λ>0 Q∈M1 (P )
λdP
dQ
∗
= inf
EQ [X] + inf EP λx0 + λl
λ>0
Q∈M1 (P )
λdP
dQ
∗
= inf
EQ [X] + inf λEP x0 + l
λ>0
Q∈M1 (P )
λdP
dQ
1
∗
= inf
EQ [X] + inf EP x0 + l λ
λ>0 λ
Q∈M1 (P )
dP
dQ
1
∗
= − sup
.
EQ [−X] − inf EP x0 + l λ
λ>0 λ
dP
Q∈M1 (P )
= inf
inf
∗
In
and hence φ̃(t) =
the
∗ forth equality we have used that
∗ φ̃ (t) = −λũ(−t)
t
∗
λ˜l (t). Using 4.7 we have that λ˜l (t) = λ˜l λ . In the fifth equality we
have used lemma 4.8.
Using the relation that SF lx0 = −Mũ (X) we can conclude that
dQ
1
l
∗
.
(4.38)
SF x0 = sup
EQ [−X] − inf EP x0 + l λ
λ>0 λ
dP
Q∈M1 (P )
Which proves the theorem.
We have shown the relation of utility based risk measures with certainty equivallents defined in chapter two. We have also discussed the relation between these
risk measures. We summarize the main results in table 4.2.1.
Table 4.1: summary utility based risk measures
certainty equivalent
SF lx0 (X)
Dφ (X)
−Mũ (X)
with ũ(x) = −l(−x) + x0
− OCEu (X)
with −u(−x) = φ∗ (x)
Utility representation − sup{m ∈ R|E [u(X − m)] ≥ −x0 }
− sup (η + E [u (X − η)])
η∈R
penalty function
inf λ>0 λ1 EP
x0 + l∗ λ dQ
dP
EP φ
dQ
dP
At this point we will take a closer look at the assumptions we have made. One
of these assumptions was that the utility functions we use are normalised. We
69
followed [5] by only considering the subset of utility functions which are nondecreasing and concave and for which u(0) = 0 and 1 ∈ ∂u(0). The authors of
[5] give no clear explanation why they chose this normalisation. However they
do state that they need this normalisation to be able to give a clear economic
interpretation of the optimised certainty equivalent. They interpret the optimised
certainty equivalent as a decision problem and use the utility function to ’discount’
an uncertain payoff. If X is an uncertain payoff, then E [u(X)] is the value of this
payoff. If you give the investor the possibility to consume a part η of this uncertain
income in advance, then he gets η + E [u(X − η)]. The investor tries to optimise
the decision on how much to consume in advance. Using this normalisation they
guarantee that u(x) ≤ x. As we have shown in the second chapter this is equivalent
with OCEu (X) ≤ E [X], a condition which reflects risk aversion. If the investor
consumes to much in advance then u(·) will penalise this. If on the other hand the
investor consumes to little in advance then this can be seen as a missed opportunity.
Al his money (or more) is stuck in the uncertain payoff and since he is risk averse,
this would also be penalized by u(·). The investors optimal allocation results
in the optimised certainty equivalent. Remember that under the von NeumannMorgenstern axioms the utility function of an investor is only unique up to an
affine transformation. This has the undesirable effect that the same investor,
modelled by two different utility functions can have different optimised certainty
equivalents, because the optimised certainty equivalent is not invariant under an
affine transformation of the utility function u. This makes the optimised certainty
equivalent not a ’real’ certainty equivalent.
From an economic point of view the standardisation of the utility functions is
essential to give a clear interpretation to the optimised certainty equivalent, and
to bypass the problem that the optimised certainty equivalent is not invariant
under an affine transformation of the utility function.
From a mathematical point of view however, the dependence of the optimised certainty equivalent on the specific standardization of the utility function is not really
a problem but can be viewed as an opportunity. By wisely choosing a standardisation of the utility function, one can alter the optimised certainty equivalent,
and thus the risk measure. Now a new question occurs: ”What would be a good
standardisation of the utility function from a mathematical point of view?” To
answer this question we will need to further examine the connection between the
utility function and the divergence function.
In the robust representation of the divergence risk measure one can observe that the
divergence is a penalty function. A good penalty function would heavily penalise
models Q which deviate a lot from the fixed model P , while lightly penalising
models which are very close to P . Therefore it would be intuitive to assume that
the divergence penalises the model P the least. That is φ(t) attains its minimum
for t = 1. Using that −u(−x) = φ∗ (t) we have that
u(0) = − sup (0 · t − φ(t))
t∈R
= inf (φ(t)) .
t∈R
Assuming the infimum is attained we can conclude that u(0) is be the minimal
penalty given. In what follows we will assume that u ∈ C 1 . We are interested in
70
what the condition u0 (0) = 1 imposes on the divergence function φ.
Notice that φ(1) = sup (1x + u(−x)). If u0 (0) = 1, then 1 − u0 (−x) = 0 has
x∈R
a solution x = 0. Because u is concave we have that u00 (x) ≤ 0. Therefore
φ(1) = u(0), which again states that the penalty given to P equals u(0).
Hence the standardisation u(0) = 0 and u0 (0) = 1 implies that φ attains its
minimum at 1 and φ(1) = 0.
4.2.2
The coherence of divergence risk measures
We know that divergence risk measures are always convex. This follows easily
from the properties of the optimised certainty equivalent proven in theorem 2.4 of
the second chapter. However divergence risk measures are not always coherent.
In chapter one, theorem 1.7, we have seen that the penalty function of a coherent
risk measure can only take the values 0 or +∞. In the case of divergence risk
measures this would imply that the divergence is either 0 or +∞. This is a rather
restrictive condition. It is now natuaral to ask which utility functions give rise to
coherent divergence risk measures. This question was answered in [5]. For this the
authors considered the class of strongly risk averse utility functions U0< . I.e.
u ∈ U0< if and only if u ∈ U0 and u(t) < t ∀t 6= 0.
We will further assume that u is continuous. In [5, lemma 2.1] we find following
lemma.
Lemma 4.9. Let u : R → [−∞, +∞] be a proper closed and concave function.
Then the right and left derivatives u0+ and u0− exist as extended real numbers, and
1. for all a < t < b we have that u0+ (a) ≥ u0− (t) ≥ u0+ (t) ≥ u0− (b), and
2. the subdifferential is given by
∂u(t) = {s ∈ R|u0+ (t) ≤ s ≤ u0− (t)}.
(4.39)
Denote with g(η) := η + E [u(X − η)] then it is stated in [5, proposition 2.1] that
η ∗ ∈ arg max(g(η)) ⇔ E u0+ (X − η ∗ ) ≤ 1 ≤ E u0− (X − η ∗ ) .
(4.40)
Where the authors have assumed that they can freely interchange the derivative
and the expectation operator. They claim this is the case when the one-sided
derivatives of u are continuous and when the associated expected values are finite.
Hence if u ∈ C 1 we have that
η ∗ ∈ arg max(g(η)) ⇔ E [u0 (X − η ∗ )] = 1.
(4.41)
In [5, Theorem 3.1] we find following theorem which characterises the utility functions for which the associated divergence risk measure is coherent.
71
Theorem 4.13. In the class U0< of strongly risk-averse utility functions that are
finite valued, The divergence risk measure Dφ (X) = − OCEu (X) is a coherent risk
measure if and only if u is the piecewise linear function given by
(
γ2 x, if x ≤ 0
u(x) =
(4.42)
γ1 x, if x > 0
for some γ2 > 1 > γ1 ≥ 0.
Proof. The proof of this theorem was taken from [5, propositions 3.1, 3.2] and
consists of two parts, theorem 4.14 and 4.15.
Theorem 4.14. Let u ∈ U0< . Then OCEu (X) is positively homogeneous for all
random variables X if and only if u is positively homogeneous.
Proof. First suppose that u is positive homogeneous. Then we need to show that
OCEu is positive homogeneous. Take λ > 0, we have that
OCEu (λX) = sup (η + E [u(λX − η)])
η∈R
= sup (λη + E [u(λX − λη)])
λη∈R
= sup (λη + λE [u(X − η)])
λη∈R
= λ sup (η + E [u(X − η)])
λη∈R
= λ OCEu (X).
Which proves that the OCE is positive homogeneous. Take α > 0 > β, and
consider the random variable X such that P (X = α) = p and P (X = β) = 1 − p.
Now denote with
g(η) := η + pu(α − η) + (1 − p)u(β − η).
(4.43)
Then the optimised certainty equivalent is given by
OCEu (X) = sup (η + pu(α − η) + (1 − p)u(β − η)) = sup g(η).
η∈R
(4.44)
η∈R
Because u ∈ U0< we have that 1 ∈ ∂u(0). Hence by lemma 4.9 we have that
u0+ (0) ≤ 1 ≤ u0− (0).
Because α > 0 > β we can again apply lemma 4.9 such that
u0− (α) ≥ u0+ (α) ≥ u0− (0) ≥ 1 ≥ u0+ (0) ≥ u0− (β) ≥ u0+ (β) > 0.
We know from equation 4.40 that
η ∗ ∈ arg max(g(η)) ⇔ E u0− (X − η ∗ ) ≥ 1 ≥ E u0+ (X − η ∗ ) .
72
(4.45)
(4.46)
Hence 0 ∈ arg max(g(η)) if and only if
pu0− (α) + (1 − p)u0− (β) ≥ 1 ≥ pu0+ (α) + (1 − p)u0+ (β).
From this we have that
pu0− (α) + (1 − p)u0− (β) ≥ 1 ⇔ p ≥
1 − u0− (β)
,
u0− (α) − u0− (β)
and
pu0+ (α)
+ (1 −
p)u0+ (β)
1 − u0+ (β)
≤1⇔p≤ 0
.
u+ (α) − u0+ (β)
Hence
0 ∈ arg max(g(η)) ⇔
1 − u0+ (β)
1 − u0− (β)
≤
p
≤
.
u0− (α) − u0− (β)
u0+ (α) − u0+ (β)
(4.47)
We will now check whether the right hand side of equation 4.47 is well defined.
Because u0− (α) > 1 > u0− (β) > 0, and similarly u0+ (α) > 1 > u0+ (β) > 0 we have
1−u0− (β)
1−u0+ (β)
that 0 < u0 (α)−u
0 (β) < 1 and 0 < u0 (α)−u0 (β) < 1. We conclude that we always
−
+
−
+
have that p ∈ (0, 1).
We also need to show that
1 − u0+ (β)
1 − u0− (β)
≤
.
u0− (α) − u0− (β)
u0+ (α) − u0+ (β)
(4.48)
We will prove that
(1 − u0+ (β))(u0− (α) − u0− (β)) − (1 − u0− (β))(u0+ (α) − u0+ (β)) ≥ 0.
(4.49)
We have that
(1 − u0+ (β))(u0− (α) − u0− (β)) − (1 − u0− (β))(u0+ (α) − u0+ (β))
= u0− (α) − u0− (β) − u0− (α)u0+ (β) − u0+ (α) + u0+ (β) + u0+ (α)u0− (β)
= u0− (α) 1 − u0+ (β) − u0− (β) − u0+ (α) + u0+ (β) + u0+ (α)u0− (β)
≥ u0+ (α) 1 − u0+ (β) − u0− (β) − u0+ (α) + u0+ (β) + u0+ (α)u0− (β)
= u0+ (α) 1 − u0+ (β) − 1 + u0− (β) − u0− (β) + u0+ (β)
= u0+ (α) u0− (β) − u0+ (β) − u0− (β) − u0+ (β)
= u0− (β) − u0+ (β) u0+ (α) − 1
≥ 0.
Where in the first inequality we have used the fact that u0− (α) ≥ u0+ (α). We can
conclude that the expression on the right side of 4.47 is well defined. Now take p0
such that it satisfies 4.47. Then η ∗ = 0 is an optimal solution and OCEu (X) =
p0 u(α) + (1 − p0 )u(β). Take λ ∈ (0, 1), then because we do not necessarily know
that η ∗ = 0 is an optimal solution for OCEu (λX) we have following inequalities.
73
OCEu (λX) = sup (η + E [u(λX − η)])
η∈R
≥ p0 u(λα) + (1 − p0 )u(λβ)
= p0 u(λα + (1 − λ)0) + (1 − p0 )u(λβ + (1 − λ)0)
≥ λ (p0 u(α) + (1 − p0 )u(β)) + (1 − λ) (p0 u(0) + (1 − p0 )u(0))
= λ (p0 u(α) + (1 − p0 )u(β))
= λ OCEu (X)
Where we have used that u is a concave function such that u(0) = 0. Because
we assumed that the optimised certainty equivalent is positive homogeneous, all
equalities should hold. Hence we have that
p0 u(λα) + (1 − p0 )u(λβ) = λp0 u(α) + λ(1 − p0 )u(β)
We can rewrite this and find that
p0 (u(λα) − λu(α)) + (1 − p0 ) (u(λβ) − λu(β)) = 0
(4.50)
Because u is a concave utility function for which u(0)=0. We have that for all
x ∈ R and λ ∈ [0, 1].
u(λx) = u(λx + (1 − λ)0) ≥ λu(x) + (1 − λ)u(0) = λu(x).
Because p0 ∈ (0, 1) and (u(λx) − λu(x)) ≥ 0 for all x ∈ R, we have that both
terms in the sum 4.50 are positive and because the sum to zero they should be
zero as well.
Hence we can conclude that
(
u(λα) = λu(α), ∀α > 0
(4.51)
u(λβ) = λu(β), ∀β < 0.
We conclude that because u(0) = 0
u(λx) = λu(x),
∀λ ∈ [0, 1], ∀x ∈ R.
(4.52)
If λ > 1, then there exists a µ ∈ (0, 1) such that λ = µ1 . We then have that
this holds for all
u(µx) = µu(x), which means that u λ1 x = λ1 u(x).
Because
1
1
x ∈ R, this also holds for λx. We have that u λ λx = λ u(λx).
we can conclude that
∀λ > 0, ∀x ∈ R.
u(λx) = λu(x),
(4.53)
This means that u is positive homogeneous, which concludes our proof.
The next theorem characterises the positive homogeneous utility functions from
U0< .
74
Theorem 4.15. Let u ∈ U0< be a finite positive homogeneous utility function, then
u is a piecewise linear function. I.e
(
γ2 x, x ≤ 0
u(x) =
(4.54)
γ1 x x > 0.
Where γ2 > 1 > γ1 ≥ 0.
The proof of this theorem is based on a lemma found in [25, corollary 13.2.1 ].
Lemma 4.10. Let f be any positively homogeneous convex function, which is not
identically +∞. Then cl(f ) is the support function of a certain closed convex set
C. Namely
C := {y|∀x, hx, yi ≤ f (x)}
(4.55)
Proof. Without proof, see [25, corollary 13.2.1 ].
Furthermore it is stated in [25, p. 51] that for proper convex functions the closedness property cl(f ) = f is equivalent with lower semicontinuity.
Proof. (theorem 4.15) Denote with l(x) := −u(−x), then l is a positive homogeneous convex function. l is also continuous because u is. Using lemma 4.10 we
know that l is the support of a closed convex set of R, I.e. an interval [γ1 , γ2 ]
with γ1 ≤ γ2 . And because [γ1 , γ2 ] = {y|∀x, hx, yi ≤ l(x)}, we have following
representation for l.
l(x) = −u(−x) = sup (xy) .
(4.56)
γ1 ≤y≤γ2
Hence we have that
(
γ1 x,
l(x) =
γ2 x,
x≤0
x ≥ 0.
(4.57)
Then the utility function u is given by
(
γ2 x,
u(x) =
γ1 x,
x≤0
x ≥ 0.
(4.58)
Because u ∈ U0< we have that for x =
6 0 u(x) < x, this implies that γ2 > 1 > γ1 .
Because u is non-decreasing we also know that γ1 ≥ 0.
4.2.3
Examples
In this section we will try to clarify the concept of divergence based risk measures
further by calculating the corresponding utility function of some known divergence
functions.
The χ2 -divergence is given by φ(t) = (t − 1)2 . We have that
−u(−x) = sup (xt − φ(t))
t∈R
= sup xt − (t − 1)2 .
t∈R
75
The first order condition yields that x = 2(t − 1). The second order condition
yields that −2 < 0. From this we conclude that xt − (t − 1)2 attains a maximum
for t = x2 + 1. Hence:
−u(−x) = x
x
2
+1 −
x 2
2
x2
+ x.
=
4
2
We conclude that the corresponding utility function is given by u(x) = − x4 + x.
√
2
The Hellinger divergence is given by φ(t) =
t − 1 . We have that
−u(−x) = sup (xt − φ(t))
t∈R
√
2 = sup xt −
t−1
.
t∈R
The first order condition yields that x = 1 − √1t , or t =
√ < 0. We find that
condition is satisfied because 2−1
t3
1
.
(1−x)2
The second order
2
1
−1
1−x
2
x
x
=
−
(1 − x)2
1−x
2
x−x
=
(1 − x)2
x
=
.
1−x
x
−
−u(−x) =
(1 − x)2
x
We can conclude that u(x) = 1+x
.
The reader might notice we have not included the Kullback-Leibler divergence. In
the next chapter we will show that the associated utility function is the exponential
utility function.
4.3
The ordinary certainty equivalent as risk measure
In previous sections we have found two ways to construct convex risk measures
using utility functions, utility based shortfall risk and divergence risk measures.
Using a duality theorem form mathematical optimisation we found that these utility based risk measures where the dual optimisation problems of negative certainty
equivalents.
The link with certainty equivalents is not surprising at all. Certainty equivalents
try to define an equivalent risk-free amount to a uncertain gamble. If this amount
76
is negative, this means you are willing to pay some amount to not have to incur
the risk of the gamble. This amount is then used as the risk measure.
It is now natural to ask whether we could we use the ordinary certainty equivalent
as a risk measure? That is ”would ρ(X) = − CEu (X) be a good risk measure?”
In the first chapter we defined some axioms which a ”good” risk measure should
satisfy. First of all − CEu (X) would need to be a monetary risk measure. For this
it needs to satisfy the translation property. This means we need to have
CEu (X + m) = CEu (X) + m
∀m ∈ R.
(4.59)
It turns out that the restriction 4.59 is a rather severe restriction on the possible
utility functions we can use. In what follows we will further assume that u is
strictly increasing and u ∈ C 2 . Now define for all m ∈ R um (x) := u(x + m), then
um is also strictly increasing and um ∈ C 2 . Because both u and um are strictly
increasing the inverse functions u−1 and u−1
m are well defined. For all m ∈ R we
have:
CEu (X + m) = CEu (X) + m ⇔ u−1 (E [u(X + m)]) = u−1 (E [u(X)]) + m
⇒ E [u(X + m)] = u u−1 (E [u(X)]) + m
⇒ E [um (X)] = um (CEu (X))
⇒ u−1
m (E [um (X)]) = CEu (X)
⇒ CEum (X) = CEu (X)
From theorem 2.3 from chapter 2. We know that for all m ∈ R
um
u
CEum (X) = CEu (X) ⇔ rA
(x) = rA
(x)
∀x ∈ R.
u
(x) denotes the Arrow-Pratt coefficient of absolute risk aversion. We
Where rA
have that:
−u00 (x + m)
−u00 (x)
um
u
= 0
= rA
(x + m).
rA
(x) = 0 m
um (x)
u (x + m)
u
u
(x) for all m ∈ R and x ∈ R. From
(x + m) = rA
This means we need to have rA
u
this we derive that the Arrow-Pratt coefficient of absolute risk aversion rA
(x) is
u
independent of x. This implies that rA (x) is constant. Thus u is a linear or an
exponential utility function.
Because linear utility functions imply a risk neutral attitude they are not desirable
to construct a risk measure with. In theorem 2.7 of chapter two we have shown
that for the (normalised) exponential utility function all three different certainty
equivalents coincide.
77
5
Utility functions
In this chapter we will take a closer look at some of the classes of utility functions
we encountered in the literature. We will discuss their general properties and
whether they are suitable to be used in utility based risk measures.
For each of the utility functions we will calculate the associated divergence function
using the Legendre transform. Furthermore we will also illustrate the effect of the
parameters that occur in both the utility based shortfall risk and the divergence
risk. For this we have simulated 10000 returns form a normal distribution with
mean 0.25 and standard deviation σ. One can think of the log-returns of a stock
which follows a Brownian motion with a drift of 0.25 and a volatility of σ. Because
the volatility of a stock, is linked to the riskiness of this stock, we are interested
to see the effect of σ on the risk measure. We expect to see that higher values of
σ coincide with higher values of the risk measures.
Inspired by the results of these simulations we can state following lemma, which
does not assume a specific distribution of the returns.
Lemma 5.1. Let uα : R → R be a class of utility functions with a parameter α.
If uα1 (x) ≥ uα2 (x) for all x ∈ R then we have that for X ∈ L∞ (Ω, F, P ).
1. Dφα1 (X) ≤ Dφα2 (X),
l
l
2. SF xα01 (X) ≤ SF xα02 (X), for all x0 in the interior of lα1 and lα2 .
Where φα1 and φα2 denote the associated divergence functions of uα1 and uα2 respectively. And where lα1 and lα2 denote the associated loss functions.
78
Proof.
1. Let η ∈ R and let X ∈ L∞ (Ω, F, P ) then
uα1 (x − η) ≥ uα2 (x − η),
∀x ∈ R
⇒ E [uα1 (X − η)] ≥ E [uα2 (X − η)]
⇒ η + E [uα1 (X − η)] ≥ η + E [uα2 (X − η)]
⇒ sup (η + E [uα1 (X − η)]) ≥ sup (η + E [uα2 (X − η)])
η∈R
η∈R
⇒ − OCEuα1 (X) ≤ − OCEuα2 (X)
⇒ Dφα1 (X) ≤ Dφα2 (X).
The last implication follows from theorem 4.10 and uses the assumption that
X ∈ L∞ (Ω, F, P ).
2. To prove the effect on the utility based shortfall risk measure take X ∈
L∞ (Ω, F, P ) and assume that uα1 (x) ≥ uα2 (x) for all x ∈ R. Then we have
that
E [uα1 (X + m)] ≥ E [uα2 (X + m)] , ∀m ∈ R
l
l
To prove that SF xα01 ≤ SF xα02 for all x0 in the interior of lα1 and lα2 , we will
show that
{m ∈ R|E [uα2 (X + m)] ≥ −x0 } ⊂ {m ∈ R|E [uα1 (X + m)] ≥ −x0 } (5.1)
Take m ∈ {m ∈ R|E [uα2 (X + m)] ≥ −x0 }. We have that E [uα1 (X + m)] ≥
E [uα2 (X + m)] ≥ −x0 .
Hence we can conclude that m ∈ {m ∈ R|E [uα1 (X + m)] ≥ −x0 } . This
proves the fact that
inf {m ∈ R|E [uα1 (X + m)] ≥ −x0 } ≤ inf {m ∈ R|E [uα2 (X + m)] ≥ −x0 } .
(5.2)
The first class of utility functions we will study are the power utility functions
of which we found a brief description in [15]. These utility functions belong to a
larger class of utility functions called the HARA class. The acronym HARA stands
for hyperbolic absolute risk aversion. A utility function belongs to the HARA class
if the Arrow-Pratt coefficient of absolute risk aversion is given by
rA =
1
a + bx
∀x ∈ D.
(5.3)
Where b ≥ 0 and a > 0 if b = 0. The domain D = R if b = 0. If b 6= 0 then
D = ( −a
, +∞).
b
79
5.1
The power utility functions
Assume that b > 0, then we can reconstruct the utility function using theorem 2.2
from the second chapter. First assume that b 6= 1
We have that
Z η
Z x
C1 exp
−rA (ζ)dζ dη + C2
u(x) =
1
1
Z η
Z x
−1
C1 exp
dζ dη + C2
=
1
1 a + bζ
− 1b !
Z x
a + bη
=
C1 exp ln
dη + C2
a+b
1
− 1b
Z x
a + bη
dη + C2
= C1
a+b
1
#
"
1− 1b
a+b
a + bx
= C1
− 1 + C2
b−1
a+b
=
1
D
(a + bx)1− b + E.
b−1
Where D and E are integration constants. If b = 1 we have that rA (x) =
Then the utility function is given by
u(x) = D ln(a + x) + E.
1
.
a+x
(5.4)
The utility function
(
u(x) =
1
(a + bx)1− b + E
D ln(a + x) + E
D
b−1
b 6= 1
b=1
(5.5)
is called the extended power utility. If we have that a > 0 then we can standardise
this utility function in the usual way such that u(0) = 0 and u0 (0) = 1. To see this,
1
first suppose b 6= 1. From the condition u0 (0) = 1 we conclude that D = a b , and
1
−D 1− b
−a
from u(0) = 0 we can conclude that E = b−1
= b−1
a
. Now consider the case
where b = 1 then u(0) = 0 implies that E = −D ln(a) and u0 (0) = 1 implies that
D = a. Therefore we can conclude that when a > 0 and b > 0 we have following
standardised utility function.
( 1− 1
1
a b
a
(a + bx)1− b − b−1
b 6= 1
u(x) = b−1
(5.6)
a ln(a + x) − a ln(a)
b = 1,
where the domain is given by D = ( −a
, +∞). When a = 0 the extended power utilb
ity function becomes the narrow power utility function which takes the following
form
( 1−γ
D x1−γ + E
γ 6= 1
u(x) =
(5.7)
D ln(x) + E γ = 1.
80
This utility function is defined for all x > 0. The parameter γ is known as the
coefficient of relative risk aversion rR which can be obtained using the following
formula
u00 (x)
.
(5.8)
rR (x) = −x 0
u (x)
It is clear that if a ≤ 0 the power utility cannot be standardised in the usual way.
However there is a bigger problem when using the power utility in the context of
utility based risk measures. The power utility is only defined for values greater
than − ab . Throughout this thesis however we have looked at the stochastic variable
X which modelled the net payoffs of a portfolio. When trying to quantify the risk
of this portfolio we are especially interested in the potential losses of the portfolio.
Because we have to evaluate these potential losses with a utility function, it is
important that the utility function is defined for those negative values. This can
be problematic with power utility.
There are however functions in the HARA class which do not have this problem.
This occurs whenever b = 0 because then the coefficient of absolute risk aversion is
a constant. As we have seen in chapter two of this thesis the corresponding utility
function is the exponential utility function.
5.2
The exponential utility functions
The standardised exponential utility function is given by
1 − exp (−ax)
.
(5.9)
a
Where the parameter a denotes the coefficient of absolute risk aversion. In theorem
2.7 of the second chapter we have proven that for the exponential utility function,
all certainty equivalents coincide.
We will now derive the divergence function associated with this utility function:
u(x) =
φ(t) ≡ sup (xt + u(−x))
x∈R
1 − exp (ax)
= sup xt +
a
x∈R
The first order condition yields that in a maximum x = a1 ln(t). The second order
condition for a maximum is fulfilled because for all x we have that −a exp (ax) < 0.
Hence we find that the divergence function is given by
t
1 − exp (ln(t))
φ(t) = ln(t) +
a
a
t
1
t
= ln(t) + − .
a
a a
81
If Q ∈ M1 (P ), then the divergence associated with this is
Z
1 dQ
dQ
dQ
Iφ (Q|P ) =
ln
+1−
dP
dP
dP
dP
Ω a
Z
Z
Z
1
dQ
1
dQ
1
=
ln
dP +
dP −
dQ
a Ω dP
dP
a Ω
a Ω
Z
dQ
dQ
1
ln
dP
=
a Ω dP
dP
1
= KL(Q|P ).
a
We find that the divergence associated with the exponential utility function is the
KullBack-Leibler entropy. We have already encountered this entropy in the first
chapter where we used it as a penalty function. We note that the divergence risk
measure associated with it, i.e
1
EQ (−X) − KL(Q|P )
sup
(5.10)
a
Q∈M1 (P )
is called the entropic risk measure. This definition of entropic risk measure was
given in [11, p 201] where we also find the definition in the form of a negative
optimised certainty equivalent. Using theorem 2.7 from the second chapter and
theorem 4.10 from the fourth chapter we can see that this entropic risk measure
has the following representation.
1
1
ERa (X) = sup
EQ (−X) − KL(Q|P ) = ln (E [exp (−aX)]) . (5.11)
a
a
Q∈M1 (P )
To study the effect of the coefficient of absolute risk aversion of the entropic risk
measure, it is important to notice that if a1 ≤ a2 then ua1 (x) ≥ ua2 (x) for all
x ∈ R.
We will formally prove this fact by showing that ua (x) is a decreasing function of
a. We have that
exp(−ax)(xa + 1) − 1
∂ua (x)
=
≤0
∂a
a2
Where the inequality follows from the fact that (xa + 1) ≤ exp(ax) 1 . Hence we
can use lemma 5.1 to conclude that the entropic risk measure is increasing in the
coefficient of absolute risk aversion, i.e.
a1 ≤ a2 ⇒ ERa1 (X) ≤ ERa2 (X).
(5.12)
To illustrate this relationship we have simulated different sets of each 10000 returns. These returns were generated from a normal distribution with mean 0.25
and different standard deviations σ. We calculated the entropic risk measure for
different values of absolute risk aversion and plotted our results. The results can
be found in figure 5.2. In this figure we can clearly observe that an increase of
absolute risk aversion corresponds to an increase in entropic risk. Furthermore we
also notice that a higher standard deviation corresponds to a higher risk.
1
This is a known inequality which follows from the fact that xa + 1 is the tangent line to
exp(ax) in ax = 0, and exp(·) is a convex function.
82
Figure 5.1: Influence of the absolute risk aversion a on the exponential divergence risk
measure.
We would like to point out that because of theorem 2.7 we know that SF l0 =
ERa (X). This means that for the exponential utility function the utility based
shortfall risk for x0 = 0 equals the entropic risk, i.e the exponential divergence
risk. Therefore we will not study the effect of the absolute risk aversion on the
exponentially based shortfall risk. As we have deduced in the fourth chapter
increasing the parameter x0 of a utility based shortfall risk measure always results
in a decrease of the risk and this holds independent of the utility function.
We know from 4.13 that this risk measure is not coherent because the utility
functions is not a piecewise linear function. However there exists a coherent version
of this entropic risk measure which is desribed in detail in [3]. This risk measure
is called Entropic Value at Risk or EVaR. From [3, definition 3.1] we have that
Definition 5.1. Entropic Value at Risk at a (1−α)100% confidence level is defined
as
1
MX (z)
ln
,
(5.13)
EVaRα (X) := inf
z>0
z
a
where MX denotes the moment generating function of X.
We will not work out any details of this article as it is outside the scope of this
thesis. However we will report some key results from this paper and explain how
these results are closely linked to the entropic risk measure and divergence based
risk measures in general. This is interesting because these seemingly technical
theorems could be used to construct coherent alternatives to the divergence risk
measures discussed in chapter four.
In [3, Theorem 3.3] we find following robust representation theorem regarding
Entropic Value at Risk.
83
Theorem 5.1. For X ∈ L∞ (Ω, F, P ) we have that
!
1
1
EVaRα = sup EQ [−X] = inf
sup
EQ [X] − KL(Q|P ) − ln(α)
,
z>0 Q∈M (P )
z
z
Q∈I
1
(5.14)
where I = {Q ∈ M1 (P )|KL(Q|P ) ≤ − ln(α)}
It is easy to see that any risk measure which has a robust representation in the
form of sup EQ [−X], where I denotes a set of probability measures, is a coherent
Q∈I
risk measure. This follows from the properties of the expected value and the
supremum. When we compare this representation of Entropic Value at Risk to
definition 5.11 we can clearly see and interpret the different approach.
In the divergence based approach we considered all probability measures Q which
are absolute continuous with respect to P . We then looked at the expected losses
under each of these probability measures Q, EQ [−X]. Using the Kullback-Leiber
entropy we penalised these expected losses depending on how similar the probability measure Q was to P . We concluded our computation by taking the supremum
over all these penalised expected losses.
In the coherent approach however, we only consider the probability measures Q
which have a Kullback-Leibler distance with respect to P smaller than a given
amount. We then take the supremum over all the expected losses with respect to
those probability measures Q. No penalty functions are used here and every probability measure Q for which KL(Q|P ) ≤ − ln(α) is taken to be equally important.
This idea which is used to construct Entropic Value at Risk could be generalised
to all divergence risk measures using the definition of a φ-entropic risk measure
with divergence level β which we found in [3, definition 5.1].
Definition 5.2. Let φ be a convex function with φ(1) = 0, and β a non-negative
number. The φ-entropic risk measure with divergence level β is defined as
ERφ,β (X) := sup EQ [−X] ,
(5.15)
Q∈I
where I := {Q ∈ M1 (P )|Iφ (Q|P ) ≤ β} .
This defines a class of coherent risk measures which shows a lot of similarities to
the divergence risk measures which where defined as
sup
(EQ [−X] − Iφ (Q|P )) .
Q∈M1 (P )
So far we have discussed the power utility and the exponential utility, both of
which are contained in the HARA class. These utility functions are commonly
used in economics. There exist however a lot of other classes utility functions.
One such class is the class of the polynomial utility functions.
5.3
The polynomial utility functions
The following class of utility functions was found in [10].
84
Definition 5.3. For γ > 1 with γ ∈ N the polynomial utility function is defined
as
(
1−(1−x)γ
if x ≤ 1
u(x) = 1 γ
(5.16)
elsewhere
.
γ
We have plotted this utility function for different values of γ in figure 5.2
The associated loss function is given by
(
(1+x)γ −1
if x ≥ −1
l(x) = −1 γ
(5.17)
elsewhere .
γ
The first derivative is given by
(
(1 − x)γ−1
u0 (x) =
0
x≤1
x > 1.
(5.18)
The Arrow-Pratt measure of absolute risk aversion is not well defined for x ≥ 1.
> 0, which implies a risk averse attitude. Because
For x < 1 we have that rA + γ−1
1−x
the utility function is constant for x ≥ 1 we can say that for x ≥ 1 the utility
function implies risk neutrality.
We will now calculate the divergence function associated with this utility function.
φ(t) = sup (xt − l(x))
x∈R
1
(1 + x)γ − 1
= max sup xt −
, sup xt +
γ
γ
x<−1
x≥−1
1
First consider the case that t < 0, then we have that sup xt +
= +∞. Hence
γ
x<−1
for t < 0 we have that φ(t) = +∞. Now assume that t ≥ 0. Then we have that
1
1
sup xt +
= −t + .
(5.19)
γ
γ
x<−1
(1 + x)γ − 1
We will now calculate sup xt −
. The first order condition
γ
x≥−1
1
yields that t − (1 + x)γ−1 = 0. Hence we have that t γ−1 − 1 = x. Notice that
the second order condition for a maximum is also fulfilled in this point. Hence we
have that
(1 + x)γ − 1
sup xt −
=
γ
x≥−1
1
γ
1
=
γ
1
=
γ
=
85
γ
t
1
γ−1
−1 t−
t γ−1 − 1
γ
!!
1
γ
γt γ−1 +1 − γt − t γ−1 + 1
γ
γ
γt γ−1 − γt − t γ−1 + 1
γ
(γ − 1)t γ−1 − γt + 1
Figure 5.2: Polynomial
utility function for
different values of
γ.
Figure 5.3: Divergence function for different values
of γ.
Now it is sufficient to notice that if t ≥ 0 then
γ
(γ−1) γ−1
t
γ
≥ 0. Hence we have that
γ
1
1
γ−1
(γ − 1)t
− γt + 1 ≥ −t + .
γ
γ
From this it follows that for t ≥ 0
γ
1
φ(t) =
(γ − 1)t γ−1 − γt + 1 .
γ
We conclude that the associated divergence is given by
( γ
1
γ−1 − γt + 1
(γ
−
1)t
if t ≥ 0
φ(t) = γ
+∞
elsewhere .
(5.20)
(5.21)
(5.22)
We have plotted this divergence function for different values of γ in figure 5.3.
We will now study the effect of the parameter γ on both the polynomial utility
based shortfall risk and on the polynomial divergence risk. An excellent starting
point for this is figure 5.2. This figure lets us suspect that, for a fixed return x, if
γ1 ≥ γ2 then uγ1 (x) ≤ uγ2 (x). We can see this in the following way:
Assume that γ1 ≥ γ2 . For x ≥ 1 we have that γ11 ≤ γ12 . Hence for x ≥ 1 we have
that uγ1 (x) ≤ uγ2 (x). Now consider a random but fixed x < 1 then 1 − x > 0.
γ1
γ2
Hence if γ1 ≥ γ2 we have that 1−(1−x)
≤ 1−(1−x)
. From this we can conclude
γ1
γ2
that uγ1 (x) ≤ uγ2 (x) for all x ∈ R.
86
We can use lemma 5.1 to conclude that both the polynomial divergence risk and
the polynomial utility based shortfall risk will increase when the parameter γ
increases. We have illustrated this relationship using a simulation. We generated
10000 returns from a normal distribution with mean 0.25. We did this for different
standard deviations. For each of these sets of returns we computed the divergence
risk and the utility based shortfall risk for different values of γ. We have listed the
results in table 5.1 and 5.2 respectively.
Table 5.1: Divergence risk of the polynomial utility for different values of γ and σ
γ=2
σ
σ
σ
σ
σ
= 0.2
= 0.4
= 0.6
= 0.8
= 1.0
γ=3
γ=4
γ=5
γ=6
-0.226 -0.206 -0.187 -0.167
-0.166 -0.092 -0.022 0.045
-0.073 0.077 0.214 0.340
0.046 0.289 0.514 0.732
0.206 0.545 0.836 1.095
-0.148
0.107
0.457
0.943
1.330
Table 5.2: Utility based shortfall risk of the polynomial utility with x0 = 0 for different
values of γ and σ
γ=2
σ
σ
σ
σ
σ
= 0.2
= 0.4
= 0.6
= 0.8
= 1.0
γ=3
γ=4
γ=5 γ=6
-0.226 -0.206 -0.186 0.167
-0.163 -0.086 -0.015 0.052
-0.058 0.095 0.234 0.361
0.083 0.332 0.562 0.784
0.276 0.614 0.903 1.159
-0.148
0.115
0.479
0.977
1.393
When looking at these tables we further notice that the larger the standard deviation, the larger both risk measures. This coincides with our intuition. When
we compare values across both tables, we notice that the divergence risk is always
smaller than the utility based shortfall risk, with x0 = 0. This should not be
surprising because this theoretical result follows directly from the results which
were proven in the fourth chapter. In the fourth chapter we showed that2 if x0 = 0
then SF l0 (X) = − Mu (X). We also obtained a general inequality which stated
that Mu (X) ≤ OCEu (X). From those results we can conclude that
SF l0 (X) ≥ Dφ (X).
(5.23)
A special case of the polynomial utility is the quadratic utility. This utility function
is obtained by taking γ = 2.
( 2
− x + x if x ≤ 1
(5.24)
u(x) = 1 2
elsewhere .
2
The divergence associated with the quadratic utility is
2
see equation 4.2
87
(
φ(t) =
(t − 1)2
+∞
1
2
if t ≥ 0
elsewhere .
(5.25)
The divergence function (t − 1)2 called the χ2 -divergence function. It is stated in
[5] that the optimised certainty equivalent associated with the quadratic utility
function of a stochastic variable X for which xmax ≤ 1 + E [X] is
1
Var(X)
2
We will verify this claim. We have that u is a differentiable function and
(
1 − x, x ≤ 1
0
u (x) =
0
x>0
OCEu (X) = E [X] −
(5.26)
(5.27)
Now we will use equation 4.41 which characterises the optimal allocation η ∗ of the
optimised certainty equivalent.
η ∗ = E [X] ⇔ E [u0 (X − η ∗ )] = 1
Because we know that xmax ≤ 1 + E [X], we have that X(ω) − E [X] ≤ 1 for all
ω ∈ Ω. Hence we have that
E [u0 (X − E [X])] = E [1 − X + E [X]] = 1 − E [X] + E [X] = 1.
This proves that η ∗ = E [X] is an optimal allocation. We can now conclude that
the optimised certainty equivalent is given by
OCEu (X) = η ∗ + E [u (X − η ∗ )]
= E [X] + E [u (X − E [X])]
1
2
= E [X] + E (X − E [X]) − (X − E [X])
2
1 = E [X] + E [X] − E [X] − E (X − E [X])2
2
1
= E [X] − Var(X).
2
This proves the result.
5.4
The SAHARA utility functions
When we took a closer look at the application of the power utility to risk measures,
we highlighted the problem that this utility function might not be defined for
large negative values. The class of SAHARA utility functions was introduced in
[8], to deal with the problem of the limited domain of certain HARA functions.
The acronym SAHARA stands for symmetric asymptotic hyperbolic absolute risk
aversion. Originally this class of utility functions was used for option pricing. In
this thesis we will take a closer look at the properties of this class in the context
88
of utility based risk measures. Just as in the HARA class, the SAHARA class is
also defined using the Arrow-Pratt measure of absolute risk aversion.
Definition 5.4. A utility function u with
domain R belongs to the SAHARA class
−u00 (x)
if the absolute risk aversion ra (x) = u0 (x) is given by
a
rA (x) = p
b2 + (x − d)2
(5.28)
with a > 0, b > 0 and d ∈ R.
We have plotted this risk aversion for several values of the parameters a, b and d in
figure 5.4. We can see in these figures that the absolute risk aversion is a strictly
positive symmetric function which attains a maximum for x = d. It is easy to
prove these facts using equation 5.28. Furthermore we find that
a
lim rA (x) = lim p
=0
2
x→+∞
b + (x − d)2
a
= 0.
lim rA (x) = lim p
x→−∞
x→−∞
b2 + (x − d)2
x→+∞
(5.29)
(5.30)
This implies that an investor with the SAHARA utility has an almost risk neutral
attitude towards very large losses and very large gains. We will call the point at
which the absolute risk aversion attains its maximum, d the threshold loss3 . When
approached from above the absolute risk aversion is increasing. This implies that
an investor or financial institution will become increasingly risk averse and will try
to avoid falling below the threshold loss.
In the context of risk measures the parameter d could be used to model a loss which
if exceeded will cause a the financial institution significant problems. Using this
interpretation it is not difficult to see why lim rA (x) = 0 is not an unreasonable
x→−∞
assumption. If the losses are so large that it insures the bankruptcy of the financial
institution, there is no reason to be risk averse any more. Unlike in [8] we will not
assume that d = 0, because there is no reason to assume that the threshold loss
is 0. This makes the computation of the associated utility function tedious. To
insure the readability of this section we will only report the results and we have
put the calculations in appendix B.
If the Arrow-Pratt measure of absolute risk aversion is given by equation 5.28 then
the associated utility function is given by

p
−a p
−C1

2 + (x − d)2 + (x − d)
2 + (x − d)2 + (x − d) + C

b
a
b
2
 (a2 −1)
!
p
√b2 +(x−d)2 −(x−d)2
u(x) =
C1


b2 + (x − d)2 + (x − d) −
+ C2
 2 ln
2b2
(5.31)
For some constants C1 and C2 .
For all a > 0 the marginal utility is given by
3
This is different from the definition given in [8], because in our context the stochastic variable
X does not model the total wealth.
89
a 6= 1
a = 1.
Figure 5.4: Absolute risk aversion of SAHARA utility.
Figure 5.5: Absolute risk aversion for
varying values of a with
b = 1 and d = 2 fixed.
u0 (x) = C1
p
Figure 5.6: Absolute risk aversion for
varying values of b with
a = 0.5 and d = 2 fixed.
b2 + (x − d)2 + (x − d)
−a
.
(5.32)
We can determine the constants C1 and C2 such that the utility function is standardised and we have that u(0) = 0 and u0 (0) = 1. We find that
a
√
C1 =
b2 + d 2 − d ,
(5.33)
and
 C
√
−a √
1
b2 + d2 − d
a b2 + d2 − d a 6= 1
 (a2 −1)
√
2
2 +d2 +d
√
C2 =
b
(
)
C
− 21 ln b2 + d2 − d −
a = 1.
2b2
(5.34)
We have plotted the standardised SAHARA utility functions for different values
of a,b and d in figure 5.7.
90
Figure 5.7: The SAHARA utility function.
Figure 5.8: b = 1 and Figure 5.9: a = 2 and Figure 5.10: a = 2 and
d = 0.
b = 1.
d = 2.
The calculation of the associated divergence function can be found in appendix B.
We found that the associated divergence function is given by
 1 −1
1
−1

 t b2 t a C11 a − t a C11a − 2d + C2
a 6= 1
2
1+ a
1− a
φ(t) =
(5.35)
t
1
C1
C1
b2 t

−
C
ln
−
−
t
d
+
+
C
a
=
1.
1
2
2
t
2
2t
2C1
Where the constants C1 and C2 are given by 5.33 and 5.34 respectively. We have
plotted the divergence function in figure 5.11.
Figure 5.11: Divergence function of SAHARA utility with a = 2, b = 2 and d = −1.
The SAHARA class of utility functions is de most complicated class of utility
functions in this chapter.
Unlike in the case of the exponential utility functions and the polynomial utility
functions, we will only illustrate the effect of the parameters of the SAHARA
91
utility on the divergence risk measures in a concrete example. For the divergence
risk measures we worked with different sets of returns which we generated from a
normal distribution with mean 0.25 and different standard deviations. The effect of
the parameters a, b and d where plotted in figures 5.12, 5.14 and 5.16 respectively.
When we look the effect of the parameter a on the SAHARA divergence risk we
see that in this case the divergence risk is increasing in the parameter a. Although
we do not provide a formal proof, we do not think this relationship is purely
coincidental. If we look at figure 5.8 we suspect that if a1 ≥ a2 and all other
parameters are fixed, then ua1 (x) ≤ ua2 (x). Hence the relationship between the
parameter a and the divergence risk measure might be due to lemma 5.1.
The same relationship is observed between the parameter a and the utility based
shortfall risk. In figure 5.13 we have plotted both the divergence risk and the
utility based shortfall risk with x0 = 0 of 10000 returns generated form a normal
distribution with mean 0.25 and standard deviation 0.8. Although both the divergence risk and the utility based shortfall risk were plotted, only one graph is
visible. This is not a mistake. It turns out that in this example both risk measures
yield very similar results, which makes it difficult to distinguish between them.
Figure 5.12: Effect of the parameter a on the SAHARA divergence risk with b = 2 and
d = 0.
92
Figure 5.13: Effect of parameter a on utility based shortfall risk with b = 2 en d = 0.
Using the same sets of returns as in the illustration of the affect of the parameter
a we have illustrated the effect of the parameter b on the divergence risk. For the
computations we have taken a = 2 and d = 0. The results are shown in figure 5.14.
Here we observe that an increase in the parameter b corresponds to a decrease in
the divergence risk measure. We also observe that a higher standard deviation
leads to a higher divergence risk. These results are not surprising when we look
at figure 5.9 where the effect of the parameter b on the SAHARA utility function
is shown.
To illustrate the effect on the utility based shortfall risk we have calculated this
risk measure on a set of 10000 returns generated from a normal distribution with
mean 0.25 and standard deviation 0.8. We took a = 2, d = 0 and x0 = 0. At
the same time we also calculated the divergence risk of the same set of returns.
The results are shown in figure 5.15. In this figure we do not only observe that
the SAHARA utility based shortfall risk is decreasing in the parameter b, but also
that the difference between this risk measure and the divergence risk measure is
very small.
93
Figure 5.14: Influence of the parameter b on the SAHARA divergence risk with a = 2
and d = 0.
Figure 5.15: Influence of the parameter b on the utility based shortfall risk with a = 2,
d = 0 and x0 = 0.
94
Until this point each parameter we looked at had a monotone effect on the risk
measure. However if we look at figure 5.16 we notice a non-monotone relationship
between the parameter d and the associated divergence risk. This figure was
generated, as always using different sets of returns each taken from a normal
distribution with mean 0.25 and different standard deviations. For the calculations
we put a = 2 and b = 2. The same relationship is observed in figure 5.17. Here we
have computed both the SAHARA utility based shortfall risk and the SAHARA
divergence risk of a set of returns which was generated from a normal distribution
with mean 0.25 and standard deviation 0.8. For the computations we took a = 2,
b = 2 and x0 = 0. Again we notice that there is almost no difference between
the utility based shortfall risk measure and the divergence risk measure. Notice
that the non-monotonicity is a consequence of the way that we standardised the
SAHARA utility function. If we had taken C1 = 1 and C2 = 0 for example then
the increasing relationship between the parameter d and the divergence risk would
follow directly from the shift additivity of the optimised certainty equivalent. To
see this denote with ud1 (x) the SAHARA utility function were d = d1 . Notice that
under the alternative standardisation ud (x) = u0 (x − d), therefore we have that
OCEud (X) = sup (η + E [ud (X − η)]) = sup (η + E [u0 (X − d − η)]) = OCEu0 (X)−d
η∈R
η∈R
Hence we would have that Dφd (X) = Dφ0 (X) + d. Using the standardisation
C1 = 1 and C2 = 0 would also cause an increasing relationship between the
parameter d and the utility based shortfall risk. A result which follows directly
from the translation invariance of utility based shortfall risk measures.
Figure 5.16: Influence of the parameter d on the SAHARA divergence risk with a = 2
and b = 2.
95
Figure 5.17: Influence of the parameter d on the different risk measures with a = 2,
b = 2 and x0 = 0. Based on 10000 returns generated from a normal
distribution with µ = 0.25 and σ = 0.8.
5.5
The κ-utility functions
In [15] we found following class of utility functions, which we will call the κ-utility
functions. For each κ > 0
√
1
(5.36)
1 + κx − 1 + κ2 x2
κ
denotes a κ-utility function. Notice that this utility function is standardised such
that u(0) = 0 and u0 (0) = 1 for all κ > 0. We have plotted this function for
several values of κ in figure 5.5. We will fix κ and calculate the first and second
derivatives with respect to x. We find that
u(x) =
κx
.
u0 (x) = 1 − √
1 + κ2 x2
(5.37)
We can conclude that for all κ > 0 and for all x u0 (x) > 0. This implies that
all utility functions from the class 5.36 are strictly increasing. For the second
derivative we find that
−κ
u00 (x) = √
(5.38)
3 .
1 + κ2 x2
Because κ > 0 we have that u00 (x) < 0 for all κ and x. This means all utility
functions from the class 5.36 are strictly concave everywhere. Now we can easily
deduce the Arrow-Pratt measure of absolute risk aversion.
ra (x) = −
u00 (x)
κ
√
=
0
u (x)
(1 + κ2 x2 ) 1 + κ2 x2 − κ2 x2
96
(5.39)
Figure 5.18: Utility function 5.36.
Figure 5.19: Absolute risk
aversion.
Figure 5.20: Skew asymptote, κ = 2.
We have plotted this absolute risk aversion for several values of κ in figure 5.19.
Using this figure we can make several hypotheses about the class of κ-utility functions. We notice that when κ gets larger, the maximum of the absolute risk
aversion gets larger. In the neighbourhood of zero we have that the larger the
κ the larger the absolute risk aversion. We also see that for all values of κ the
absolute risk aversion tends to zero in the left tail. The larger the κ the faster this
happens. Hence in the left tail the agent tends to risk-neutrality. This will also
be the case for the right tail. We can see this in figure 5.20 where we have plotted
the same utility functions as in figure 5.5 but on a larger domain. We can see that
when x becomes larger, the utility functions tend to some horizontal asymptote.
In the same figure we see that when x gets smaller the utility function tends to
some skew asymptote y = ax + b for some a and b. The computations regarding
the asymptotic behaviour of the κ-utility can be found in appendix B. To improve
the readability of this text we will only discuss the results.
It turns out that the skew asymptote is given by y = 2x + κ1 . This is illustrated
in figure 5.20. The equation of the horizontal asymptote is given by y = κ1 . The
class of utility functions 5.36 is defined for all κ > 0. We will now discuss what
happens when κ tends to 0 and κ tends to +∞.
We have that
√
lim u(x) = x − x2 .
(5.40)
κ→+∞
From this we can conclude that
(
0,
x≥0
lim u(x) =
κ→+∞
2x, x < 0.
(5.41)
We recognize the utility function used in CVAR for a confidence level α = 0.5.
The limit when κ tends to zero is given by
97
lim u(x) = x.
κ→0
(5.42)
Which gives us the utility function of a risk neutral investor.
The determination of the associated divergence function is a tedious task. All computations can be found in appendix B. We obtained that the divergence function
is given by
( p
−1
2 + 1
1
−
(t
−
1)
if 0 ≤ t ≤ 2
κ
φ(t) = κ
(5.43)
+∞
if t > 2.
In figure 5.21 we have plotted this divergence function.
Figure 5.21: The divergence function of the κ-utility for different values of κ.
To understand the effect of the parameter κ on both the divergence risk measure
and the utility based risk measure we will again use lemma 5.1. Figure lets us
suspect that if κ1 ≥ κ2 then uκ1 (x) ≤ uκ2 (x) for all x ∈ R.
98
Assume that κ1 ≥ κ2 > 0 then for all x ∈ R we have that
q
q
2 2
κ1 ≥ κ2 > 0 ⇒ 1 + κ1 x ≥ 1 + κ22 x2
q
q
⇒ − 1 + κ21 x2 ≤ − 1 + κ22 x2
p
p
1 + κ21 x2
1 + κ22 x2
⇒−
≤−
κ1
κ2
p
p
2 2
1 + κ1 x
1 + κ22 x2
1
1
+x−
≤
+x−
⇒
κ1
κ1
κ2
κ2
⇒ uκ1 (x) ≤ uκ2 (x).
From this we can conclude that an increase in the parameter κ results in an increase
of the divergence risk and of the utility based shortfall risk.
This effect of the parameter κ on the divergence risk is illustrated in figure 5.22.
We have constructed this figure in the same fashion as before. I.e sets of 10000
returns where simulated from a normal distribution with mean 0.25 such that each
set had different standard deviations.
Figure 5.22: Influence of κ-parameter on the divergence risk.
In figure 5.23 we have plotted the divergence risk and the utility based shortfall
risk with x0 = 0 of a set of returns generated from a normal distribution with mean
0.25 and standard deviation 0.8. This figure illustrates the fact that utility based
shortfall risk is increasing in the parameter κ. We can also observe that there
is a clear difference between the utility based shortfall risk with x0 = 0 and the
divergence risk, and that the difference between the two risk measures increases if
κ increases.
99
Figure 5.23: Comparison of the divergence risk and the utility based shortfall risk of
the κ-utility.
When looking at the divergence of the κ-utility we notice that for t > 2 the
divergence becomes +∞. That is for values of t larger then the slope of the skew
asymptote the divergence is +∞. We have seen something similar when we looked
at the divergence associated with CVaR. In that case we have that the utility
function is given by u(x) = − α1 max(0, −x) and y = α1 x is a skew asymptote for
x → −∞. The associated divergence was given by
(
0
if 0 ≤ t ≤ α1
φ(t) =
(5.44)
+∞ if t > α1 .
This turns out not be a coincidence. Because the slope of the skew asymptote is
an upper bound of the slope of the utility function or equivalently the slope of loss
function, the Legendre transform of the loss function becomes infinite for values
larger than this upper bound. This effect is illustrated in figure 5.24 where we
have used κ = 5. We have formalised this intuition in a lemma.
Lemma 5.2. Let u(x) be an increasing and concave utility function, such that
y = ax + b is a skew asymptote for x → −∞. Then the associated divergence
φ(t) = (−u(−t))∗ = +∞ for all t > a.
Proof. Because y = ax + b is a skew asymptote of u(x) when x tends to −∞, we
have that y = ax − b is a skew asymptote of l(x) = −u(−x) when x tends to +∞.
Because l(x) is a convex and increasing function, the slope of l(x) is increasing.
Hence a is an upper bound for the slope of the loss function. Because ax − b is a
skew asymptote when x tends to +∞ we have that
∀ > 0, ∃δ > 0 such that x > δ ⇒ |l(x) − (ax − b)| < 100
Figure 5.24: Effect of skew asymptote on the divergence function.
Hence there exists a γ such that for all x > δ l(x) − ax + b < γ because we can
take γ = or γ = −. We need to calculate sup (xt − l(x)). We will show that
x∈R
for t > a we have that xt − l(x) is unbounded. Suppose that xt − l(x) is bounded
from above for t > a, then there exists an M ∈ R such that xt − l(x) ≤ M
for all x ∈ R. We have that for x > δ, l(x) < γ + ax − b. Hence we have that
xt ≤ l(x)+M < γ+ax−b+M . From this we have that ∀x > δ, x(t−a) < γ−b+M .
Because t > a we find that x < γ−b+M
. Which gives a contradiction. Hence
t−a
∗
xt − l(x) is unbounded and φ(t) = l (t) = sup(xt − l(x)) = +∞.
x∈R
101
Conclusion
In this masterthesis we looked at different ways to incorporate utility functions
in risk measures. We focused on two classes of risk measures: utility based
shortfall risk measures and divergence risk measures. Both of these risk measures are convex, which means that they satisfy the properties of monotonicity,
translation invariance and sub-additivity. However, they are generally not coherent because they lack the positive homogeneity property. Like all convex risk
measures, these risk measures have a robust representation of the following form
sup (EQ [−X] − α(Q)). Where α(·) is a penalty function. In the case of diverQ∈M1 (P )
gence risk measures this representation is often
used
and the penalty function is
dQ
taken to be the φ- divergence Iφ (Q|P ) = E φ( dP , where the convex function φ
is called the divergence function. The effect of this divergence function is difficult
to analyse and to interpret. Fortunately the strong Fenchel duality theorem from
mathematical optimisation was able to reformulate the robust representation of
a divergence risk measure to a more comprehensible formula. We obtained that
each divergence risk measure could be interpreted as the negative of an optimised
certainty equivalent where the utility function u was linked to the divergence function φ through the Fenchel-Legendre transform. More formally we obtained that
if φ∗ (x) = −u(−x) then
Dφ (X) =
sup
(EQ [−X] − Iφ (Q|P )) = − sup (η + E [u(X − η)]) = − OCEu (X).
η∈R
Q∈M1 (P )
The utility based shortfall risk measures were defined as the negative of the u-Mean
certainty equivalent. We had that for l(x) = −u(−x) = −ũ(−x) + x0 .
SF lx0 (X) = inf{m ∈ R|E [l(−X − m) ≤ x0 ]}
= − sup{m ∈ R|E [ũ(X − m)] ≥ 0} = − Mũ (X).
Both utility based risk measures have a representation as an optimisations problem
with regard to a utility function. These optimisation problems were linked using
strong Lagrangian duality. We obtained that SF l0 (X) ≥ Dφ (X).
Because both the divergence risk measure and the utility based shortfall risk measure can be interpreted as the negative of some certainty equivalent, we looked
into the possibility of using the negative of the ordinary certainty equivalent as a
risk measure. It turned out that to get a translation invariant risk measure, only
linear or exponential utility functions could be used. Since we also noted that
for the exponential utility function, all certainty equivalents coincide, we did not
obtain any new interesting convex risk measures.
102
After reading this thesis, the reader might feel there remains an important unanswered question. Namely, ”Which utility function should be used in utility based
risk measure?” We do not give an answer to this question.
One of the reasons for not proposing a specific utility function is that utility functions model preferences and these preferences are subjective. Another important
reason is computability. In this thesis we did not look at how these risk measures could be efficiently computed. Although we did compute some utility based
risk measures in the last chapter, we did this by using a packages for constrained
and unconstrained optimisation in python. This computational aspect should be
taking into account when choosing a suitable utility function.
Although we did not put forward a specific utility function that should be used in
utility based risk measures, we did study some of them in the last chapter. Here
we tried to give an illustration in how the parameters of the utility functions affect
both the utility based shortfall risk and the divergence risk. For each of the utility
functions we also computed the associated divergence function.
103
A
Dutch summary
Om kapitaalvereisten op te stellen voor financiële instellingen is het noodzakelijk
om het risico te kunnen bepalen van de portfolio’s van deze instellingen. Het
bepalen van dit risico kan gebeuren aan de hand van risicomaten. In het eerste
hoofdstuk bestuderen we deze risicomaten vanuit een wiskundig standpunt en formuleren we enkele eigenschappen die een goede risicomaat zou moeten hebben.
Aan de hand van deze eigenschappen kunnen we een klasse van convexe risicomaten construeren. Binnen deze klasse besteden we vervolgens extra aandacht
aan de subklasse van de coherente risicomaten. Hierna bestuderen we het concept van acceptatieverzamelingen. Dit zijn verzamelingen waarin alle mogelijke
portfolio’s zich bevinden waarvan we het risico aanvaardbaar vinden. Aan de
hand van deze verzamelingen kunnen we op een eenvoudige manier risicomaten
definiëren. Vervolgens introduceren de robuuste representatie van convexe risicomaten en bestuderen we de bijhorende straffunctie.
Enkel risicomaten bestuderen vanuit een wiskundig standpunt zou volledig voorbijgaan aan het subjectieve karakter van risico. Wat een te hoog risico is voor de één
is aanvaardbaar voor de ander. Het tweede hoofdstuk geeft daarom een inleiding
tot de beslissingstheorie. Hierin introduceren we het von Neumann-Morgenstein
framework voor het maken van beslissingen onder onzekerheid. We leggen uit wat
nutsfuncties zijn en hoe ze de verschillende attitudes ten opzichte van risico kunnen
modelleren. Vervolgens schenken we aandacht aan verschillende types zekerheidsequivalenten: het gewone zekerheidsequivalent (CEu ), het geoptimaliseerde zekerheidsequivalent (OCEu ) en het u-gemiddeld zekerheidsequivalent (Mu ) komen aan
bod. We sluiten dit hoofdstuk af met een introductie van het concept stochastische
dominantie.
Gewapend met zowel de wiskundige concepten uit het eerste hoofdstuk als de
economische concepten uit het tweede hoofdstuk, kunnen we nu concrete risicomaten analyseren. Dit gebeurt in het derde hoofdstuk, waarin we de geziene concepten toepassen op zowel Value at Risk als Expected shortfall. Hierin merken we
op dat Value at Risk, een van de meest gebruikte risicomaten, enkele belangrijke
104
tekortkomingen vertoont zowel op wiskundig als op economisch gebied.
In het vierde hoofdstuk gaan we dieper in op de hoofdvraag van deze thesis: ”Hoe
kunnen nutsfuncties op een goede manier geı̈ncorporeerd worden in risicomaten?”
We geven twee mogelijke antwoorden op deze vraag. Allereerst zijn er de op
nutsfuncties gebaseerde shortfall risicomaten (SF lx0 ). De constructie van deze risicomaten gebeurt vanuit acceptatieverzamelingen. Deze acceptatieverzamelingen
bevatten alle portfolio’s waarvan het verwachte nut een bepaalde grens overstijgt.
Als we de verliesfunctie l definiëren als l(x) = −u(−x) hebben we dat
SF lx0 = inf{m ∈ R|E [u(X + m)] ≥ −x0 }.
Hierbij hebben we opgemerkt dat we deze formule kunnen herschrijven aan de
hand van een u-gemiddeld zekerheidsequivalent. We hebben dat SF l0 = − Mu (X).
Een tweede type risicomaten waarin nutsfuncties geı̈ncorporeerd zijn, zijn de zogenaamde divergentie risicomaten (Dφ ). In tegenstelling tot de op nutsfuncties
gebasserde shortfall risicomaten worden deze risicomaten niet geconstrueerd aan
de hand van acceptatieverzamelingen, maar wordt er gebruik gemaakt van de
robuuste representatie van convexe risicomaten. We hebben dat
Dφ (X) = sup (EQ [−X] − Iφ (Q|P )) .
QP
Kenmerkend
dQ voor divergentie risicomaten is dat de1 straffunctie de vorm Iφ (Q|P ) =
EP φ dP heeft, waarbij φ een convexe functie is die de divergentiefunctie genoemd wordt. Een van de bekendste voorbeelden van een divergentie risicomaat is
entropisch
risico.
Hierbij neemt men als divergentie de Kullback-Leibler entropy,
dQ
ln
. De interpretatie van divergentie risicomaten is niet eenvoudig als
EP dQ
dP
dP
men enkel beschikt over de desbetreffende robuuste respresentatie. Gelukkig biedt
de stelling van de sterke Fenchel-dualiteit hiervoor een oplossing. Indien we als
nutsfunctie u(x) = −φ∗ (−x) nemen, waarbij φ∗ de Fenchel-Legendre transformatie
is van de divergentiefunctie vinden we dat
Dφ (X) = − sup (η + E [u(X − η)]) = − OCEu (X).
η∈R
Deze representatie is veel eenvoudiger te interpreteren dan de robuuste representatie. Zo kan men entropisch risico interpreteren als het negatieve geoptimaliseerde zekerheidsequivalent van een individu met een exponentiële nutsfunctie.
Geı̈nspireerd door het feit dat zowel divergentie risicomaten als op de nutsfuncties gebaseerde shortfall risicomaten kunnen geı̈nterpreteerd worden als negatieve
zekerheidsequivalenten vroegen we ons af of ook het gewone zekerheidsequivalent op die manier aanleiding zou geven tot een goede risicomaat. Dit idee bleek
echter weinig succesvol daar deze risicomaten in vele gevallen niet over de gewenste
wiskundige eigenschappen beschikten.
Alhoewel entropisch risico een erg gekend voorbeeld is van een divergentie risicomaat, is er vanuit economisch oogpunt weinig reden waarom we voor de constructie van risicomaten de exponentiële nutsfunctie zouden gebruiken. In het laatste
hoofdstuk bestudeerden we daarom verschillende nutsfuncties in de context van
1
Die ook de waarden +∞ en −∞ aan kan nemen, maar in minstens één waarde eindig is.
105
risicomaten. We berekenden voor elk van deze nutsfuncties de geassocieerde divergentiefunctie en onderzochten de invloed van de parameters op de verschillende
risicomaten.
106
B
Additional computations
B.1
Computations regarding the SAHARA utility class
The class of SAHARA utility functions is defined using following coefficient of
absolute risk aversion.
a
rA (x) = p
2
b + (x − d)2
(B.1)
with a > 0, b > 0 and d ∈ R.
B.1.1
Computation of the utility function
Let v(x) = u0 (x) than we have that
dv(x)
v(x)
= √
−a
dx
b2 +(x−d)2
Integrating both sides
gives
−a
p
dx
b2 + (x − d)2
Z
−a
p
=
dy
b2 + y 2
p
= −a ln( b2 + y 2 + y) + C
p
= ln( b2 + (x − d)2 + (x − d))−a + C.
Z
ln(v(x)) =
We conclude that
u0 (x) = C1
p
b2 + (x − d)2 + (x − d)
For some integration constant C1 . Then we have that
107
−a
.
(B.2)
Z
u(x) =
C1
p
b2
+ (x −
d)2
+ (x − d)
−a
dx.
Let y = (x − d) then we have that
Z
p
−a
u(y) = C1
b2 + y 2 + y
dy.
Now consider following substitution
z 2 − b2
,
2z
z 2 + b2
dy =
dz.
2z 2
y=
Then we have that

s
2
2 2
−a 2
Z
2
2
z −b
z −b 
z + b2
2

b +
+
dz
u(z) = C1
2z
2z
2z 2
r
2
!−a 2
Z
z − b2
4b2 z 2 + z 4 − 2z 2 b2 + b4
z + b2
+
dz
= C1
4z 2
2z
2z 2

s
2 2
−a 2
2
Z
2
2
2
z
−
b
z
+
b
z
+
b

+
= C1 
dz
2z
2z
2z 2
2 −a 2
Z
z + b2
z
= C1
dz
2z
2z 2
Z
1
= C1 z −a−2 (z 2 + b2 )dz.
2
Now consider the case that a 6= 1 then
Z
1
u(z) = C1 z −a−2 (z 2 + b2 )dz
2
C1 z −a+1
b2 z −a−1
=
+
+ C2
2 −a + 1 −a − 1
−C1 z −a
2 −1
=
z(a
+
1)
+
b
z
(a
−
1)
+ C2 .
2(a2 − 1)
p
2 −b2
Using the substitution y = z 2z
or equivalently z = b2 + y 2 + y we have that
108
p
−a
p
p
−1
b2 + y 2 + y
2
2
2
2
2
u(y) =
b + y + y (a + 1) + b
b +y +y
(a − 1) + C2
2(a2 − 1)
−a
p
p
p
b2 + y 2 + y
−C1
2 + y 2 + y (a + 1) +
2 + y 2 − y (a − 1) + C
b
b
=
2
2(a2 − 1)
−a
p
p
b2 + y 2 + y
−C1
2 + y2 + y + C .
a
b
=
2
(a2 − 1)
−C1
We conclude that for a 6= 1 we have that
−a p
−C1 p 2
u(x) = 2
b + (x − d)2 + (x − d)
a b2 + (x − d)2 + (x − d) + C2 .
(a − 1)
(B.3)
Now consider the case that a = 1, then we have that
Z
C1 −1 C1 2 −3
z +
b z dz
u(z) =
2
2
C1
2 1
ln(z) − b 2 + C2 .
=
2
2z
p
Now using the substitution z = b2 + y 2 + y we find that


C1  p 2
1

u(y) =
b + y 2 + y − b2 p
ln
2  + C2
2
2
b2 + y 2 + y


2
p
2
2
b +y −y
p
C1 

=
b2 + y 2 + y − b2 p
ln
2 p
2  + C2
2
2
b2 + y 2 + y
b2 + y 2 − y


2
p
2 + y2 − y
b
p
C1 

=
b2 + y 2 + y − b2
ln
 + C2
2
2b4
p
2 
2
2
b +y −y 
C1  p 2
=
b + y2 + y −
ln
 + C2 .
2
2b2

Using that y = x − d we conclude that if a = 1 we have that

p
2 
2
2
b + (x − d) − (x − d) 
C1  p 2
b + (x − d)2 + (x − d) −
u(x) =
ln
+C2 .
2
2b2
(B.4)
109
Now we will determine the constants C1 and C2 such that u(0) = 0 and u0 (0) = 1.
From B.2 we have that the condition u0 (0) = 1 yields
a
√
b2 + d 2 − d .
C1 =
Notice that C1 > 0. The condition u(0) = 0 yields that when a 6= 1
C2 =
−a √
C1 √ 2
2−d
2 + d2 − d .
b
+
d
a
b
(a2 − 1)
When a = 1 we have that
C1
C2 = −
2
B.1.2
√
2 !
√
2 + d2 + d
b
ln
.
b 2 + d2 − d −
2b2
Computation of the divergence function
Now we will derive the divergence function associated with the standardised SAHARA Utility.
First suppose that a 6= 1 then the utility function is given by B.3. denote with
p
−a p
1
2
2
2
2
u1 (x) := 2
b + (x − d) + (x − d)
a b + (x − d) + (x − d)
(a − 1)
(B.5)
Then u(x) = C1 u(x) + C2 . We will calculate the divergence function associated
with u1 en denote this function φ1 . Then there is a clear relation between the
divergence function associated with u. Because C1 > 0 we have that
φ(t) = sup (xt + u(−x))
x∈R
= sup (u(x) − xt)
x∈R
= sup (C1 u1 (x) + C2 − xt)
x∈R
t
+ C2
= C1 sup u1 (x) − x
C1
x∈R
t
= C1 φ1
+ C2 .
C1
We have that φ1 (t) = supx∈R (u1 (x) − xt). The first order condition gives that
u0 (x∗ ) = t, where x∗ denotes the optimal value.
−a
p
∗
2
∗
2
b + (x − d) + (x − d)
=t
1
1 −1
⇔(x∗ − d) =
t a − b2 t a .
2
110
Then we have that
−a p
−1 p 2
∗ − d)2 + (x∗ − d)
2 + (x∗ − d)2 + (x∗ − d)
b
+
(x
a
b
(a2 − 1)
−t p 2
∗
∗
2
= 2
a b + (x − d) + (x − d)
a −1

v
!2
u
−1
−1
1
1
b2 t a
b2 t a 
−t  u
ta
ta
t
−
−
= 2
+
 a b2 +

a −1
2
2
2
2
u1 (x∗ ) =
 s
4b2
−2
a
−1
a
2
2b2
b2 t a
2
1
a

+t −
t
bt 
−t 
+
a
+
−
−1
4
2
2
 v

!2
u −1
−1
1
2
2 1/a
b ta 
−t  u
ta
t t a +b t
−
= 2
+
a

a −1
2
2
2
=
a2
−1
a
t
+ b2 t1/a
2
!
−1
a
1
b2 t a
a
+
−
2
2
−1
−t
2 1
a (a + 1) + b t a (a − 1)
=
t
2(a2 − 1)
!
−1
1
−t
ta
=
+ b2 t a (a + 1) .
2 (a − 1)
−t
= 2
a −1
t
!
We can conclude that
φ1 (t) = u1 (x∗ ) − x∗ t
−t
=
2
−1
a
!
−1
t
t
2
2 a1
a
+ b t (a + 1) −
2d + t − b t
(a − 1)
2
1
−t −1
1
1
2
=
ta
+ 1 + b ta
−1
− td
2
a−1
a+1
!
−1
1
t b2 t a
ta
=
−
− 2d .
2 1 + a1
1 − a1
1
a
Hence we have that
φ(t) = C1 φ1
t
C1
+ C2
When a = 1 we have that the first order condition yields that
p
−1
t=
b2 + (x∗0 − d)2 + (x∗ − d)2
.
111
(B.6)
From which we can conclude that
t−1 b2 t
−
2
2
−1
b2 t
t
−
.
x∗ = d +
2
2
(x∗ − d) =
We have that
!
p
b2 + (x∗ − d)2 − (x∗ − d)
2b2
!!
r
−1
2
2 + t−2 − b2 + b4 t2
1
t
b
t
1
4b
ln(t−1 ) − 2
−
+
2
2b
4
2
2
−1
1
1
t
b2 t t−1 b2 t
ln(t−1 ) − 2
+
−
+
2
2b
2
2
2
2
2
1
bt
ln(t−1 ) − 2
2
2b
1
t
−1
ln(t ) −
.
2
2
1
u1 (x∗ ) =
2
=
=
=
=
p
ln( b2 + (x∗ − d)2 + (x − d)) −
Then we have that
φ1 (t) = u1 (x∗ ) − tx∗
1
t
t−1 b2 t
−1
=
−
ln(t ) −
−t d+
.
2
2
2
2
B.2
B.2.1
Computations regarding the κ-utility class
Determining the asymptotic behaviour
For κ > 0 the κ-utility is given by
√
1
1 + κx − 1 + κ2 x2
.
(B.7)
κ
We will now determine the equation corresponding to the skew asymptote of the
κ-utility. The skew asymptote is given by y = ax+b Using the formula’s of Cauchy
we find that
u(x) =
112
u(κ, x)
x→−∞
x
!
√
1 1 + κx − 1 + κ2 x2
= lim
x→−∞ κ
x
√
− 1 + κ2 x2
= 0 + 1 + lim
x→−∞
κx
√ q1
− x2 x2 + κ2
= 1 + lim
x→−∞
q κx
x x12 + κ2
= 1 + lim
x→−∞
q κx
1
+ κ2
x2
= 1 + lim
x→−∞
κ
√
2
κ
=1+
=2
κ
a = lim
b = lim [u(x) − ax]
x→−∞
√
1
2
2
= lim
1 + κx − 1 + κ x − 2x
x→−∞ κ
1
1√
2
2
= + lim x −
1 + κ x − 2x
κ x→−∞
κ
1
1√
2
2
= + lim −x −
1+κ x
κ x→−∞
κ
"
#
√ r
1
x2 1
= + lim −x −
+ κ2
κ x→−∞
κ
x2
"
#
r
x 1
1
= + lim −x +
+ κ2
κ x→−∞
κ x2
"
#
r
1 1
1
+ κ2
= − lim x 1 −
κ x→−∞
κ x2
q


1
1
2
+
κ
1
−
1
κ
x2

= − lim 
1
κ x→−∞
x
"
#
√
1 − κ1 z 2 + κ2
1
= − lim
κ z→0
z
1
−z
1
=
= − lim √
κ z→0 κ z 2 + κ2
κ
113
We can conclude that the skew asymptote is given by y = 2x + κ1 . We will now
derive the equation of the horizontal asymptote.
√
1
1 + κx − 1 + κ2 x2
x→+∞ κ
!
r
1
1
1
= + lim
κx − κx 1 + 2 2
κ x→+∞ κ
κx
!
r
1
1
= + lim x 1 − 1 + 2 2
κ x→+∞
κx
q
1
1
−
1
+
κ2 x2
1
= + lim
1
κ x→+∞
x
lim u(x) = lim
x→+∞
q−1
κ2 x3 1+ 21 2
1
κ x
= + lim
−1
κ x→+∞
x2
1
1
q
= + lim
κ x→+∞ κ2 x 1 + 1
κ2 x2
1
.
κ
We conclude that if x tends to infinity the utility function tend to the horizontal
asymptote with equation y = κ1 .
The class of utility functions 5.36 is defined for all κ > 0. We will now show what
happens when κ tends to zéro and κ tends to +∞.
=
√
1
1 + κx − 1 + κ2 x2
κ→+∞ κ
!
r
1
1
= lim
+x−
+ x2
κ→+∞
κ
κ
r
1
= x − lim
+ x2
κ→+∞
κ
√
= x − x2 .
lim u(x) = lim
κ→+∞
√
1
1 + κx − 1 + κ2 x2
κ→0 κ
κx2
= lim x − √
κ→0
1 + κ2 x2
= x.
lim u(x) = lim
κ→0
B.2.2
Computation of the divergence function
Fix some κ > 0 then the associates loss function is given by l(x) = −u(−x).
√
1
2
2
l(x) = −
1 − κx − 1 + κ x
(B.8)
κ
114
The associated divergence function is given by φ(t) = l∗ (κ, t).
√
1
1 − κx − 1 + κ2 x2
(B.9)
φ(t) = sup xt +
κ
x∈R
The first order condition yields that
κ2 x
1
κ+ √
t−
= 0.
(B.10)
κ
1 + κ2 x2
We will only derive the divergence for t > 0. First assume that t ≥ 2. Then there
does not exist an x ∈ R such that equation B.10 holds. We can see this easily
when we rewrite this equation as
κx
.
(B.11)
t=1+ √
1 + κ2 x2
√
κx
Because κx < 1 + κ2 x2 for all x ∈ R and κ > 0 we have that √1+κ
2 x2 < 1.
Hence if equation B.11 would hold
for
some
x
and
some
κ
then
t
<.
It t ≥
2x
κ
1
2 then the first derivative t − κ κ + √1+κ2 x2 > 0. This implies the function
√
xt + κ1 1 − κx − 1 + κ2 x2 is increasing. To determine the supremum over all x
of this function we will study the limit when x tends to +∞. First consider the
case when t = 2. We have that
1
√
√
1
1
1 − κx − 1 + κ2 x2
= + lim x −
1 + κ2 x2
lim 2x +
x→+∞
κ
κ x→+∞
κ
!
r
1
1
= + lim x 1 − 1 + 2 2
κ x→+∞
κx
q
1
1
−
1
+
2
2
κ x
1
= + lim
1
κ x→+∞
x
1
1
q
= + lim
κ x→+∞ κ2 x 1 + 1
κ2 x2
=
1
.
κ
We conclude that:
1
.
κ
Not consider the case where t > 2. Calculating the limit gives us
φ(2) =
lim
x→+∞
√
1
tx +
1 − κx − 1 + κ2 x2
κ
(B.12)
1
1√
2
2
= + lim (t − 1)x −
1+κ x
κ x→+∞
κ
!
r
1
1
= + lim x (t − 1) − 1 + 2 2
κ x→+∞
κx
= +∞ (t − 1 − 1)
= +∞.
115
We find that:
φ(t) = +∞
t > 2.
(B.13)
When 0 < t < 2 equation B.11 can hold for some x. We will now determine this
x as a function of t. We find that
κx
t=1+ √
1 + κ2 x2
κx
⇒ (t − 1) = √
1 + κ2 x2
κ2 x2
⇒ (t − 1)2 =
1 + κ2 x2
1
1 + κ2 x2
⇒
=
(t − 1)2
κ2 x2
1
1
⇒
−1= 2 2
2
(t − 1)
κx
s
1
1
−1=
⇒±
2
(t − 1)
κx
±1
⇒x= q
.
1
κ (t−1)
2 − 1
In what follows we will use following notations
p
(t − 1)2
x+ = q
= p
1
κ 1 − (t − 1)2
κ (t−1)
2 − 1
p
− (t − 1)2
−1
x− = q
= p
1
κ 1 − (t − 1)2
κ (t−1)
2 − 1
q
1
+
2
φ (t) = x+ t +
1 − κx+ − 1 + κ2 x+
κ
q
1
−
φ (t) = x− t +
1 − κx− − 1 + κ2 x2− .
κ
1
−
We remark that if t = 1 then x+
t = xt = 0. Hence for t = 1 we have that
φ(1) = φ+ (1) = φ− (1) = 1 − 1 = 0.
The second order condition follows from a straightforward calculation
d
1
κ2 x
d
−κ2 x
−κ2
√
√
=
=
t−
κ+
3 < 0.
dx
κ
dx
1 + κ2 x2
1 + κ2 x2
(1 + κ2 x2 ) 2
To calculate the divergence we need to calculate
φ(t) = max φ+ (t), φ− (t)
116
(B.14)
q
1
φ (t) = x+ t +
1 − κx+ − 1 + κ2 x2+
κ
s
!
p
p
(t − 1)2 t
κ (t − 1)2
(t − 1)2
1
= p
1− p
+
− 1+
(1 − (t − 1)2 )
κ 1 − (t − 1)2 κ
κ 1 − (t − 1)2
!
p
p
(t − 1)2 t
(t − 1)2
1
1
1− p
= p
+
−p
κ 1 − (t − 1)2 κ
1 − (t − 1)2
1 − (t − 1)2
!
p
p
(t − 1)2 t − (t − 1)2 − 1
1
1
p
+
=
κ
κ
1 − (t − 1)2
!
p
(t − 1)2 (t − 1) − 1
1
1
p
=
+
κ
κ
1 − (t − 1)2
p
We need to distinguish two
cases
when
t
>
1
we
have
that
(t − 1)2 = (t − 1)
p
and when t < 1 we have (t − 1)2 = −(t − 1).
Hence if t > 1 we have
!
2
(t − 1) − 1
1
1
p
+
φ+ (t) =
2
κ
κ
1 − (t − 1)
!
−1
1 − (t − 1)2
1
p
=
+
κ
κ
1 − (t − 1)2
p
1
−1
1 − (t − 1)2 + .
=
κ
κ
And if t < 1 we have
+
1
φ+ (t) =
κ
−(t − 1)2 − 1
p
1 − (t − 1)2
!
1
+ .
κ
We’ll now calculate φ− (t).
q
1
−
2
2
φ (t) = x− t +
1 − κx− − 1 + κ x−
κ
s
!
p
p
κ (t − 1)2
− (t − 1)2 t
1
(t − 1)2
1+ p
= p
+
− 1+
(1 − (t − 1)2 )
κ 1 − (t − 1)2 κ
κ 1 − (t − 1)2
!
p
p
(t − 1)2
− (t − 1)2 t
1
1
= p
+
1+ p
−p
κ 1 − (t − 1)2 κ
1 − (t − 1)2
1 − (t − 1)2
!
p
p
1 − (t − 1)2 t + (t − 1)2 − 1
1
p
=
+
κ
κ
1 − (t − 1)2
!
p
1
1 − (t − 1)2 (t − 1) − 1
p
=
+
κ
κ
1 − (t − 1)2
117
We again distinguish two cases. For t > 1 we find that
1
φ− (t) =
κ
−(t − 1)2 − 1
p
1 − (t − 1)2
!
1
+ .
κ
and for t < 1 we have
!
(t − 1)2 − 1
1
p
+
κ
1 − (t − 1)2
1
−1 p
=
1 − (t − 1)2 + .
κ
κ
1
φ− (t) =
κ
−
+
If t > 1 we have that φ+ (t) ≥ φ− (t) and
for t < 1 we find that φ (t) ≥ φ (t). We
p
lim −1
1 − (t − 1)2 + κ1 = κ1 .
also have that x%2
κ
We can conclude that
( p
−1
2
1 − (t − 1) + κ1 if 0 ≤ t ≤ 2
κ
φ(t) =
+∞
if t > 2.
118
Bibliography
[1] C. Acerbi, D. Tasche, On the coherence of expected shortfall, Journal of Banking
and Finance, Vol. 26, Issue 7, 2002, 1487-1503.
[2] C. Acerbi, Spectral measures of risk: A coherent representation of subjective
risk aversion, Journal of Banking and Finance, Vol. 26, 2002, 1505-1518.
[3] A. Ahmadi-Javid, Entropic Value-at-Risk: A new coherent risk measure, J
Optim Theory Appl, Vol. 155,Issue 3, 2012, 1105-1123.
[4] A. Ben-Tal, A. Ben-Israel, A recourse certainty equivalent for decisions under
uncertainty., Annals of Operations Research, Vol. 30, Issue 1, 1991, 1-44.
[5] A. Ben-Tal,M. Teboulle, An old-new concept of convex risk measures: the optimised certainty equivalent., Mathematical Finance, Vol. 17, Issue 3, 2007,
449-476.
[6] J.M. Borwein, A.S. Lewis, Partially finite convex programming, Part I: Quasi
relative interiors and duality theory, Mathematical Programming, Vol. 57,
1992, 15-48.
[7] J.M. Borwein, D.R. Luke, Duality an convex programming, Handbook of Mathematical Methods in Imaging, edited by Scherzer and Otmar, Springer, 1992,
229-270.
[8] A. Chen, A. Pelsser, M. Vellekoop, Modelling non-monotone risk aversion using
SAHARA utility functions, Journal of Economic Theory, Vol. 146, Issue 5,
2011, 2075-2092.
[9] E.R. Csetnek, Overcoming the failure of classical generalized interior-point regularity conditions in convex optimisation., Logos verlag Berlin GmbH, 2010.
[10] S. Drapeau, M. Kupper, A. Papapantoleon, A Fourier Approach to the Computation of CV@R and Optimized Certainty Equivalents, Journal of Risk, Vol.
16, 2013, 3-29.
[11] H. Föllmer, A. Schied, Stochastic finance: An introduction in discrete time,
De Gruyter, 2010.
[12] H. Föllmer, A. Schied, Convex and coherent risk measures, unpublished paper.
[13] H. Föllmer, A. Schied Convex measures of risk and trading constraints., Finance and Stochastics, Vol. 6, Issue 4, 2002, 429-447.
119
[14] G. C. Goodwin, M. M. Seron, J. A. de Doná , Constrained Control and Estimation: An Optimisation Approach, Springer Science and Business Media,
2006.
[15] V. Henderson, D. Hobson, Utility indifference Pricing: an Overview, Volume
on Indifference Pricing, 2004.
[16] O. Hernandez-Lerma, J. B. Lasserre Markov Chains and Invariant Probabilities, Birkhäuser, 2012.
[17] V.Jose,R.Nau,R.Winkler, Scoring Rules, Generalized Entropy and Utility
maximisation, Operations Research, Vol. 56, Issue 5, 2008, 1146-1157.
[18] T. Knispel, H. Föllmer, Convex Risk Measures: Basic Facts, Law-invariance
and beyond, Asymptotics for Large Portfolios, Handbook of the Fundamentals
of Financial Decision Making: In 2 Parts, edited by L. MacLean, W. Ziemba,
World scientific, 2013, 507-555.
[19] H. Levy, Y. Kroll, Ordering Uncertain Options with Borrowing and Lending,
The Journal of Finance, Vol. 33, Issue 2, 1978, 553-574.
[20] A.Mas-Collell,M. Whinston,J. Green, Microeconomic theory, Oxford University Press, 1995.
[21] K. Martin, C.T. Ryan, M. Stern, The Slater Conundrum: Duality and Pricing
in Infinite Dimensional Optimization., SIAM. J. OPtim., Vol. 26, Issue 1, 2016,
111-138.
[22] R.Nau, R.Jose, R. Winkler, Duality between maximization of expected utility
and minimization of Relative entropy when probabilities are imprecise, Int.
Symp. on imprecise probability, 2009.
[23] J. Pontstein, Approaches in the theory of optimisation, Cambridge University
Press, 1980.
[24] R. Raskin, M. Cochran, Interpretations and transformations of scale for
the pratt-arrow absolute risk aversion coefficient: implications for generalized
stochastic dominance, Western Journal of Agricultural Economics Vol.11, Issue
2, 1986, 204-210.
[25] R.T. Rockafellar, Convex Analysis, Princeton University press, 1970.
[26] L. Rogers, D. Williams Diffusions, markov processes and martingales: Volume
1: Foundations, Cambridge University Press, 2000.
[27] W.Rudin, Principles of Mathematical Analysis , Third edition, Mc.Graw-Hill,
1964.
[28] Y. Syau, A note on convex functions,International J. Math. and Math. Sci.,
Vol.22, 1998, 525-534.
120
[29] Y.Yamai, T. Yoshiba, Comparative analyses of expected shortfall and value
at risk: expected utility maximisation and tail risk., Monetary and Economic
Studies, 2002, 95-116.
121