Further Topics in Mathematical Finance - Lesson 6

MTH6120 Further Topics in Mathematical Finance
Lesson 6
Contents
3 Valuation by expected utility
3.1 Expected utility . . . . . . . . . . . . . . . . . . . . . . . .
3.1.1 St. Petersburg Paradox . . . . . . . . . . . . . . . .
3.1.2 Cramer’s resolution of the St. Petersburg Paradox . .
3.1.3 Utility functions . . . . . . . . . . . . . . . . . . . .
3.1.4 How are utility functions used . . . . . . . . . . . . .
3.2 The portfolio selection problem . . . . . . . . . . . . . . . .
3.2.1 Definition of the problem . . . . . . . . . . . . . . .
3.2.2 Calculation of the expected utility . . . . . . . . . . .
3.2.3 Calculating the variance and expectation of the payoff
3.2.4 Optimising the expected utility . . . . . . . . . . . .
3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
50
50
51
51
52
54
56
56
57
59
62
Valuation by expected utility
We have seen that valuation by expected pay-off is very useful in pricing financial derivatives
such as options. However this expectation is essentially a mathematical “trick” used to translate the notion of no-arbitrage. I.e. a way of enforcing that all bets or tradeable assets are in
some sense “compatible”. This fact is expressed by stating that the probability we use is the
“risk-neutral probability” and not the “real world probability”1 .
One could consider expectations in other aspects of finance. This time based on the
normal, real world, probabilities, as a way of measuring the expected success of a financial
investment.
In this chapter we will explore how this can be done and a new important consideration
it gives rise to. This is chiefly the notion of utility function, a way of rescaling the money
needed to better reflect the value it has for an investor.
3.1
Expected utility
We will start by examining a well known examples that will serve as motivation for subsequent
definitions.
1
You might find this confusing the first time. Recall the situation where we defined probabilities from the
odds of a dice game: p1 = 1/(1 + o1 ),. . . , p6 = 1/(1 + o6 ), note that these “probabilities” have nothing to
do with the physical qualities of the dice (they are not the real world probabilities), they are just a way of
codifying the prices that bookmakers are making, called risk-neutral probabilities
50
3.1.1
St. Petersburg Paradox
Nicholas Bernoulli proposed the following problem in 1713. Consider a bet whereby:
• You toss a coin. If you get heads you receive £1.
• Otherwise toss the coin again. If you get heads you receive £2.
• Otherwise toss the coin again. If you get heads you receive £4.
• Otherwise toss the coin again. If you get heads you receive £8.
• ...
Written mathematically, the payoff is


£1
if we get H. Probability = 21




£2
if we get TH. Probability = 212




£22
if we get TTH. Probability = 213
Payoff =
..

.



n

n

£2
if we get T· · ·TH. Probability =



..

.
1
2n+1
Note that you always make money so this bet is surely worth something. A way of estimating
its value is to look at the expected payoff.
This is easy to calculate:
1
1
1
1
E (Payoff) = £1 + £2 2 + £22 3 + £23 4 + · · · =
2
2
2
2
1 1 1 1
= + + + + ··· =
2 2 2 2
= £∞.
(1)
This would suggest that this bet is worth £∞. Would you pay £100,000 for the privilege
of playing this game? Most people would not. Why?
3.1.2
Cramer’s resolution of the St. Petersburg Paradox
Gabriel Cramer2 recognised the problem with the calculation above was the inordinate contribution made by very large payoffs, such as 225 = £33,554,432 weighted with a very small
probability 1/226 = 0.0000015%. In 1728 he made the following observation: a normal person
will perceive in the same way gaining £224 , £225 , £226 or any larger quantity of cash3 .
Therefore, in order to evaluate the bet in St. Petersburg Paradox we might as well replace
all payoffs larger than £224 by £224 .
2
This is the same Cramer responsible for the Cramer rule to solve linear equations using determinants.
This is the Bill Gates effect which states that Bill Gates would not improve his lifestyle in any way were
he to receive $10 billion in addition to his current fortune.
3
51
With this modified notion we estimate the price of the bet by modifying the calculation in
(1) as follows:
Modified expectation
of Payoff
1
1
1
1
1
= £1 + £2 2 + · · · + £224 25 + £224 26 + £224 27 + . . . =
2
2
2 2
2
1 25 times 1 224
1
1
= + · · · + + 26 1 + + 2 + · · · =
2
2 2
2 2
1
= 12.5 + 2 = £13
4
where we have evaluated 1 + 21 + 212 + · · · using the geometric series formula: ra + ra+1 + · · · =
ra /(1 − r) for |r| < 1.
According to this calculation the bet in the St. Petersburg paradox is not worth more than
£13 to a normal person. This is in sharp contrast with our earlier evaluation of £∞ based on
the expectation of payoff.
The idea of Cramer is in essence to change the value we attribute to money to account
for its utility. In words of Cramer:
“Mathematicians estimate money in proportion to its quantity, and men
of sense in proportion to the usage they make of it.”
3.1.3
Utility functions
After Cramer, Daniel Bernoulli4 elaborated on the idea of utility attached to an amount of
cash (see FYI section at the end). His ideas give rise to the following definition.
Definition 3.1. A utility function is an increasing function u(x) that encodes the value
to an investor of having x units of cash.
A utility function normally looks like the one plotted in Figure 1. Typically the larger x is
the slower u(x) increases. A function with this property is called concave.
Definition 3.2. A function is called concave if its slope decreases, that is, if for all x in
the domain of f :
f (x + ε) − f (x)
f (x) − f (x − ε)
≤
.
ε
ε
When f is twice differentiable this is equivalent to f 00 (x) ≤ 0 for all x.
Remark 3.3. Note that each investor will have a different utility function that encodes how
much satisfaction different amounts of cash can bring.
4
Daniel Bernoulli was Nicholas Bernoulli’s brother. The Bernoullis were a family of very prolific mathematicians.
52
u(101)
u(100)
u(2)
u(1)
−£10
x
£1 £2
£100 £101
Figure 1: Sample utility function.
Definition 3.4.
1. A utility function that is concave is called risk-averse.
2. A utility function that is linear is called risk-neutral.
3. A utility function that is convex is called risk-seeking.
Remarks 3.5.
1. By far most utility functions used are concave, i.e. risk-averse.
2. The reason why a concave utility function is called risk-averse is that, as in Figure 1, the
additional value of one unit of cash, when we already have an amount, is diminished.
Also, as the value of x decreases the utility, u(x) goes down more steeply, making losses
weigh more negatively.
3. The reason why a linear utility function is called risk-neutral is that it is a simple
rescalling of cash (e.g. converting dollars to cents) and does not weigh differently the
risk of losses and gains.
4. The reason why a convex utility function is called risk-seeking is that it will minimise
the negative value of losses and magnify the value of gains. One could say that this is
the risk function implicitly used by gamblers who value very highly a slim chance of a
large profit against likely losses.
Examples 3.6.
1. The function originally considered by Daniel Bernoulli is u(x) = log(x).
Note that this is a concave (i.e. risk-averse) utility function as u00 (x) = −1/x2 ≤ 0.
An advantage of this function is that it can be described in economic terms as follows:
the additional utility gained from an extra unit of cash, ε, is inversely proportional to
the money we already have, x. The mathematical justification for this is:
u(x + ε) − u(x) ∼ u0 (x)ε =
53
1
ε
x
where we have used the approximation (u(x + ε) − u(x)) /ε ∼ u0 (x) and the fact that
u0 (x) = x1 .
A problem with the utility function, log(x), is that it cannot be used with negative values
of x which represent debt.
2. A utility function that is often used as it applies to positive and negative values of x is
u(x) = 1 − e−βx for β > 0.
Note that u0 (x) = βe−βx and so as β > 0 this is positive and u is, as required, increasing.
Also, u00 (x) = −β 2 e−βx < 0 so that u is a concave, i.e. risk-averse, utility function.
For example, for β = 1 we have:
• u(−10) = −22,025, so the value of a £10 debt is greatly magnified.
• u(10) = 0.999955 and u(11) = 0.999983 which means the extra value of £1 when
we already have £10 is very small.
3. Cramer’s explanation of the St. Petersburg paradox implicitly uses the following utility
function
(
x
if x ≤ 224
u(x) =
224 if x > 224 .
Is this function concave?
3.1.4
How are utility functions used
Utility functions allow us to rescale money according to its financial impact. We will replace
the expectation of payoff
E (Payoff)
by the expectation of utility
E (u(Payoff))
and use this to measure the benefit of an investment.
Example 3.7. An investor has a capital of X and is considering investing a part of it, α%, in
a risky investment that will either double with probability p, or become zero with probability
q = 1 − p. If the investor’s risk preferences are defined by the utility function u(x) = log(x).
What fraction, α, of their capital should they invest?
If αX is the invested amount then (1 − α)X is the uninvested amount. The payoff is
therefore:
(
(1 − α)X + 2αX = (1 + α)X with probability p
Payoff =
(1 − α)X + 0αX = (1 − α)X with probability q = 1 − p.
According to the discussion above the optimal choice of investment is that which maximizes
the expected utility
E (u(Payoff)) .
Note that, as the payoff depends on α, this will be a function of α:
f (α) = E (u(Payoff)) = u ((1 + α)X) p + u ((1 − α)X) q =
= log ((1 + α)X) p + log ((1 − α)X) q
= log(X) + log (1 + α) p + log (1 − α) q
54
To maximize this we take the derivative of f (α):
1
−1
p(1 − α) − q(1 + α)
(p − q) − α
p+
q=
=
2
1+α
1−α
1−α
1 − α2
This derivative will be zero when α = p − q. It is a maximum because
f 0 (α) =
−(1 − α2 ) − ((p − q) − α)(−2α)
f (α) =
(1 − α2 )2
00
If we substitute at α = p − q
−(1 − (p − q)2 )
−1
f (p − q) =
=
2
2
(1 − (p − q) )
1 − (p − q)2
00
This is negative because p − q = 2p − 1 < 2 − 1 = 1. Therefore the value α = p − q maximizes
the expected utility function f (α).
Note that this result is reasonable: if p is large then the risky investment is likely to pay,
therefore the optimal invested fraction, α, should go up which is consistent with the formula
α = p − q.
The following is a slightly more realistic example.
Example 3.8. Consider the same situation as described in the previous example but where
the non-invested part of the capital, (1 − α)X, is deposited for the investment period, T , in
a bank account yielding an interest rate5 r. The payoff is in this case
(
(1 + rT )(1 − α)X + 2αX
with probability p
Payoff =
(1 + rT )(1 − α)X + 0αX = (1 + rT )(1 − α)X with probability q = 1 − p.
We apply the same logic as in the previous example. The optimal allocation of capital to
the risky investment, α, is the one that maximizes the expected utility:
f (α) = E (u(Payoff)) .
We calculate this
f (α) =
=
=
=
u ((1 + rT )(1 − α)X + 2αX) p + u ((1 + rT )(1 − α)X) q
log ((1 + rT )(1 − α)X + 2αX) p + log ((1 + rT )(1 − α)X) q
log(X) + log ((1 + rT )(1 − α) + 2α) p + log ((1 − α)) q + log(1 + rT )q
log(X) + log ((1 + rT ) + α(1 − rT )) p + log ((1 − α)) q + log(1 + rT )q.
To maximize this function we calculate the derivative:
−1
1 − rT
p+
q
f 0 (α) =
(1 + rT ) + α(1 − rT )
1−α
p(1 − rT )(1 − α) − q ((1 + rT ) + α(1 − rT ))
=
((1 + rT ) + α(1 − rT )) (1 − α)
(p(1 − rT ) − q(1 + rT )) − (p(1 − rT ) + q(1 − rT )) α
=
((1 + rT ) + α(1 − rT )) (1 − α)
(p − q − rT ) − (1 − rT ) α
=
.
((1 + rT ) + α(1 − rT )) (1 − α)
5
Note that in real life all interest rates are annualised. This means that the deposit starting at $1 will,
at the maturity of the deposit be worth $(1 + rT ) where T it the time to maturity in years. The crude
non-annualised return rT is never used as it is a very small number for short dated deposits and a very large
number for 30 year deposits and cannot therefore be compared with usual interest rate figures.
55
The derivative will be zero when
α=
p − q − rT
1 − q − q − rT
2q
=
=1−
.
1 − rT
1 − rT
1 − rT
(2)
We leave it to the reader to verify that this is indeed a maximum and to sanity check equation
(2):
1. The formula should reduce to the one obtained in the previous example if r = 0.
2. When interest rates, r, go up we would expect it to be less useful to invest in the risky
asset. We would therefore expect α in formula (2) to go down if r goes up.
3. As in the previous example we expect that if the probability of success, in the risky
investment, p, goes up, then the optimal amount to invest will go up. Check this.
3.2
The portfolio selection problem
In the examples in the last section we saw how the amount we allocate to a risky investment
can be chosen to maximize expected utility.
This section addresses a similar problem in a more realistic framework.
3.2.1
Definition of the problem
Assumptions.
1. We assume that an investor has a capital of ω
2. There are n investments which they can choose. If we allocate part of our capital, ωi , to
investment i, at the end of the investment period we will have ωi (1+Ri ). Schematically:
Start of investment period
$ ωi
End of investment period
$ ωi (1 + Ri )
Note that Ri is the return on the investment as defined in section 2.5.3 Returns of
the asset6 , as:
Value at end of
investment period
− value at start
ωi (1 + Ri ) − ωi
=
= Ri
value at start
ωi
Also note that the return Ri is unknown now. It is a random variable only revealed at
the end of the investment period.
3. In order to make it possible to calculate we will further assume:
(a) The returns, Ri , are normal random variables with expectation ri and variance vi2 :
Ri ∼ N (ri , vi2 ).
(b) The utility function relevant to the investor is u(x) = 1 − e−βx .
6
Note that the plain return was noted R0 in that section as we wished to emphasize the log-return, noted
R there. In this section, we will not use log-returns, all returns will be as described above
56
Definition 3.9. The vector (ω1 , . . . , ωn ) is called a portfolio as it determines the holdings
of each of the available investments. We will assume that the total amount invested equals
the total capital
ω1 + · · · + ωn = ω.
Remark 3.10. Note that the payoff of the investment will be:7
Payoff = ω1 (1 + R1 ) + · · · ωn (1 + Rn )
this again will be a random variable only revealed at the end of the investment period. Note
that as the returns Ri are normal, Payoff will also be a normal random variable.
The Portfolio Selection Problem is the following
Problem (The Portfolio Selection Problem). Find a portfolio ω1 , . . . , ωn such that the expected utility of the investment
E (u(Payoff))
(3)
is maximum.
Note that the expression (3) is a function of the variables ω1 , . . . , ωn and that these
variables are constrained to satisfy ω1 + · · · + ωn = ω.This is an instance of what is commonly
known as a constrained optimization problem which represents a massive mathematical
field with numerous applications to industry.
3.2.2
Calculation of the expected utility
Theorem 3.11. The objective function in the Portfolio Selection Problem, E (u(Payoff)), is
equal to
1 2
(4)
E (u(Payoff)) = 1 − e−βE(Payoff)+ 2 β Var(Payoff)
Proof. We need to calculate
E (u(Payoff)) = 1 − E e−βPayoff
(5)
In general it is difficult to calculate E e−βPayoff for an arbitrary random variable Payoff. But
as remarked above Payoff is a normal random variable. The result now follows by from the
following lemma applied to X = −βPayoff, we have that the mean and variance of X are
respectively:
E (X) = −βE (Payoff)
Var (X) = β 2 Var (Payoff)
Substituting this in (6) we get
1 2
E e−βPayoff = e−βE(Payoff)+ 2 β Var(Payoff)
which when inserted in (5) yields the expression in the Theorem.
7
Note that unlike Chapter 1 and 2 on arbitrage we do not keep track of any debt required to fund entering
into these investments. We could either assume that the initial capital, £ω, is given, or that one of the
investments, say the first ω1 , reflects that debt. We will not deal with this in this section.
57
The following is essentially the same as Theorem 1.12 in Lesson 1.
Lemma 3.12. If X is a normal random variable with mean a and variance b, X ∼ N (a, b),
then
1
E eX = ea+ 2 b
(6)
Proof. The proof is essentially the same as that of Theorem 1.12.
Recall, as in the proof of Theorem 1.12,
√ that by Corollary 1.10 such a random variable
X ∼ N (a, b) can be written as X = a + bZ with Z ∼ N (0, 1). As in the proof of Theorem
1.12 We write the expectation as an integral
Z +∞
√ Z +∞
√
√
1
2
a+ bz 1
−z 2 /2
a+ bZ
√ e
√ e(2a+2 bz−z )/2 dz.
e
=
dz =
E e
2π
2π
−∞
−∞
We now use completion of squares:
√
√
2a + 2 bz − z 2 = −(z − b)2 + b + 2a
which yields
E e
√ a+ bZ
= e
a+b/2
Z
+∞
−∞
√ 2
1
√ e−(z− b) /2 dz.
2π
But the
√ integral is 1, as the expression inside the integral is the probability density function of
a N ( b, 1) random variable. This proves the lemma.
Corollary 3.13. The portfolio, ω1 , . . . , ωn , that maximizes expected utility maximizes the
function
1
f (ω1 , . . . , ωn ) = βE(Payoff) − β 2 Var(Payoff)
2
Proof. This is a consequence of the fact that
1 − e−a ≤ 1 − e−b ⇐⇒ e−a ≥ e−b ⇐⇒ a ≤ b.
The statement of the corollary follows because if Payoffa (resp. Payoffb ) indicates the
payoff corresponding to a portfolio ω1a , . . . , ωna (resp. ω1b , . . . , ωnb ) we have:
1
1
2
2
1 − e−βE(Payoffa )+ 2 β Var(Payoffa ) ≤ 1 − e−βE(Payoffb )+ 2 β Var(Payoffb ) ⇐⇒
1
1
⇐⇒ βE(Payoffa ) − β 2 Var(Payoffa ) ≤ βE(Payoffb ) − β 2 Var(Payoffb ).
2
2
According to this corollary we just need to focus on maximizing the function
1
f (ω1 , . . . , ωn ) = βE(Payoff) − β 2 Var(Payoff)
2
(7)
subject to the constraints ω1 + · · · + ωn = ω.
Remarks 3.14.
1. Note that the calculation in the proof of Theorem 3.11 above worked
thanks to the assumptions that
(a) The returns are normal.
(b) The utility function is u(x) = 1 − exp(−βx).
58
Otherwise we would not be able to derive the results above.
2. Note that if the expectation of the payoff of a portfolio a is greater than expectation of
the payoff of a portfolio b whilst their variances are the same, then according to (4) the
expected utility will be greater for portfolio a. This is consistent with our intuition that
a and b have the same variance and a higher expected payoff then a should be preferred.
3. Likewise, if two portfolios a and b offer the same expected payoff but a has a lower
variance then (4) says the expected utility will be larger for portfolio a. This again
corresponds with intuition that if two investments offer the same average payoff we
should prefer the one that is less risky (has smaller variance).
3.2.3
Calculating the variance and expectation of the payoff
We saw in the previous section that in order to find the portfolio that optimises expected
utility
1 2
E (u(Payoff)) = 1 − e−βE(Payoff)+ 2 β Var(Payoff)
we need to maximize the function
1
f (ω1 , . . . , ωn ) = βE(Payoff) − β 2 Var(Payoff).
2
In this section we will calculate this quantity.
Theorem 3.15. The expectation of Payoff is
E (Payoff) = ω1 (1 + r1 ) + · · · + ωn (1 + rn )
where ri is defined to be the expectation of Ri .
Proof. The proof is trivial
n
X
E (Payoff) = E
(1 + Ri )ωi
!
=
n
X
(1 + E (Ri )) ωi
i=1
i=1
Note that as ωi are non-random they drop out of the calculations of expectations.
The next calculation requires us to use the notion of covariance and correlation. What
follows is a reminder of these concepts.
Remarks 3.16. As homework make sure that you prove any of the statements provided below
without proof.
1. Recall that the covariance of two random variables X and Y is defined by:
Cov (X, Y ) = E (XY ) − E (X) E (Y ) .
It is an easy exercise to see that this is equal to:
E ((X − E(X)) (Y − E(Y )))
Covariance measures whether the random variables X and Y move generally in the same
direction (Cov > 0), or in independent directions (Cov = 0), or in opposite directions
(Cov < 0). To see this intuitively note that the covariance will be positive if whenever
59
(a) X ends up above its expectation E(X) then Y also ends up above its expectation
E(Y ); as then (X − E(X)) (Y − E(Y )) > 0
(b) X ends up below its expectation E(X) then Y also ends up below its expectation
E(Y ); as then (X − E(X)) (Y − E(Y )) is also positive.
Likewise, the covariance is negative if whenever
(a) X ends up above its expectation E(X) then Y ends up below its expectation E(Y );
as then (X − E(X)) (Y − E(Y )) < 0
(b) X ends up below its expectation E(X) then Y ends up above its expectation E(Y );
as then (X − E(X)) (Y − E(Y )) is also negative.
2. The covariance measures dependency between X and Y : if X and Y are independent
then Cov (X, Y ) = 0.
3. If X = Y we get Cov (X, X) = Var(X).
4. From the definition you can easily derive:
(a) Cov(X, Y ) = Cov(Y, X)
(b) Cov(X1 + X2 , Y ) = Cov(X1 , Y ) + Cov(X2 , Y )
(c) Cov(aX, Y ) = aCov(X, Y )
By the symmetry property (a) it is easy to derive properties like (b) and (c) for the
second argument:
(b’) Cov(X, Y1 + Y2 ) = Cov(X, Y1 ) + Cov(X, Y2 )
(c’) Cov(X, aY ) = aCov(X, Y )
From properties (a), (b), (c), (b’), and (c’) you can derive:
Cov (a1 X1 + a2 X2 , b1 Y1 + b2 Y2 ) = a1 b1 Cov (X1 , Y1 )+a1 b2 Cov (X1 , Y2 )+a2 b1 Cov (X2 , Y1 )+a2 b2 Cov (X2 , Y2 ) .
By writing a bit more you can conclude that in general:
!
n
m
n X
m
X
X
X
Cov
ai X i ,
bj Yj =
ai bj Cov (Xi , Yj )
i=1
j=1
i=1 j=1
5. The properties mentioned above suggest that the covariance has similar properties to
the scalar product. Likewise the variance has similar properties to the norm squared of a
vector. In fact the covariance is a scalar product in some complicated infinite dimensional
space of random variables. We will not cover this. It is just mentioned in case it helps
you learn the properties of the covariance by analogy to the scalar product.
6. The size of the covariance is dependent on the sizes of X and Y . So if, X, instead
of measuring an amount dollars, changes to measure in cents then everything will be
scaled up by 100. A number that is better scaled is the correlation:
Cov (X, Y )
.
ρXY = p
Var(X)Var(Y )
It can be seen that ρXY ∈ [−1, 1]8 .
8
In the analogy discussed above the correlation is the cos of the angle between two vectors. Recall the
hv,wi
formula, for the cosine of the angle, α between two vectors v and w: cos(α) = kvkkwk
60
7. If X = Y then the correlation is ρXY = +1 = +100%. The variables are very highly
correlated.
8. If X = −Y then the correlation is ρXY = −1 = −100%. The variables are very highly
anticorrelated.
We separate out as a lemma some key properties that we will use below:
Lemma 3.17.
1. For a random variable X and real number a we have that:
Var(X + a) = Var(X)
(8)
2. For random variables X, Y we have:
Var(X + Y ) = Var(X) + 2Cov(X, Y ) + Var(Y )
(9)
3. For random variables X1 , . . . , Xn we have:
Var(X1 + · · · + Xn ) =
n X
n
X
Cov(X1 , Xj )
(10)
i=1 j=1
Remarks 3.18.
1. Note that in expression (10) the terms with i = j are just variances as
Cov(Xi , Xi ) = Var(Xi ). Note also that all other terms appear twice as Cov(Xi , Xj ) =
Cov(Xj , Xi ).
2. Formula (9) is the same as (10) for n = 2. It has just been added to aid your understanding of (10). Make sure you understand precisely how (9) is a special case of
(10).
Proof.
1. The proof of (8) is standard but we will provide it for completeness. The proof
consists of writing the definition and calculating:
Var (X + a) = E (X + a)2 − (E (X + a))2
= E X 2 + 2aX + a2 − (E (X) + a)2 =
= E X 2 + 2aE(X) + a2 − E (X)2 + 2aE (X) + a2
= E X 2 − E (X)2 = Var(X)
2. The proof of (9) is very similar to the proof of (10) and we will leave it as an exercise
which will help you understand the proof below.
3. To prove (9) we just write the definition of variance and calculate:

!
!2 
!!2
n
n
n
X
X
X
Var
Xi
= E
Xi  − E
Xi
i=1
i=1
= E
n X
n
X
i=1
!
X i Xj
i=1 j=1
=
n X
n
X
=
!2
E (Xi )
i=1
E (Xi Xj ) −
i=1 j=1
n X
n
X
−
n
X
n X
n
X
E (Xi ) E (Xj )
i=1 j=1
(E (Xi Xj ) − E (Xi ) E (Xj )) =
i=1 j=1
n X
n
X
i=1 j=1
61
Cov (Xi , Xj ) .
Theorem 3.19. The variance of payoff is
Var(Payoff) =
n X
n
X
ωi ωj vi vj ρij
i=1 j=1
where vi2 is the variance of the ith return Ri and ρij is the correlation between the ith and
the jth returns.
Proof. This is an easy consequence of the lemma:
!
!
n
n
X
X
Var(Payoff) = Var
ωi (1 + Ri ) = Var
ωi Ri
i=1
=
=
=
n X
n
X
i=1 j=1
n X
n
X
i=1 j=1
n X
n
X
by (8)
i=1
Cov (ωi Ri , ωj Rj )
by (10)
ωi ωj Cov (Ri , Rj )
ωi ωj vi vj ρij
by Remarks 3.16.4 (c) and (c’)
from the definition of correlation Remarks 3.16.6
i=1 j=1
Remark 3.20. For example, in the case n = 2 we have
E (Payoff) = (1 + r1 )ω1 + (1 + r2 )ω2
Var (Payoff) = ω1 ω1 v1 v1 ρ11 + ω1 ω2 v1 v2 ρ12 + ω2 ω1 v2 v1 ρ21 +
+ ω2 ω2 v2 v2 ρ22
= ω12 v12 + 2ω1 ω2 v1 v2 ρ12 + ω22 v22
(11)
Note the simplifications in the expression for the variance arising from the fact that ρ11 =
ρ22 = 1 (see Remark 3.16.7) and ρ12 = ρ21 .
Conclusion. From the previous two theorems we see that in order to find the portfolio
ω1 , . . . , ωn which maximizes the expected utility we need to find the maximum of the following function
f (ω1 , . . . , ωn ) = β
n
X
i=1
n
n
1 XX
ωi ωj vi vj ρij
ωi (1 + ri ) − β 2
2 i=1 j=1
(12)
where we presume that we know the variables r1 ,. . . , rn , v1 ,. . . , vn , ρij . Note this is a
quadratic function in its variables ω1 ,. . . , ωn .
3.2.4
Optimising the expected utility
We will maximize the function (12) in full detail in the case n = 2. In this case, by (11) we
have that the function to be maximized is
1
f (ω1 , ω2 ) = β ((1 + r1 )ω1 + (1 + r2 )ω2 ) − β 2 ω12 v12 + 2ω1 ω2 v1 v2 ρ12 + ω22 v22 .
2
62
Bearing in mind that the sum of the investment amounts is constrained to be the total capital:
ω1 + ω2 = ω we see that we just need to maximize the one variable quadratic function:
g(ω1 ) = β ((1 + r1 )ω1 + (1 + r2 )(ω − ω1 ))
1
− β 2 ω12 v12 + 2ω1 (ω − ω1 )v1 v2 ρ12 + (ω − ω1 )2 v22 .
2
But the maximum or minimum of a parabola ax2 + bx + c is found9 at x = −b
so that to
2a
find the maximum of the parabola above we just need to find the coefficient, a, of ω12 and the
coefficient b of ω1 and then calculate −b
. By inspecting the function above we see that
2a
1
a = − β 2 v12 − 2v1 v2 ρ12 + v22
2
1
b = β ((1 + r1 ) − (1 + r2 )) − β 2 2ωv1 v2 ρ12 − 2v22 ω
2 2
2
= β (r1 − r2 ) + β v2 − v1 v2 ρ12 ω
Therefore the maximum of the quadratic function above is located at
ω1
−b
b
β (r1 − r2 ) + β 2 (v22 − v1 v2 ρ12 ) ω
=
=
=
2a
−2a
β 2 (v12 − 2v1 v2 ρ12 + v22 )
1
(r1 − r2 ) + (v22 − v1 v2 ρ12 ) ω
β
.
=
v12 − 2v1 v2 ρ12 + v22
To check that the parabola, g(ω1 ), actually has a maximum there and not a minimum,
we just need to look at the coefficient of ω12 , which we have called a, and check that is is
negative10 . But, as ρ12 ≤ 1, we have that:
1
1
1
a = − β 2 v12 − 2v1 v2 ρ12 + v22 ≤ − β 2 v12 − 2v1 v2 + v22 = − β 2 (v1 − v2 )2 ≤ 0
2
2
2
We have therefore proved:
Theorem 3.21. In the case n = 2 the portfolio, (ω1 , ω2 ), subject to the constraint ω1 + ω2 =
ω, for which the expected utility is maximized is:
(r1 − r2 )/β + (v22 − v1 v2 ρ12 ) ω
,
v12 − 2v1 v2 ρ12 + v22
(r2 − r1 )/β + (v12 − v1 v2 ρ12 ) ω
ω2 =
.
v12 − 2v1 v2 ρ12 + v22
Proof. The only part that remains to be proved is the expression for ω2 , but this follows from
the constraint ω1 + ω2 = ω as
(r1 − r2 )/β + (v22 − v1 v2 ρ12 ) ω
ω2 = ω − ω1 = ω −
v12 − 2v1 v2 ρ12 + v22
ω (v12 − 2v1 v2 ρ12 + v22 ) − (r1 − r2 )/β − (v22 − v1 v2 ρ12 ) ω
=
v12 − 2v1 v2 ρ12 + v22
(r2 − r1 )/β + (v12 − 2v1 v2 ρ12 + v22 ) ω − (v22 − v1 v2 ρ12 ) ω
=
v12 − 2v1 v2 ρ12 + v22
(r2 − r1 )/β + (v12 − v1 v2 ρ12 ) ω
=
v12 − 2v1 v2 ρ12 + v22
ω1 =
9
Prove this!
For a parabola g(ω1 ) = aω12 + bω1 + c the second derivative is g 00 (ω1 ) = 2a. So, as stated above, to
check an extreme point is a maximum we just need to check g 00 (ω1 ) = 2a < 0
10
63
Remark 3.22.
1. As usual we need to sanity check the formulæ in the theorem to make
sure they make sense:
(a) If the expected return from investment 1, r1 , grows and nothing else changes, then
from formulæ above we see that ω1 grows and ω2 goes down. This makes intuitive
sense as it means that investment 1 is becomes a better investment. Likewise with
r2 growing.
(b) Let us examine the case in which correlations are zero. Then the formulæ above
simplify to:
(r1 − r2 )/β + v22 ω
,
v12 + v22
(r2 − r1 )/β + v12 ω
.
=
v12 + v22
ω1 =
ω2
Here it is easy to see that if v1 goes up and all other variables remain fix, then ω1
will go down. As ω1 + ω2 remains constant, ω2 will go up. This is consistent with
the fact that asset 1 is becoming a riskier (more volatile) asset and therefore less
preferred for investment.
2. Note that we have not imposed the constraints ω1 ≥ 0, ω2 ≥ 0. It can well happen
that in the formulæ above either ω1 or ω2 end up being negative. Addressing this would
require additional analysis that we will not develop in this course.
Progress Check
1. What is the St. Petersburg paradox? Why is it a paradox?
2. How did Cramer resolve this paradox?
3. Consider a bet with the same payoff as in the St. Petersburg paradox but where the
coin is tricked and comes up with heads with 75% probability
4. What did Daniel Bernoulli invent in order to better understand the idea behind
Cramer’s resolution of St Petersburg Paradox.
5. What is the Bill Gates effect?
6. What is a risk-averse utility function? Draw it? Why is it called like this? What type
of investor would use this type of utility function to make investment decisions?
7. What is a risk-seeking utility function? Draw it? Why is it called like this? What
type of investor would use this type of utility function to make investment decisions?
8. Is the function
(
x
u(x) =
225
if x < 225
if x ≥ 225
a utility function? is it a risk averse probability function? How does it relate to the
explanations in the text?
64
It is suggested you
discuss
these
questions
with
your
colleagues
to reinforce
your understanding of
the lesson.
You should
try to respond these
questions
without
your lecture
notes
to
check
whether
you
have
learnt
the
material.
9. What is a concave function?
10. Is the function f (x) = α(x − a)(x − b) concave? Draw its graph?
11. What would be the expected payoff of the bet in 3 above, if, following Cramer’s
ideas, you cap the maximum payoff at 220 ?
12. What is the payoff of a bet where you can gain £10,000 with probability 50% and
lose the same amount with the same probability. Should you take this bet if you are
a risk averse investor?
13. Is f (x) = 1 − exp(−βx) a risk averse utility function? Prove it. Plot this function.
14. What is the portfolio selection problem?
15. What is the payoff function in the portfolio selection problem?
16. Prove that with the assumptions of the portfolio selection problem we have
1 2
E (u (Payoff)) = 1 − exp −βE (Payoff) + β Var (Payoff)
2
17. In the formula above: Does the expected utility of the payoff, E (u (Payoff)), go up
or down if if the expected payoff goes up and the variance of the payoff stays the
same?
18. In the formula above: Does the expected utility of the payoff, E (u (Payoff)), go up
or down if if the variance of the payoff goes up and the expected payoff stays the
same?
19. Why is it necessary in the calculation for question 16 to assume that the returns are
normal and that the utility function is 1 − exp(−βx)?
20. Prove the unproven statements in Remark 3.16:
(a) Cov (X, Y ) = E (X − E (X)) E (Y − E (Y ))
(b) If X and Y are independent then Cov(X, Y ) = ρXY = 0.
(c) Var(X) = Cov(X, X).
(d) Cov(X, Y ) = Cov(Y, X).
(e) Cov(aX, Y ) = Cov(X, aY ) = aCov(X, aY ).
(f) Cov(X1 + X2 , Y ) = Cov(X1 , Y ) + Cov(X2 , Y ) and Cov(X, Y1 + Y2 ) =
Cov(X, Y1 ) + Cov(X, Y2 ).
(g) Cov(a1 X1 + a2 X2 , b1 Y1 + b2 Y2 ) = a1 b1 Cov(X1 , Y1 ) + a1 b2 Cov(X1 , Y2 ) +
a1 b1 Cov(X2 , Y1 ) + a2 b2 Cov(X2 , Y2 )
(h) If X = Y then ρXY = 1. If X = −Y then ρXY = −1.
Additionally prove the following easy facts:
(i) If X = a is a constant random variable (so not random), then Cov (X, Y ) = 0.
65
(j) If a, b, c, d ∈ R then Cov (aX + b, bY + d) = abCov (X, Y ).
(k) Var (aX + bY ) = a2 Var(X) + 2abCov(X, Y ) + b2 Var(Y ).
(l) Var (X + Y ) = Var(X) + 2Cov(X, Y ) + Var(X).
21. Write down the function f (ω1 , . . . , ωn ) that we need to optimise in order to find the
optimal portfolio under the assumptions made in the lesson. Explain what are these
assumptions.
22. Does the function f (ω1 , . . . , ωn ) increase or decrease if one of the expected returns
ri goes up? Why is this intuitively reasonable?
23. Write down the function needed to f (ω1 , . . . , ωn ) that we need to optimise in the
portfolio selection problem in the special case in which all correlations ρij are zero
for i 6= j (remember that ρii = 1 for all i). Does the function f (ω1 , . . . , ωn ) increase
or decrease if one of the volatilities vi goes up? Why is this intuitively reasonable?
24. Write down the function f (ω1 , ω2 ) that we need to optimise in order to find the
optimal portfolio in the case n = 2.
25. Prove that the maximum or minimum of a parabola f (x) = ax2 + bx + c is located
and that it is a maximum if a < 0 and a minimum if a > 0. Show that
at x = −b
2a
this value of x is the average of the two roots of the quadratic equation ax2 + bx + c.
Draw a picture that demonstrates why this is the case.
66
More interesting facts
1. This is the first page of the 1738 article by Daniel Bernoulli Specimen theoriæ novæ
de mensura sortis (Exposition of a new theory on the measurement of risk ([DB])),
where the notion of utility function is first introduced.
2. You can find more information on the St. Petersburg paradox here:
https://en.wikipedia.org/wiki/St._Petersburg_paradox
3. In our lecture notes, following Gabriel Cramer, we assumed the bet in the St. Petersburg paradox could not pay more than 225 because we assumed that more cash
would add no more utility. The amount M = 225 could also be interpreted as to
what is the maximum cash that your counterparty could ever pay. We could find the
value of the St. Petersburg bet capped with different values of M . Here are some
values from Wikipedia:
Banker
Friendly game
Millionaire
Billionaire
Bill Gates (2015)
U.S. GDP (2007)
World GDP (2007)
Bankroll (M)
$100
$1,000,000
$1,000,000,000
$79,200,000,000
$13.8 trillion
$54.3 trillion
Expected payoff
$7.5625
$20.90734
$30.86264
$37.15251
$44.57
$46.54
Notice how much smaller the value of the capped bets are compared to the initial
Bernoulli uncapped bet with expected payoff $∞.
67
This is not
examinable
content but
will
help
your understanding of
the course
4. Portfolio selection is big business, it is the job of fund managers and pension funds.
These have some capital and need to carefully balance expected return and risk in
order to serve their customers. Portfolio allocation is the more sophisticated version
of the 60% in bonds and 40% in equity that you might have heard.
5. The riskiness of bonds is assessed by rating companies like Moodys, S&P and Fitch.
They attach a combination of letters to a bond such as AAA (triple A, no risk),
until D (for institutions very likely to default; called either high risk bond, or junk
bond). See
https://en.wikipedia.org/wiki/Bond_credit_rating
for more information. Regulators often impose that portfolio managers in key investment sectors such as pension funds, invest only in bonds with a rating above
a certain threshold to prevent excessive risk been taken in seek of a higher average
return.
6. In times of crisis portfolio allocation techniques can have unintended consequences.
If some asset becomes too volatile, fund managers will reduce investment in that
asset to limit their risk taking (and thus maximize their utility function). If a large
part of the investor community does the same then the ensuing selling activity might
further depress the valuation of an asset.
7. The covariance of random variables leads to a scalar product in certain infinite dimensional spaces of random variables. These spaces, called Hilbert spaces, are often used
in probability and analysis, and often allow one to show that complicated equations,
such as certain Partial Differential Equations, have solutions by using techniques of
Euclidean geometry such as orthogonal projections. The study of infinite dimensional vector spaces used in analysis and probability is called Functional Analysis,
it includes notions such as: Hilbert spaces, Banach spaces, Banach algebras, and
topological vector spaces. See https://en.wikipedia.org/wiki/Functional_
analysis for more information in this very interesting field.
8. As discussed above given a degree 2 polynomial, P (x), the zero of its derivative, x01 ,
is equal to the average of the two roots of P (x), x1 and x2 . Note that in order for
all these numbers to always make sense we work on the complex plane C.
For higher degree polynomials we have much more sophisticated phenomena.
For example, a degree 3 polynomial, P (x), has now three complex roots, x1 , x2 and
x3 . Whereas its derivative P 0 (x) has two roots, x01 and x02 . How do the two points
x01 and x02 sit in the complex plane, C, in relation to the roots x1 , x2 and x3 ?
The Gauss-Lucas theorem (valid for any degree) states that the zeroes of the
derivative, x01 and x02 , sit inside the polygon (in this case a triangle) defined
in the complex plane by the zeroes of the polynomial, x1 , x2 and x3 . See
https://en.wikipedia.org/wiki/Gauss%E2%80%93Lucas_theorem.
A very surprising Theorem of Marden states that in the case of degree 3, the points
x01 and x02 are the foci of the ellipse inscribed inside the triangle defined by x1 , x2
and x3 and tangent to this triangle on the midpoints of each side. See https://en.
wikipedia.org/wiki/Marden%27s_theorem for more information and a picture.
68
References
[IMF] Introduction to Mathematical Finance course notes, Queen Mary University, 2015/16
[Ross] Sheldon Ross, An Elementary Introduction to Mathematical Finance, 3rd Edition, Cambridge University Press, 2011
[DB] Exposition of a new theory on the measurement of risk, English translation of Daniel
Bernoulli’s Specimen theoriæ novæ de mensura sortis, Papers Imp. Acad. Sci. St. Petersburg 5 175–192.
69