Optimisation Michael Tehranchi Lecture 3

Optimisation
Lecture 3 - Easter 2017
Michael Tehranchi
Lagrangian necessity
Consider the problem
minimise f (x) subject to g(x) = b, x ∈ X.
Let
L(x, λ) = f (x) + λ> (b − g(x))
be the Lagrangian. Notice that for any Lagrange multiplier λ ∈ Λ we have
inf
f (x) =
x∈X, g(x)=b
[f (x) + λ> (b − g(x))]
inf
x∈X, g(x)=b
=
inf
L(x, λ)
x∈X, g(x)=b
≥ inf L(x, λ)
x∈X
by the inclusion {x ∈ X : g(x) = b} ⊆ X. We will say that the Lagrangian method works if
there exists a Lagrange multiplier λ∗ such that there is equality, that is,
inf
f (x) = inf L(x, λ∗ ).
x∈X
x∈X,g(x)=b
When does the Lagrangian method work? To answer this question, we need to define
some terms.
We say that a function ψ : Rm → R has a supporting hyperplane at a point b ∈ Rm if
there exists a λ ∈ Rm such that
ψ(c) ≥ ψ(b) + λ> (c − b)
for all c ∈ Rm . The situation is illustrated in the case m = 1 below.
Now return to the optimisation problem at hand. Define a function ϕ on Rm by
ϕ(c) =
inf
x∈X, g(x)=c
1
f (x).
The function ϕ is called the value function for the problem. Of course, we are really interested
the case c = b, but we will now see that it is useful to let the right-hand side of our problem
vary.
Theorem (Lagrangian necessity). The Lagrangian method works for the problem if and
only if the value function has a supporting hyperplane at b.
Proof: The Lagrangian method works iff there exists a λ such that
ϕ(b) = inf [f (x) + λ> (b − g(x))]
x∈X
The value function has a supporting hyperplane at b if and only if there exists a λ such that
ϕ(b) = infm [φ(c) + λ> (b − c)].
c∈R
Therefore, the equivalence of the two hypotheses is proven by noting the equality
inf [f (x) + λ> (b − g(x))] = infm
x∈X
c∈R
inf
[f (x) +λ> (c − g(x)) + λ> (b − c)]
| {z }
{z
}
=0
x∈X,g(x)=c
|
φ(c)
= infm [φ(c) + λ> (b − c)].
c∈R
Shadow prices
We now consider another interpretation of Lagrange multipliers. We start with a little
result:
Theorem. Suppose that ψ : Rm → R is differentiable and that for a fixed b ∈ Rm there
exists a λ ∈ Rm such that
ψ(c) ≥ ψ(b) + λ> (c − b)
for all c ∈ Rm ; that is, there is a supporting hyperplane at b. Then the gradient of ψ at b is
equal to λ, that is,
∂ψ
= λi for all i.
∂bi
Proof. Fix a ∈ Rm and ε > 0. By the supporting hyperplane assumption we have that
ψ(b + εa) − ψ(b)
≥ λ> a.
ε
Taking the limit as ε ↓ 0 and the assumption of differentiability yields the inequality
m
m
X
X
∂ψ
ai ≥
λ i ai .
∂bi
i=1
i=1
Since a was arbitrary, we could replace a with −a in the above inequality to conclude that
m
m
X
X
∂ψ
ai ≤
λ i ai .
∂b
i
i=1
i=1
Hence there is equality. And since a was arbitrary, we have
∂ψ
∂bi
= λi for all i as claimed. A corollary of the above theorem and the proof of Lagrangian necessity is this:
2
If the Lagrangian method works and if the value function is differentiable, then the gradient
of the value function is the Lagrange multiplier.
We are now ready to give an economic interpretation of this fact. Consider a factory owner
who makes n different products out of m raw materials.
• He needs to choose amount xi to make of the i-th product for each i = 1, . . . , n.
• Given a vector of amounts x = (x1 , . . . , xn )> of products to manufacture, the factory
requires the amount gj (x) of the j-th raw material for j = 1, . . . , m.
• The amount of the j-th raw material available is bj .
• Only non-negative amounts of products can be produced.
• Given the amounts x of products, the profit earned is f (x).
The factory owner then tries to solve the problem to
maximise f (x) subject to g(x) ≤ b, x ≥ 0.
Now suppose that the factory owner is offered more raw material. How much should she
pay?
Let ϕ be the value function for the problem. It would be in the owner’s interest to buy
an amount ε = (ε1 , . . . , εm )> of raw materials if
ϕ(b + ε) − ϕ(b) ≥ cost of additional raw material.
When ε is small, the left-hand side is approximately
m
X
∂ϕ
ϕ(b + ε) − ϕ(b) =
εj
∂bj
j=1
so highest price the factory owner would be willing to pay for a small amount of the j-th
∂φ
∂ϕ
. For this reason, the quantity ∂b
is called the shadow price of the j-th
raw material is ∂b
j
j
raw material.
(Notice that if the j-th constraint is not tight, so that gj (x∗ ) < bj , then factory owner does
∂ϕ
not use all of the available j-th raw material. Hence its shadow price ∂b
is zero since there
j
is no extra profit in acquiring a little more. On the other hand, the j-th slack variable zj is
positive and by complementary slackness we have λj = 0. So the shadow price interpretation
makes sense in this case as well.)
We have now shown that if the Lagrangian method works and if the value function is
differentiable, then the Lagrange multipliers can be interpreted as the shadow prices of raw
materials.
Sufficient conditions for the existence of a supporting hyperplane.
How can we check that the value function have a supporting hyperplane? We need to
define a few terms.
A subset C ⊆ Rn is convex if
x, y ∈ C implies θx + (1 − θ)y ∈ C for all 0 ≤ θ ≤ 1.
On the next page is an example of a convex set on the left, and a set that is not convex on
the right.
3
A function ψ : Rm → R is convex if
ψ(θx + (1 − θ)y) ≤ θψ(x) + (1 − θ)ψ(y) for all x, y ∈ Rm and 0 ≤ θ ≤ 1.
Exercise. Show that a function ψ : Rm → R is convex if and only if the set
C = {(x, y) : ψ(x) ≤ y} ⊆ Rm+1
is convex. (The set C defined above and illustrated below is called the epigraph of ψ.)
A useful and interesting (but not examinable) characterisation of convex functions is this:
Theorem. A function is convex if and only if it has a supporting hyperplane at each point1.
Now, to answer the question posed at the start of this section. Consider a minimisation
problem. The Lagrangian method works in the sense that there exists a Lagrange multiplier
1One
direction of the proof is easy. Suppose ψ has a supporting hyperplane at each point. Fix x, y ∈ Rm
and 0 ≤ θ ≤ 1. Let b = θx + (1 − θ)y, and let λ ∈ Rm be such that
ψ(c) ≥ ψ(b) + λ> (c − b)
for all c. Letting first c = x and then c = y in the above inequality yields
θψ(x) + (1 − θ)ψ(y) ≥ ψ(b) + λ> (θx + (1 − θ)y − b) = ψ(b)
showing that ψ is convex. (This might resemble the proof of Jensen’s inequality you saw in IA Probability.)
The proof of the other direction is more involved in general, but is reasonably easy in the case where ψ is
twice-differentiable. Suppose ψ is convex and fix b. By the second order mean-value theorem, there exists a
4
λ∗ for the problem if the value function ϕ is convex. A problem on the example sheet is to
verify that if
(1) the set X is convex,
(2) the objective function f is convex, and
(3) the functional constraint is
• either g(x) = b and g is linear, or
• g(x) ≤ b and g is convex.
then ϕ is convex.
0 ≤ θ ≤ 1 such that
1
ψ(c) = ψ(b) + λ> (c − b) + (c − b)> Hψ ∗ (c − b)
2
≥ ψ(b) + λ> (c − b)
where λ is the gradient of ψ at b, and Hψ ∗ is the Hessian of ψ evaluated at the point b∗ = θb + (1 − θ)c. We
have used the fairly easy-to-prove fact that a twice-differentiable function is convex if and only if its Hessian
matrix is non-negative definite everywhere.
5

Download Report

Optimisation Michael Tehranchi Lecture 3

Paperzz.com

Your Paperzz