Basic Lagrange Multiplier

1
•
The Lagrange multipliers is a mathematical method for performing constrained optimization
of differentiable functions.
•
Recall unconstrained optimization of differentiable functions, in which we want to find the
extreme values (max or min values) of a differentiable function
.
•
In other words, we want to find the domain points
minimum values (extrema) of the function
that yield the maximum or
.
•
We determine the extrema of
by first finding the function’s critical domain
points, which are points where the gradient (i.e., each of the partial derivatives) is zero.
•
These points may yield (local) maxima, (local) minima or saddle points of the function
.
•
We then check the properties of the second derivatives or simply inspect the function values
to determine the function’s extreme values.
2
•
In constrained optimization of differentiable functions, we still have a differentiable function
we want to maximize or minimize.
•
But, we have restrictions on the domain points that we can consider.
•
The set of these points is called the feasible region
a constraint function , typically formulated as
•
, and they are given by
.
Example: consider an inverted paraboloid as the function to maximize, constrained by a set
of points defined by a line in the x-y plane.
Function’s value at
each of the domain
points
chosen from the
constraint curve.
Consider only the
domain points
that are on
the constraint
curve.
3
Example: Constrained Maximization
(See
• Suppose you want to maximize
constraint
)
, subject to the
4. Function's
maximum
constrained value:
,
.
1. The feasible set
consists of
points
on
the unit circle,
plotted in the
x-y plane.
3. Function’s value
for each point of
the feasible set.
2. Feasible set
dropped by 6.
4
•
We can solve this constrained optimization problem using the method of substitution.
•
First, we write the formal expression for the constrained optimization (maximization):
•
Solution by substitution:
•
Now, set 1st derivative to zero, and find critical points:
•
Substituting the critical point
, as determined by inspection, into
:
5
Lagrange Multiplier Method
Outline
• We now consider an alternative way to perform constrained optimization of
differentiable functions, called the Lagrange multiplier method, or just the
Lagrangian.
• We will give the basic formulation of the problem, and a procedure of how
to solve it using the basic Lagrange multiplier method.
• This will be followed by an intuitive derivation of the basic Lagrange
equations.
• In addition, an intuitive derivation of the generalized Lagrange multiplier
method will be given.
• Finally, the primal and dual forms of the Lagrange multiplier method will be
given. The primal and dual forms offer equivalent methods for performing
constrained optimization.
6
(Basic) Lagrange Multiplier Method
•
Consider a basic constrained optimization (maximize of minimize) problem:
,
•
The formulation of the Basic Lagrange Multiplier method for constrained optimization problem is:
•
We set the partial derivatives of the Lagrangian
to zero, and then find the optimal values of
the variables ∗ ∗ ∗ that maximize (or minimize) the function.
•
The is called the Lagrange multiplier.
7
,
•
Why does the Basic Lagrange Multiplier equation include
•
We know from basic calculus that setting the function’s derivative to zero yields the function’s critical
points, and then, from the critical points, we can determine the extrema of the function.
•
The constraint function
?
must also be satisfied, and setting the partial derivative
gives the constraint
in the Basic Lagrange equation.
, and so, there is some intuition for including
8
More Motivation for Including
• To obtain an intuitive appreciation for why the Lagrange is formulated as such
, consider a contour plot of the previous
optimization example:
(-2,2,0)
Contour plot
of
.
Surface plot
of
.
(2,-2,0)
These are “level
curves” of
.
9
Gradient of
•
Note that the gradient of
is perpendicular to the level curves of
and points in
the direction of the maximum rate of change of the function
: Note that this direction
is given in the
plane.
Gradient is
Parallel with
the diagonal in
the x-y axis.
10
Contour of
• Now consider the contour of
:
and
and the constraint level curve
Contour plot
of
.
Constraint
level curve:
.
11
Gradient of
The constraint level curve
but plotted in the
plane.
•
The gradient of the level curve
the outward direction.
is a level curve of the paraboloid
,
is perpendicular to the level curve and points in
,
,
•
Constraint
level curve:
.
12
At Solution Points
•
Informally, notice that, the slope of the tangent line of the contour of
(note: the
tangent line is the contour line itself) is equal to the slope of the tangent line of the
constraint level curve
at the critical points
,
•
Also, informally, note that at the intersection point of any other contour line of and the
level curve , the slope of their tangent lines appear to be different.
Contour line
Point on contour line:
.
Contour line
.
.
Their slopes appear
to be the same.
Constraint
level curve:
.
13
close all;
d=linspace(-2,2,1000); [x,y]=meshgrid(d,d);
figure; hold all; grid on;
contour(x,y,x+y,50);
surf(x,y,x+y), shading interp;
theta = linspace(0,2*pi); [x1,y1] = pol2cart(theta,1.0);
plot3(x1,y1,x1+y1,'LineWidth',2,'Color','k');
plot3(sqrtm(2)/2.0,sqrtm(2)/2.0,sqrtm(2), 'bo', 'markerfacecolor','b',
'markersize',6);
set(gca, 'PlotBoxAspectRatio', [0.9221 1.0000 0.7518]);
set(gca,'GridAlpha', 0.5, 'GridLineStyle', '--', 'FontSize',22);
set(gca,'ZLim',[-6 4],'XLim',[-2 2],'YLim',[-2 2]);
set(gca,'XTick',[-2:1:2], 'YTick',[-2:1:2], 'ZTick',[-6:2.0:4]);
colormap 'jet';
view(49,12) % adjust view so it is easy to see
14
Aside: Derivative Interpretation
“The slope of the tangent line of a curve gives the direction we should travel to stay on the curve.”
•
Recall the definition of the derivative
(evaluated at
:
rise
run
→
•
•
In the limit as
at a point , the function must become linear at point ; otherwise the
function would not be differentiable. This is because the only way the function would not be
at point , would be if the function had a corner at point , and, if so, the
linear as
function would have an infinite derivative at point , and would not be differentiable at point
.
,
, the slope of the tangent line
Starting from the point at which the derivative is taken
gives the rise and the run we should take to get to the next infinitesimally closest point of the
function, and hence stay on the function.
15
Moving along the Tangent Line
• Imagine you are at one of the points (in this example,
infinitesimally small step along the constraint curve
tangent line of
at the point.
or
), and you make an
, and along the
• Your movement will keep you on the constraint curve, since, “the slope of the
tangent line of a curve gives the direction we should travel to stay on the curve.”
Constraint
level curve:
.
16
•
Your infinitesimally small movement will cause one of two possible outcomes:
– Move parallel with and along a level curve of
• For example: as shown at point
, you will move parallel with and along
, or
– Cross over a level curve
• For example: as shown at point
, you will move across
:
17
Consider Intersection Point
Where Tangents are Different
• In this case, a movement to the right will cross the level curve
, and will touch another level
,
which represents an increase in
, i.,e.,
.
18
Cannot be an Extrema Point
• Since your movement touches another level curve
, while staying on
(i.e., your movement takes you to a valid point in
the constraint curve
the feasible region
on the constraint curve
, this means the
point under consideration, , cannot be a maximum point, because the function’s
.
value at the new point of intersection is
In other words, starting from , and
moving along the feasible region, one finds
another point in the feasible region that has
a greater function value. Therefore, is
not an extreme point.
19
Tangent Lines are Different at
• Notice that the tangent line of the level curve
tangent line of the constraint level curve
is different than the
at point .
• Also, note that if you were to move along the tangent line of the level curve
, your movement would take you off the constraint level curve, and
.
hence take you out of the feasible region, i.e.,
Tangent of
at
Tangent of
at
20
Conclusion Regarding Intersection
Point
• In general, point cannot be a critical point if the
slope of the tangent line of the constraint level
is different than the slope of the
curve
tangent line of the objective level curve
at intersection point
.
If the slopes are different at an intersection point,
then that point cannot be an extremum.
21
Considering Touching Point
Where Tangents are Equal
• Previously, we considered a point
where the tangent of the constraint
level curve and the tangent of a level curve of were different.
• Next, consider where your movement takes you in the case of a touching
point , where the tangents are equal.
Tangent of
Tangent of
at
22
Considering Touching Point
Tangents are Equal
• Once again, consider moving along the tangent line of the constraint curve, but
now in more detail.
along the tangent
• Say, you took an infinitesimally small step from to
line of the constraint curve
, at the touching point .
Tangent of
at
23
Considering Touching Point
Tangents are Equal
• This infinitesimally small step from to
on the constraint curve
.
along the tangent line keeps you
• Also, since the tangents are equal, you can move along the objective level curve
using the same step
. This keeps you on the objective level curve, and does
.
not change the value of the function,
Tangent of
Tangent of
at
24
Touching Point
is a Solution
Tangents are Equal
•
Therefore, a local extremum must occur where
•
In general, if the slopes of the tangents at a touching point of the constraint level function
and an objective level curve
are equal, then the touching point is a critical point in the
constrained extremum problem.
•
In other words;
– In a constrained maximization or minimization problem, we are constrained to finding an extremum
of
considering only those points
that satisfy the constraint
point
.
is tangent to a level curve of
– The extreme value occurs at a point , where the objective level curve
does not cross, the constraint level curve .
.
touches, but
– At that point , the tangent of the constraint level curve
is equal to the tangent of the
objective level curve
. At this point the slopes touch but do not cross.
– At this point
the extreme value of the function
is
.
25
• Next, we use the fact that at a solution point , the
slope of the tangent line of the constraint level curve
is equal to the slope of the tangent line of the
objective level curve , to derive the Lagrange
multiplier constrained optimization equation.
• We will show that the gradient of the objective
function
can be written as a scalar multiple
of the gradient of the constraint function
, i.e.,
26
•
Known: the slope of the tangent line of the constraint level curve
is equal to the slope of the tangent line of the
at the critical point .
objective level curve
•
Their normal vectors are parallel.
•
is
Known: the gradient of the objective function
perpendicular to the tangent line of the objective level curve
at the critical point . Therefore, the gradient
is the normal vector of the objective level curve
at the critical point .
Normal
vector
Tangent at
•
is
Known: the gradient of the constraint function
perpendicular to the tangent line of the level curve
at
the critical point . Therefore, the gradient
is the normal
vector of the level curve
at the critical point .
•
This means the gradient of the objective function
related to the gradient of the constraint function
a scalar multiple .
•
Note:
•
Therefore, we can write that:
can be either positive or negative
is
through
.
for
negative
for
positive
27
•
At a non-solution point, the tangent of the constraint curve is
not parallel to the tangent of the objective level curve.
•
At a solution point, the tangent of the constraint curve is
parallel to the tangent of the objective level curve.
•
At a solution point, since the tangents are parallel, the normal
of the constraint level curve and the normal of the objective
level curve are also parallel.
•
This means that at the solution point, the gradient of the
objective function is either parallel or anti-parallel to the
gradient of the constraint level curve.
•
This means
multiple .
•
Note: can be either positive or negative
•
Therefore, we can write that:
is related to
Normal
vector
Tangent at
through a scalar
.
for
negative
for
positive
28
Thought Experiment: Exhaustive
Search of Extrema
• Do the following for every point on the constraint level curve, i.e., for every
point in the feasible region:
– Imagine you are at a point
of the constraint level curve
– You take note of the value of the objective level curve
.
at point
– You take an infinitesimally small step on the curve to get to point
stay on the curve , i.e., you stay in the feasible region.
– You take note of the value of the objective level curve
.
, and so you
at point
.
• If the value of the objective level curve
at point is different than the value at
point , then point cannot be an extremum. You will note that the slopes of the
tangents lines of the two functions and at the point are different.
• If the value of the objective level curve
at point is the same as the value at
point , then point is an extremum. You will note that the slopes of the tangents lines
of the two functions and at the point are the same.
29
Lagrange Optimization Equation
• The above can be written as:
• Undoing the differentiation and removing the “setting to zero” procedure:
• Since can be either positive or negative, we can now write the
Lagrangian:
• The is called the Lagrange multiplier.
30
Lagrange Minimization/Maximization
Procedure
•
We set the partial derivatives of the Lagrangian
to zero, and then find the optimal
values of the variables
that maximize (or minimize) the function.
•
Notice that the above three equations comprise the gradient equation derived earlier:
•
And that the last equation extracts the constraint
31
(Basic) Lagrange Multiplier Method
•
Consider a basic constrained optimization (maximization or minimization) problem:
,
•
The formulation of the basic Lagrange constrained optimization problem is:
•
We set the partial derivatives of the Lagrangian
to zero, and then find the optimal values of
the variables ∗ ∗ ∗ that maximize (or minimize) the function.
•
The is called the Lagrange multiplier.
32
Example Lagrange Computation
,
•
The formulation of the basic Lagrange constrained optimization problem is:
•
Now, setting the partial derivatives to zero, and finding critical points:
•
Substituting the critical points
into
,
, we find:
33
Lagrange with Multiple Constraints
•
Consider an optimization problem with multiple constraints
•
The set of points
•
In other words, the feasible region is the set of intersection points of the constraint
functions,
.
•
•
, at which all
:
intersect, form the feasible region.
may not be parallel (or anti-parallel) to any single
.
The set of intersection points corresponds to a linear combination of the constraint
functions,
. The gradient of a sum is equal to the sum of the gradients:
34
Multiple Constraints Example
35
Example: Multiple Constraints
at critical
point.
Maximum
constrained value
’s values in the
feasible region.
(plane)
Feasible
region.
(plane)
36
Lagrange with Inequality
Constraints
• Consider an optimization problem with an inequality constraint
:
• The Lagrange multiplier formulation with inequality constraint can be written as:
• Notice that the equation appears very similar to the Lagrange multiplier method
.
with equality constraints, except that the is constrained to be
• Why should the multiplier with inequality constraints be limited to
37
Lagrange with Inequality
Constraints
•
To show intuitively why this
must be the case, first consider the possibilities:
1. No solution exists and the lines of constant
do not touch or intersect.
and feasible region
2. A solution exists at the boundary of the feasible region
3. A solution exists inside the feasible region
•
.
.
In the interesting case where a solution exists, we will show there will be two cases:
.
.
•
Consider a maximization example with two variables and one inequality constraint.
•
To maximize
subject to
allowed by the inequality, i.e.,
, let us first look at the boundary of the region
.
38
Lagrange with Inequality
Constraints: Solution on Boundary
•
Consider a sketch of the level curve
•
We assume a solution exists at point
•
Then,
and
must be parallel (not anti-parallel), and this point will give a maximum
(not a minimum) of
for the region
, because of the following argument:
and the level curve
where
touches
.
Feasible
Region
39
Solution on Boundary: Gradient
Points Away From Feasible Region
•
At the point
, where
and
would be pointing away from the feasible region, since:
•
The gradient always points in the direction of maximum increase of a function, and
•
The function
touch, the gradient
decreases as you move towards the inside of the feasible region.
Feasible
Region
40
Solution on Boundary:
and
Point in Same Direction
• If
and
were pointing in opposite directions, then
pointing inwards towards the feasible region, meaning
greater values inside the feasible region.
Feasible
Region
would be
would have
• If we were to find another point
where
touches
inside the feasible region, we would
find an increase in ; since we are still
in the feasible region
, we
must conclude that the point on the
boundary cannot be a critical point
(not a solution).
• This contradicts our initial assumption
that is a solution.
41
Solution on Boundary:
and
Point in Same Direction
• Therefore,
and
must be parallel and be pointing in the same
direction (i.e., not anti-parallel).
Feasible
Region
42
Solution on Boundary: Gives a
Maximum of
• The gradient
indicates the function
feasible region.
increases away from the
• If a solution exists on the boundary, then this solution must be a maximum.
Feasible
Region
43
Solution on Boundary: Effective
and Binding Constraint
• In the case where a critical point exists on the boundary
, the
inequality constraint is said to be effective and is called a binding constraint,
and
Feasible
Region
44
Summary for Solution on Boundary
•
Consider a sketch of the level curve
•
We assume a solution exists at point
•
Then,
region
–
•
where
touches
.
must be parallel (not anti-parallel), and this point will give a maximum (not a minimum) of
, because of the following argument:
At the point
region, since
,
, where
,
and
,
touch, the gradient
as you move towards the inside of the feasible region.
for the
would be pointing away from the feasible
–
If
and
were pointing in opposite directions, then
have greater values inside the feasible region.
–
If we were to find another point where
,
touches
, inside the feasible region, we would find an increase in ;
since we are still in the feasible region
, , we must conclude that the point on the boundary cannot be a critical point
(not a solution). This contradicts our initial assumption.
–
Therefore,
and
This implies that
would be pointing inwards towards the feasible region, meaning
would
must be parallel and be pointing in the same direction (i.e., not anti-parallel).
Therefore, at the critical point
–
•
and
and the level curve
0 for
on
,
and
must be parallel (not anti-parallel)
.
In the case where a critical point exists on the boundary
called a binding constraint, and
, the inequality constraint is said to be effective and is
45
Lagrange with Inequality Constraints:
Solution Within Boundary
•
In the case where a critical point exists inside the feasible region, i.e.,
can consider any point within the feasible region to determine the extrema of
•
I.e., the problem is unconstrained, if we assume a solution exists within the feasible region.
•
In other words, the problem becomes an unconstrained optimization problem (i.e.,
optimization with no constraints).
•
In this case we say the constraint is not binding, or the constraint is ineffective.
•
The maximum is then found by looking for the unconstrained maximum of
that we look only inside the feasible region.
•
In this case:
, then we
.
, assuming
46
Lagrange with Inequality
Constraints: Summary
•
The Lagrange multiplier method with inequality constraints can be written as:
•
If the extremum occurs at the boundary of the constraint
binding and effective):
•
If the constraint
•
I.e., unconstrained optimization. Argument: at this point we know there is a solution,
and that the solution does not exist at the boundary. Therefore, any critical point the
optimizer finds will be inside the feasible region. Finally, the optimizer is free to choose
(i.e., it is unconstrained) to find a solution.
any point
(the constraint is
is not binding and ineffective, then the above reduces to:
47
• The Lagrange optimization method has a dual form, one called the primal
optimization method and the other called the dual optimization method.
• In some applications it is more suitable to use the dual optimization method, as it
leads to a simpler and quicker solution, while in other applications, the primal
method is better.
• In the following we show that under certain conditions, the primal and dual
optimization methods are equivalent and lead to the exact same solution to an
optimization problem.
• As an example use of the primal and dual optimization methods being equivalent,
we can show that the condition
, is also true for a minimization problem.
– Note that our intuitive verification for the condition
assumption of a maximization problem.
, was based on the
48
Lagrange Optimization:
Basic Formulation
• Consider an optimization problem of the following form:
• The basic Lagrange formulation (Lagrangian) for this problem is:
• The
are called the Lagrange multipliers for equality constraints.
• We would then find and set ‘s partial derivatives to zero:
• Finally, solve for
and
, and then locate the minima.
49
Lagrange Optimization:
Generalized Formulation
• Consider the following, which is called the primal optimization problem:
• The generalized Lagrangian is given by:
• The
and
are called the Lagrange multipliers.
50
Deriving an Alternate Expression for
the Primal Optimization Problem
• We will now derive an alternative expression for the
primal optimization problem.
• We call this the “min max” expression for the
primal optimization problem.
51
Min Max Expression for
Primal Optimization Problem
•
Consider the following quantity:
•
If the choice of
•
For instance, if
therefore,
•
In addition, if
therefore,
violates any of the primal constraints (below), then
, then
can be chosen as
to maximize
:
, and
.
, then
can be chosen as
to maximize
, and
.
52
Min Max Expression for
Primal Optimization Problem
•
Now, if the choice of
•
For instance, if
irrespective of .
•
In addition, if
•
Taken together,
satisfies the primal constraints (below), then
, then the value of
, then
is irrelevant, since
will be chosen as
to maximize
:
,
.
.
53
Min Max Expression for
Primal Optimization Problem
• Now, if the choice of
:
satisfies the primal constraints (below), then
• Note also that if were allowed to be negative, then
to maximize
, in which case
).
have a solution, even for good
• This provides further reason for requiring that
would be chosen as
, and so we would not
.
54
Min Max Expression for
Primal Optimization Problem
•
Therefore:
•
Next, consider the minimization problem:
•
This means that, after performing the maximization of , which we found to be
, given
that the constraints are satisfied, then minimize the resulting function
by finding the
optimal value of .
•
I.e., given that the constraints are satisfied:
55
Min Max Expression for
Primal Optimization Problem
• In other words,
is the same as our original primal
optimization problem.
Min max representation
• Finally, define the optimal value of
Original primal optimization problem
as
:
• We call this the value of the primal optimization problem.
56
Deriving a Dual Expression for
the Primal Optimization Problem
• We will now derive the dual expression of the generalized
Lagrange optimization formulation.
• We call the dual expression the “Max min” expression.
• We will relate the dual expression to the primal expression, and
hence show that the dual expression can also be used to express
the generalized Lagrange optimization formulation.
• Finally, we will show that under certain conditions, the dual
expression is equivalent to the primal expression, and thus, either
of them can be used to solve an optimization problem posed as a
generalized Lagrange optimization formulation.
57
The Dual “Max Min” Expression
• Consider the quantity:
• Note that, whereas in the definition of
(below) we were optimizing (maximizing) with
respect to and , here (above) we are minimizing
with respect to .
58
The Dual “Max Min” Expression
• Now, add a maximization term:
• This is exactly the same as our primal problem (below), except that the order of the
“max” and the “min” are now exchanged.
• Finally, define the optimal value of
as
∗
:
∗
• We call this the value of dual optimization problem.
59
Primal and Dual Relationship
• It can be shown that (see next 2 slides):
• Furthermore, it can be shown that, under certain conditions:
• This means that under certain conditions, we can solve a
given optimization problem by using either the primal or dual
methods, and we’ll pick the most suitable one.
60
2-D Case
PROOF:
• Since we can chose any , we can write on the LHS that
• On the RHS, since
•
implies that we can choose a
.
this implies:
that minimizes the RHS:
61
• What does the following mean?
• For each , we find a
• This will generate
each value of .
that minimizes the function
answers (i.e.,
.
minimums of
), one for
• Does the following make sense then?
• Yes, because for each value of
you choose, the term
represents the minimum value of the function
over all .
62
Our Case
PROOF:
implies that we can choose a
both sides, even with the restriction
implies that we can choose a
combination that maximizes
:
that minimizes the RHS:
63
64
Recall: Lagrange Optimization
Generalized Formulation
• Consider the following, which is called the primal optimization problem:
• The generalized Lagrangian is given by:
• The
and
are called the Lagrange multipliers.
65
Conditions for Equality
Given without Proof
•
Let
•
Under certain assumptions, there must exist
,
,
:
be the optimal domain points that optimize the objective function:
,
,
so that:
.
•
Once determined, ,
which are as follows:
,
will satisfy the Karush-Kuhn-Tucker (KKT) conditions,
66
• The first two conditions (below) follow from the Lagrange optimization
procedure.
– We set the partial derivatives of
:
to zero, and solve for the variables
– Of course, when we solve for the variables
and find the optimal values
, they should satisfy the following two KKT conditions:
67
• The last two conditions (below) follow from the initial
constraints of the problem.
– The
is the initial constraint:
– Of course, when we solve for the variables
the optimal values
,
should satisfy
:
and find
68
•
The third condition (below) follows from the analysis of the primal optimization problem.
•
Recall that in the derivation of the primal optimization problem we wanted to perform the
maximization:
•
For the above term to be maximum, and
subject to
. Furthermore:
•
We note that in the case of strict feasibility (i.e., when is active and a solution exists at
), the condition
is satisfied and also is free to vary
.
•
When
is inactive and
, then
, it is required that
.
,
69
[1] J. Kitchin, "Matlab in Chemical Engineering at CMU," [Online]. Available:
http://matlab.cheme.cmu.edu/2011/12/24/using-lagrange-multipliers-in-optimization/.
[Accessed 20 February 2015].
[2] C. A. Jones. [Online]. Available:
http://www1.maths.leeds.ac.uk/~cajones/math2640/notes4.pdf. [Accessed 23 February
2015].
[3] D. Klein, "Dan Klein's Homepage," [Online]. Available:
www.cs.berkeley.edu/~klein/papers/lagrange-multipliers.pdf. [Accessed 23 February
2015].
[4] J. Beggs, "Introduction to the Lagrange Multiplier," [Online]. Available:
https://www.youtube.com/watch?v=RZz7c1oeHm4. [Accessed 2 March 2015].
70