Optimization Theory Background Info

Optimization Theory
Background Info
Convex Function - three definitions (use whichever is easiest)
f(λx1 + (1 - λ)x2)
1) Chord connecting any two points on function lies below the function
∀x1 , x 2 and ∀λ ∈ (0,1) , f (λx1 + (1 − λ ) x 2 ) ≥ λf ( x1 ) + (1 − λ ) f ( x 2 )
λf(x1) + (1 - λ)f(x2)
2) Line tangent to function (i.e., 1st order Taylor Series approx) is above
f(x0)+ f '(x0)(x - x0)
x1
x2
the function: f ( x0 ) + f ' ( x0 )( x − x0 ) ≥ f ( x)
λx1 + (1 - λ)x2
f(x)
3) Based on second def'n, remainder term of 1st order Taylor Series approx
2
x0 x
is negative: 12 f ' ' ( xˆ )( x − x0 ) ≤ 0 ∀xˆ ∈ ( x0 , x) ; sufficient to say f ' ' ( x0 ) ≤ 0
Concave Function - negative of convex function is concave; flip inequalities from convex def'ns
Principal Submatrix - k x k submatrix of n x n matrix A formed by deleting n - k columns and
same respective rows
k-th Order Principal Minor - determinant of k x k principal submatrix
Leading Principal Submatrix (Ak) - k-th order principle submatrix created by deleting the
last n - k rows and last n - k columns from A; example, for 3 x 3 matrix, leading principals:
a
(a11 ) , 11
a 21
a11
a12
, and a 21
a 22
a31
a12
a13
a 22
a32
a 23
a33
Leading Principal Minor - determinant of the leading principle submatrix
Derivative Shorthand - can rewrite ∂f(x)/∂xi as fi or fi' and ∂2f(x)/∂xi∂xj as fij or fij''
Quadratic Form - real-valued function of form:
Q( x1 , , x n ) = aij xi x j which can be rewritten Q(x) = x T ⋅ A ⋅ x (A is symmetric matrix)
i≤ j
x3
Positive Definite - x Ax > 0 ∀x ≠ 0
e.g., x12 + x 22
T
x3
x2
Positive Semidefinite - x T Ax ≥ 0 ∀x ≠ 0
e.g., ( x1 + x 2 )
x1
x2
2
x3
Negative Definite - x T Ax < 0 ∀x ≠ 0
e.g., − x12 − x 22
x2
x1
x3
x1
x2
Negative Semidefinite - x T Ax ≤ 0 ∀x ≠ 0
e.g., − ( x1 + x 2 )
x1
2
Indefinite - x T Ax > 0 for some x in Rn and < 0 for some other x
e.g., x12 − x 22
1 of 15
x3
x1
x2
Determining Definiteness (1) A is positive definite ⇔ all n leading principal minors are > 0
OR check all eigenvalues > 0
(2) A is negative definite ⇔ all n leading principal minors alternate sign:
A1 < 0 , A 2 > 0 , A 3 < 0 , etc. (i.e., (−1) k A k > 0 )
OR check all eigenvalues < 0
(3) A is positive (negative) semidefinite ⇔ same as (1) and (2) with ≥ and ≤;
Note: you have to check all k-th order principal minors (n choose k of them!)
OR check all eigenvalues ≥ 0 (positive semidefinite) or ≤ 0 (negative semidefinite)
(4) If k-th order leading principal minor is nonzero but doesn’t fit any of above, then A is
indefinite
OR have mixed positive and negative eigenvalues
2
2
2 x 2 Matrix - neg definite: f 11 < 0 & f 11 f 22 − ( f 12 ) > 0 (i.e., a11 < 0 & a11 a 22 − a12
> 0)
Hessian - second derivative matrix; symmetric by Young's Theorem
H = f ' ' ( x) =
∂2 f
=
∂xi ∂x j
f 11 (x)
f 21 (x)
f 12 (x)
f 22 (x)
f 1n (x)
f 2 n ( x)
f x1 (x)
f x 2 (x)
f 22 (x)
Young's Theorem - order of differentiation doesn't matter for continuous function; f ij = f ji
Bordered Hessian - second derivative matrix, bordered by first derivatives in first row and
column
0
f1
BH = f 2
f1
f 11
f 21
f2
f12
f 22
fn
f1n
f 2n
fn
f n1
f n2
f nn
Level curves
x2
Quasiconcave - two definitions
1) Chord connecting two points in domain lies above level curve of one of
the original points
∀ x' and x'' ∈ D and λ ∈ (0,1), f (λx'+ (1 − λ )x' ' ) ≥ min[ f (x' ), f (x' ' )]
2) Set of all points greater than a level curve is convex
{x : f (x) ≥ f (x' )} ∀ x' ∈ D is a convex set
x''
f(x1,x2) = f(x'')
x'
x1
f
Convex set
x2
x'
f
Level curve:
f(x1,x2) = f(x')
x1
Quasiconvex -reverse the inequalities above and change min to max
Level curves
x2
x2
x''
x'
f
f(x1,x2) = f(x'')
x1
f
x'
Convex set
Level curve:
f(x1,x2) = f(x')
x1
2 of 15
NOT Quasiconcave
x2
x''
x'
f
f(x1,x2) = f(x'')
x1
1 Variable Case (graphs don't generalize to multiple variables)
Quasiconcave (convex) relates to a single peaked (single
trough) function so a local max (min) is a global max
(min)
Monotonic Functions - all monotonic functions (i.e.,
increasing or decreasing) of one variable are both
quasiconvex and quasiconcave
f(x)
f(x)
f(x')
f(x')
x
x'
{x: f (x) ≥ f (x')}
This is a convex set ∴
this is quasiconcave
x
x'
{x: f (x) ≥ f (x')}
This is not a convex set ∴
this is not quasiconcave
WARNING - Quasiconcave is not the same as concave; quasiconcave is talking about the level
curves in the domain (xn space); concave is talking about the function itself (xn-f(x) space)
Theorem - any concave function is also quasiconcave (but quasiconcave
concave)
Proof: Definition of concave: f (λx1 + (1 − λ ) x 2 ) ≥ λf ( x1 ) + (1 − λ ) f ( x 2 )
Right side is simply a weighted average so λf ( x1 ) + (1 − λ ) f ( x 2 ) ≥ min[ f ( x1 ), f ( x 2 )]
That means f (λx1 + (1 − λ ) x 2 ) ≥ min[ f ( x1 ), f ( x 2 )]
∴ a concave function is quasiconcave
Satiation Point - level curves form concentric "rings" (don't have to be circles) approaching a
single point where the function is maximized or minimized (example in graph below)
Testing for Quasiconcave/convex x2
Look at level curves for function in graph on right
Red lines are tangents to level curve f(x1,x2) = c and have decreasing slopes
Look at total differential: df = f1dx1 + f2dx2 = 0
x1
If we look at changing x1, but want to stay on the level curve, we have an
x'
f(x)
f
implicit function h(x1) = x2
Quasiconcave
d 1
(1) level curve slope flattening
That means dh/dx1 = dx2/dx1 = -f1/f2
f2
≤ 0(2) level curve slope steeper
Note from the red lines that as we increase x1, f1/f2 is decreasing, i.e.,
f ( x , h( x1 ))
d 1 1
f 2 ( x1 , h( x1 ))
dx1
=
dh
dh
f 11 + f 12
f 2 − f 1 f 21 + f 22
dx1
dx1
f 22
dx1
(division rule &
chain rule)
Multiply out terms and substitute dh/dx1 = -f1/f2, then factor out 1/(f2)3 (will require adding an f2
to some terms) and realize that f12 = f21
0
f 11 f 2 f12 f1 f 2 f 1 f 21 f 22 f 12
−1
1
2
2
−
− 2 +
= 3 ( f 11 f 2 − 2 f12 f 1 f 2 + f 22 f 1 ) = 3 f 1
f 22
f 23
f2
f 23
f2
f2
f2
f1
f2
f 11
f12
f 12
f 22
So in order for this to be ≤ 0, the determinant of the bordered hessian must be ≥ 0
This works for the blue lines as well
Summary: f2 > 0 (i.e., red lines)
d(f1/f2)/dx1 ≤ 0 ⇔ |BH| ≥ 0
f2 < 0 (i.e., blue lines)
d(f1/f2)/dx1 ≥ 0 ⇔ |BH| ≥ 0
Note: This isn't talking about positive or negative definiteness (like we check for
concave/convex); we're only concerned with the sign of the determinant
Theorem - if |BH| > 0, function is quasiconcave; if |BH| < 0, function is quasiconvex
Theorem - any Cobb-Douglas (i.e., xα y β) function is quasiconcave
3 of 15
Multiple Variables Sufficient for Quasiconcave - (-1)k|BHk| > 0 ∀ k ≥ 2 (necessary cond: (-1)k|BHk| ≥ 0)
Sufficient for Quasiconvex - |BHk| < 0 ∀ k ≥ 2 (necessary cond: |BHk| ≥ 0)
Note: no alternating signs for quasiconvex
Rules/Warnings (1) k = number of variables in BHk
(2) BHk is the kth principle submatrix of BH
(3) never delete border row or column
(4) never use fewer than 2 variables (k ≥ 2); doesn't say much because determinant
will always be zero if 1 variable and border row/col (|BH| = 0(f11) - f12)
(5) start with 3x3 matrix
(6) can use leading principle minors if you have > 0; if not strict, you must check all
combinations of variables
General Rule - if you have an explicit function, use H (concave/convex) or |BH|
(quasiconcave/convex); if you don't have an explicit function use the basic definitions (as in
the proof above for concave
quasiconcave)
Optimization Basics
Two Questions
1. Does solution exist?
2. Find/characterize the solution
Max vs. Min - Min f(x) is same as Max -f(x)
Local Max - point x* such that f(x) ≤ f(x*) ∀ x near x*
Strict Max - < instead of ≤
Global Max - find all local max/min and compare them or impose strong assumptions on
function to ensure only one min/max
Case 1 - Unconstrained with Single Variable
Necessary - f ' ( x*) = 0 (can also add concave, i.e., f ' ' ( x) ≤ 0 )
Sufficient - f ' ( x*) = 0 and f ' ' ( x) < 0 ∀ x ≠ x* near x* (need that for special cases like -x4)
Get results from Taylor Series:
1st Order - f ( x) = f ( x*) + f ' ( xˆ ) ⋅ ( x − x*) ; for x* to be max, remainder term must be
negative; the difference (x - x*) can be either positive or negative depending on
where x is relative to x*: f ' ( xˆ ) ≤ 0 if x ≥ x * and f ' ( xˆ ) ≥ 0 if x ≤ x * ; therefore,
f ' ( x*) = 0
2nd Order - f ( x ) = f ( x*) + f ' ( x*) ⋅ ( x − x*) + 12 f ' ' ( xˆ ) ⋅ ( x − x*) 2 ; for x* to be max,
remainder term must be negative; the difference term is squared so it's positive; that
means f ' ' ( xˆ ) < 0
4 of 15
Case 2 - Constrained with Single Variable
Constraint - a ≤ x ≤ b; resolves existence problem (theorems [we won't prove] say if you
have continuous function and compact set [closed & bounded], there is a max and min)
Inner Points - same conditions are unconstrained case
Boundary Points - different for lower & upper bounds:
(1) f ''
(2) f ''
Lower Bound doesn't
does
Necessary - (1) f ' ( a ) < 0 or (2) f ' ( a ) = 0 and f ' ' ( a ) ≤ 0
matter
matter
Sufficient - same, but change ≤ to < in last part
Upper Bound Necessary - (1) f ' (b) > 0 or (2) f ' (b) = 0 and f ' ' (b) ≤ 0
b
b
a
a
Sufficient - same, but change ≤ to < in last part
Case 3 - Unconstrained with Many Variables
Necessary 1st Order -
∂f (x)
= 0 ∀ i = 1,2,
∂xi
, n (n equations & n unknowns)
2nd Order - Hessian is negative semidefinite
Sufficient - same as necessary except Hessian is negative definite
Two Variable Case - necessary condition: (for sufficient make them strict inequalities)
1st Order -
∂f ( x1 , x 2 ) ∂f ( x1 , x 2 )
=
=0
∂x1
∂x 2
2nd Order - f11 ≤ 0, f22 ≤ 0, and f11f22 - f122 ≥ 0 (i.e., H negative semidefinite)
Case 4 - Constrained with Many Variables
Two Variable Case Max f(x1,x2)
Subject to g(x1,x2) = 0 (i.e., a level curve of g)
Theory - use constraint to generate implicit function x2 = h(x1) and solve unconstrained
problem of one variable: Max f(x1,h(x1)) or new function Max F(x1)
1st Order -
dF
∂f
∂f dx 2
=
+
=0
dx1 ∂x1 ∂x 2 dx1
Note: From constraint,
∂g
∂g
dx1 +
dx 2 = 0
∂x1
∂x 2
dx 2 − ∂g / ∂x1
=
dx1
∂g / ∂x 2
Economic Interpretations (1) MB = MC
Marginal
benefit of x1
∂f
∂f ∂g / ∂x1
=
∂x1 ∂x 2 ∂g / ∂x 2
= dx2/dx1; how constraint
forces x2 to change when
you change x1
(Physical opportunity cost
of increasing x1)
Marginal opportunity
cost of increasing x1
(2) Slope of Constraint = Slope of Objective
∂f / ∂x1 ∂g / ∂x1
=
∂f / ∂x 2 ∂g / ∂x 2
Slope of level curve
of objective function
5 of 15
Slope of constraint
(3) Same Benefit per Dollar Spent
∂f / ∂x1 ∂f / ∂x 2
=
∂g / ∂x1 ∂g / ∂x 2
Marginal benefit of x2
Marginal cost of x2
Benefit per
dollar spent
2nd Order - This gets messy! Will use derivative shorthand (fij = ∂2f/∂xi∂xj)
Start with 1st order: f 1 + f 2
g
dh
= f1 − f 2 1 = 0
dx1
g2
Take derivative of that, but realize these are functions of two variables so rewrite
it first:
f 1 ( x1 , h( x1 )) − f 2 ( x1 , h( x1 ))
g1 ( x1 , h( x1 ))
=0
g 2 ( x1 , h( x1 ))
Division Rule
And the derivative is:
f 11 + f 12
dh
dh g1
− f 21 + f 22
− f2
dx1
dx1 g 2
g 11 + g 12
dh
dh
g 2 − g 1 g 21 + g 22
dx1
dx1
g 22
Product Rule
Chain Rule
Now substitute dh/dx1 = -g1/g2 and recognize that f12 = f21 and g21 = g12
f 11 − f 12
2
1
2
2
g1
g
g
f g g
− f 21 1 + f 22
− 2 112 2 +
g2
g2
g
g2
f 2 g 2 g12
g
2
2
g1
g2
+
f 2 g 1 g12
−
g 22
To simplify further, factor out a 1/(g2)3
[
1
f 11 g 23 − 2 f 12 g 1 g 22 + f 22 g 12 g 2 − f 2 g 11 g 22 + 2 f 2 g1 g 2 g 12 − f 2 g12 g 22
3
g2
Sufficient Condition:
[
f 2 g 1 g 22
g 22
]
]
1
f 11 g 23 − 2 f 12 g 1 g 22 + f 22 g 12 g 2 − f 2 g 11 g 22 + 2 f 2 g1 g 2 g 12 − f 2 g12 g 22 < 0
3
g2
Necessary Condition:
[
]
1
f 11 g 23 − 2 f 12 g 1 g 22 + f 22 g 12 g 2 − f 2 g 11 g 22 + 2 f 2 g1 g 2 g 12 − f 2 g12 g 22 ≤ 0
3
g2
We can either check the whole thing for the sufficient and necessary conditions
or we can check the individual parts. For the latter, it helps to group more
terms together:
6 of 15
g1
g2
[ (
)
(
1
g 2 f11 g 22 − 2 f12 g 1 g 2 + f 22 g12 + f 2 − g11 g 22 + 2 g 1 g 2 g 12 − g12 g 22
3
g2
)]
From the third interpretation of the first order condition (benefit per dollar spent),
realize:
f1
f
f
f
= 2 = λ ... that means g1 = 1 and g 2 = 2
λ
λ
g1 g 2
Plug these in to the first set of parenthesis above and get
(
f 22
f1 f 2
f 12
1
g
f
−
2
f
+
f
+ f 2 − g 11 g 22 + 2 g 1 g 2 g12 − g 12 g 22
2
11
12
22
g 23
λ2
λ2
λ2
Factor out the λ2 and multiply out the (g2)3:
f
−1
− f 11 f 22 + 2 f 12 f 1 f 2 − f 22 f 12 + 23 − g11 g 22 + 2 g1 g 2 g12 − g12 g 22
2 2
g2 λ
g2
(
)
(
)
)
Obviously, the terms in the parenthesis are determinants of bordered hessians
(just a little cynicism there). Using matrices:
−
1
g 22 λ2
0
f1
f2
det f 1
f2
f 11
f12
f 12
f 22
0
f2
+ 3 det g1
g2
g2
g1
g2
g11
g12
g12 > 0
g 22
Weighted Difference - this is a weighted difference of determinants of bordered
hessians (BH)
|BHf| - is statement of curvature of level curves of objective function (f):
Quasiconcave - |BHf| > 0
Quasiconvex - |BHf| < 0
*** This only works for 2 variable case ***
Six cases - 1-3 are max; 4-6 are min
x2
x2
f(x1,x2) = c
g(x1,x2) = 0
x1
f
(1) f quasiconcave &
g quasiconvex
x2
(4) f quasiconvex &
g quasiconcave
g=0
x1
f
f
(2) f & g quasiconcave, (3) f & g quasiconvex,
2nd order still holds
2nd order still holds
x2
f(x1,x2) = c
x1
f=c
f=c
g=0
x1
g(x1,x2) = 0
f
x2
x2
g=0
f=c
x1
g=0
f=c
x1
f
f
(5) f & g quasiconcave, (6) f & g quasiconvex,
2nd order doesn't hold 2nd order doesn't hold
Case 1 - f is quasiconcave & g is quasiconvex; this assures the second order
condition will hold and is a stronger sufficient condition than necessary;
generally this is the condition we'll use because we won't be working with
explicit functions
Case 2 - f & g are quasiconcave, but weighted difference is still > 0; if you move
along constraint function g, you cannot reach a higher level curve of f, so you
are at a maximum at the tangency point
Case 3 - similar to 2 except f & g are quasiconvex
7 of 15
Case 4 - f is quasiconvex & g is quasiconcave; this violates the second order
condition and makes the weighted difference < 0 so the tangency point is a
minimum
Case 5 - similar to 3 except f & g are reversed; weighted difference < 0 so
tangency point is a minimum; you can move along constraint g and reach
higher level curves of f
Case 6 - similar to 4 except f & g are reversed; weighted difference < 0 so
tangency point is a minimum; you can move along constraint g and reach
higher level curves of f
Stronger 2nd Order - satisfy 2nd order condition if |BHf| > 0 (quasiconcave) and
|BHg| < 0 (quasiconvex); this is Case 1 above; it's stronger (more restrictive)
than the sufficient 2nd order condition listed above; this is what we'll usually
assume when we get to consumer theory (for Cases 2 and 3, we need to know
the explicit function to check the weighted diff.)
Practice - use Lagrangian function
Multiple Variables 1st Order Option 1 - fj - fngj/gn = 0 ∀ j = 1, 2,...,(n-1) and xn = h(x1,x2,...xn-1)
Option 2 -
f1
f
= 2 =
g1 g 2
=
fn
and g(x1,x2,...,xn) = 0
gn
2nd Order - f is quasiconcave and g is quasiconvex
Optimization with Lagrangian Function
Typical problem:
Decision
variables
"Subject to"
Max f(x)
x
s.t.
g(x) = 0 : λ
Define new
variable
Lagrange
multiplier
Lagrange Multiplier - mathematical term for variable introduced to the optimization problem
Economic Interpretation - Shadow Price; how much it would be worth to relax the
constraint (i.e., how much you would be willing to pay for an additional unit of resource)
Lagrangian Expression - L = f(x) - λg(x)
"Pricing Out the Constraint" - subtracting the constraint is standard notation although you
could add it; subtracting ensures λ ≥ 0 which is good for error checking and better for
interpretation as a shadow price; proof of λ ≥ 0 is in notes for envelope theorem
Optimizing - note that Max L = Max f(x) because g(x) = 0
1st Order - treat λ as a decision variable just like xi; results in (n + 1) equations and (n + 1)
unknowns
∂f / ∂xi
∂L ∂f
∂g
=
−λ
=0
= λ ... same as we had before ("bang for the buck")
∂xi ∂xi
∂xi
∂g / ∂xi
∂L
= − g = 0 ... constraint comes back in 1st order condition; optimal to satisfy the constraint
∂λ
8 of 15
2nd Order - look at hessian for L (n+1 x n+1 matrix)
∂2L
∂λ2
∂2L
H L = ∂x ∂λ
1
∂2L
∂λ∂x1
∂2L
∂x12
∂2L
∂λ∂x n
∂2L
∂x1∂x n
∂2L
∂x n ∂λ
∂2L
∂x n ∂x1
∂2L
∂x n2
Play with this to make it look like a bordered hessian; first use
− fi
∂L
∂2L
∂2L
∂2L
∂ (− g )
= − g so
= 0 and
=
=
= −gi =
(from 1st order cond.)
2
λ
∂λ
∂λ∂xi ∂xi ∂λ
∂xi
∂λ
∂L
∂2L
= f i − λg i so
= f ij − λg ij
∂xi
∂xi ∂x j
Use this info to rewrite the hessian:
− f1
− fn
f 11 − λg11
f1n − λg1n
f n1 − λg n1
f nn − λg nn
0
HL =
− f1
λ
− fn
λ
λ
λ
Note if we multiply a row or column by a scalar, you multiply the determinant by the same
value, so we can multiply row 1 and column 1 by -1 and not change the sign of the
determinant.
If we multiply row 1 and column 1 by λ, we multiply the determinant by λ2 which is positive
so it doesn't change the sign of the determinant; we now have:
0
f1
fn
f1
f 11 − λg11
f n1 − λg n1
fn
f 1n − λg1n
f 11
f 1n
g11
g1n
−λ
f n1
f nn − λg nn
f nn
= Hf - λHg
g n1
g nn
(a weighted difference of hessians)
In order for a point that satisfies the first order conditions to be a maximum, this matrix must
satisfy the rules for quasiconcavity (i.e., (-1)k|Mk| > 0 ∀ k ≥ 2); a simpler check would be
to require f to be quasiconcave and g to be quasiconvex
Multiple Constraints - with equality constraints we're limited to the number
of constraints (m) being less than the number of variables (n); will have
one lagrange multiplier for each constraint
Lagrangian Expression - L = f (x) −
m
k =1
λ k g k ( x)
9 of 15
Max f(x)
x
s.t.
g1(x) = 0 : λ1
g2(x) = 0 : λ2
gm(x) = 0 : λm
1st Order - will have (n + m) first derivatives
∂L
= fi −
∂xi
m
k =1
Note new notation:
λ k g1k (x) = 0 ∀ i = 1,2,...,n
marginal cost (of
relaxing constraints)
because of increasing xi
marginal
benefit of xi
∂g k (x)
∂ 2 g k ( x)
k
g =
and g ij =
∂xi
∂xi ∂x j
k
i
∂L
= g j (x) = 0 ∀ j = 1,2,...,m
∂λ j
2nd Order - sign convention to check quasiconcave depends on m; "it's in Chang"
0
0
g11
g 1n
0
0
g1m
g nm
(m x m)
g11
g1m
g 1n
g nm
(m x n)
f ij −
(n x m)
m
k =1
Min
x
s.t.
λk g ijk
(n x n)
(m + n) x (m + n)
Inequality Constraints
Mathematically the same as Case 4 (constrained with equalities); won't
solve it as an equality, but will show that it's theoretically the same
Nonnegativity - x ≥ 0; usually included for modeling to prevent
nonsensical solution (e.g., consumption must be nonnegative)
Greater Than - ≥ can be written as ≤ if we multiply by -1
Max f(x)
x
s.t.
g(x) ≥ 0
f(x)
g1(x) ≥ 0
gm(x) ≥ 0
xj ≥ 0, j = 1,...,n
Max f(x)
x
s.t.
g1(x) ≤ 0
gm(x) ≤ 0
xj ≥ 0, j = 1,...,n
Max f(x)
x
s.t.
-g(x) ≤ 0
Slack Variable - introduce variable to take up the "slack" and force the inequality to be an
equality; square it to make sure it's nonnegative without using an inequality; note if
constraint is tight (i.e., g(x) = 0), then the slack is zero (s = 0)
Max f(x)
x
s.t.
g(x) ≤ 0
Max f(x)
x,s
s.t.
g(x) + s2 = 0
Constraint Limit - with equality constraints, we need variables ≥ constraints (n ≥ m), but
notice slacks add one variable for each constraint so we don’t have that problem
General Case - shown on right; this case will be used to derive the
Max f(x)
Kuhn-Tucker conditions
x,s,v
m
n
i
2
2
s.t.
g1(x) + s12 = 0 : λ1
L = f ( x) − λ g ( x ) + u + β x − v
i =1
i
[
i
]
j =1
j
[
j
j
]
1st Order - 3n + 2m of them! (x, β, v have n each; s, λ have n each)
10 of 15
gm(x) + sm2 = 0 : λm
xj - vj2 = 0 : βj, j = 1,...,n
Constraint Slack Variable -
∂L
= − 2λ i u i = 0
∂u i
Complimentary Slackness - each constraint is associated with a slack variable (ui)
and a lagrange multiplier (λ i); at least one of these must be zero; this follows
directly from the first order condition above
Economic Interpretation - λ i is the shadow price; if gi(x) < 0 (i.e., the slack ui > 0),
the constraint is not binding so we wouldn't pay
Relaxing constraint
x2
to relax it... the shadow price is zero; looking at x2
doesn't change
optimal solution
it the other way, if λ i > 0 (i.e., we'd be willing to
pay to relax the constraint), the constraint must
be binding (i.e., ui = 0 and gi(x) = 0... notice
also that gi(x) = -∂L/∂λ i)
x1
x1
Special Case - λ i = ui = 0; constraint is tight,
Feasible region
Feasible region
(before and after)
(before and after)
but we don't gain by relaxing it; could be
because there's a satiation point (left) or
multiple constraints.
Rewrite 1st Order - there are several ways to write this, but this way is good
because it's mutually exclusive and exhaustive; note that it incorporates the first
order condition for the lagrange multiplier λ i
(a) if λi > 0
− ∂L / ∂λi = g i (x) = 0
(b) if λi = 0
− ∂L / ∂λi = g i (x) ≤ 0
Nonnegativity Slack Variable -
∂L
= −2 β j v j = 0
∂v j
Interpretation - same as constraint slack variable above; either β j = 0 or vj = 0 (or
both); if β j = 0, the nonnegativity constraint is not tight (i.e., xj > 0); if this is the
optimal solution we must have ∂L/∂xj = 0 or increasing xj would improve the
solution; if vj = 0, the nonnegativity constraint is tight; for this to be an optimal
solution we must have ∂L/∂xj ≤ 0; that is, we'd like to decrease xj to improve the
solution, but we can't because we hit the boundary of the nonnegativity constraint
Rewrite 1st Order - just like before, we can rewrite this to incorporate other first
order conditions in a mutually exclusive and exhaustive way
(a) if x j > 0
∂L / ∂x j =
∂f
−
∂x j
(b) if x j = 0
∂L / ∂x j =
∂f
−
∂x j
m
i =1
m
i =1
λi
∂g i
=0
∂x j
λi
∂g i
≤0
∂x j
Kuhn-Tucker Conditions - summarized in boxes above; these m + n first order conditions
capture all 3n + 2m first order conditions; note that the slack variables and lagrange
multipliers associated with nonnegativity aren't needed
Combinations of Conditions - there are 2n+m different ways to combine the K-T conditions;
each combination represents one of four possible types of solutions:
(1) Corner Solution (2) All Constraint Binding (3) Some Constraint(s) Not Binding 11 of 15
(4) No Constraint Binding Using Them with an Explicit Function (1) look at problem to see if we can limit possible solutions (e.g., x and y ≠ 0)
(2) Write out lagrangian
(3) Write out Kuhn-Tucker Conditions
(4) List all possible remaining cases... up to 2n+m (e.g., 1(a), 2(b), 3(a))
(5) For each case, solve the n + m equations for the n + m unknowns; these are the
critical (or stationary) points
(6) Check the inequalities in the K-T conditions for the solution
(7) Substitute each solution into the objective function and pick the best one
Well Behaved - if you verify before hand that the objective function is globally quasiconcave
and the constraints are globally quasiconvex, there will be only one solution, the global
max; that allows us to modify the steps above... choose the easiest cases to solve first;
only one case will satisfy the K-T conditions so you can stop as soon as you find it
Using Them with a General Function - we usually assume the 2nd order conditions
No Nonnegativity - this replaces the lower box above with ∂L/∂xj = 0
Different Lower Bound - if constraint is xj ≥ bj, just replace 0 from the conditions above with
bj (i.e., xj > bj and xj = bj)
Constraint Qualification - K-T conditions by themselves aren't necessary for an optimal
solution; there are restrictions that must be placed on the constraint function and the
constraint set in order for the K-T conditions to be necessary; formally writing the constraint
qualifications isn't trivial (they're complicated), but here are two examples:
Constraint Function - can't have inflection point in constraint at boundary of the feasible
region (i.e., first derivative of constraint = 0); this gives the wrong signal and tricks the
problem into thinking the constraint is not binding when it is
Constraint Set - must have nonempty interior; empty set or single point for feasible region
will not work with K-T conditions
Slater's Condition - sufficient for constraint qualifications to hold
Another Sufficient Condition - having all linear constraints and a nonempty feasible region
is sufficient for constraint qualifications to hold (note, this is much more restrictive, but
lots easier to remember)
Specific Example - end of notes has a specific example of problem that doesn't satisfy
constraint qualification
Transformations - can change some things and still get the same solution
Objective - Max x1/4y1/4 = Max 1/4ln(x + y)... gives same solution although objective value will
obviously be different
Constraints - 2x + 2y - 14 ≤ 0 is same as x + y - 7 ≤ 0; this could be important in achieving
constraint qualification
Example (p.21 of handout)
Max x1/4y1/4
x,y
s.t.
x2 + y2 - 25 = 0 : λ1
x+y-7≤ 0
: λ2
x ≥ 0, y ≥ 0
12 of 15
This has 4 pairs of K-T conditions which translates into 24 = 16 cases; we can eliminate
some by looking at the problem; it's obvious that there exist some feasible solutions with
x and y ≠ 0 with f (x,y) ≥ f (0,0) = 0 ∴we can ignore all cases with x and y = 0; another
thing to notice is that the objective is monotonic in both variables; that means at least
one of the constraints must be binding (i.e., can't have λ1 = λ2 = 0); we're now down to 3
cases to consider
2nd Order - based on work below, objective is quasiconcave and constraints are
quasiconvex so there is a unique global maximum
Constraint 2 - it's linear which is both quasiconvex and quasiconcave; we're interested in
constraints being quasiconvex so this is good
Constraint 1 - look at determinant of bordered hessian
0
2x 2 y
2x
2y
2
0
0 = −8 y 2 − 8 x 2 = −8( x 2 + y 2 ) < 0 ∴ quasiconvex
2
Objective - look at determinant of bordered hessian (don't really need to because it's a
Cobb-Douglas function which is quasiconcave, but this shows you how to do it the
hard way)
1 −3 / 4 1 / 4
4
−3 − 7 / 4 1 / 4
16
−3 / 4 −3 / 4
1
16
0
1
4
1
4
−3 / 4
1/ 4
x
y
1 / 4 −3 / 4
x y
x
y
x
x
y
y
1 1 / 4 −3 / 4
4
−3 / 4 − 3 / 4
1
16
−3 1 / 4 −7 / 4
16
x
y
x
x
y
y
Since all we're worried about is the sign, not the magnitude, we can multiply and
divide rows and columns by scalars to make this easier to work with
Multiply row 1 by 4, row 2 by 16, and row 3 by 16
0
x −3 / 4 y 1 / 4
x1 / 4 y −3 / 4
4 x −3 / 4 y 1 / 4
4 x 1 / 4 y −3 / 4
− 3x −7 / 4 y 1 / 4
x −3 / 4 y − 3 / 4
x −3 / 4 y −3 / 4
− 3 x 1 / 4 y −7 / 4
Multiply column 1 by 1/4
0
x −3 / 4 y 1 / 4
x1 / 4 y −3 / 4
x −3 / 4 y 1 / 4
x 1 / 4 y −3 / 4
− 3 x −7 / 4 y 1 / 4
x − 3 / 4 y −3 / 4
x −3 / 4 y − 3 / 4
− 3x1 / 4 y −7 / 4
Divide row 1 by x-3/4 and y-3/4; do the same with column 1
0
y
−7 / 4
x
y − 3x
y
− 3 / 4 −3 / 4
x x
y
1/ 4
−3 / 4
x
y −3 / 4 =
− 3x1 / 4 y −7 / 4
(
) (
)
x1 / 4 y 1 / 4 + x 1 / 4 y 1 / 4 − − 3x 1 / 4 y 1 / 4 − − 3x 1 / 4 y 1 / 4 = 8 x 1 / 4 y 1 / 4 > 0 ∴ objective is
quasiconcave
Lagrangian L = x1/4y1/4 - λ1(x2 + y2 - 25) - λ2(x + y - 7)
K-T Conditions (1a) x > 0
∂L/∂x = 1/4x--3/4y1/4 - 2λ1x - λ2 = 0
(1b) x = 0
∂L/∂x = 1/4x--3/4y1/4 - 2λ1x - λ2 ≤ 0 (know we won't have this)
13 of 15
(2a) y > 0
∂L/∂y = 1/4x1/4y--3/4 - 2λ1y - λ2 = 0
(2b) y = 0
∂L/∂y = 1/4x1/4y--3/4 - 2λ1y - λ2 ≤ 0 (know we won't have this)
(3a) λ1 > 0
-∂L/∂λ1 = x2 + y2 - 25 = 0
(3b) λ1 = 0
-∂L/∂λ1 = x2 + y2 - 25 ≤ 0
(4a) λ2 > 0
-∂L/∂λ2 = x + y - 7 = 0
(4b) λ2 = 0
-∂L/∂λ2 = x + y - 7 ≤ 0
Case 1 - 1a, 2a, 3a, 4b; solve the system:
(i) 1/4x--3/4y1/4 - 2λ1x - λ2 = 0
(ii) 1/4x1/4y--3/4 - 2λ1y - λ2 = 0
(iii) x2 + y2 - 25 = 0
(iv) λ2 = 0
Eqn (iv) simplifies (i) and (ii) because λ2 drops out
That means we can rewrite Eqn (i) and (ii) as
λ1 = 1/8x--7/4y1/4 = 1/8x-1/4y-7/4
Which we can solve for x
x2 = y2
Substitute this into Eqn (iii)
2y2 = 25
y = x = 5/ 2
Now we have to check the inequalities for each case
1 & 2 are obvious because x and y are both > 0
Skip to 4 because we may not have to solve for λ1 is 4 doesn't hold;
x + y = 10 / 2 = 7.07 ≤ 7... ∴ this is not the solution to the K-T conditions
Case 2 - 1a, 2a, 3b, 4a; solve the system:
(i) 1/4x--3/4y1/4 - 2λ1x - λ2 = 0
(ii) 1/4x1/4y--3/4 - 2λ1y - λ2 = 0
(iii) λ1 = 0
(iv) x + y - 7 = 0
Eqn (iii) simplifies (i) and (ii) because λ1 drops out
That means we can rewrite Eqn (i) and (ii) as
λ2 = 1/4x--3/4y1/4 = 1/4x-1/4y-3/4
Which we can solve for x
x2 = y2 or simply x = y
Substitute that into Eqn (iv)
2y = 7
x = y = 7/2
Now we have to check the inequalities for each case
1 & 2 are obvious because x and y are both > 0
For 3 we have x2 + y2 = 49/4 + 49/4 = 24.5 ≤ 25 (satisfied)
For 4 we have to solve for λ2 = 1/4x--3/4y1/4 = 1/4(7/2)-3/4(7/2)1/4 = 0.134 > 0 (satisfied)
This solution satisfies all four inequalities so it satisfies the K-T conditions and is the
optimal solution to the problem; we don't have to consider case 3 (1a, 2a, 3a, 4a),
but if you do the work you'll find a contradiction in the inequalities like we did in
case 1
14 of 15
Example (doesn't satisfy Constraint Qualification)
Max xy
x,y
s.t.
-(1 - x - y)3 ≤ 0 : λ
x ≥ 0, y ≥ 0
y
Optimal
Solution
x⋅y = c
f
x
Objective is strictly quasiconcave. Constraint can be rewritten x + y ≤ 1, which is linear,
therefore, it's quasiconvex. This should mean a solution to the K-T conditions will be
optimal. A graph of the feasible region and solution is shown above.
Lagrangian L = xy + λ(1 - x - y)
K-T Conditions (1a) x > 0
∂L/∂x = y - 3λ (1 - x - y)2 = 0
(1b) x = 0
∂L/∂x = y - 3λ (1 - x - y)2 ≤ 0
(2a) y > 0
∂L/∂y = x - 3λ (1 - x - y)2 = 0
(2b) y = 0
∂L/∂y = x - 3λ (1 - x - y)2 ≤ 0
(3a) λ > 0
∂L/∂λ = -(1 - x - y)3 = 0
(3b) λ = 0
∂L/∂λ = -(1 - x - y)3 ≤ 0
Optimal Solution (1/2,1/2)... this doesn't satisfy the K-T conditions. Since x > 0, condition 1a
must hold, but y - 3λ (1 - x - y) = 1/2 - 3λ (1 - 1/2 - 1/2) = 1/2 ≠ 0. Same holds for
condition 2a.
Look at this another way. If 3a holds (λ > 0), then (1 - x - y) = 0, but then looking at 1a, we
must have y = 0 (or 1b so x = 0). Looking at 2a, we must have x = 0 (or 2b so y = 0).
Either way we end up with x = y = 0 and therefore -(1 - x - y)3 = -(1 - 0 - 0)3 = -1 ≠ 0.
The unique solution to the K-T conditions is (0,0); case 1b, 2b, 3b. But this isn't the optimal
solution because (1/2,1/2) is better.
Problem - constraint has an inflection point.
15 of 15