Optimization

Optimization
Critical Points
In this section, we develop a method for …nding the extrema— i.e.e, the maximum and minimum points— of a function of two variables. For reasons which
will soon be apparent, this method is called the second derivative test.
To begin with, we say that a function f (x; y) has a local maximum at a
point (p; q) if there is a circle centered at (p; q) such that
f (x; y)
f (p; q)
for all (x; y) in that circle. That is, f (p; q) is the maximum height of some small
patch of the surface, although it may not be maximum overall.
It follows that if jhj < R; then f (p + h; q)
f (p + h; q)
f (p; q) and thus
f (p; q)
0
Dividing by h when 0 < h < R and letting h approach 0 from the right yields
fx (p; q) = lim+
h!0
Conversely, dividing by h when
left yields
fx (p; q) = lim
h!0
f (p + h; q)
h
f (p; q)
0
R < h < 0 and letting h approach 0 from the
f (p + h; q)
h
f (p; q)
0
Consequently, it must follow that fx (p; q) = 0: A similar argument shows that
fy (p; q) = 0:
1
That is, the tangent plane to the graph of f (x; y) is horizontal at a local
maximum or a local minimum.
Similar results hodl if f (x; y) has a local minimum at a point (p; q) since this
is equivalent to f (x; y) having a local maximum at (p; q) :
De…nition 8.1: The critical points of a function f (x; y) are those
points (p; q) for which fx (p; q) = 0 and fy (p; q) = 0:
By the discussion above, the extrema of f (x; y) must occur at its critical points.
EXAMPLE 1
Find the critical point(s) of
f (x; y) = x3
3xy + y 3
Solution: The …rst partial derivatives are
fx (x; y) = 3x2
3y;
3x + 3y 2
fy (x; y) =
Setting fx and fy equal to zero leads to 2 simultaneous equations:
3x2
3x + 3y 2 = 0
3y = 0;
Simplifying leads to y = x2 and x = y 2 ; which implies that x = x4 :
Since x = x4 is the same as x4 x = 0; we obtain
x x3
x (x
1
=
0
1) x + x + 1
=
0
2
2
which results in x = 0 and x = 1: Since y = x2 ; we have x = 0
implies y = 0; while x = 1 implies y = 1: Thus, the critical points
are (0; 0) and (1; 1) :
Check your Reading: Which of the critical points in example 1 does not
correspond to a local extremum of f (x; y)?
The Second Derivative Test
Clearly, f (x; y) has a local maximum at a critical point (p; q) only if every
vertical slice of z = f (x; y) has a maximum at (p; q) :
3
Similarly, f (x; y) has a local mimimum at (p; q) only if every vertical slice of
z = f (x; y) has a minimum at (p; q).
However, it is possible for z (t) to be concave up in one slice and concave
down in another slice.
If this is the case, then we say that f (x; y) has a saddle at (p; q) :
To determine if we get a maximum, a minimimum, or a saddle at a critical
point (p; q) ; we consider the vertical slice z (t) = f (p + mt; q + nt) : Since
x = p + mt and y = q + nt implies that x0 (t) = m and y 0 (t) = n; the …rst
derivative of z (t) is
dz
@f dx @f dy
=
+
= mfx + nfy
dt
@x dt
@y dt
Moreover, m and n constant implies that
z 00 =
d dz
dt dt
dfy
dfx
+n
dt
dt
@fx dx @fx dy
@fy dx @fy dy
= m
+
+
+n
@x dt
@y dt
@x dt
@y dt
= m (mfxx + nfxy ) + n (mfyx + nfyy )
= m
Expanding and using the equality of the mixed partials then yields
z 00 (0) = m2 fxx (p; q) + 2mnfxy (p; q) + n2 fyy (p; q)
(1)
If fxx (p; q) = 0; then we can choose m and n so that z 00 (0) is negative in
some slices and positive in others, thus implying that z = f (x; y) has a saddle
at (p; q) : If fxx (p; q) 6= 0; the completing the square in m yields
z 00 (0) = fxx (p; q) m +
fxy (p; q)
n
fxx (p; q)
4
2
+
D (p; q) 2
n
fxx (p; q)
(2)
2
where D = fxx fyy (fxy ) is called the discriminant of f: (i.e., expanding (2)
will result in (1) ).
If D (p; q) > 0; then z 00 (0) has the same sign as fxx (p; q) in all directions
u = hm; ni ; thus implying a maximum if fxx (p; q) < 0 and a minimum if
fxx (p; q) > 0: However, if D (p; q) < 0; then choosing m = 1 and n = 0 yields
z 00 (0) > 0 whereas choosing m = fxy (p; q) =fxx (p; q) yields z 00 (0) < 0; thus
implying a saddle. These observations lead to the following theorem:
Second Derivative Test: If (p; q) is a critical point of a function
f (x; y) whose second derivatives exist at (p; q) ; then
Discriminant
D (p; q) > 0;
D (p; q) > 0;
D (p; q) < 0;
2nd der
fxx (p; q) > 0
fxx (p; q) < 0
Result
f (x; y) has a local minimum at (p; q)
f (x; y) has a local maximum at (p; q)
f (x; y) has a saddle at (p; q)
However, if D (p; q) = 0; then no information about f (x; y) is obtained..
EXAMPLE 2
x2 y 2 .
Identify the extrema and saddle points of f (x; y) =
Solution: Since fx = 2x and fy = 2y; the only critical point
is (0; 0) : However, fxx = 2; fyy = 2; and fxy = 0; so that the
discriminant of f is
D = fxx fyy
Thus, f (x; y) = x2
2
(fxy ) = (2) ( 2)
02 =
y 2 has a saddle at (0; 0) :
5
4<0
EXAMPLE 3 Find the extrema and saddle points of f (x; y) =
x3 3xy + y 3 :
Solution: In example 1, we showed that the critical points of f are
(0; 0) and (1; 1) : Since fx (x; y) = 3x2 3y and fy (x; y) = 3x+3y 2 ;
the second derivatives of f (x; y) are
fxx = 6x;
fxy =
3;
fyy = 6y
Thus, the discriminant is
2
D (x; y) = (6x) (6y)
At (0; 0) ; we have D (0; 0) = 0
at (0; 0) : At (1; 1) ; we have
( 3) = 36xy
9=
D (1; 1) = 36 1 1
9
9 < 0: Thus, f has a saddle
9 = 27 > 0
However, fxx (1; 1) = 6 > 0; so f has a local minimum at (1; 1) :
blueEXAMPLE 4
of
blackFind the local extrema and saddle points
f (x; y) = x sin (xy)
6
Solution: The …rst partial derivatives are
fx = sin (xy) + xy cos (xy) ;
fy = x2 cos (xy)
Setting fy = 0 yields either x = 0 or cos (xy) = 0; the latter of which
implies that
xy = + n
2
for any integer n: At such points, we would have fx (x; y) either as
1 or 1 (but not 0). However, if y = 0; then
fx (x; 0) = 0 + y
which implies that both fx = 0 and fy = 0 at (0; 0) (and nowhere
else). The second derivatives are
fxx
fxy
fyy
= 2y cos (xy) xy 2 sin (xy)
= fyx = 2x cos (xy) x2 y sin (xy)
=
x3 sin (xy)
and fxx (0; 0) = fxy (0; 0) = fyy (0; 0) = 0: Thus, the discriminant
is D (0; 0) = 0; so the second derivative test provides no information
about the extrema or saddles of f (x; y) = x sin (xy) at (0; 0) :
7
Although it appears that there is a saddle at (0; 0) in example 4, there is no way
of determining this using the second derivative test. Indeed, g (x; y) = x4 + y 4
is positive everywhere except for g (0; 0) = 0; so clearly g (x; y) has a minimum
at (0; 0) : But gxx (0; 0) = gxy (0; 0) = gyy (0; 0) = 0 implies that D (0; 0) = 0;
so the minimum cannot be identi…ed using the second derivative test.
Check your reading: Does p (x; y) = x4 y 4 have any local extrema that
can be identi…ed using the second derivative test?
Linear Systems and Quadratic Extrema
Many applications involve quadratic functions, where a quadratic function is a
function that is a second degree polynomial in each variable. When a quadratic
function has a critical point, it must be the solution to a system of simultaneous
linear equations (also known as a linear system) of the form
ax + by
cx + dy
= r
= s
One way of solving a linear system is to multiply the …rst equation by c;
multiply the second by a; and then combine the two equations to eliminate y:
acx bcy
acx + ady
(ad bc) y
=
=
=
sa
rc
sa
rc
After solving for x; substitution can be used to determine y: Or any of a number
of other variations may be used instead.
blueEXAMPLE 5 blackFind the point(s) on the plane z = x+y 3
that are closest to the origin.
8
Solution: To begin with, we let f denote the square of the distance
from a point (x; y; z) to the origin. Consequently,
f = x2 + y 2 + z 2
Substituting z = x + y
3 thus yields
f (x; y) = x2 + y 2 + (x + y
Since fx = 4x + 2y
6;
fy = 2x + 4y
4x + 2y = 6;
+
6; we must solve
2x + 4y = 6
Multiplying the second equation by
4x
4x
0x
2
3)
2y
8y
6y
2 yields
=
=
=
6
12
6
so that y = 1: Similarly, we …nd that x = 1; so the critical point
is (1; 1) : Moreover, fxx = 4; fxy = 2; and fyy = 4; so that the
discriminant is
D = fxx fyy
2
fxy
= 16
4 = 12 > 0
Thus, every “slice”is concave up and correspondingly, f has a minimum at (1; 1) : Substitution yields
z =1+1
3=
1
so that (1; 1; 1) is the point in the plane z = x+y 3 that is closest
to the origin.
One of the most important applications in statistics is …nding the equation of
the line that best …ts a data set of the form
(x1 ; y1 ) ; (x2 ; y2 ) ; : : : ; (xn ; yn )
where by best …t we mean the line which produces the least error. Speci…cally,
the j th error or residual in approximating the data set with the line y = mx + b
is
"j = mxj + b yj
9
Thus, "2j is the square of the vertical distance from the point to the line.
We then de…ne the least squares line for the data set to be the line with the
slope m and the y-intercept b that minimizes the total squared error
E (m; b) =
n
X
(mxj + b
2
yj )
j=1
That is, the least squares line minimizes the sum of the squares of the residuals.
EXAMPLE 6
and (4; 4) :
Find the least squares line for the data set (1; 1) ; (2; 3) ; (3; 5) ;
Solution: To …nd E (m; b) ; we calculate the squares of the residuals
for each of the data points and then compute their sum:
"21
"22
"23
"24
:
:
:
:
(m
(m
(m
(m
2
1 + b 1)
2
2 + b 3)
2
3 + b 5)
2
4 + b 4)
E (m; b)
m2 + 2mb 2m + b2 2b + 1
4m2 + 4mb 12m + b2 6b + 9
9m2 + 6mb 30m + b2 10b + 25
16m2 + 8mb 32m + b2 8b + 16
30m2 + 20mb 76m + 4b2 26b + 51
=
=
=
=
=
The …rst partial derivative of E (m; b) are
Em (m; b) = 60m + 20b
76 and
Eb (m; b) = 20m + 8b
Thus, the critical points must satisfy
60m + 20b
20m + 8b
Multiplying the latter by
60m
60m
0m
=
=
76
26
3 yields
+
20b
24b
4b
10
=
=
=
76
78
2
26
Thus, b = 0:5 and likewise, we …nd that m = 1:1:
The second derivatives of E (m; b) are
Emm = 60;
Emb = 20;
Ebb = 8
and as a result, the discriminant is
D = 60 8
2
(20) = 80 > 0
which implies that E (m; b) has a minimum at m = 1:1 and b = 0:5:
Thus, the least squares line for the data set (1; 1) ; (2; 3) ; (3; 5) ; and
(1; 4) is y = 1:1x + 0:5:
Typically, due to the size of the data sets involved, least squares problems
are not solved by hand. Correspondingly, our investigation of least squares
problem is treated with greater depth and more examples in the associated
Maple worksheet.
Check your reading: Why did we use the square of the distance instead of
the actual distance in example 4?
Positive De…nite Matrices and the Hessian
The second derivative test can be generalized to any number of variables, but
to do so requires that we reinterpret our results in section 2 in terms of the
Hessian of f (x; y) :
To begin with, let us notice that we obtained (1) in the form
z 00 = m2 fxx + 2mnfxy + n2 fyy
by simplifying it from
z 00 = m (mfxx + nfxy ) + n (mfyx + nfyy )
11
(3)
However, (3) is the inner product of the vector u = hm; ni with the Hessian
applied to u as a column matrix:
Hf u =
fxx
fyx
fxy
fyy
m
n
=
mfxx + nfxy
mfyx + nfyy
That is, (1) in section 2 is in actuality given by
z 00 (0) = u Hf (p; q) u
If z 00 (0) > 0; then all the vertical slices are concave up at (p; q) and thus
f (x; y) has a local minimum at (p; q) : If z 00 (0) < 0; then f (x; y) has a local
maximum at (p; q) : But if z 00 (0) is negative for some directions u and positive
for other choices of u; then f (x; y) has a saddle at (p; q) : This motivates the
following de…nition:
De…nition 8.3: An n
n matrix A is positive de…nite if
u Au > 0
for all n-dimensional vectors u 6= 0: Correspondingly, if A is positive de…nite, then A itself is said to be negative de…nite. If u Au
is negative for some vectors u and positive for others, then A is not
de…nite.
The second derivative test then corresponds to the de…niteness (or lack thereof)
of the Hessian of f at a critical point (p; q) : Moreover, our discussion in section
2 that led to the de…nition of the discriminant can be restated as a theorem:
Theorem 8.4: Let A be the matrix
A=
a11
a21
a12
a22
Then A is positive de…nite only if a11 > 0 and det (A) > 0:
2
Indeed, notice that the discriminant D = fxx fyy fxy
is the determinant of the
Hessian matrix. Moreover, because a 2 2 matrix satis…es det (A) = det ( A)
(but not true for 3 3 matrices !!!), a matrix A is negative de…nite if a11 < 0
and det (A) > 0, thus allowing us to restate the second derivative test:
12
Second Derivative Test for 2 Variables: If (p; q) is a critical
point of a function f (x; y) whose second derivatives exist at (p; q) ;
then
Discriminant
D (p; q) > 0;
D (p; q) > 0;
D (p; q) < 0;
2nd der
fxx (p; q) > 0
fxx (p; q) < 0
Hessian: Hf (p; q)
positive de…nite
negative de…nite
not de…nite
Result: f (x; y) has a
local minimum at (p; q)
local maximum at (p; q)
saddle at (p; q)
If Hf (p; q) = 0; then the Hessian says nothing about the extrema at (p; q).
EXAMPLE 7
of
Use the second derivative test to …nd the extrema
f (x; y) = x2 + y 3
3y
2
Solution: Since fx = 2x and fy = 3y
3; setting fx = 0 and
fy = 0 yields
2x = 0 and 3y 2 = 3
Thus, x = 0 and y = 1; so that the critical points are (0; 1) and
(0; 1) : Since fxx = 2; fxy = 0; and fyy = 6y; the Hessian matrix
is
2 0
Hf =
0 6y
At (0; 1) ; we have
Hf (0; 1) =
2
0
0
6
Since 2 > 0 and det (Hf (0; 1)) = 12 > 0; the Hessian Hf (0; 1) is
positive de…nite. Thus, f (x; y) has a minimum at (0; 1) :
However, at (0; 1) ; the Hessian matrix is
Hf (0; 1) =
2
0
0
6
and det (Hf (0; 1)) = 12 < 0: Thus, Hf (0; 1) is not de…nite and
f (x; y) has a saddle at (0; 1) :
The second derivative test for functions of 3 or more variables is essentially the
same as for 2 variables, except that there is no discriminant for functions of 3
or more variables.
Second Derivative Test for n Variables: If p = (p1 ; : : : ; pn ) is a
critical point of a function f (x1 ; : : : ; xn ) that is well-approximated
13
by its quadratic approximation near p; then
If Hf (p1 ; : : : ; pn ) is
positive de…nite
negative de…nite
not de…nite
Then f (x1 ; : : : ; xn ) has a
local minimum at (p1 ; : : : ; pn )
local maximum at (p1 ; : : : ; pn )
saddle at (p1 ; : : : ; pn )
The proof of the second derivative test in higher dimensions follows directly
from the form of the quadratic approximation. Speci…cally, at a critical point
p = (p1 ; : : : ; pn ) ; the total derivative rf (p) = 0; so that the quadratic approximation is of the form
Q (x) = f (p) +
1
(x
2
p) Hf (p) (x
p)
If Hf (p) is positive de…nite, then Q (x) > f (p) for all x in some neighborhood
of p, and f (x) Q (x) then implies that
f (x)
f (p)
on that neighborhood. Thus, f has a minimum at p:
However, determining if a matrix is positive de…nite becomes increasingly
di¢ cult as the number of dimensions increases, as the next theorem illustrates:
Theorem 8.5 (Sylvester’s Criterion): Let A
matrix
2
a11 a12 a13 : : : a1n
6 a21 a22 a23 : : : a2n
6
6
A = 6 a31 a32 a33 : : : a3n
6 ..
..
..
..
..
4 .
.
.
.
.
an1
an2
an3
:::
ann
be the n
3
n real
7
7
7
7
7
5
Then A is positive de…nite if and only if a11 > 0; the determinant of
the upper 2 2 matrix satis…es
det
a11
a21
a12
a22
the determinant of the upper 3 3
02
a11 a12
det @4 a21 a22
a31 a32
and in general all the upper j
nant for all j = 1; : : : ; n.
>0
matrix satis…es
31
a13
a23 5A > 0
a33
j matrices have a positive determi-
We further explore and also provide examples for the second derivative test for
functions of more than 2 variables in the associated Maple worksheet.
14
Exercises:
Find the local extrema and saddle points of the following functions:
1.
3.
5.
7.
9.
11.
13.
15.
17.
19.
f (x; y) = x2 + 4y 2
f (x; y) = x2 + xy + 3x + 2y
f (x; y) = x2 4xy + y 2 + 6y
f (x; y) = 3x2 + 6xy + 7y 2 2x + 4y
f (x; y) = x3 3x2 + y 2
f (x; y) = x3 + 3xy + y 3
f (x; y) = 4xy x4 y 4
f (x; y) = x4 + 2x2 y 2y
f (x; y) = sin (x) + cos (y)
f (x; y) = x sin (y)
2.
4.
6.
8.
10.
12.
14.
16.
18.
20.
f (x; y) = x2 3y 2
f (x; y) = y 2 + xy 2x 2y
f (x; y) = x2 + 2xy y 2 + 3x + 4
f (x; y) = 4x2 6xy + 5y 2 20x + 26y
f (x; y) = x4 + y 4 y 2
f (x; y) = x3 + 6xy + y 3
f (x; y) = x4 + y 4 + 4xy
f (x; y) = x4 2x2 y + 2y
f (x; y) = x ln (x) + y ln (y)
f (x; y) = e2x cos (y)
Find the slope and y-intercept of the least squares line for each of the following
data sets:
21.
23.
25.
(1; 1) ; (2; 2) ; (3; 3)
( 1; 1:2) ; ( 2; 2:3) ; ( 3; 3:4)
(1; 75) ; (2; 79) ; (3; 85) ; (4; 81)
22.
24.
26.
(1; 72) ; (2; 97) ; (3; 83)
(1; 1) ; (2; 1) ; (3; 1)
(1; 75) ; (2; 79) ; (3; 81) ; (4; 85)
27. Find the point in the plane z = x + 1 which is closest to the origin.
(Hint: minimize the square of the distance from a point (x; y; x + 1) to the origin
(0; 0; 0) ).
28. Find the point in the plane z = x + 2y + 3 which is closest to the origin.
29. Find the point in the plane z = x + y closest to the point (2; 2; 1) :
2
30. Find the point(s) on the surface z = (x 1) + y 2 closest to the origin.
31. What dimensions of a rectangular box with a surface area of 64 in2 lead
to a maximum volume?
32. What dimensions of a rectangular box with a volume of 64 in3 lead to
a minimum surface area?
33. Acme sporting goods collected the following set of data relating price
charged for a racket, x; to the number of rackets per week sold at that price.
x = price
y = weekly sales
$50
18
$55
15
$60
10
$65
6
Fit this data to a linear demand function y = mx + b:
34. Repeat exercise 33 given the data set
x = price
y = weekly sales
$50
24
15
$55
21
$60
18
$65
15
35. A house with width x; length y; and height z is to have a roo‡ine with
a height of 250 :
If the house is to have a total ‡oor space of 2000 ft2 ; what values of x; y; and z
minimize the sum of the area of the sides and the roof.
36. If the roof costs 3 times more than the sides of the house, then what
values of x; y; z in problem 35 minimize the cost of the house?
37. Suppose that r (t) = p + tu and L (s) = q + sw where p; u; q; and w
are constant vectors ( i.e., (i.e., r (t) and L (s) are straight lines). Let E(s; t)
denote the total squared distance between r (t) and L (s). Does E(s; t) always
have a minimum? What is signi…cant about any extrema or saddle points of
E(s; t) when the two lines do not intersect?
38. A “ray” travels in a line from the point (4; 2) to the x-axis, is re‡ected
16
in a straight line to the y-axis, and then is re‡ected again to the point (2; 3) :
What is the shortest possible path for the ray to travel in this manner from
(4; 2) to (2; 3)?
39. Discussion: Explain why every point on the unit circle is a critical
point of
2
f (x; y) = x2 + y 2 1
Does f (x; y) have all saddle points on the unit circle, or does it have all minima
on the unit circle? Explain.
40. Find the point(s) on the surface z = 1 x2 y 2 closest to the origin.
Why are there in…nitely many of them?
41. Write to Learn: Write a short essay in which you revisit section 2 but
instead assume that fyy (p; q) is nonzero and subsequently complete the square
in n. What does this version of the second derivative test look like?
42. *Write to Learn: The key to the proof of the second derivative test
is (2), which is
z 00 (0) = fxx (p; q) m +
fxy (p; q)
n
fxx (p; q)
2
+
D (p; q) 2
n
fxx (p; q)
Although we say there is "no information" when D (p; q) = 0; is that completely
correct? If fxx (p; q) > 0 but D (p; q) = 0; then what does this imply about the
possibility of an extremum at (p; q)? How would we explore what happens for
the one choice of m for which z 00 (0) = 0? Is this one case enough to derail the
entire theorem? Write a short essay addressing these possibilities.
43. *Suppose a house with a height of 10 feet at each corner is to have a
total ‡oor space of 2000 f t2 ; and suppose that ps is the cost per square foot of
17
the sides and that pr is the cost per square foot of the roof.
What width x; height y; and pitch of the roof minimize the cost of the house?
What shape should the house have if ps pr ? What should the dimensions of
the house be if ps = 2pr ?
18