September Math Course: Multivariate Calculus

September Math Course:
Multivariate Calculus
Arina Nikandrova∗
1
Functions
Function y = f ( x ), where x is either be a scalar or a vector of several variables ( x1 , ..., xn ) ,
can be thought of as a “rule” which converts an input (denoted typically by x) into an
output (denoted typically by y):
• y is a function of x if you can draw it from left to right without “doubling back,” i.e.,
only one value of y should correspond to a value of x.
• y is a continuous function of x if you can draw it without removing your pencil from
the page.
• y is a differentiable function of x if it is continuous and contains no kinks.
In this part of the course we will focus on functions where the input consists of many
variables. Such functions are common in economics.
Example 1. A consumer’s utility is a function of all the goods he consumes. So if there
are n goods, then his utility is a function of the quantities ( x1 , x2 , ..., xn ) he consumes. We
represent this by writing u ( x1 , x2 , ..., xn ).
A firm’s production is a function of the quantities of all the inputs it uses. So, if ( x1 , x2 , ..., xn )
are the quantities of the inputs used by the firm and y is the level of output produced, then
we have y = f ( x1 , x2 , ..., xn ) , where f (·) is the production function.
∗ e-mail:
[email protected]
1
Figure 1: Slope of y = 2x
2
2.1
First Order Derivative
First Order Derivative of Univariate Functions
Consider function of one variable, f ( x ) . If this function is differentiable at a given point,
x0 , it has both a value (its “height”), y0 = f ( x0 ) , and a slope. The slope tells us the rate
of change: how much y changes when x changes by a given amount.
Example 2. (Linear Function) The simplest function to consider is a linear function of the
form y = ax + b. Start at any point ( x0 , y0 ) on the line and move along the line so that
the x-coordinate increases by one unit. The corresponding change in the y-coordinate is
called the slope of the line. The slope tells us the rate of change: how much y changes
when x changes by a given amount. The defining characteristic of a line is that this rate of
change is constant:
a ( x0 + 1) + b − ( ax0 + b)
∆y
=
= a.
∆x
x0 + 1 − x0
For non-linear functions the same change in x leads to different changes in y, depending
on the starting point x0 .
Example 3. Consider a quadratic function y = x2 . If we start at x0 = 1 and increase x by
1, then y changes by 3 (i.e., 4 − 1). If we start at x0 = 2, however, then increasing x by 1
changes y by 5 (i.e., 9 − 4). Thus the same change in x leads to different changes in y.
Consequently, for non-linear functions we cannot define a global notion of the slope.
However, it is possible to define a notion of the slope which is valid when the change
in x is “small.”
2
Figure 2: Change in y = x2 when x increases by 1 starting from x0 = 1 and x0 = 2
Example 4. Consider the quadratic function y = x2 . The line y = 4x − 4 just touches
the curve y = x2 at the point ( x, y) = (2, 4). This follows as 4 = y = x2 = 22 and
4 = y = 2x − 4 = 2 × 4 − 4. Such a line is called a tangent line. The tangent line has the
property that it “looks the same as the function around the point at which it just touches
the function.” The tangent line shows the rate of change in y at a point for small changes
in x. The slope of the tangent line at point x0 is called the derivative at the point x0 .
The derivative of a function f ( x ) at the point x0 is denoted by f 0 ( x0 ) . The total differential
of f ( x ) at x0 represents the principal part of the change in a function y = f ( x ) with respect
to changes in x and is defined by the following:
dy = f 0 ( x0 ) dx.
The total differential is a way of understanding the local rate of change of the function
f ( x ) around the point x0 . That is, it is an algebraic way of denoting the slope of a function
dy d f
(hence the alternative notation for a derivative: y0 , dx , dx ).
Example 5. (Production Costs) Imagine that y = c ( x ) represents the costs of production
in £ and x the quantity produced by a firm. The derivative of c (·) at a given x0 tells us
how costs change in response to a change in quantity, provided the change is small. For
example, if we know that the derivative at x0 = 2 is 4, this tells us that if the quantity
produced changes by a small amount dx, then the impact on cost is given approximately
by the total differential dy = 4dx. Economists have a special for the derivative of the cost
function: it is called marginal cost.
3
Figure 3: Tangent of y = x2 at the point ( x, y) = (2, 4).
Figure 4: The total differential of f ( x ) at x0 represents the main part of the change in f ( x )
with respect to any – not necessarily small – changes in x.
4
Since the rate of change along a curve is changing constantly, the derivative has to be computed separately at each possible value of x. The derivative is thus a local phenomenon:
it tells us something about the rate of change in the neighborhood of a point, but it gives
no information about the rate of change globally.
Example 6. The information that the derivative of y = x2 (i.e., dy = 2xdx) at x = 2 is
4 tells us that the rate of change in y is 4 when x is “close” to 2. It does not give any
information about the rate of change at x = 10, and so on.
Formally, the derivative can be thought of as a separate function of x, a slope or derivative
function given by:
f ( x + h) − f ( x )
.
f 0 ( x ) ≡ lim
h
h →0
Given a function y = f ( x ), the derivative function simply associates to every x the slope
of the tangent line at x. Typically, when we talk about the derivative, we mean the derivative as a function. So when we want to talk about the value of the derivative at a point x0 ,
we shall mention it by saying “the derivative at x0 is ... .”
2.2
Rules of Differentiation
• Differentiation is linear: For any functions f and g and any real numbers a and b
the derivative of the function h ( x ) = a f ( x ) + bg ( x ) with respect to x is h0 ( x ) =
a f 0 ( x ) + bg0 ( x ) .
• Power function rule: The derivative of power function h ( x ) = x n is
h0 ( x ) = nx n−1 .
Special cases include:
– Constant rule: if f is the constant function f ( x ) = c, for any number c, then for
all x, f 0 ( x ) = 0.
– If f ( x ) = x, then f 0 ( x ) = 1.
These special cases imply that the derivative of an affine function is constant, i.e.,
if f ( x ) = ax + b, then f 0 ( x ) = a. This makes sense as shifting a function doesn’t
change its slope and so additive constants disappear.
• The product rule: For any functions f and g the derivative of the function h ( x ) =
f ( x ) g ( x ) with respect to x is
h0 ( x ) = f 0 ( x ) g ( x ) + f ( x ) g0 ( x ) .
5
• Quotient rule: The derivative of function h ( x ) =
h0 ( x ) =
f (x)
,
g( x )
where g ( x ) 6= 0, is:
f 0 ( x ) g ( x ) − f ( x ) g0 ( x )
.
g2 ( x )
• The chain rule: The derivative of the function of a composite function h ( x ) = f ( g ( x ))
with respect to x is
h0 ( x ) = f 0 ( g ( x )) g0 ( x ) .
• The inverse function rule: If the function f has an inverse function g, meaning that
g ( f ( x )) = x and f ( g (y)) = y, then
g0 (y) =
1
f 0 ( g (y))
.
• The basic rules for differentiating exponential and logarithmic functions:
– The derivative of f ( x ) = e x is f 0 ( x ) = e x , where e = 2.71828 is the Euler’s
number.
– The derivative of f ( x ) = ln x is f 0 ( x ) = 1/x, where ln is the natural logarithm
with the base e = 2.71828.
Intuition for Chain Rule: Let demand be a function of price, q ( p) = a − bp, and price
vary with time, so that p (t) = t2 . Then demand is a composite function that also depends
on time, q ( p (t)). How does demand vary with time?
• What is
−b
dq
dp ?
Omit the influence of t and imagine that can vary p directly, then
• What is
dp
dt ?
By power rule,
dp
dt
dq
dp
=
= 2t
• Overall, we need to consider the chain reaction as a change in t leads to a change in
p, in turn changing q :
d
dq dp
q ( p (t)) =
= q0 ( p) p0 (t) = −2bt.
dt
dp dt
dp
where dt is the small change in p brought about by a small change in t, and
small change in q brought about by a small change in p.
dq
dp
is the
To verify the validity, note that by substituting p (t) into q ( p) , we get quantity as a
function of time q (t) = a − bt2 and q0 = −2bt.
6
Intuition for Inverse Function Rule: If y = f ( x ) is a strictly monotonic (or 1:1) function,
its inverse, x = f −1 (y), is also a function. Formally:
f −1 (y) = { x : y = f ( x )} .
Thus inverse is a function if to each value of y corresponds only one value of x, e.g.,
parabola is ruled out (why?).
y−b
Example 7. Inverse of y = f ( x ) = ax + b is function g (y) = a . Inverse of y = f ( x ) =
√
x2 , where x > 0, is function g (y) = y. We can think of these two functions as inverses.
If we take x as the input, apply f to it and then pass this output through the function
g, we get back x. Computationally, we just express x from equation y = f ( x ) to obtain
x = f −1 ( y ) ≡ g ( y ) .
The derivatives of inverse functions are related to each other. If we apply chain rule to
both sides of x = g ( f ( x )) , when g (·) ≡ f −1 (·):
g0 (y) =
1
f 0 (x)
.
However, for the above display to make sense we need to express x in terms of y on the
RHS.
Example 8. If y = f ( x ) = x2 , where x > 0, then the derivative of its inverse is
g0 (y) =
1
f 0 (x)
=
1
1
= √ ,
2x
2 y
where the last equality follows as by the definition of inverse x = f −1 (y) =
2.3
√
y.
First Order Derivative 0f Multivariate Function
We have considered functions of a single variable until now. Most economic problems
involve more than one variable, so consider a function y = f ( x1 , x2 , ..., xn ) .
Partial Derivatives
The partial derivative of f with respect to xi is the derivative of f with respect to xi treating
all other variables as constants and is denoted by ∂ f /∂xi or f i :
∂
f ( x1 , ..., xi−1 , xi + h, xi+1 , ..., xn ) − f ( x1 , x2 , ..., xn )
f ( x1 , x2 , ..., xn ) ≡ lim
.
∂xi
h
h →0
In order to calculate partial derivatives, we can apply the usual rules of differentiation.
7
(b) Cross section when L = 1
(a) 3D graph
Figure 5: Cobb-Douglas production function f (K, L) = K0.5 L0.5
Example 9. Consider a Cobb-Douglas production function f (K, L) = K α L β , where K > 0
is capital input, L > 0 is labour input and 1 > α, β > 0 are some constants. Then,
∂f
= αK α−1 L β > 0
∂K
∂f
= βK α L β−1 > 0.
∂L
So, for a given labor input, more capital raises output and, for a given capital input, more
labour raises output.
Mathematically, the partial derivative of f with respect to xi tells us the rate of change
when only the variable xi is allowed to change. Economically, the partial derivatives give
us useful information:
• With a production function, the partial derivative with respect to the input, xi , tells
us the marginal productivity of that factor, or the rate at which additional output
can be produced by increasing xi , holding other factors constant.
• With a utility function, the partial derivative with respect to good xi tells us the
rate at which the consumer’s well being increases when she consumes additional
amounts of xi holding constant her consumption of other goods, i.e., the marginal
utility of that good.
8
Total Differentials
Partial derivatives are multivariate extensions of derivatives; total differentials are multivariate extensions of differentials. For functions of more than one independent variable,
y = f ( x1 , x2 , ..., xn ) , the partial differential of y with respect to any one of the variables
xi is the principal part of the change in y resulting from a change dxi in that one variable.
∂y
The partial differential is therefore ∂x dxi involving the partial derivative of y with respect
i
to xi . The sum of the partial differentials with respect to all of the independent variables
is the total differential
∂y
∂y
dy =
dx1 + · · · +
dxn ,
∂x1
∂xn
which is the principal part of the change in y resulting from changes in all independent
variables.
To gain some intuition about total differentials,1 suppose there are two variables and consider the plane y = a0 + a1 x1 + a2 x2 . How does the function behave when we change x1
and x2 ? Clearly, if dx1 and dx2 are the amounts by which we change x1 and x2 , we have,
∂y
∂y
dy = a1 dx1 + a2 dx2 . Note furthermore that the partials are, ∂x = a1 and ∂x2 = a2 . We can
1
then write total change in y as:
dy =
∂y
∂y
dx1 +
dx2 .
∂x1
∂x2
Rewriting this in matrix notation:
dy =
h
∂y
∂x1
∂y
∂x2
i dx 1
.
dx2
h
i
∂y
∂y
In the case of the plane, the vector of all partial derivatives is given by ∂x
=
∂x2
1
a1 a2 . This vector tells us the rates of change in the directions x1 and x2 .
Now consider a more general two variable function, y = f ( x1 , x2 ). With a general function, the idea is to find a plane which looks locally like the function around the point
( x1 , x2 ). Since the partial derivatives give the rates of change in x1 and x2 , it makes sense to
1 Recall
that we motivated the notion of a derivative by saying that it was the slope of the line which
“looked like the function around the point x0 .” When we have n variables, the natural notion of a “line” is
given by the following linear function:
y = a0 + a1 x1 + a2 x2 + ... + an xn .
(1)
In general, the function (1) is referred to the equation of a plane (it certainly is the equation of a plane when
there are two variables, x1 and x2 ).
9
f Hx,5L
y
-10
5
0
-5
-5
10
5
100
10
x
-50
0
fHx,yL
-100
-100
-200
-5
0
5
10
x
(a) Function f ( x, y) = − x2 − y2 (blue surface) and the tangent plane at point (4, 5)
(red surface). The tangent plane, given
by z = −8 ( x − 4) − 10 (y − 5) − 41, looks
like f ( x, y) = − x2 − y2 around (4, 5) . The
derivative of f ( x, y) = − x2 − y2 is the
slopes in the two directions of the tangent
plane.
(b) Cross-section when y = 5 : The
slope of the red line represents the partial derivative of f ( x, y) = − x2 − y2
with respect x at point (4, 5) .
Figure 6: Function f ( x, y) = − x2 − y2 and its derivative.
pick the appropriate plane which passes through the point ( x1 , x2 ) and has slopes ∂y/∂x1
and ∂y/∂x2 in the htwo directions.
The derivative of the function f ( x1 , x2 ) at ( x1 , x2 ) is
i
simply the vector
∂y
∂x1
∂y
∂x2
, where the partial derivatives are evaluated at the point
( x1 , x2 ). We can interpret the derivative as the slopes in the two directions of the plane
which looks “like the function” around the point ( x1 , x2 ).
For a general function of n variables, y = f ( xh1 , x2 , ..., xn ) , the
i derivative of f at point
∂y
∂y
( x1 , x2 , ..., xn ) is the vector of partial derivatives ∂x ... ∂xn . This vector defines a lin1
ear map, which is the best linear approximation of the function f near the point ( x1 , x2 , ..., xn ).
This linear map is thus the generalization the usual notion of derivative.
Example 10. For function f (K, L) = K α L β , the vector of partial derivatives is
h
i ∂f
∂f
α−1 L β βK α L β−1 .
=
αK
∂K ∂L
Then total differential of f is:
df =
αK α−1 L β
βK α L β−1
10
dK
dL
.
Total Derivatives
While the partial derivative of f with respect to xi treats all other arguments of f as constants, the total derivative of f acknowledges that other arguments of f may also vary
with xi due to some postulated relationship. Finding the total derivative relies on the
chain rule.
Definition. Consider function f ( x, y, z, t) , where x, y, and z depend on t. Then, the chain
rule is given by:
∂ f dx ∂ f dy ∂ f dz ∂ f
df
=
+
+
+ .
dt
∂x dt
∂y dt
∂z dt
∂t
In particular notice that
∂f
df
6=
,
dt
∂t
as t has a direct effect on f , given by
z.
Example 11. Consider a function
where
∂f
∂t
and an indirect effect through its effect on x, y and
y = 3x − w2 ,
x = 2w2 + w + 4.
∂y
Here w has a direct effect on y, given by ∂w and an indirect effect through its effect on x.
Hence, the total derivative of y with respect to w is
∂y dx
∂y
dy
=
+
dw
∂x dw ∂w
= 3 (4w + 1) − 2w.
Note that unless w = −1/4,
∂y
dy
6=
.
dw
∂w
A more complicated example.
Example 12. Consider a function
y = f ( x1 , x2 , w ) ,
where
x1 = g ( w )
11
and
x2 = h ( w ) .
∂y
Here w has a direct effect on y, given by ∂w and an indirect effect through its effect on x1
and x2 . Hence, the total derivative of y with respect to w is
∂y dx1
∂y dx2
∂y
dy
=
+
+
dw
∂x1 dw
∂x2 dw
∂w
0
= f 1 ( x1 , x2 , w ) g ( w ) + f 1 ( x1 , x2 , w ) h 0 ( w ) + f 3 ( x1 , x2 , w ) .
Problem 1. Consider the function
z = x2 y − 10x −
1
,
t3
where x = e1−y and t = 3y.
1. Find the partial derivative of z with respect to y.
2. Find the total derivative of z with respect to y, dz/dy.
3
3.1
Unconstrained Optimization
Univariate Case
We will consider the following maximization problem
max f ( x )
x
or minimization problem
min f ( x ) .
x
First Order Conditions: Necessary Conditions for Local Extrema
If a differentiable function f ( x ) reaches its maximum or minimum at point x ∗ then f 0 ( x ∗ ) =
0. To see this consider the total differential:
dy = f 0 ( x ∗ ) dx.
If the function reaches a maximum or minimum at x ∗ then it must be impossible to increase or decrease the value of the function by small changes in x. However, if f 0 ( x ∗ ) 6= 0,
12
then it is always possible to make by larger or smaller by making (small) appropriate
changes in x. Therefore, we must have f 0 ( x ∗ ) = 0 at a maximum or a minimum.
Any point satisfying the condition f 0 ( x ∗ ) = 0 may be referred to as a stationary point;
when a point satisfying f 0 ( x ∗ ) = 0 is a minimum or a maximum, it is referred to as a
critical value or extremum.
We need to distinguish between local (or relative) extrema and global extrema. Figure 7a
illustrates the difference, which is also explained in Definition 1.
Definition 1. A point x ∗ is called a global maximum of the function f ( x ) if f ( x ∗ ) ≥ f ( x )
for all x in the domain of f . A point x is called a local maximum of the function f ( x )
if there is a “small interval” centered at x ∗ such that f ( x ∗ ) ≥ f ( x ) for all x in this small
interval.
A point x ∗ is called a global minimum of the function f ( x ) if f ( x ∗ ) ≤ f ( x ) for all x in
the domain of f . A point x is called a local minimum of the function f ( x ) if there is a
“small interval” centered at x ∗ such that f ( x ∗ ) ≤ f ( x ) for all x in this small interval.
The condition f 0 ( x ∗ ) = 0 at a maximum or minimum is valid only if x ∗ is in the “interior”
of the domain of the function. This is because the argument for showing that f 0 ( x ∗ ) = 0 is
a necessary condition for x ∗ to be a maximum or a minimum relies on the ability to make
small changes in x around x ∗ . However, at a “boundary point” we cannot make certain
changes. For instance, if the function is defined for all x in the interval [ a, b], then at a,
we can only increase x, while at b, we can only decrease x. Hence, it is possible that the
maximum (or minimum) occurs at a or b and yet this boundary point does not satisfy
the necessary condition for maximization (or minimization). For example, in Figure 7a
the global minimum of a function defined for x ∈ [0, 6] occurs at point x = 0 and the
global maximum occurs at point x = 6, neither of which satisfies the first order condition
f 0 ( x ∗ ) = 0.
Condition f 0 ( x ∗ ) = 0 is called a necessary condition because it cannot guarantee that x is
indeed a maximum or minimum. It is entirely possible that f 0 ( x ∗ ) = 0 but x ∗ is neither a
maximum nor a minimum.
Example 13. Consider function f ( x ) = ( x + 2)3 + 5. Note that f 0 (−2) = 0, but point
x = −2 is neither maximum, nor minimum (see Figure 7b).
Second Order Conditions: Sufficient Conditions for Local Extrema
Condition f 0 ( x ∗ ) = 0 on its own does not distinguish local maxima from local minima.
To tell whether point x ∗ is a local maximum or a local minimum, we need to look at the
sign of function f 0 ( x ) in the immediate neighborhood of x ∗ , where neighborhood is defined
as points immediately to the left and immediately to the right of x ∗ :
13
fHxL
60
40
20
-6
-4
-2
2
x
-20
-40
-60
(b) The point where f 0 ( x ) = 0 is the
point of inflection.
(a) Function f ( x ) defined for x ∈ [0, 6] : Each point
where f 0 ( x ) = 0 corresponds to either local minimum or local maximum, but condition f 0 ( x ) = 0
does not identify global minimum or maximum.
Moreover, condition f 0 ( x ) = 0 on its own does not
distinguish local maximum from local minimum.
Figure 7: The first order condition f 0 ( x ) = 0 is a necessary, but not sufficient condition for
local minima and maxima
14
fHxL
fHxL
4
12
10
2
8
1
2
3
4
5
6
x
6
4
-2
2
-4
1
(a) f ( x ) = ( x − 3)2 + 4: Point x = 3
is a maximum as f 0 ( x ) is decreasing
(changes sign from positive to negative) in the neighborhood of x = 3.
2
3
4
5
6
x
(b) f ( x ) = − ( x − 3)2 + 4: Point x = 3
is a minimum as f 0 ( x ) is increasing
(changes sign from negative to positive) in the neighborhood of x = 3.
Figure 8: The second order conditions, i.e, the conditions on the sign of f 00 ( x ) , are sufficient for determining local minima and maxima
• Point x = x ∗ is a local maximum if in the neighborhood of x ∗ , f 0 ( x ) is positive for
x < x ∗ and is negative for x > x ∗ ;
• Point x = x ∗ is a local minimum if in the neighborhood of x ∗ , f 0 ( x ) is negative for
x < x ∗ and is positive for x > x ∗ ;
• Point x = x ∗ is neither a local maximum nor a local minimum if in the neighborhood
of x ∗ , f 0 ( x ) does not change sign.
An equivalent way to express the above conditions is to say that
• Point x = x ∗ is a local maximum if in the neighborhood of x ∗ , f 0 ( x ) is a decreasing
function;
• Point x = x ∗ is a local minimum if in the neighborhood of x ∗ , f 0 ( x ) is an increasing
function;
• Point x = x ∗ is neither a local maximum nor a local minimum if in the neighborhood
of x ∗ , f 0 ( x ) is neither increasing nor decreasing.
15
This last set of conditions can be expressed more succinctly in terms of second order
derivatives, but it requires a few new definitions. Recall that function
f 0 ( x ) ≡ lim
h →0
f ( x + h) − f ( x )
.
h
is the first derivative of the function f . The first derivative indicates whether a function
is increasing or decreasing. A function f ( x ) is weakly decreasing at point x if f 0 ( x ) ≤ 0;
a function f ( x ) is weakly increasing at point x if f 0 ( x ) ≥ 0. If the inequalities are strict,
then the function is strictly decreasing or strictly increasing.
Since the derivative itself is a function, we can take its derivative. This is called the second
derivative and denoted d2 f /dx2 or f 00 ( x ). Formally,
d
d2 f
=
2
dx
dx
df
dx
.
The second derivative indicates whether the first derivative of a function is increasing or
decreasing, thereby describing the curvature of the function.
Definition 2. A function f ( x ) is called concave if f 00 ( x ) ≤ 0 at all points of its domain; a
function f ( x ) is called convex if f 00 ( x ) ≥ 0 at all points of its domain. If the inequalities
are strict, then the function is called strictly concave or strictly convex.
Example 14. The function f ( x ) = x2 is convex on its domain; function g ( x ) = ln x is
concave on the domain x > 0.
A function may be neither concave nor convex on its entire domain.
Example 15. Consider f ( x ) = −2x3 /3 + 10x2 + 5 defined for x ≥ 0. In this case, f 00 ( x ) =
−4x + 20 and thus:
• for 0 < x ≤ 5, f 00 ( x ) ≥ 0 and function is convex;
• for x > 5, f 00 ( x ) < 0 and function is concave.
Definition 3. A function f ( x ) is called concave at x ∗ if f 00 ( x ∗ ) ≤ 0; a function f ( x ) is
called convex at x ∗ if f 00 ( x ∗ ) ≥ 0
Recall Figure 7b where f 0 (−2) = 0, but point x = −2 is neither maximum, nor minimum.
This point is called the inflection point.
Definition 4. The point where a function changes its curvature is called an inflection
point.
16
fHxL
4
fHxL
1
3
1
2
2
3
4
x
-1
1
-2
-2
-1
1
2
x
(a) f ( x ) = x2 is convex for all x ∈
(−∞, ∞)
(b) g ( x ) = ln x is concave for all x ∈
(0, ∞)
Figure 9: An example of (a) a strictly convex and (b) a strictly concave function.
fHxL
300
250
200
150
100
50
2
4
6
8
10
x
Figure 10: Function f ( x ) = −2x3 /3 + 10x2 + 5: point x = 5 is an inflection point where
the function changes its curvature from convex (for 0 < x < 5) to concave (for x > 5).
17
As an aside, note that since the second derivative is also a function, we can also take its
derivative. This is called the third derivative and denoted f 000 ( x ) to indicate that this function is found by three successive operations of differentiation, starting with the function f .
One can continue this process, but we will typically not go beyond the second derivative.
Example 16. Suppose that f ( x ) = x5 . Then, f 0 ( x ) = 5x4 , f 00 ( x ) = 20x3 and f 000 ( x ) =
60x2 .
The observation that the second derivative indicates whether the first derivative of a function is increasing or decreasing leads to the following set of necessary and sufficient conditions for identifying maxima and minima:
• If f 0 ( x ∗ ) = 0 and f 00 ( x ∗ ) < 0, then x ∗ is a local maximum of f ( x ) ;
• If f 0 ( x ∗ ) = 0 and f 00 ( x ∗ ) > 0, then x ∗ is a local minimum of f ( x ) .
The necessary condition only identifies a local maximum or minimum, but not a global
maximum or minimum. However, the local maxima of a function that is concave on its
entire domain are also global maxima. Similarly, the local minima of globally convex
functions are also global minima. That is:
• If f 0 ( x ∗ ) = 0 and f 00 ( x ) < 0 for all x in the domain of f , then x ∗ is a global maximum
of f ( x ) ;
• If f 0 ( x ∗ ) = 0 and f 00 ( x ) > 0 for all x in the domain of f , then x ∗ is a global minimum
of f ( x ) .
Function depicted in Figure 8a strictly concave on its entire domain and thus point x = 3
is a global maximum; function depicted in Figure 8b strictly convex on its entire domain
and thus point x = 3 is a global minimum.
The necessary and sufficient conditions for local extrema require f 00 ( x ∗ ) 6= 0. When
f 00 ( x ∗ ) = 0, point x ∗ can be either minimum or maximum or neither of the two. In this
case we need to use an N −th derivative test.
• If f 0 ( x ∗ ) = 0, f 00 ( x ∗ ) = 0,..., f ( N −1) ( x ∗ ) = 0, f N ( x ∗ ) < 0, where N is even, then
point x ∗ is a maximum;
• If f 0 ( x ∗ ) = 0, f 00 ( x ∗ ) = 0,..., f ( N −1) ( x ∗ ) = 0, f N ( x ∗ ) > 0, where N is even, then
point x ∗ is a minimum;
• If f 0 ( x ∗ ) = 0, f 00 ( x ∗ ) = 0,..., f ( N −1) ( x ∗ ) = 0, f N ( x ∗ ) 6= 0, where N is odd, then
point x ∗ is a point of inflection.
18
Solved Examples
Example 17. Suppose the monopolist’s profit function is given by
Π (q) = pq − c (q) = (100 − q) q − q2 .
The monopolist aims to maximize profit and thus solves:
max (100 − q) q − q2 .
q
From the necessary first order conditions it follows that
Π0 (q) = 100 − 4q = 0.
So q∗ = 25 is a candidate for a maximum. To check that this indeed is the maximum, we
need to check the second order conditions for optimization. The second derivative
Π00 (q) = −4 < 0
for all q and, in particular, for q = 25. Hence q = 25 is a global maximum.
Another economic example.
Example 18. Suppose that firm minimizes its average cost, which is defined for q > 0 and
is given by:
C (q) = 100/q + q.
Then, first-order conditions imply:
C 0 (q) = −100/q2 + 1 = 0
Therefore, q∗ = 10 (negative output is not allowed). Since,
C 00 (q) = 200/q3 > 0
for all q > 0, q∗ = 10 is a global minimum.
3.2
Multivariate Case
Consider the general maximization problem:
max f ( x1 , x2 , ..., xn ) .
x1 ,...xn
19
The first order conditions for maximization require the first order differential to be zero at
the optimal point. That is, a vector of small changes (dx1 , dx2 , ..., dxn ) should not change
the value of the function. We thus have
df =
This can be satisfied if
∂f
∂f
dx1 + · · · +
dxn = 0.
∂x1
∂xn
∂f
∂f
∂f
= 0,
= 0, ...,
= 0.
∂x1
∂x2
∂xn
These conditions are necessary conditions and they must also hold for minimization problems.
As in the single variable case, we are really after maxima and minima. The first order
conditions alone cannot distinguish between local maxima and local minima. Likewise,
the first order conditions cannot identify whether a candidate solution is a local or global
maxima. We thus need second order conditions to help us. For a point to be a (local)
maximum, we must have d2 f < 0 for any vector of (small) changes (dx1 , dx2 , ..., dxn ); that
is, f needs to be a (locally) strictly concave function. Similarly, for a point to be a (local)
minimum, we must have d2 f > 0 for any vector of (small) changes (dx1 , dx2 , ..., dxn ); that
is, f needs to be a (locally) strictly convex function.2
Definition 5. Point ( x1∗ , x2∗ , ..., xn∗ ) is a local maximum if for all i
∂f
∗
∗= 0
|
∂xi x1 = x1 ,...,xn = xn
and function f ( x1 , x2 , ..., xn ) is concave at ( x1∗ , x2∗ , ..., xn∗ ).
Point ( x1∗ , x2∗ , ..., xn∗ ) is a local minimum if for all i
∂f
∗
∗= 0
|
∂xi x1 = x1 ,...,xn = xn
and function f ( x1 , x2 , ..., xn ) is convex at ( x1∗ , x2∗ , ..., xn∗ ).
Point ( x1∗ , x2∗ , ..., xn∗ ) is a global maximum if for all i
∂f
∗
∗= 0
|
∂xi x1 = x1 ,...,xn = xn
2 In
the definition below, notation
∂f
∗
∗
|
∂xi x1 = x1 ,...,xn = xn
should be understood as a partial derivative of f with respect to xi evaluated at a point x ∗ = x1∗ , x2∗ , ..., xn∗ .
20
and function f ( x1 , x2 , ..., xn ) is concave for all ( x1 , x2 , ..., xn ) (see Figure 11a).
Point ( x1∗ , x2∗ , ..., xn∗ ) is a global minimum if for all i
∂f
∗
∗= 0
|
∂xi x1 = x1 ,...,xn = xn
and function f ( x1 , x2 , ..., xn ) is convex for all ( x1 , x2 , ..., xn ) (see Figure 11b).
If at the point ( x1∗ , x2∗ , ..., xn∗ ) where for all i
∂f
∗
∗ = 0,
|
∂xi x1 = x1 ,...,xn = xn
function f is neither convex, nor concave, then point ( x1∗ , x2∗ , ..., xn∗ ) is a saddle point (see
Figure 12).
Now we need tools for identifying whether a multivariate function is concave or convex.
Higher-Order Derivatives of Multivariate Functions
A single variable function f ( x ) is strictly concave is f 00 ( x ) < 0 and is strictly convex if
f 00 ( x ) > 0. Notice that in the single-variable case, the second-order total differential is:
d2 y = f 00 ( x ) (dx )2 .
Hence, we can (equivalently) define a function of one variable to be strictly concave if
d2 y < 0 and strictly convex if d2 y > 0. The advantage of writing it in this way is that we
can extend this definition to functions of many variables.
A multivariate function f ( x1 , x2 , ..., xn ) is strictly concave if d2 y < 0 and strictly convex if
d2 y > 0. This imposes certain restrictions on its second-order partial derivatives.
Second-order partial derivatives
Given a function f ( x1 , x2 , ..., xn ), the second-order derivative ∂ f 2 /∂xi ∂x j is the partial
derivative of ∂ f /∂xi with respect to x j . The above may suggest that the order in which the
derivatives are taken matters and that the partial derivative of∂ f /∂xi with respect to x j is
different from the partial derivative of ∂ f /∂x j with respect to xi . While this can happen,
it turns out that if the function f ( x1 , x2 , ..., xn ) is well-behaved then the order of differentiation does not matter. This result is called Young’s Theorem. We will be dealing with
well-behaved functions for which Young’s Theorem holds.
21
y
5
0
-5
0
-50
f Hx,yL
-100
-5
0
x
5
−2x12
(a) f ( x1 , x2 ) =
− 2x22 : the function is concave for all
( x1 , x2 ), hence point ( x1 , x2 ) = (0, 0) is a global maximum.
x2
5
0
-5
100
f Hx1 ,x2 L50
0
-5
0
x1
5
(b) f ( x1 , x2 ) = 2x12 + 2x22 : the function is convex
hence point ( x1 , x2 ) = (0, 0) is a global minimum.
for all ( x1 , x2 ),
Figure 11: An example of (a) a strictly concave and (b) a strictly convex function of two
variables.
22
x2
-5
50
5
0
f Hx1 ,x2 L 0
-50
-5
0
5
x1
Figure 12: f ( x1 , x2 ) = 2x12 − 2x22 : the function is convex in the direction of x1 and concave
in the direction of x2 , hence point ( x1 , x2 ) = (0, 0) is a saddle point.
Example 19. Consider a Cobb-Douglas production function f (K, L) = K α L β , where K > 0
is capital input, L > 0 is labor input and 1 > α, β > 0 are some constants. For this function
we can evaluate the second-order partial derivative, ∂2 f /∂L∂K , in two different ways.
First, since
∂f
= βK α L β−1 ,
∂L
taking the partial derivative of this with respect to K, we get,
∂2 f
∂
=
∂L∂K
∂K
∂f
∂L
= αβK α−1 L β−1 .
Alternatively, since
∂y
= αK α−1 L β ,
∂K
∂2 f
∂
∂f
=
= αβK α−1 L β−1 .
∂K∂L
∂L ∂K
This illustrates Young’s Theorem: no matter in which order we differentiate, we get the
same answer.
Note that if K > 0 and L > 0,
∂2 f
> 0,
∂K∂L
23
which means that the marginal productivity of labor (capital) increases as we add more
capital (labor). At the same time, if α < 0 and K > 0, L > 0,
∂2 f
= α (α − 1) K α−1 L β < 0.
∂K2
This means that the marginal productivity of capital decreases as we add more capital.
Concavity and convexity of a multivariate function
We say a function to be concave if d2 y ≤ 0 for all x and to be convex if d2 y ≥ 0 for all x.
If the function satisfies a stronger condition, d2 y < 0 for all x, then it is strictly concave.
Analogously, if d2 y > 0 for all x, it is strictly convex.
Consider a two variable function, y = f ( x1 , x2 ). Its differential is:
dy =
∂y
∂y
dx1 +
dx2 ,
∂x1
∂x2
which again can be viewed as a function of x1 and x2 . Taking a differential, we obtain
"
#
"
#
∂2 y
∂2 y
∂2 y
∂2 y
dx2 dx1 +
d (dy) =
dx1 +
dx1 + 2 dx2 dx2 ,
∂x1 ∂x2
∂x2 ∂x1
∂x12
∂x2
which after collecting terms yields:
d2 y =
∂2 y
∂2 y
∂2 y
2
dx
dx
+
dx
+
2
(dx2 )2 .
(
)
2
1
1
∂x1 ∂x2
∂x12
∂x22
Thus, the second-order total differential depends on the second-order partial derivatives
of f ( x1 , x2 ) . For a general function y = f ( x1 , x2 , ..., xn ) , one can use a similar procedure
to get the formula for the second-order total differential. This is a little more complicated
but it can be written compactly as follows:
d2 y =
n
n
∂2 y
∑ ∑ ∂xi ∂x j dxi dx j .
i =1 j =1
As things stand, it is not clear how to go about verifying that the second-order total differential of a function of n variables is never positive or never negative. However, notice
that we can write the second order differential of a function of two variables,
d2 y =
∂2 y
∂2 y
∂2 y
2
dx
+
2
dx
dx
+
(
)
(dx2 )2 ,
2
1
1
2
2
∂x
∂x
∂x1
∂x2
1 2
24
in matrix form in the following way:

2
d y=
dx1 dx2

∂2 y
∂x12
∂2 y
∂x2 ∂x1
The matrix of second-order partial derivatives
 2
H≡
∂2 y
∂x1 ∂x2
∂2 y
∂x22
∂2 y
∂x1 ∂x2
∂2 y
∂x22
∂ y
∂x12
∂2 y
∂x2 ∂x1


dx1
dx2
.


is called Hessian matrix, which is symmetric by Young’s Theorem. For a general function
y = f ( x1 , x2 , ..., xn ) ,


∂2 y
∂2 y
∂2 y


.
.
.
∂x1 ∂x2
∂x1 ∂xn 
 ∂x2 12
dx
1


y
∂2 y
∂2 y
dx2 
 ∂x∂ ∂x
. . . ∂x2 ∂xn  
2

2
∂x
2
1


2
d y = dx1 dx2 . . . dxn 

  ... 
.
.
.
.

..
..
..
..




dxn
∂2 y
∂2 y
∂2 y
.
.
.
2
∂xn ∂x
∂xn ∂x2
∂x
1
and thus




H≡



∂2 y
∂x12
∂2 y
∂x2 ∂x1
∂2 y
∂x1 ∂x2
∂2 y
∂x22
∂2 y
∂xn ∂x1
∂2 y
∂xn ∂x2
..
.
..
.
n
...
...
..
.
∂2 y
∂x1 ∂xn
∂2 y
∂x2 ∂xn
..
.
∂2 y
∂xn2
...








Now it is clear that to determine whether multivariate function is concave or convex, we
need to know the sign of d2 y, i.e., we are interested in the sign of the quadratic form:
d2 y
=
scalar
dx0
H
dx
(1× n ) ( n × n ) ( n ×1)

=
dx1 dx2



. . . dxn 



∂2 y
∂x12
∂2 y
∂x2 ∂x1
∂2 y
∂x1 ∂x2
∂2 y
∂x22
∂2 y
∂xn ∂x1
∂2 y
∂xn ∂x2
..
.
..
.
...
...
...
...
∂2 y
∂x1 ∂xn
∂2 y
∂x2 ∂xn
..
.
∂2 y
∂xn2



 dx1

  dx2 
 . 
.
  .. 


 dx
n
For a given symmetric matrix H and for any x ∈ Rn five situations may arise:
25
Definition 6. An (n × n) matrix H is:
• positive definite if x0 Hx > 0 for any (n × 1) vector x ∈ R N , x 6= 0n (note that x 6= 0n
means that at least one element of x is not equal 0n ).
• positive semidefinite if x0 Hx ≥ 0 for any (n × 1) vector x ∈ R N , x 6= 0n
• negative definite if x0 Hx < 0 for any (n × 1) vector x ∈ R N , x 6= 0n
• negative semidefinite if x0 Hx ≤ 0 for any (n × 1) vector x ∈ R N , x 6= 0n
• indefinite if x0 Hx > 0 for at least one vector x 6= 0n and x0 Hx < 0 for at least one
vector x 6= 0n .
From the discussion above, if the Hessian is negative definite for all ( x1 , ..., xn ), the function is
strictly concave. If the Hessian is positive definite for all ( x1 , ..., xn ), the function is strictly convex.
So to determine whether a function is concave or convex, we need to be able to determine
whether the Hessian matrix is negative definite or positive definite.
We can classify a symmetric matrix H in one of the above categories using either eigenvalue test or the principal minor test.
Eigenvalue Test
The quadratic form x0 Hx is:
• positive (semi)definite if and only if all the eigenvalues of H are strictly positive
(non-negative);
• negative (semi)definite if and only if all the eigenvalues of H are strictly negative
(non-positive).
Example 20. Consider matrix

1 4 6
A =  4 2 1 .
6 1 6

The characteristic equation is
2
det (A − λI) = (1 − λ) λ − 8λ + 11 − 4 (18 − 4λ) + 6 (6λ − 16) = 0.
This equations of order three with no obvious factorization seems difficult to solve!
Principal Minor Test
26
Definition 7. Let H be an n × n matrix. An i-th order principal minor of H is the determinant of a submatrix of H obtained by deleting n − i rows and the n − i columns with
the same index. The i-th (order) leading principal minor of H is the determinant of the
submatrix obtained from H by deleting the last n − i rows and columns.
Example 21. Let A be a 3 × 3 matrix

a11 a12 a13
A =  a21 a22 a23  .
a31 a32 a33

Principal Minors
There is one third order principal minor of A, det (A). There are three second order principal minors:
a11 a12
• det
, where the submatrix in the minor’s calculation is obtained by deleta21 a22
ing the third row and third column of A.
a11 a13
• det
, where the submatrix in the minor’s calculation is obtained by deleta31 a33
ing the second row and second column of A.
a22 a23
• det
, where the submatrix in the minor’s calculation is obtained by deleta32 a33
ing the first row and first column of A.
There are also three first order principal minors: a11 formed by deleting the last two rows
and columns; a22 formed by deleting the first and last rows and columns; and a33 formed
by deleting the first two rows and columns.
Leading Principal Minors
The ith leading principal minor of the determinant of the submatrix obtained from A by
deleting all columns and all rows after the i-th. Thus
first l.p.m. = a11
a11
second l.p.m. = det
a21

a11 a12
third l.p.m. = det  a21 a22
a31 a32
27
a12
a22

a13
a23  .
a33
Principal Minor Test:
• The quadratic form x0 Hx is positive definite if and only if all leading principal minors
H are positive.
• The quadratic form x0 Hx is negative definite if and only if its leading principal minors
of H alternate in sign, the first being negative (i.e. the first is negative, the second
is positive, the third is negative and so on, that is, the i-th order leading principal
minor has the sign of (−1)i .
• The quadratic form x0 Hx is positive semidefinite for every principal minor is ≥ 0.
• The quadratic form x0 Hx is negative semidefinite if every principal minor of H of odd
order is ≤ 0 and every principal minor of even order is ≥ 0.
Note that in the first two cases, it is enough to check the inequality for all the leading
principal minors (i.e. for 1 ≤ i ≤ n). In the last two cases,we must check for all principal
n
minors (i.e for each i with 1 ≤ i ≤ n and for each of the
principal minors of order
i
i).
Example 22. Matrix
is positive definite.
Matrix
is negative definite.
Matrix
1 1
1 4
−1 1
1 −4
−1 1
1 4
is neither positive definite nor negative definite.
Matrix


1 4 6
 4 2 1 
6 1 6
is indefinite.
In the case of a function of two variables, y = f ( x1 , x2 ) :
28
• d2 y is positive definite (and thus function is convex) if
| H | = ∂2 y
∂x12
∂2 y
∂x2 ∂x1
∂2 y
> 0 and
∂x12
∂2 y 2 2
∂2 y ∂2 y
∂ y
∂x1 ∂x2 × 2−
> 0;
=
2
∂2 y ∂x1 ∂x2
∂x1 ∂x2
2
∂x
2
• d2 y is negative definite (and thus function is concave) if
| H | = ∂2 y
∂x12
∂2 y
∂x2 ∂x1
∂2 y
< 0 and
∂x12
∂2 y 2 2
2y
2y
∂
∂ y
∂
∂x1 ∂x2 =
× 2−
> 0.
2
∂2 y ∂x1 ∂x2
∂x1 ∂x2
2
∂x
2
Note that the condition | H | > 0 implies that
∂2 y
∂x12
and
for both positive definite and negative definite H.
∂2 y
∂x22
should have the same sign
Conditions for stationary point of y = f ( x1 , x2 )
Maximum
∂y
∂x = 0,
FOC:
Minimum
∂y
∂x = 0,
1
∂y
∂x2
∂2 y ∂2 y
∂x12 ∂x22
−
1
∂y
∂x2
=0
∂2 y ∂2 y
,
∂x12 ∂x22
SOC:
< 0,
2
2
∂ y
∂x1 ∂x2
Saddle Point
∂y
∂x = 0,
∂2 y ∂2 y
,
∂x12 ∂x22
>0
∂2 y ∂2 y
∂x12 ∂x22
−
> 0,
2
2
∂ y
∂x1 ∂x2
If
∂2 y ∂2 y
−
∂x12 ∂x22
1
=0
∂2 y
∂x1 ∂x2
>0
∂2 y ∂2 y
∂x12 ∂x22
∂y
∂x2
=0
∂2 y
∂x1 ∂x2
−
2
<0
2
= 0,
the test fails and we need to check the other principal minors to determine whether the
stationary point is a maximum, a minimum or a saddle point.
29
Extended Example: Firm’s Profit Maximization
Suppose a firm can sell it’s output at p per unit and that its production function is given by
y = AK α L β . What combination of capital and labor should the firm use so as to maximize
profits assuming that capital costs r per unit and labor w per unit?
The firm’s profits are given by revenue minus costs:
π̃ (K, L) = pAK α L β − rK − wL.
Firm aims to maximize profits, i.e., it solves the following unconstrained optimization
with multiple variables:
max π̃ (K, L) = max pAK α L β − rK − wL.
K,L
K,L
We can use the first order conditions to obtain potential candidates for optimization. The
first order conditions (FOC) are:
∂π̃
= αpAK α−1 L β − r = 0
∂K
∂π̃
= βpAK α L β−1 − w = 0.
∂L
At the point where FOC are satisfied the objective function attains the maximum only if
it is a concave function. In multivariate setting function is strictly concave if the matrix of
its second order derivatives, called Hessian, is negative definite. In this problem Hessian
is given by
2
∂ π/∂K2 ∂2 π/∂K∂L
H =
∂2 π/∂K∂L ∂2 π/∂L2
α(α − 1) ApK α−2 L β
αβApK α−1 L β−1
=
αβApK α−1 L β−1
β( β − 1) ApK α L β−2
To verify whether matrix is negative definite one can look at leading principal minors and
check whether they alternate in sign with odd order principal minors being negative and
even order principal minors being positive. In this problem this requirement reduces to
the following set of inequalities:
α(α − 1) ApK α−2 L β < 0
β( β − 1) ApK α L β−2 < 0
det( H ) > 0
30
Note that
2
det( H ) = ∂2 π/∂K2 ∂2 π/∂L2 − ∂2 π/∂K∂L
2
= αβ(α − 1)( β − 1) − α2 β2 ApK α−1 L β−1
Thus SOC are given by
α(α − 1) ApK α−2 L β < 0
β( β − 1) ApK α L β−2 < 0
2
α −1 β −1
2 2
> 0.
ApK
L
αβ(α − 1)( β − 1) − α β
SOC inequalities are satisfied if
α−1 < 0
β−1 < 0
αβ(α − 1)( β − 1) − α2 β2 > 0,
where the last inequality is satisfied if α + β < 1 (follows after expanding the product,
simplifying and remembering that α > 0 and β > 0).
4
Constrained Optimization
Until now, we have considered unconstrained problems. Usually, economic agents face
natural constraints.
Example 23. Consumer’s Problem: Suppose that a consumer has a utility function U ( x1 , x2 ) =
x11/2 x21/2 , the price of x1 is p1 , the price of x2 is p2 and the consumer has m in income. How
much of the two goods should the consumer purchase to maximize her utility?
In producer theory we are frequently interested in the following minimization problem:
Example 24. Firm’s Problem Suppose that a firm’s production function is given by f (K, L) =
K1/3 L2/3 , the price of capital is r and the price of labor is w. What is the least cost way for
the firm to produce Q units of output?
Both of the above problems have a common mathematical structure:
max f ( x1 , x2 , ..., xn ) subject to g ( x1 , x2 , ..., xn ) = 0.
x1 ,...xn
31
We say that f ( x1 , x2 , ..., xn ) is the objective function, g ( x1 , x2 , ..., xn ) = 0 is the constraint
and x1 , x2 , ..., xn are the choice variables. We are interested in finding a solution to this
problem


x1∗
 x∗ 
 2 
x∗ =  ..  .
 . 
xn∗
The value function for this problem is derived by substituting x∗ into the objective function
to obtain f ( x1∗ , x2∗ , ..., xn∗ ) .
It is also possible that instead of maximizing f ( x1 , x2 , ..., xn ) we could be minimizing
f ( x1 , x2 , ..., xn ).
Example 25. (Example 23 continued) Utility maximization problem can be written as:
max x11/2 x21/2 subject to p1 x1 + p2 x2 = m.
x1 ,x2
The solution to the problem is a Marshallian demand as a function of prices and income,
i.e., x1∗ = x1 ( p1 , p2 , m) and x2∗ = x2 ( p1 , p2 , m) , while the objective function evaluated at
the optimum is an indirect utility function:
v ( p1 , p2 , m) = ( x1∗ )1/2 ( x2∗ )1/2 .
Similarly,
Example 26. (Example 24 continued) Firm’s cost minimization problem can be stated as:
min rK + wL
K,L
s.t
Q = K1/3 L2/3 .
The solution to the problem is a conditional input demand as a function of r, w and Q, i.e.,
K c = K (r, w, Q) and Lc = L(r, w, Q), while the objective function evaluated at the optimum is a cost function that gives the cost of producing the required level of output Q :
c(r, w, Q) = rK c + wLc .
4.1
Direct Substitution
When the constraint(s) are equalities, we can convert the problem from a constrained
optimization to an unconstrained optimization problem by substituting for some of the
variables.
32
Example 27. (Example 23 continued) In the consumer’s utility maximization problem p1 x1 +
p2 x2 = m. Hence,
p2
1
x1 = m − x2 .
p1
p1
Substituting this into the objective function,
max
x2
p2
1
m − x2
p1
p1
1/2
x21/2 .
This is a function of just x2 and we can now maximize this function with respect to x2 . By
incorporating the constraint into the objective function, we transformed the constrained
optimization problem into the unconstrained optimization problem, which we know how
to solve. The first order conditions give:
1 −1/2
x
2 2
Solving for x2 :
1/2
1
p2
m − x2
p1
p1
1 p2
−
2 p1
1
p2
m − x2
p1
p1
−1/2
x21/2 = 0
1
p2
p2
m − x2 =
x2
p1
p1
p1
1m
2 p2
1
p2
1m
=
m − x2 =
.
p1
p1
2 p1
=⇒ x2 =
=⇒ x1
Firm’s problem can be solved similarly.
4.2
The Lagrangian Approach
The substitution technique has serious limitations:
• In some cases, we cannot use substitution easily: for instance, suppose the constraint
is x4 + 5x3 y + y2 x + x6 + 5 = 0. Here, it is not possible to solve this equation to get
x as a function of y or vice versa.
• Moreover, in many cases, the economic constraints are written in the form g ( x1 , x2 , ..., xn ) ≥
0 or g ( x1 , x2 , ..., xn ) ≤ 0. While the Lagrangian technique can be modified to take
care of such cases, the substitution technique cannot be modified, or can be modified
only with some difficulty.
33
Given a problem
max f ( x1 , x2 , ..., xn ) subject to g ( x1 , x2 , ..., xn ) = 0
x1 ,...xn
write down the Lagrangian function
L ( x1 , x2 , ..., xn , λ) = f ( x1 , x2 , ..., xn ) + λg ( x1 , x2 , ..., xn ) .
Note that the Lagrangian is a function of n + 1 variables: ( x1 , x2 , ..., xn , λ). We then look
for the stationary points of the Lagrangian, that is, points where all the partial derivatives
of the Lagrangian are zero. Using a Lagrangian, we get n + 1 first order conditions:
∂L
= 0, (i = 1, ..., n)
∂xi
∂L
= 0.
∂λ
Solving these equations will give us candidate solutions for the constrained optimization
problem. Candidate solutions still need to be checked using the second-order conditions.
Example 28. (Example 23 continued) In the consumer’s utility maximization problem:
L ( x1 , x2 , λ) = x11/2 x21/2 + λ (m − p1 x1 − p2 x2 ) .
The first order conditions are given by:
1 −1/2 1/2
∂L
x
=
x2 − λp1 = 0
∂x1
2 1
1 −1/2 1/2
∂L
=
x
x1 − λp2 = 0
∂x2
2 2
∂L
= p1 x1 + p2 x2 − m = 0.
∂λ
Interpretation of FOC: If we divide the first two conditions, we get that
MRS12 =
U1
p
= 1.
U2
p2
This says that at the optimum point, the slope of the indifference curve must be equal to
the slope of the budget line.
To solve the problem, note that from the first two conditions it follows that
1 −1/2 1/2
1 −1/2 1/2
x1
x2 = λ =
x
x1
2p1
2p2 2
34
or
x2 =
p1
x1 .
p2
(2)
Substituting this into the budget constraint yields:
x1 =
1m
.
2 p1
Substituting x1 back into (2) and solving for x2 yields:
x2 =
1m
.
2 p2
Firm’s problem can be solved similarly:
Example 29. (Example 24 continued) The Lagrangian for the firm’s problem is:
L = rK + wL − λ K1/3 L2/3 − Q
First order conditions:
∂L
λ
= r − K −2/3 L2/3 = 0
∂K
3
∂L
2λ
= w − K1/3 L−1/3 = 0
∂L
3
∂L
= Q − K1/3 L2/3 = 0
∂λ
(3)
(4)
(5)
Taking ratio of 3 and 4 one obtains:
r
L
=
w
2K
(6)
Substituting for K in 5:
Q=
w 1/3
L
L2/3
2r
From here expressing L it follows:
c
L =
2r
w
1/3
Q.
Substituting L∗ into the ratio of first order 6 conditions to express K in terms of parameters,
one obtains:
w 2/3
Kc =
Q.
2r
35
Note that the technique has been identical for both maximization and minimization problems. This means that the first order conditions identified so far are only necessary conditions and not sufficient conditions. We shall look at sufficient, or second order conditions
later.
The Lagrangian approach amounts to searching for points where:
• The constraint is satisfied.
• The constraint and the level curve of the objective function are tangent to one another.
If we have more than two variables, then the same intuition can be extended. For instance,
with three variables, the Lagrangian conditions will say:
• The rate of substitution between any two variables along the objective function must
equal the rate of substitution along the constraint.
• The optimum point must be on the constraint.
Intuition for the Lagrangian Method
Consider the simplest case of the maximization of a function of two variables subject to
one constraint:
max f ( x1 , x2 ) subject to g ( x1 , x2 ) = 0.
x1 ,x2
Suppose that point
∗
x =
x1∗
x2∗
is a constrained maximum. Therefore any small feasible change in x from this point, that
is, a small movement along the constraint, should not be able to improve the value of the
objective function. We represent small changes in x = ( x1 , x2 ) T by differential notation
dx1
dx =
.
dx2
Then the first-order necessary conditions may be stated as follows:
f x1 dx1 + f x2 dx2 = 0
(7)
However, a feasible change in x does not change the value of the constraint. That is, the
constraint g ( x1 , x2 ) = 0 implies that
gx1 dx1 + gx2 dx2 = 0
36
(8)
and so dx1 and dx2 are no longer both arbitrary. We can take, e.g., dx1 as arbitrary, but
then dx2 must be chosen to satisfy (8).
Taking the ratio of (7) and (8), it is clear that at the optimum
f x1
fx
= 2 ≡ λ.
g x1
g x2
The Lagrange-multiplier method yields the same first-order necessary condition and the
Lagrange multiplier λ makes sure that both (7) and (8) are simultaneously satisfied.
Economic Interpretation of the Lagrangian Multiplier
Note that we did not compute λ in either consumer’s problem or firm’s problem. This is
because our interest is in the values of x1 and x2 (or K and L). However, in some instances,
it is useful to compute λ: this has an economic interpretation in terms of the shadow price
of the constraint.
Suppose we have the problem
max f ( x1 , x2 , ..., xn ) subject to g ( x1 , x2 , ..., xn ) = 0
x1 ,...xn
Suppose we now relax this constraint: instead of requiring g ( x1 , x2 , ..., xn ) = 0, we require
g( x, y) = δ where δ is a small positive number. Clearly, since the constraint has been
changed, the value of the objective function must change. The question is: by how much?
The answer to this question is given by λ. For this reason, λ is referred to as the shadow
price of the constraint. It tells us the rate at which the objective function increases if the
constraint is changed by a small amount.
Example 30. (Example 23 continued) In the consumer’s utility maximization problem, we
can compute
1 −1/2 1/2
x
x2
2p1 1
1
1 m −1/2 1 m 1/2
=
2p1 2 p1
2 p2
1
=
.
2 ( p1 p2 )1/2
λ =
Thus, the shadow price of the constraint tells us that if we give a small amount of additional income to the consumer, then his utility will go up by a factor of
λ=
1
2 ( p1 p2 )1/2
Thus λ represents a marginal utility of income.
37
.
4.3
Second Order Conditions
As with the unconstrained case, we need to check the second-order conditions to ensure
we have an optimum. As before, the second-order sufficient conditions for a maximum is
d2 f < 0 and for a minimum is d2 f > 0. However, because of the constraint, it is no longer
sufficient to look at the Hessian of f to verify these conditions.
Suppose we have a two-variable constrained optimization problem
max f ( x1 , x2 ) or min f ( x1 , x2 ) subject to g ( x1 , x2 ) = 0.
x1 ,x2
x1 ,x2
The second order conditions for this problem differ slightly from the usual conditions
because of the constraint g ( x1 , x2 ) = 0 which implies that dx1 and dx2 must be chosen
to satisfy (8). Thus the second-order sufficient conditions for a maximum is that d2 f < 0
subject to (8) and the second-order sufficient conditions for a minimum is that d2 f > 0
subject to (8).
In practice, to check the second-order sufficient conditions we need to compute the bordered Hessian matrix of the Lagrangian at the critical point that we want to check. The
Lagrangian of the two-variable constrained optimization problem is
L ( x1 , x2 , λ) = f ( x1 , x2 ) + λg ( x1 , x2 ) .
The bordered Hessian is the ’usual Hessian’, bordered by the derivatives of the constraint
with respect to the endogenous variables, here x1 and x2 . That is,


0 g1 g2
H B =  g1 L11 L12  .
g2 L21 L22
The second order conditions state:
• If ( x1∗ , x2∗ , λ∗ ) corresponds to a constrained maximum, then H B evaluated at ( x1∗ , x2∗ , λ∗ )
must be positive.
• If ( x1∗ , x2∗ , λ∗ ) corresponds to a constrained minimum, then H B evaluated at ( x1∗ , x2∗ , λ∗ )
must be negative.
Example 31. (Example 23 continued) In the consumer’s utility maximization problem,

0
p1
p2
H B =  p1 − 41 x1−3/2 x21/2 − 41 x1−1/2 x2−1/2  .
p2 − 14 x1−1/2 x2−1/2 − 14 x11/2 x2−3/2

38
In a general n-variable problem with m (m < n) constraints, (x∗ , λ∗ ) that satisfies the
first-order conditions is
• a local maximum if the last (n − m) leading principle minors of H B alternate in sign
beginning with that of (−1)m+1 ;
• a local minimum if the last (n − m) leading principle minors of H B are of the same
sign as (−1)m .
In both cases, H B must be evaluated at (x∗ , λ∗ ) .
There are also some global results for equality-constrained problems:
• If f ( x1 , ..., xn ) is concave and all constraints are linear in ( x1 , ..., xn ) , then a solution
to the constrained maximization problem is a global maximum.
• If f ( x1 , ..., xn ) is convex and all constraints are linear in ( x1 , ..., xn ) , then a solution
to the constrained minimization problem is a global minimum.
5
The Envelope Theorem
We are interested in studying how the value function of an optimization problem changes
when one of the parameters of the problem changes. A very powerful tool for such investigations if the envelope theorem.
5.1
The Envelope Theorem for Unconstrained Optimization
Suppose we have the unconstrained optimization problem
max f ( x1 , x2 ; α)
x1 ,x2
where α is some exogenous parameter. Suppose that ( x1∗ (α) , x2∗ (α)) solves this optimization problem. Note that the solution will depend upon α. The value function for this problem is derived by substituting ( x1∗ (α) , x2∗ (α)) into the objective function:
V (α) = f ( x1∗ (α) , x2∗ (α) ; α) .
Notice that the value function is a function of the parameter α. Notice also that the value
function depends on α in two different ways:
1. Direct dependence.
39
2. Indirect dependence through x1∗ (α) and x2∗ (α) .
We are interested in knowing how the value function changes when α changes. When we
differentiate the value function, we get:
∂ f ∂x1∗
∂ f ∂x2∗ ∂ f
dV
=
+
+ ,
dα
∂x1 ∂α
∂x2 ∂α
∂α
where the partial derivatives of f are evaluated at the solution ( x1∗ (α) , x2∗ (α)). Now note
that at the optimum (assuming we have an interior solution), it must be the case that
∂f
|
=0
∗
∂x1 x1 = x1 (α)
x2 = x2∗ (α)
and
∂f
|
= 0.
∗
∂x2 x1 = x1 (α)
x2 = x2∗ (α)
Hence, the first two terms drop out and we have
∂f
dV
=
,
dα
∂α
where the partial derivative is evaluated at the point ( x1∗ (α) , x2∗ (α)). This result which is
called the Envelope Theorem says in words: The total derivative of the value function with
respect to the parameter α is the same as the partial derivative of the objective function evaluated
at the optimal point.
Example 32. Consider the unconstrained problem:
max 4x1 + αx2 − x12 − x22 + x1 x2 .
x1 ,x2
The first order conditions:
4 − 2x1 + x2 = 0
α − 2x2 + x1 = 0.
Solving:
8+α
3
2α + 4
=
.
3
x1∗ =
x2∗
40
(We also need to check the second order conditions.)
Substituting x1∗ and x2∗ into the objective function, we obtain the value function:
2α + 4 (8 + α)2 (2α + 4)2 (8 + α) (2α + 4)
8+α
+α
−
−
+
.
V (α) = 4
3
3
9
9
9
By the Envelope Theorem:
dV
2α + 4
= x2∗ =
.
dα
3
5.2
The Envelope Theorem for Constrained Optimization
Now consider the constrained case. We can basically do the same as before. Consider the
problem,
max f ( x1 , x2 ; α) subject to g ( x1 , x2 ; α) = 0.
x1 ,x2
The Lagrangian for this problem is,
L ( x1 , x2 , λ; α) = f ( x1 , x2 ; α) + λg ( x1 , x2 ; α) .
Suppose that ( x1∗ (α) , x2∗ (α) , λ∗ (α)) solves the constrained optimization problem. The
value function for this problem is defined as:
V (α) = f ( x1∗ (α) , x2∗ (α) ; α) .
Let us write the value function as:
V (α) = f ( x1∗ (α) , x2∗ (α) ; α) + λ∗ (α) g ( x1∗ (α) , x2∗ (α) ; α) .
Differentiating with respect to α :
dV
dα
=
∂ f ∂x1∗
∂ f ∂x2∗ ∂ f
+
+
∂x1 ∂α
∂x2 ∂α
∂α
∗
dλ
+
g ( x ∗ (α) , x2∗ (α) ; α)
dα 1
∂g ∂x1∗
∂g ∂x2∗ ∂g
∗
+λ (α)
+
+
,
∂x1 ∂α
∂x2 ∂α
∂α
where again all partial derivatives are evaluated at the solution ( x1∗ (α) , x2∗ (α) , λ∗ (α)) .
This can be written as
∗ ∗
dV
∂f
∂f
∗ ∂g ∂x1
∗ ∂g ∂x2
=
+λ
+
+λ
dα
∂x1
∂x1 ∂α
∂x2
∂x2 ∂α
∗
dλ
∂f
∂g
+
g ( x1∗ (α) , x2∗ (α) ; α) +
+ λ∗ (α) .
dα
∂α
∂α
41
Note that the first two terms on the right hand side drop out because ( x1∗ , x2∗ , λ∗ ) must
satisfy the necessary conditions for constrained optimization. The third term drops out
because g ( x1∗ (α) , x2∗ (α) ; α) = 0. We are left with the following:
dV
dα
=
∂f
∂L ∗ ∗ ∗
∂g
+ λ∗
=
(x , x , λ ) .
∂α
∂α
∂α 1 2
In words: The derivative of the value function with respect to the parameter α is the partial
derivative of the Lagrangian function with respect to α evaluated at the solution ( x1∗ , x2∗ , λ∗ ).
5.3
Extended Example: Firm’s Cost Minimization Problem
Suppose that a firm’s production function is given by f (K, L) = K1/3 L2/3 , the price of
capital is r and the price of labor is w. What is the least cost way for the firm to produce Q
units of output?
Firm’s cost minimization problem can be stated as:
min rK + wL
s.t
K,L
Q = K1/3 L2/3 .
The solution to the problem is a conditional input demand as a function of r, w and Q, i.e.,
K c = K (r, w, Q) and Lc = L(r, w, Q), while the objective function evaluated at the optimum is a cost function that gives the cost of producing the required level of output Q :
c(r, w, Q) = rK c + wLc .
The Lagrangian for the firm’s problem is:
L = rK + wL − λ K1/3 L2/3 − Q
First order conditions:
λ
∂L
= r − K −2/3 L2/3 = 0
∂K
3
∂L
2λ
= w − K1/3 L−1/3 = 0
∂L
3
∂L
= Q − K1/3 L2/3 = 0
∂λ
(9)
(10)
(11)
Taking ratio of 9 and 10 one obtains:
r
L
=
w
2K
42
(12)
Substituting for K in 11:
w 1/3
L
Q=
L2/3
2r
From here expressing L it follows:
c
L =
2r
w
1/3
Q.
Substituting L∗ into the ratio of first order 12 conditions to express K in terms of parameters, one obtains:
w 2/3
c
K =
Q.
2r
The value function of firm’s cost minimization problem is called the cost function:
c(r, w, Q) = rK c + wLc
"
1/3 #
w 2/3
2r
Q.
+w
= r
2r
w
The value function in this case depends on two parameters:(r, w). However, the Envelope theorem is still applicable. For instance, if we want to know how the value function
changes when w changes, we simply treat r as a constant. Thus, by the Envelope Theorem,
differentiating c(r, w, Q) with respect to r and w yields conditional input demands:
∂c
∂r
∂c
∂w
=
Kc
Lc
.
In producer theory this result is referred to as Shepard’s Lemma. You can confirm that the
above is exactly what you will get if you differentiate the value function directly.
6
Integration
The fundamental theorem of calculus is a theorem that links the concept of the derivative
of a function with the concept of the integral.
• The first part of the theorem, sometimes called the first fundamental theorem of
calculus, is that an indefinite integration can be reversed by a differentiation. This
part of the theorem is also important because it guarantees the existence of antiderivatives for continuous functions.
43
• The second part, sometimes called the second fundamental theorem of calculus,
is that the definite integral of a function can be computed by using any one of its
infinitely many anti-derivatives. This part of the theorem has key practical applications because it markedly simplifies the computation of definite integrals.
Integration is useful in economics:
• In microeconomics, consumer surplus, i.e., the difference between what a consumer
is willing to pay and what he actually pays, is an integral.
• In macroeconomics, stock variable (e.g., capital) is an integral of a flow variable (e.g.,
investment).
• In finance, stock price or net present value is an integral of a dividend flow.
• In probability and statistics, moments of random variables are integrals.
There are two types of integrals:
• Indefinite integrals can be seen as “anti-derivatives” that recover the original function from the first derivative.
• Definite integrals calculate the area under a graph. In this form it is very similar to
a sum, but of infinitely many, small parts.
6.1
Indefinite Integrals
We want to find a function F ( x ) that differentiates to f ( x ).
Example 33. Consider
f ( x ) = 3x2
In differentiation the Power Rule implies that if F ( x ) = x n , then F 0 ( x ) = nx n−1 . So guess:
F ( x ) = x3 , then F 0 ( x ) = 3x2 = f ( x ). Hence F ( x ) = x3 is the
• anti-derivative
• primitive
• Integral of f ( x )
44
The first fundamental theorem of calculus: Let f be a continuous real-valued function
defined on a closed interval [ a, b]. Let F be the function defined, for all x in [ a, b], by
F (x) =
Z x
a
f ( x̃ ) dx̃
Then, F is continuous on [ a, b], differentiable on the open interval ( a, b), and
F 0 ( x ) = f ( x ),
for all x in ( a, b).
In
F (x) =
Z
f ( x ) dx,
f ( x ) is known as the integrand.
d
Is F ( x ) = x3 the only anti-derivative of f ( x ) = 3x2 ? No as dx
( F ( x ) + c) = f ( x ) for any
constant c. This arbitrary constant is called the constant of integration.
6.2
Rules of Integration
• Integration is linear:
Z
( f ( x ) + g ( x )) dx =
Z
f ( x ) dx +
Z
g ( x ) dx.
• Power function rule: For n 6= −1
n
f ( x ) = ax ⇒ F ( x ) =
Z
a
x n+1 + c.
n+1
R
=
f ( x )dx
f ( x ) dx =
Example 34. f ( x )
=R 3x2
⇒
F(x)
0
f ( x ) = 5= 5x ⇒ F ( x ) = f ( x )dx = 5x + c
• There is no general product rule, but
Z
• Exponential Rule: Recall that
f ( x ) = ae
k f ( x ) dx = k
d kx
dx e
kx
Z
f ( x ) dx.
= kekx . Then,
⇒ F (x) =
45
Z
a
f ( x ) dx = ekx + c.
k
=
x3 + c
Example 35. f ( x ) = 6e2x ⇒ F ( x ) =
• Log Rule: Recall that
d
dx
f ( x )dx = 3e2x + c
R
ln ( x ) = 1x . Then,
1
⇒ F ( x ) = f ( x ) dx = ln ( x ) + c.
f (x) =
x
R
5
Example 36. f ( x ) = x+
⇒
F
x
=
f ( x ) dx = 5 ln( x + 2) + c
(
)
2
Z
• The Substitution Rule: This technique operates through a “change of a variable”
which converts an intractable integral into a form where it can be solved.
Z
du
f (u) dx =
dx
Z
f (u) du = F (u) + c.
This is the “inverse” of the chain rule of differentiation.
R
Example 37. Find 3x2 ( x3 + 1)dx. Let u = x3 , then
Z
2
du
(u + 1)dx
Z dx
= (u + 1) du (by substitution rule)
3
3x ( x + 1)dx =
Z
u2
+u+c
2
x6
=
+ x3 + c
2
=
• Integration by Parts:
Z
vdu = uv −
Z
udv
This is a direct consequence of the product rule of differentiation. Recall that
(uv)0 = u0 v + uv0 .
Integrating both sides of the above expression gives
Z
0
(uv) dx =
Since by definition of an integral
Z
R
Z
0
u vdx +
Z
uv0 dx.
(uv)0 dx = uv,
0
u vdx = uv −
Z
uv0 dx.
The first term on the RHS is the product of the integral of u and v and the second term is the
integral of a product function which consists of the integral of u and the derivative of v.
46
Figure 13: A definite integral of f ( x ) over the interval [ a, b] as an area under the curve.
R
Example 38. Find ln ( x ) dx. Let v = ln ( x ) , u = x ⇒ dv = 1x dx, du = dx, then
Z
ln ( x ) dx =
Z
vdu
= uv −
Z
udv by integration by parts
= x ln ( x ) −
Z
1dx
= x ln ( x ) − x + c.
6.3
Definite Integral
Let f ( x ) be a continuous function on the interval [ a, b], where a and b are real numbers
with a < b. A definite integral of f ( x ) over the interval [ a, b] gives the area underneath
the graph of the function between a and b, where the parts below the x-axis are subtracted.
What is the area bounded by the curve y = f ( x ), the vertical lines x = a and x = b and the
x-axis? A first approximation of this area can be obtained by cutting the x-axis between
a and b into intervals of equal length and thus creating rectangles of equal width, where
the top right-hand corner touches the curve y = f ( x ) (see yellow rectangles in Figure 14).
Thus, if we split the interval [ a, b] into 5 subintevals {[ x0 , x1 ] , [ x1 , x2 ] , ..., [ x4 , x5 ]}, where
x0 = a and x5 = b, the sum of the rectangle areas is
5
( x1 − x0 ) f ( x1 ) + ( x2 − x1 ) f ( x2 ) + ... + ( x5 − x4 ) f ( x5 ) =
∑ ( x i − x i −1 ) f ( x i ) .
i =1
47
Figure 14: A definite integral of f ( x ) over the interval [ a, b] as a sum.
However, this method of estimating the area leads to some errors thereby we either overestimate (like in Figure 14) or underestimate the area (in Figure 14 green rectangles underestimate the area as the height of the rectangle is the value of the function at the left-hand
boundary of the subinterval). We can reduce these errors by creating many sub-intervals
(see green rectangles in Figure 14). This suggests that a definite integral of f ( x ) over the
interval [ a, b] can be viewed as the limit of the sum of the areas of the rectangles as the size of
each rectangle gets infinitesimally small and the number infinitely large.
The intuition that the integral is the area under the graph is sufficient for (almost) all
economics and finance. For example, consumer surplus is an area under the demand
curve in price/quantity space. In macroeconomics and finance examples from Section 6
flow variables are graphed against time (i.e., with time on x-axis).
The second fundamental theorem of calculus: Let f and F be real-valued functions
defined on a closed interval [ a, b] such that the derivative of F is f , i.e., f and F are such
that for all x in [ a, b],
F 0 ( x ) = f ( x ).
Then,
Z b
a
f ( x ) dx = [ F ( x )]ba = F (b) − F ( a) .
As discussed earlier, if F ( x ) is an anti-derivative of f , then G ( x ) := F ( x ) + c is also an
anti-derivative of f for any constant c. However, the value of the definite integral does not
48
depend on the choice of the anti-derivative, as
G (b) − G ( a) = F (b) + c − F ( a) − c = F (b) − F ( a) .
So in practical terms, we can then just ignore the constant term when evaluating definite
integrals.
Process of calculating a definite integral:
1. Determine indefinite integral
2. Evaluate at boundaries
3. Subtract F ( a) from F (b)
R1
Example 39. Find 0 xdx.
1. F ( x ) =
R
xdx =
x2
2
02
2
2
= 0, F (1) = 12 = 12
h 2 i1
R1
3. ⇒ 0 xdx = x2 = 12 − 0 = 12
2. F (0) =
0
Properties of definite integral
1.
Ra
2.
Ra
3.
Rc
4.
Rb
b
a
a
a
f ( x )dx = F ( a) − F (b) = −( F (b) − F ( a)) = −
Rb
a
f ( x )dx = F ( a) − F ( a) = 0
f ( x )dx =
Rb
a
k f ( x )dx = k
f ( x )dx +
Rb
a
Rc
b
f ( x )dx ( a < b < c)
f ( x )dx
Rb
Rb
[ f ( x ) + g( x )] dx = a f ( x )dx + a g( x )dx
R4
Example 40. Calculate 9 2√1 x dx:
5.
Rb
a
Z 4
9
√ 4
1
√ dx =
x 9
2 x
√
√
= 4− 9
= 2 − 3 = −1.
49
f ( x )dx
Calculate
R9
1
√
4 2 x dx:
Z 9
4
√ 9
1
√ dx =
x 4
2 x
√
√
= 9− 4
= 3−2 = 1
Another example:
Example 41. Calculate
Re
1
ln ( x ) dx:
Z e
1
ln ( x ) dx = [ x ln ( x ) − x ]1e
= e ln (e) − e − (1 ln (1) − 1)
= e−e−0+1 = 1
Sometimes we need to take integrals when the interval is not bounded. For example:
• Evaluating the present value of an infinite stream of benefits of a financial asset.
• Evaluating the consumer surplus of a constant elasticity demand function q = ape ,
as this demand curve it does not hit the y-axis.
In this case,
Z ∞
a
6.4
f ( x ) dx = lim F (y) − F ( a) .
y→∞
An Application: Continuous Compounding
In finance, the present value of an asset can be approximated as a definite integral. Consider a continuous stream of income c for T years. Since a pound today is not the same as
having it a year from now, we discount future income. If the discount rate is r , then the
income c received t years into the future is worth
c
(1 − r ) t
in today’s terms. Thus, the present value of an asset paying c every year into the future is
PV =
c
(1 − r )
0
+
c
(1 − r )
50
1
+ ... +
c
(1 − r ) T
.
When time becomes ’continuous’ it can be shown that the present value of an asset paying
an amount c at time t into the future is ce−rt . In this case, the present value of the asset is
PV =
Z T
0
ce−rt dt
Z T
e−rt dt
0
1 −rt T
=c − e
r
0
c
−rT
.
1−e
=
r
=c
Note that for an infinitely lived asset
Z ∞
ce−rt dt
0
c
= lim
1 − e−rT
T →∞ r
c
= .
r
PV =
This follows because e−rT goes to zero as T becomes very large.
51

Download Report

September Math Course: Multivariate Calculus

Paperzz.com

Your Paperzz