September Math Course: Multivariate Calculus Arina Nikandrova∗ 1 Functions Function y = f ( x ), where x is either be a scalar or a vector of several variables ( x1 , ..., xn ) , can be thought of as a “rule” which converts an input (denoted typically by x) into an output (denoted typically by y): • y is a function of x if you can draw it from left to right without “doubling back,” i.e., only one value of y should correspond to a value of x. • y is a continuous function of x if you can draw it without removing your pencil from the page. • y is a differentiable function of x if it is continuous and contains no kinks. In this part of the course we will focus on functions where the input consists of many variables. Such functions are common in economics. Example 1. A consumer’s utility is a function of all the goods he consumes. So if there are n goods, then his utility is a function of the quantities ( x1 , x2 , ..., xn ) he consumes. We represent this by writing u ( x1 , x2 , ..., xn ). A firm’s production is a function of the quantities of all the inputs it uses. So, if ( x1 , x2 , ..., xn ) are the quantities of the inputs used by the firm and y is the level of output produced, then we have y = f ( x1 , x2 , ..., xn ) , where f (·) is the production function. ∗ e-mail: [email protected] 1 Figure 1: Slope of y = 2x 2 2.1 First Order Derivative First Order Derivative of Univariate Functions Consider function of one variable, f ( x ) . If this function is differentiable at a given point, x0 , it has both a value (its “height”), y0 = f ( x0 ) , and a slope. The slope tells us the rate of change: how much y changes when x changes by a given amount. Example 2. (Linear Function) The simplest function to consider is a linear function of the form y = ax + b. Start at any point ( x0 , y0 ) on the line and move along the line so that the x-coordinate increases by one unit. The corresponding change in the y-coordinate is called the slope of the line. The slope tells us the rate of change: how much y changes when x changes by a given amount. The defining characteristic of a line is that this rate of change is constant: a ( x0 + 1) + b − ( ax0 + b) ∆y = = a. ∆x x0 + 1 − x0 For non-linear functions the same change in x leads to different changes in y, depending on the starting point x0 . Example 3. Consider a quadratic function y = x2 . If we start at x0 = 1 and increase x by 1, then y changes by 3 (i.e., 4 − 1). If we start at x0 = 2, however, then increasing x by 1 changes y by 5 (i.e., 9 − 4). Thus the same change in x leads to different changes in y. Consequently, for non-linear functions we cannot define a global notion of the slope. However, it is possible to define a notion of the slope which is valid when the change in x is “small.” 2 Figure 2: Change in y = x2 when x increases by 1 starting from x0 = 1 and x0 = 2 Example 4. Consider the quadratic function y = x2 . The line y = 4x − 4 just touches the curve y = x2 at the point ( x, y) = (2, 4). This follows as 4 = y = x2 = 22 and 4 = y = 2x − 4 = 2 × 4 − 4. Such a line is called a tangent line. The tangent line has the property that it “looks the same as the function around the point at which it just touches the function.” The tangent line shows the rate of change in y at a point for small changes in x. The slope of the tangent line at point x0 is called the derivative at the point x0 . The derivative of a function f ( x ) at the point x0 is denoted by f 0 ( x0 ) . The total differential of f ( x ) at x0 represents the principal part of the change in a function y = f ( x ) with respect to changes in x and is defined by the following: dy = f 0 ( x0 ) dx. The total differential is a way of understanding the local rate of change of the function f ( x ) around the point x0 . That is, it is an algebraic way of denoting the slope of a function dy d f (hence the alternative notation for a derivative: y0 , dx , dx ). Example 5. (Production Costs) Imagine that y = c ( x ) represents the costs of production in £ and x the quantity produced by a firm. The derivative of c (·) at a given x0 tells us how costs change in response to a change in quantity, provided the change is small. For example, if we know that the derivative at x0 = 2 is 4, this tells us that if the quantity produced changes by a small amount dx, then the impact on cost is given approximately by the total differential dy = 4dx. Economists have a special for the derivative of the cost function: it is called marginal cost. 3 Figure 3: Tangent of y = x2 at the point ( x, y) = (2, 4). Figure 4: The total differential of f ( x ) at x0 represents the main part of the change in f ( x ) with respect to any – not necessarily small – changes in x. 4 Since the rate of change along a curve is changing constantly, the derivative has to be computed separately at each possible value of x. The derivative is thus a local phenomenon: it tells us something about the rate of change in the neighborhood of a point, but it gives no information about the rate of change globally. Example 6. The information that the derivative of y = x2 (i.e., dy = 2xdx) at x = 2 is 4 tells us that the rate of change in y is 4 when x is “close” to 2. It does not give any information about the rate of change at x = 10, and so on. Formally, the derivative can be thought of as a separate function of x, a slope or derivative function given by: f ( x + h) − f ( x ) . f 0 ( x ) ≡ lim h h →0 Given a function y = f ( x ), the derivative function simply associates to every x the slope of the tangent line at x. Typically, when we talk about the derivative, we mean the derivative as a function. So when we want to talk about the value of the derivative at a point x0 , we shall mention it by saying “the derivative at x0 is ... .” 2.2 Rules of Differentiation • Differentiation is linear: For any functions f and g and any real numbers a and b the derivative of the function h ( x ) = a f ( x ) + bg ( x ) with respect to x is h0 ( x ) = a f 0 ( x ) + bg0 ( x ) . • Power function rule: The derivative of power function h ( x ) = x n is h0 ( x ) = nx n−1 . Special cases include: – Constant rule: if f is the constant function f ( x ) = c, for any number c, then for all x, f 0 ( x ) = 0. – If f ( x ) = x, then f 0 ( x ) = 1. These special cases imply that the derivative of an affine function is constant, i.e., if f ( x ) = ax + b, then f 0 ( x ) = a. This makes sense as shifting a function doesn’t change its slope and so additive constants disappear. • The product rule: For any functions f and g the derivative of the function h ( x ) = f ( x ) g ( x ) with respect to x is h0 ( x ) = f 0 ( x ) g ( x ) + f ( x ) g0 ( x ) . 5 • Quotient rule: The derivative of function h ( x ) = h0 ( x ) = f (x) , g( x ) where g ( x ) 6= 0, is: f 0 ( x ) g ( x ) − f ( x ) g0 ( x ) . g2 ( x ) • The chain rule: The derivative of the function of a composite function h ( x ) = f ( g ( x )) with respect to x is h0 ( x ) = f 0 ( g ( x )) g0 ( x ) . • The inverse function rule: If the function f has an inverse function g, meaning that g ( f ( x )) = x and f ( g (y)) = y, then g0 (y) = 1 f 0 ( g (y)) . • The basic rules for differentiating exponential and logarithmic functions: – The derivative of f ( x ) = e x is f 0 ( x ) = e x , where e = 2.71828 is the Euler’s number. – The derivative of f ( x ) = ln x is f 0 ( x ) = 1/x, where ln is the natural logarithm with the base e = 2.71828. Intuition for Chain Rule: Let demand be a function of price, q ( p) = a − bp, and price vary with time, so that p (t) = t2 . Then demand is a composite function that also depends on time, q ( p (t)). How does demand vary with time? • What is −b dq dp ? Omit the influence of t and imagine that can vary p directly, then • What is dp dt ? By power rule, dp dt dq dp = = 2t • Overall, we need to consider the chain reaction as a change in t leads to a change in p, in turn changing q : d dq dp q ( p (t)) = = q0 ( p) p0 (t) = −2bt. dt dp dt dp where dt is the small change in p brought about by a small change in t, and small change in q brought about by a small change in p. dq dp is the To verify the validity, note that by substituting p (t) into q ( p) , we get quantity as a function of time q (t) = a − bt2 and q0 = −2bt. 6 Intuition for Inverse Function Rule: If y = f ( x ) is a strictly monotonic (or 1:1) function, its inverse, x = f −1 (y), is also a function. Formally: f −1 (y) = { x : y = f ( x )} . Thus inverse is a function if to each value of y corresponds only one value of x, e.g., parabola is ruled out (why?). y−b Example 7. Inverse of y = f ( x ) = ax + b is function g (y) = a . Inverse of y = f ( x ) = √ x2 , where x > 0, is function g (y) = y. We can think of these two functions as inverses. If we take x as the input, apply f to it and then pass this output through the function g, we get back x. Computationally, we just express x from equation y = f ( x ) to obtain x = f −1 ( y ) ≡ g ( y ) . The derivatives of inverse functions are related to each other. If we apply chain rule to both sides of x = g ( f ( x )) , when g (·) ≡ f −1 (·): g0 (y) = 1 f 0 (x) . However, for the above display to make sense we need to express x in terms of y on the RHS. Example 8. If y = f ( x ) = x2 , where x > 0, then the derivative of its inverse is g0 (y) = 1 f 0 (x) = 1 1 = √ , 2x 2 y where the last equality follows as by the definition of inverse x = f −1 (y) = 2.3 √ y. First Order Derivative 0f Multivariate Function We have considered functions of a single variable until now. Most economic problems involve more than one variable, so consider a function y = f ( x1 , x2 , ..., xn ) . Partial Derivatives The partial derivative of f with respect to xi is the derivative of f with respect to xi treating all other variables as constants and is denoted by ∂ f /∂xi or f i : ∂ f ( x1 , ..., xi−1 , xi + h, xi+1 , ..., xn ) − f ( x1 , x2 , ..., xn ) f ( x1 , x2 , ..., xn ) ≡ lim . ∂xi h h →0 In order to calculate partial derivatives, we can apply the usual rules of differentiation. 7 (b) Cross section when L = 1 (a) 3D graph Figure 5: Cobb-Douglas production function f (K, L) = K0.5 L0.5 Example 9. Consider a Cobb-Douglas production function f (K, L) = K α L β , where K > 0 is capital input, L > 0 is labour input and 1 > α, β > 0 are some constants. Then, ∂f = αK α−1 L β > 0 ∂K ∂f = βK α L β−1 > 0. ∂L So, for a given labor input, more capital raises output and, for a given capital input, more labour raises output. Mathematically, the partial derivative of f with respect to xi tells us the rate of change when only the variable xi is allowed to change. Economically, the partial derivatives give us useful information: • With a production function, the partial derivative with respect to the input, xi , tells us the marginal productivity of that factor, or the rate at which additional output can be produced by increasing xi , holding other factors constant. • With a utility function, the partial derivative with respect to good xi tells us the rate at which the consumer’s well being increases when she consumes additional amounts of xi holding constant her consumption of other goods, i.e., the marginal utility of that good. 8 Total Differentials Partial derivatives are multivariate extensions of derivatives; total differentials are multivariate extensions of differentials. For functions of more than one independent variable, y = f ( x1 , x2 , ..., xn ) , the partial differential of y with respect to any one of the variables xi is the principal part of the change in y resulting from a change dxi in that one variable. ∂y The partial differential is therefore ∂x dxi involving the partial derivative of y with respect i to xi . The sum of the partial differentials with respect to all of the independent variables is the total differential ∂y ∂y dy = dx1 + · · · + dxn , ∂x1 ∂xn which is the principal part of the change in y resulting from changes in all independent variables. To gain some intuition about total differentials,1 suppose there are two variables and consider the plane y = a0 + a1 x1 + a2 x2 . How does the function behave when we change x1 and x2 ? Clearly, if dx1 and dx2 are the amounts by which we change x1 and x2 , we have, ∂y ∂y dy = a1 dx1 + a2 dx2 . Note furthermore that the partials are, ∂x = a1 and ∂x2 = a2 . We can 1 then write total change in y as: dy = ∂y ∂y dx1 + dx2 . ∂x1 ∂x2 Rewriting this in matrix notation: dy = h ∂y ∂x1 ∂y ∂x2 i dx 1 . dx2 h i ∂y ∂y In the case of the plane, the vector of all partial derivatives is given by ∂x = ∂x2 1 a1 a2 . This vector tells us the rates of change in the directions x1 and x2 . Now consider a more general two variable function, y = f ( x1 , x2 ). With a general function, the idea is to find a plane which looks locally like the function around the point ( x1 , x2 ). Since the partial derivatives give the rates of change in x1 and x2 , it makes sense to 1 Recall that we motivated the notion of a derivative by saying that it was the slope of the line which “looked like the function around the point x0 .” When we have n variables, the natural notion of a “line” is given by the following linear function: y = a0 + a1 x1 + a2 x2 + ... + an xn . (1) In general, the function (1) is referred to the equation of a plane (it certainly is the equation of a plane when there are two variables, x1 and x2 ). 9 f Hx,5L y -10 5 0 -5 -5 10 5 100 10 x -50 0 fHx,yL -100 -100 -200 -5 0 5 10 x (a) Function f ( x, y) = − x2 − y2 (blue surface) and the tangent plane at point (4, 5) (red surface). The tangent plane, given by z = −8 ( x − 4) − 10 (y − 5) − 41, looks like f ( x, y) = − x2 − y2 around (4, 5) . The derivative of f ( x, y) = − x2 − y2 is the slopes in the two directions of the tangent plane. (b) Cross-section when y = 5 : The slope of the red line represents the partial derivative of f ( x, y) = − x2 − y2 with respect x at point (4, 5) . Figure 6: Function f ( x, y) = − x2 − y2 and its derivative. pick the appropriate plane which passes through the point ( x1 , x2 ) and has slopes ∂y/∂x1 and ∂y/∂x2 in the htwo directions. The derivative of the function f ( x1 , x2 ) at ( x1 , x2 ) is i simply the vector ∂y ∂x1 ∂y ∂x2 , where the partial derivatives are evaluated at the point ( x1 , x2 ). We can interpret the derivative as the slopes in the two directions of the plane which looks “like the function” around the point ( x1 , x2 ). For a general function of n variables, y = f ( xh1 , x2 , ..., xn ) , the i derivative of f at point ∂y ∂y ( x1 , x2 , ..., xn ) is the vector of partial derivatives ∂x ... ∂xn . This vector defines a lin1 ear map, which is the best linear approximation of the function f near the point ( x1 , x2 , ..., xn ). This linear map is thus the generalization the usual notion of derivative. Example 10. For function f (K, L) = K α L β , the vector of partial derivatives is h i ∂f ∂f α−1 L β βK α L β−1 . = αK ∂K ∂L Then total differential of f is: df = αK α−1 L β βK α L β−1 10 dK dL . Total Derivatives While the partial derivative of f with respect to xi treats all other arguments of f as constants, the total derivative of f acknowledges that other arguments of f may also vary with xi due to some postulated relationship. Finding the total derivative relies on the chain rule. Definition. Consider function f ( x, y, z, t) , where x, y, and z depend on t. Then, the chain rule is given by: ∂ f dx ∂ f dy ∂ f dz ∂ f df = + + + . dt ∂x dt ∂y dt ∂z dt ∂t In particular notice that ∂f df 6= , dt ∂t as t has a direct effect on f , given by z. Example 11. Consider a function where ∂f ∂t and an indirect effect through its effect on x, y and y = 3x − w2 , x = 2w2 + w + 4. ∂y Here w has a direct effect on y, given by ∂w and an indirect effect through its effect on x. Hence, the total derivative of y with respect to w is ∂y dx ∂y dy = + dw ∂x dw ∂w = 3 (4w + 1) − 2w. Note that unless w = −1/4, ∂y dy 6= . dw ∂w A more complicated example. Example 12. Consider a function y = f ( x1 , x2 , w ) , where x1 = g ( w ) 11 and x2 = h ( w ) . ∂y Here w has a direct effect on y, given by ∂w and an indirect effect through its effect on x1 and x2 . Hence, the total derivative of y with respect to w is ∂y dx1 ∂y dx2 ∂y dy = + + dw ∂x1 dw ∂x2 dw ∂w 0 = f 1 ( x1 , x2 , w ) g ( w ) + f 1 ( x1 , x2 , w ) h 0 ( w ) + f 3 ( x1 , x2 , w ) . Problem 1. Consider the function z = x2 y − 10x − 1 , t3 where x = e1−y and t = 3y. 1. Find the partial derivative of z with respect to y. 2. Find the total derivative of z with respect to y, dz/dy. 3 3.1 Unconstrained Optimization Univariate Case We will consider the following maximization problem max f ( x ) x or minimization problem min f ( x ) . x First Order Conditions: Necessary Conditions for Local Extrema If a differentiable function f ( x ) reaches its maximum or minimum at point x ∗ then f 0 ( x ∗ ) = 0. To see this consider the total differential: dy = f 0 ( x ∗ ) dx. If the function reaches a maximum or minimum at x ∗ then it must be impossible to increase or decrease the value of the function by small changes in x. However, if f 0 ( x ∗ ) 6= 0, 12 then it is always possible to make by larger or smaller by making (small) appropriate changes in x. Therefore, we must have f 0 ( x ∗ ) = 0 at a maximum or a minimum. Any point satisfying the condition f 0 ( x ∗ ) = 0 may be referred to as a stationary point; when a point satisfying f 0 ( x ∗ ) = 0 is a minimum or a maximum, it is referred to as a critical value or extremum. We need to distinguish between local (or relative) extrema and global extrema. Figure 7a illustrates the difference, which is also explained in Definition 1. Definition 1. A point x ∗ is called a global maximum of the function f ( x ) if f ( x ∗ ) ≥ f ( x ) for all x in the domain of f . A point x is called a local maximum of the function f ( x ) if there is a “small interval” centered at x ∗ such that f ( x ∗ ) ≥ f ( x ) for all x in this small interval. A point x ∗ is called a global minimum of the function f ( x ) if f ( x ∗ ) ≤ f ( x ) for all x in the domain of f . A point x is called a local minimum of the function f ( x ) if there is a “small interval” centered at x ∗ such that f ( x ∗ ) ≤ f ( x ) for all x in this small interval. The condition f 0 ( x ∗ ) = 0 at a maximum or minimum is valid only if x ∗ is in the “interior” of the domain of the function. This is because the argument for showing that f 0 ( x ∗ ) = 0 is a necessary condition for x ∗ to be a maximum or a minimum relies on the ability to make small changes in x around x ∗ . However, at a “boundary point” we cannot make certain changes. For instance, if the function is defined for all x in the interval [ a, b], then at a, we can only increase x, while at b, we can only decrease x. Hence, it is possible that the maximum (or minimum) occurs at a or b and yet this boundary point does not satisfy the necessary condition for maximization (or minimization). For example, in Figure 7a the global minimum of a function defined for x ∈ [0, 6] occurs at point x = 0 and the global maximum occurs at point x = 6, neither of which satisfies the first order condition f 0 ( x ∗ ) = 0. Condition f 0 ( x ∗ ) = 0 is called a necessary condition because it cannot guarantee that x is indeed a maximum or minimum. It is entirely possible that f 0 ( x ∗ ) = 0 but x ∗ is neither a maximum nor a minimum. Example 13. Consider function f ( x ) = ( x + 2)3 + 5. Note that f 0 (−2) = 0, but point x = −2 is neither maximum, nor minimum (see Figure 7b). Second Order Conditions: Sufficient Conditions for Local Extrema Condition f 0 ( x ∗ ) = 0 on its own does not distinguish local maxima from local minima. To tell whether point x ∗ is a local maximum or a local minimum, we need to look at the sign of function f 0 ( x ) in the immediate neighborhood of x ∗ , where neighborhood is defined as points immediately to the left and immediately to the right of x ∗ : 13 fHxL 60 40 20 -6 -4 -2 2 x -20 -40 -60 (b) The point where f 0 ( x ) = 0 is the point of inflection. (a) Function f ( x ) defined for x ∈ [0, 6] : Each point where f 0 ( x ) = 0 corresponds to either local minimum or local maximum, but condition f 0 ( x ) = 0 does not identify global minimum or maximum. Moreover, condition f 0 ( x ) = 0 on its own does not distinguish local maximum from local minimum. Figure 7: The first order condition f 0 ( x ) = 0 is a necessary, but not sufficient condition for local minima and maxima 14 fHxL fHxL 4 12 10 2 8 1 2 3 4 5 6 x 6 4 -2 2 -4 1 (a) f ( x ) = ( x − 3)2 + 4: Point x = 3 is a maximum as f 0 ( x ) is decreasing (changes sign from positive to negative) in the neighborhood of x = 3. 2 3 4 5 6 x (b) f ( x ) = − ( x − 3)2 + 4: Point x = 3 is a minimum as f 0 ( x ) is increasing (changes sign from negative to positive) in the neighborhood of x = 3. Figure 8: The second order conditions, i.e, the conditions on the sign of f 00 ( x ) , are sufficient for determining local minima and maxima • Point x = x ∗ is a local maximum if in the neighborhood of x ∗ , f 0 ( x ) is positive for x < x ∗ and is negative for x > x ∗ ; • Point x = x ∗ is a local minimum if in the neighborhood of x ∗ , f 0 ( x ) is negative for x < x ∗ and is positive for x > x ∗ ; • Point x = x ∗ is neither a local maximum nor a local minimum if in the neighborhood of x ∗ , f 0 ( x ) does not change sign. An equivalent way to express the above conditions is to say that • Point x = x ∗ is a local maximum if in the neighborhood of x ∗ , f 0 ( x ) is a decreasing function; • Point x = x ∗ is a local minimum if in the neighborhood of x ∗ , f 0 ( x ) is an increasing function; • Point x = x ∗ is neither a local maximum nor a local minimum if in the neighborhood of x ∗ , f 0 ( x ) is neither increasing nor decreasing. 15 This last set of conditions can be expressed more succinctly in terms of second order derivatives, but it requires a few new definitions. Recall that function f 0 ( x ) ≡ lim h →0 f ( x + h) − f ( x ) . h is the first derivative of the function f . The first derivative indicates whether a function is increasing or decreasing. A function f ( x ) is weakly decreasing at point x if f 0 ( x ) ≤ 0; a function f ( x ) is weakly increasing at point x if f 0 ( x ) ≥ 0. If the inequalities are strict, then the function is strictly decreasing or strictly increasing. Since the derivative itself is a function, we can take its derivative. This is called the second derivative and denoted d2 f /dx2 or f 00 ( x ). Formally, d d2 f = 2 dx dx df dx . The second derivative indicates whether the first derivative of a function is increasing or decreasing, thereby describing the curvature of the function. Definition 2. A function f ( x ) is called concave if f 00 ( x ) ≤ 0 at all points of its domain; a function f ( x ) is called convex if f 00 ( x ) ≥ 0 at all points of its domain. If the inequalities are strict, then the function is called strictly concave or strictly convex. Example 14. The function f ( x ) = x2 is convex on its domain; function g ( x ) = ln x is concave on the domain x > 0. A function may be neither concave nor convex on its entire domain. Example 15. Consider f ( x ) = −2x3 /3 + 10x2 + 5 defined for x ≥ 0. In this case, f 00 ( x ) = −4x + 20 and thus: • for 0 < x ≤ 5, f 00 ( x ) ≥ 0 and function is convex; • for x > 5, f 00 ( x ) < 0 and function is concave. Definition 3. A function f ( x ) is called concave at x ∗ if f 00 ( x ∗ ) ≤ 0; a function f ( x ) is called convex at x ∗ if f 00 ( x ∗ ) ≥ 0 Recall Figure 7b where f 0 (−2) = 0, but point x = −2 is neither maximum, nor minimum. This point is called the inflection point. Definition 4. The point where a function changes its curvature is called an inflection point. 16 fHxL 4 fHxL 1 3 1 2 2 3 4 x -1 1 -2 -2 -1 1 2 x (a) f ( x ) = x2 is convex for all x ∈ (−∞, ∞) (b) g ( x ) = ln x is concave for all x ∈ (0, ∞) Figure 9: An example of (a) a strictly convex and (b) a strictly concave function. fHxL 300 250 200 150 100 50 2 4 6 8 10 x Figure 10: Function f ( x ) = −2x3 /3 + 10x2 + 5: point x = 5 is an inflection point where the function changes its curvature from convex (for 0 < x < 5) to concave (for x > 5). 17 As an aside, note that since the second derivative is also a function, we can also take its derivative. This is called the third derivative and denoted f 000 ( x ) to indicate that this function is found by three successive operations of differentiation, starting with the function f . One can continue this process, but we will typically not go beyond the second derivative. Example 16. Suppose that f ( x ) = x5 . Then, f 0 ( x ) = 5x4 , f 00 ( x ) = 20x3 and f 000 ( x ) = 60x2 . The observation that the second derivative indicates whether the first derivative of a function is increasing or decreasing leads to the following set of necessary and sufficient conditions for identifying maxima and minima: • If f 0 ( x ∗ ) = 0 and f 00 ( x ∗ ) < 0, then x ∗ is a local maximum of f ( x ) ; • If f 0 ( x ∗ ) = 0 and f 00 ( x ∗ ) > 0, then x ∗ is a local minimum of f ( x ) . The necessary condition only identifies a local maximum or minimum, but not a global maximum or minimum. However, the local maxima of a function that is concave on its entire domain are also global maxima. Similarly, the local minima of globally convex functions are also global minima. That is: • If f 0 ( x ∗ ) = 0 and f 00 ( x ) < 0 for all x in the domain of f , then x ∗ is a global maximum of f ( x ) ; • If f 0 ( x ∗ ) = 0 and f 00 ( x ) > 0 for all x in the domain of f , then x ∗ is a global minimum of f ( x ) . Function depicted in Figure 8a strictly concave on its entire domain and thus point x = 3 is a global maximum; function depicted in Figure 8b strictly convex on its entire domain and thus point x = 3 is a global minimum. The necessary and sufficient conditions for local extrema require f 00 ( x ∗ ) 6= 0. When f 00 ( x ∗ ) = 0, point x ∗ can be either minimum or maximum or neither of the two. In this case we need to use an N −th derivative test. • If f 0 ( x ∗ ) = 0, f 00 ( x ∗ ) = 0,..., f ( N −1) ( x ∗ ) = 0, f N ( x ∗ ) < 0, where N is even, then point x ∗ is a maximum; • If f 0 ( x ∗ ) = 0, f 00 ( x ∗ ) = 0,..., f ( N −1) ( x ∗ ) = 0, f N ( x ∗ ) > 0, where N is even, then point x ∗ is a minimum; • If f 0 ( x ∗ ) = 0, f 00 ( x ∗ ) = 0,..., f ( N −1) ( x ∗ ) = 0, f N ( x ∗ ) 6= 0, where N is odd, then point x ∗ is a point of inflection. 18 Solved Examples Example 17. Suppose the monopolist’s profit function is given by Π (q) = pq − c (q) = (100 − q) q − q2 . The monopolist aims to maximize profit and thus solves: max (100 − q) q − q2 . q From the necessary first order conditions it follows that Π0 (q) = 100 − 4q = 0. So q∗ = 25 is a candidate for a maximum. To check that this indeed is the maximum, we need to check the second order conditions for optimization. The second derivative Π00 (q) = −4 < 0 for all q and, in particular, for q = 25. Hence q = 25 is a global maximum. Another economic example. Example 18. Suppose that firm minimizes its average cost, which is defined for q > 0 and is given by: C (q) = 100/q + q. Then, first-order conditions imply: C 0 (q) = −100/q2 + 1 = 0 Therefore, q∗ = 10 (negative output is not allowed). Since, C 00 (q) = 200/q3 > 0 for all q > 0, q∗ = 10 is a global minimum. 3.2 Multivariate Case Consider the general maximization problem: max f ( x1 , x2 , ..., xn ) . x1 ,...xn 19 The first order conditions for maximization require the first order differential to be zero at the optimal point. That is, a vector of small changes (dx1 , dx2 , ..., dxn ) should not change the value of the function. We thus have df = This can be satisfied if ∂f ∂f dx1 + · · · + dxn = 0. ∂x1 ∂xn ∂f ∂f ∂f = 0, = 0, ..., = 0. ∂x1 ∂x2 ∂xn These conditions are necessary conditions and they must also hold for minimization problems. As in the single variable case, we are really after maxima and minima. The first order conditions alone cannot distinguish between local maxima and local minima. Likewise, the first order conditions cannot identify whether a candidate solution is a local or global maxima. We thus need second order conditions to help us. For a point to be a (local) maximum, we must have d2 f < 0 for any vector of (small) changes (dx1 , dx2 , ..., dxn ); that is, f needs to be a (locally) strictly concave function. Similarly, for a point to be a (local) minimum, we must have d2 f > 0 for any vector of (small) changes (dx1 , dx2 , ..., dxn ); that is, f needs to be a (locally) strictly convex function.2 Definition 5. Point ( x1∗ , x2∗ , ..., xn∗ ) is a local maximum if for all i ∂f ∗ ∗= 0 | ∂xi x1 = x1 ,...,xn = xn and function f ( x1 , x2 , ..., xn ) is concave at ( x1∗ , x2∗ , ..., xn∗ ). Point ( x1∗ , x2∗ , ..., xn∗ ) is a local minimum if for all i ∂f ∗ ∗= 0 | ∂xi x1 = x1 ,...,xn = xn and function f ( x1 , x2 , ..., xn ) is convex at ( x1∗ , x2∗ , ..., xn∗ ). Point ( x1∗ , x2∗ , ..., xn∗ ) is a global maximum if for all i ∂f ∗ ∗= 0 | ∂xi x1 = x1 ,...,xn = xn 2 In the definition below, notation ∂f ∗ ∗ | ∂xi x1 = x1 ,...,xn = xn should be understood as a partial derivative of f with respect to xi evaluated at a point x ∗ = x1∗ , x2∗ , ..., xn∗ . 20 and function f ( x1 , x2 , ..., xn ) is concave for all ( x1 , x2 , ..., xn ) (see Figure 11a). Point ( x1∗ , x2∗ , ..., xn∗ ) is a global minimum if for all i ∂f ∗ ∗= 0 | ∂xi x1 = x1 ,...,xn = xn and function f ( x1 , x2 , ..., xn ) is convex for all ( x1 , x2 , ..., xn ) (see Figure 11b). If at the point ( x1∗ , x2∗ , ..., xn∗ ) where for all i ∂f ∗ ∗ = 0, | ∂xi x1 = x1 ,...,xn = xn function f is neither convex, nor concave, then point ( x1∗ , x2∗ , ..., xn∗ ) is a saddle point (see Figure 12). Now we need tools for identifying whether a multivariate function is concave or convex. Higher-Order Derivatives of Multivariate Functions A single variable function f ( x ) is strictly concave is f 00 ( x ) < 0 and is strictly convex if f 00 ( x ) > 0. Notice that in the single-variable case, the second-order total differential is: d2 y = f 00 ( x ) (dx )2 . Hence, we can (equivalently) define a function of one variable to be strictly concave if d2 y < 0 and strictly convex if d2 y > 0. The advantage of writing it in this way is that we can extend this definition to functions of many variables. A multivariate function f ( x1 , x2 , ..., xn ) is strictly concave if d2 y < 0 and strictly convex if d2 y > 0. This imposes certain restrictions on its second-order partial derivatives. Second-order partial derivatives Given a function f ( x1 , x2 , ..., xn ), the second-order derivative ∂ f 2 /∂xi ∂x j is the partial derivative of ∂ f /∂xi with respect to x j . The above may suggest that the order in which the derivatives are taken matters and that the partial derivative of∂ f /∂xi with respect to x j is different from the partial derivative of ∂ f /∂x j with respect to xi . While this can happen, it turns out that if the function f ( x1 , x2 , ..., xn ) is well-behaved then the order of differentiation does not matter. This result is called Young’s Theorem. We will be dealing with well-behaved functions for which Young’s Theorem holds. 21 y 5 0 -5 0 -50 f Hx,yL -100 -5 0 x 5 −2x12 (a) f ( x1 , x2 ) = − 2x22 : the function is concave for all ( x1 , x2 ), hence point ( x1 , x2 ) = (0, 0) is a global maximum. x2 5 0 -5 100 f Hx1 ,x2 L50 0 -5 0 x1 5 (b) f ( x1 , x2 ) = 2x12 + 2x22 : the function is convex hence point ( x1 , x2 ) = (0, 0) is a global minimum. for all ( x1 , x2 ), Figure 11: An example of (a) a strictly concave and (b) a strictly convex function of two variables. 22 x2 -5 50 5 0 f Hx1 ,x2 L 0 -50 -5 0 5 x1 Figure 12: f ( x1 , x2 ) = 2x12 − 2x22 : the function is convex in the direction of x1 and concave in the direction of x2 , hence point ( x1 , x2 ) = (0, 0) is a saddle point. Example 19. Consider a Cobb-Douglas production function f (K, L) = K α L β , where K > 0 is capital input, L > 0 is labor input and 1 > α, β > 0 are some constants. For this function we can evaluate the second-order partial derivative, ∂2 f /∂L∂K , in two different ways. First, since ∂f = βK α L β−1 , ∂L taking the partial derivative of this with respect to K, we get, ∂2 f ∂ = ∂L∂K ∂K ∂f ∂L = αβK α−1 L β−1 . Alternatively, since ∂y = αK α−1 L β , ∂K ∂2 f ∂ ∂f = = αβK α−1 L β−1 . ∂K∂L ∂L ∂K This illustrates Young’s Theorem: no matter in which order we differentiate, we get the same answer. Note that if K > 0 and L > 0, ∂2 f > 0, ∂K∂L 23 which means that the marginal productivity of labor (capital) increases as we add more capital (labor). At the same time, if α < 0 and K > 0, L > 0, ∂2 f = α (α − 1) K α−1 L β < 0. ∂K2 This means that the marginal productivity of capital decreases as we add more capital. Concavity and convexity of a multivariate function We say a function to be concave if d2 y ≤ 0 for all x and to be convex if d2 y ≥ 0 for all x. If the function satisfies a stronger condition, d2 y < 0 for all x, then it is strictly concave. Analogously, if d2 y > 0 for all x, it is strictly convex. Consider a two variable function, y = f ( x1 , x2 ). Its differential is: dy = ∂y ∂y dx1 + dx2 , ∂x1 ∂x2 which again can be viewed as a function of x1 and x2 . Taking a differential, we obtain " # " # ∂2 y ∂2 y ∂2 y ∂2 y dx2 dx1 + d (dy) = dx1 + dx1 + 2 dx2 dx2 , ∂x1 ∂x2 ∂x2 ∂x1 ∂x12 ∂x2 which after collecting terms yields: d2 y = ∂2 y ∂2 y ∂2 y 2 dx dx + dx + 2 (dx2 )2 . ( ) 2 1 1 ∂x1 ∂x2 ∂x12 ∂x22 Thus, the second-order total differential depends on the second-order partial derivatives of f ( x1 , x2 ) . For a general function y = f ( x1 , x2 , ..., xn ) , one can use a similar procedure to get the formula for the second-order total differential. This is a little more complicated but it can be written compactly as follows: d2 y = n n ∂2 y ∑ ∑ ∂xi ∂x j dxi dx j . i =1 j =1 As things stand, it is not clear how to go about verifying that the second-order total differential of a function of n variables is never positive or never negative. However, notice that we can write the second order differential of a function of two variables, d2 y = ∂2 y ∂2 y ∂2 y 2 dx + 2 dx dx + ( ) (dx2 )2 , 2 1 1 2 2 ∂x ∂x ∂x1 ∂x2 1 2 24 in matrix form in the following way: 2 d y= dx1 dx2 ∂2 y ∂x12 ∂2 y ∂x2 ∂x1 The matrix of second-order partial derivatives 2 H≡ ∂2 y ∂x1 ∂x2 ∂2 y ∂x22 ∂2 y ∂x1 ∂x2 ∂2 y ∂x22 ∂ y ∂x12 ∂2 y ∂x2 ∂x1 dx1 dx2 . is called Hessian matrix, which is symmetric by Young’s Theorem. For a general function y = f ( x1 , x2 , ..., xn ) , ∂2 y ∂2 y ∂2 y . . . ∂x1 ∂x2 ∂x1 ∂xn ∂x2 12 dx 1 y ∂2 y ∂2 y dx2 ∂x∂ ∂x . . . ∂x2 ∂xn 2 2 ∂x 2 1 2 d y = dx1 dx2 . . . dxn ... . . . . .. .. .. .. dxn ∂2 y ∂2 y ∂2 y . . . 2 ∂xn ∂x ∂xn ∂x2 ∂x 1 and thus H≡ ∂2 y ∂x12 ∂2 y ∂x2 ∂x1 ∂2 y ∂x1 ∂x2 ∂2 y ∂x22 ∂2 y ∂xn ∂x1 ∂2 y ∂xn ∂x2 .. . .. . n ... ... .. . ∂2 y ∂x1 ∂xn ∂2 y ∂x2 ∂xn .. . ∂2 y ∂xn2 ... Now it is clear that to determine whether multivariate function is concave or convex, we need to know the sign of d2 y, i.e., we are interested in the sign of the quadratic form: d2 y = scalar dx0 H dx (1× n ) ( n × n ) ( n ×1) = dx1 dx2 . . . dxn ∂2 y ∂x12 ∂2 y ∂x2 ∂x1 ∂2 y ∂x1 ∂x2 ∂2 y ∂x22 ∂2 y ∂xn ∂x1 ∂2 y ∂xn ∂x2 .. . .. . ... ... ... ... ∂2 y ∂x1 ∂xn ∂2 y ∂x2 ∂xn .. . ∂2 y ∂xn2 dx1 dx2 . . .. dx n For a given symmetric matrix H and for any x ∈ Rn five situations may arise: 25 Definition 6. An (n × n) matrix H is: • positive definite if x0 Hx > 0 for any (n × 1) vector x ∈ R N , x 6= 0n (note that x 6= 0n means that at least one element of x is not equal 0n ). • positive semidefinite if x0 Hx ≥ 0 for any (n × 1) vector x ∈ R N , x 6= 0n • negative definite if x0 Hx < 0 for any (n × 1) vector x ∈ R N , x 6= 0n • negative semidefinite if x0 Hx ≤ 0 for any (n × 1) vector x ∈ R N , x 6= 0n • indefinite if x0 Hx > 0 for at least one vector x 6= 0n and x0 Hx < 0 for at least one vector x 6= 0n . From the discussion above, if the Hessian is negative definite for all ( x1 , ..., xn ), the function is strictly concave. If the Hessian is positive definite for all ( x1 , ..., xn ), the function is strictly convex. So to determine whether a function is concave or convex, we need to be able to determine whether the Hessian matrix is negative definite or positive definite. We can classify a symmetric matrix H in one of the above categories using either eigenvalue test or the principal minor test. Eigenvalue Test The quadratic form x0 Hx is: • positive (semi)definite if and only if all the eigenvalues of H are strictly positive (non-negative); • negative (semi)definite if and only if all the eigenvalues of H are strictly negative (non-positive). Example 20. Consider matrix 1 4 6 A = 4 2 1 . 6 1 6 The characteristic equation is 2 det (A − λI) = (1 − λ) λ − 8λ + 11 − 4 (18 − 4λ) + 6 (6λ − 16) = 0. This equations of order three with no obvious factorization seems difficult to solve! Principal Minor Test 26 Definition 7. Let H be an n × n matrix. An i-th order principal minor of H is the determinant of a submatrix of H obtained by deleting n − i rows and the n − i columns with the same index. The i-th (order) leading principal minor of H is the determinant of the submatrix obtained from H by deleting the last n − i rows and columns. Example 21. Let A be a 3 × 3 matrix a11 a12 a13 A = a21 a22 a23 . a31 a32 a33 Principal Minors There is one third order principal minor of A, det (A). There are three second order principal minors: a11 a12 • det , where the submatrix in the minor’s calculation is obtained by deleta21 a22 ing the third row and third column of A. a11 a13 • det , where the submatrix in the minor’s calculation is obtained by deleta31 a33 ing the second row and second column of A. a22 a23 • det , where the submatrix in the minor’s calculation is obtained by deleta32 a33 ing the first row and first column of A. There are also three first order principal minors: a11 formed by deleting the last two rows and columns; a22 formed by deleting the first and last rows and columns; and a33 formed by deleting the first two rows and columns. Leading Principal Minors The ith leading principal minor of the determinant of the submatrix obtained from A by deleting all columns and all rows after the i-th. Thus first l.p.m. = a11 a11 second l.p.m. = det a21 a11 a12 third l.p.m. = det a21 a22 a31 a32 27 a12 a22 a13 a23 . a33 Principal Minor Test: • The quadratic form x0 Hx is positive definite if and only if all leading principal minors H are positive. • The quadratic form x0 Hx is negative definite if and only if its leading principal minors of H alternate in sign, the first being negative (i.e. the first is negative, the second is positive, the third is negative and so on, that is, the i-th order leading principal minor has the sign of (−1)i . • The quadratic form x0 Hx is positive semidefinite for every principal minor is ≥ 0. • The quadratic form x0 Hx is negative semidefinite if every principal minor of H of odd order is ≤ 0 and every principal minor of even order is ≥ 0. Note that in the first two cases, it is enough to check the inequality for all the leading principal minors (i.e. for 1 ≤ i ≤ n). In the last two cases,we must check for all principal n minors (i.e for each i with 1 ≤ i ≤ n and for each of the principal minors of order i i). Example 22. Matrix is positive definite. Matrix is negative definite. Matrix 1 1 1 4 −1 1 1 −4 −1 1 1 4 is neither positive definite nor negative definite. Matrix 1 4 6 4 2 1 6 1 6 is indefinite. In the case of a function of two variables, y = f ( x1 , x2 ) : 28 • d2 y is positive definite (and thus function is convex) if | H | = ∂2 y ∂x12 ∂2 y ∂x2 ∂x1 ∂2 y > 0 and ∂x12 ∂2 y 2 2 ∂2 y ∂2 y ∂ y ∂x1 ∂x2 × 2− > 0; = 2 ∂2 y ∂x1 ∂x2 ∂x1 ∂x2 2 ∂x 2 • d2 y is negative definite (and thus function is concave) if | H | = ∂2 y ∂x12 ∂2 y ∂x2 ∂x1 ∂2 y < 0 and ∂x12 ∂2 y 2 2 2y 2y ∂ ∂ y ∂ ∂x1 ∂x2 = × 2− > 0. 2 ∂2 y ∂x1 ∂x2 ∂x1 ∂x2 2 ∂x 2 Note that the condition | H | > 0 implies that ∂2 y ∂x12 and for both positive definite and negative definite H. ∂2 y ∂x22 should have the same sign Conditions for stationary point of y = f ( x1 , x2 ) Maximum ∂y ∂x = 0, FOC: Minimum ∂y ∂x = 0, 1 ∂y ∂x2 ∂2 y ∂2 y ∂x12 ∂x22 − 1 ∂y ∂x2 =0 ∂2 y ∂2 y , ∂x12 ∂x22 SOC: < 0, 2 2 ∂ y ∂x1 ∂x2 Saddle Point ∂y ∂x = 0, ∂2 y ∂2 y , ∂x12 ∂x22 >0 ∂2 y ∂2 y ∂x12 ∂x22 − > 0, 2 2 ∂ y ∂x1 ∂x2 If ∂2 y ∂2 y − ∂x12 ∂x22 1 =0 ∂2 y ∂x1 ∂x2 >0 ∂2 y ∂2 y ∂x12 ∂x22 ∂y ∂x2 =0 ∂2 y ∂x1 ∂x2 − 2 <0 2 = 0, the test fails and we need to check the other principal minors to determine whether the stationary point is a maximum, a minimum or a saddle point. 29 Extended Example: Firm’s Profit Maximization Suppose a firm can sell it’s output at p per unit and that its production function is given by y = AK α L β . What combination of capital and labor should the firm use so as to maximize profits assuming that capital costs r per unit and labor w per unit? The firm’s profits are given by revenue minus costs: π̃ (K, L) = pAK α L β − rK − wL. Firm aims to maximize profits, i.e., it solves the following unconstrained optimization with multiple variables: max π̃ (K, L) = max pAK α L β − rK − wL. K,L K,L We can use the first order conditions to obtain potential candidates for optimization. The first order conditions (FOC) are: ∂π̃ = αpAK α−1 L β − r = 0 ∂K ∂π̃ = βpAK α L β−1 − w = 0. ∂L At the point where FOC are satisfied the objective function attains the maximum only if it is a concave function. In multivariate setting function is strictly concave if the matrix of its second order derivatives, called Hessian, is negative definite. In this problem Hessian is given by 2 ∂ π/∂K2 ∂2 π/∂K∂L H = ∂2 π/∂K∂L ∂2 π/∂L2 α(α − 1) ApK α−2 L β αβApK α−1 L β−1 = αβApK α−1 L β−1 β( β − 1) ApK α L β−2 To verify whether matrix is negative definite one can look at leading principal minors and check whether they alternate in sign with odd order principal minors being negative and even order principal minors being positive. In this problem this requirement reduces to the following set of inequalities: α(α − 1) ApK α−2 L β < 0 β( β − 1) ApK α L β−2 < 0 det( H ) > 0 30 Note that 2 det( H ) = ∂2 π/∂K2 ∂2 π/∂L2 − ∂2 π/∂K∂L 2 = αβ(α − 1)( β − 1) − α2 β2 ApK α−1 L β−1 Thus SOC are given by α(α − 1) ApK α−2 L β < 0 β( β − 1) ApK α L β−2 < 0 2 α −1 β −1 2 2 > 0. ApK L αβ(α − 1)( β − 1) − α β SOC inequalities are satisfied if α−1 < 0 β−1 < 0 αβ(α − 1)( β − 1) − α2 β2 > 0, where the last inequality is satisfied if α + β < 1 (follows after expanding the product, simplifying and remembering that α > 0 and β > 0). 4 Constrained Optimization Until now, we have considered unconstrained problems. Usually, economic agents face natural constraints. Example 23. Consumer’s Problem: Suppose that a consumer has a utility function U ( x1 , x2 ) = x11/2 x21/2 , the price of x1 is p1 , the price of x2 is p2 and the consumer has m in income. How much of the two goods should the consumer purchase to maximize her utility? In producer theory we are frequently interested in the following minimization problem: Example 24. Firm’s Problem Suppose that a firm’s production function is given by f (K, L) = K1/3 L2/3 , the price of capital is r and the price of labor is w. What is the least cost way for the firm to produce Q units of output? Both of the above problems have a common mathematical structure: max f ( x1 , x2 , ..., xn ) subject to g ( x1 , x2 , ..., xn ) = 0. x1 ,...xn 31 We say that f ( x1 , x2 , ..., xn ) is the objective function, g ( x1 , x2 , ..., xn ) = 0 is the constraint and x1 , x2 , ..., xn are the choice variables. We are interested in finding a solution to this problem x1∗ x∗ 2 x∗ = .. . . xn∗ The value function for this problem is derived by substituting x∗ into the objective function to obtain f ( x1∗ , x2∗ , ..., xn∗ ) . It is also possible that instead of maximizing f ( x1 , x2 , ..., xn ) we could be minimizing f ( x1 , x2 , ..., xn ). Example 25. (Example 23 continued) Utility maximization problem can be written as: max x11/2 x21/2 subject to p1 x1 + p2 x2 = m. x1 ,x2 The solution to the problem is a Marshallian demand as a function of prices and income, i.e., x1∗ = x1 ( p1 , p2 , m) and x2∗ = x2 ( p1 , p2 , m) , while the objective function evaluated at the optimum is an indirect utility function: v ( p1 , p2 , m) = ( x1∗ )1/2 ( x2∗ )1/2 . Similarly, Example 26. (Example 24 continued) Firm’s cost minimization problem can be stated as: min rK + wL K,L s.t Q = K1/3 L2/3 . The solution to the problem is a conditional input demand as a function of r, w and Q, i.e., K c = K (r, w, Q) and Lc = L(r, w, Q), while the objective function evaluated at the optimum is a cost function that gives the cost of producing the required level of output Q : c(r, w, Q) = rK c + wLc . 4.1 Direct Substitution When the constraint(s) are equalities, we can convert the problem from a constrained optimization to an unconstrained optimization problem by substituting for some of the variables. 32 Example 27. (Example 23 continued) In the consumer’s utility maximization problem p1 x1 + p2 x2 = m. Hence, p2 1 x1 = m − x2 . p1 p1 Substituting this into the objective function, max x2 p2 1 m − x2 p1 p1 1/2 x21/2 . This is a function of just x2 and we can now maximize this function with respect to x2 . By incorporating the constraint into the objective function, we transformed the constrained optimization problem into the unconstrained optimization problem, which we know how to solve. The first order conditions give: 1 −1/2 x 2 2 Solving for x2 : 1/2 1 p2 m − x2 p1 p1 1 p2 − 2 p1 1 p2 m − x2 p1 p1 −1/2 x21/2 = 0 1 p2 p2 m − x2 = x2 p1 p1 p1 1m 2 p2 1 p2 1m = m − x2 = . p1 p1 2 p1 =⇒ x2 = =⇒ x1 Firm’s problem can be solved similarly. 4.2 The Lagrangian Approach The substitution technique has serious limitations: • In some cases, we cannot use substitution easily: for instance, suppose the constraint is x4 + 5x3 y + y2 x + x6 + 5 = 0. Here, it is not possible to solve this equation to get x as a function of y or vice versa. • Moreover, in many cases, the economic constraints are written in the form g ( x1 , x2 , ..., xn ) ≥ 0 or g ( x1 , x2 , ..., xn ) ≤ 0. While the Lagrangian technique can be modified to take care of such cases, the substitution technique cannot be modified, or can be modified only with some difficulty. 33 Given a problem max f ( x1 , x2 , ..., xn ) subject to g ( x1 , x2 , ..., xn ) = 0 x1 ,...xn write down the Lagrangian function L ( x1 , x2 , ..., xn , λ) = f ( x1 , x2 , ..., xn ) + λg ( x1 , x2 , ..., xn ) . Note that the Lagrangian is a function of n + 1 variables: ( x1 , x2 , ..., xn , λ). We then look for the stationary points of the Lagrangian, that is, points where all the partial derivatives of the Lagrangian are zero. Using a Lagrangian, we get n + 1 first order conditions: ∂L = 0, (i = 1, ..., n) ∂xi ∂L = 0. ∂λ Solving these equations will give us candidate solutions for the constrained optimization problem. Candidate solutions still need to be checked using the second-order conditions. Example 28. (Example 23 continued) In the consumer’s utility maximization problem: L ( x1 , x2 , λ) = x11/2 x21/2 + λ (m − p1 x1 − p2 x2 ) . The first order conditions are given by: 1 −1/2 1/2 ∂L x = x2 − λp1 = 0 ∂x1 2 1 1 −1/2 1/2 ∂L = x x1 − λp2 = 0 ∂x2 2 2 ∂L = p1 x1 + p2 x2 − m = 0. ∂λ Interpretation of FOC: If we divide the first two conditions, we get that MRS12 = U1 p = 1. U2 p2 This says that at the optimum point, the slope of the indifference curve must be equal to the slope of the budget line. To solve the problem, note that from the first two conditions it follows that 1 −1/2 1/2 1 −1/2 1/2 x1 x2 = λ = x x1 2p1 2p2 2 34 or x2 = p1 x1 . p2 (2) Substituting this into the budget constraint yields: x1 = 1m . 2 p1 Substituting x1 back into (2) and solving for x2 yields: x2 = 1m . 2 p2 Firm’s problem can be solved similarly: Example 29. (Example 24 continued) The Lagrangian for the firm’s problem is: L = rK + wL − λ K1/3 L2/3 − Q First order conditions: ∂L λ = r − K −2/3 L2/3 = 0 ∂K 3 ∂L 2λ = w − K1/3 L−1/3 = 0 ∂L 3 ∂L = Q − K1/3 L2/3 = 0 ∂λ (3) (4) (5) Taking ratio of 3 and 4 one obtains: r L = w 2K (6) Substituting for K in 5: Q= w 1/3 L L2/3 2r From here expressing L it follows: c L = 2r w 1/3 Q. Substituting L∗ into the ratio of first order 6 conditions to express K in terms of parameters, one obtains: w 2/3 Kc = Q. 2r 35 Note that the technique has been identical for both maximization and minimization problems. This means that the first order conditions identified so far are only necessary conditions and not sufficient conditions. We shall look at sufficient, or second order conditions later. The Lagrangian approach amounts to searching for points where: • The constraint is satisfied. • The constraint and the level curve of the objective function are tangent to one another. If we have more than two variables, then the same intuition can be extended. For instance, with three variables, the Lagrangian conditions will say: • The rate of substitution between any two variables along the objective function must equal the rate of substitution along the constraint. • The optimum point must be on the constraint. Intuition for the Lagrangian Method Consider the simplest case of the maximization of a function of two variables subject to one constraint: max f ( x1 , x2 ) subject to g ( x1 , x2 ) = 0. x1 ,x2 Suppose that point ∗ x = x1∗ x2∗ is a constrained maximum. Therefore any small feasible change in x from this point, that is, a small movement along the constraint, should not be able to improve the value of the objective function. We represent small changes in x = ( x1 , x2 ) T by differential notation dx1 dx = . dx2 Then the first-order necessary conditions may be stated as follows: f x1 dx1 + f x2 dx2 = 0 (7) However, a feasible change in x does not change the value of the constraint. That is, the constraint g ( x1 , x2 ) = 0 implies that gx1 dx1 + gx2 dx2 = 0 36 (8) and so dx1 and dx2 are no longer both arbitrary. We can take, e.g., dx1 as arbitrary, but then dx2 must be chosen to satisfy (8). Taking the ratio of (7) and (8), it is clear that at the optimum f x1 fx = 2 ≡ λ. g x1 g x2 The Lagrange-multiplier method yields the same first-order necessary condition and the Lagrange multiplier λ makes sure that both (7) and (8) are simultaneously satisfied. Economic Interpretation of the Lagrangian Multiplier Note that we did not compute λ in either consumer’s problem or firm’s problem. This is because our interest is in the values of x1 and x2 (or K and L). However, in some instances, it is useful to compute λ: this has an economic interpretation in terms of the shadow price of the constraint. Suppose we have the problem max f ( x1 , x2 , ..., xn ) subject to g ( x1 , x2 , ..., xn ) = 0 x1 ,...xn Suppose we now relax this constraint: instead of requiring g ( x1 , x2 , ..., xn ) = 0, we require g( x, y) = δ where δ is a small positive number. Clearly, since the constraint has been changed, the value of the objective function must change. The question is: by how much? The answer to this question is given by λ. For this reason, λ is referred to as the shadow price of the constraint. It tells us the rate at which the objective function increases if the constraint is changed by a small amount. Example 30. (Example 23 continued) In the consumer’s utility maximization problem, we can compute 1 −1/2 1/2 x x2 2p1 1 1 1 m −1/2 1 m 1/2 = 2p1 2 p1 2 p2 1 = . 2 ( p1 p2 )1/2 λ = Thus, the shadow price of the constraint tells us that if we give a small amount of additional income to the consumer, then his utility will go up by a factor of λ= 1 2 ( p1 p2 )1/2 Thus λ represents a marginal utility of income. 37 . 4.3 Second Order Conditions As with the unconstrained case, we need to check the second-order conditions to ensure we have an optimum. As before, the second-order sufficient conditions for a maximum is d2 f < 0 and for a minimum is d2 f > 0. However, because of the constraint, it is no longer sufficient to look at the Hessian of f to verify these conditions. Suppose we have a two-variable constrained optimization problem max f ( x1 , x2 ) or min f ( x1 , x2 ) subject to g ( x1 , x2 ) = 0. x1 ,x2 x1 ,x2 The second order conditions for this problem differ slightly from the usual conditions because of the constraint g ( x1 , x2 ) = 0 which implies that dx1 and dx2 must be chosen to satisfy (8). Thus the second-order sufficient conditions for a maximum is that d2 f < 0 subject to (8) and the second-order sufficient conditions for a minimum is that d2 f > 0 subject to (8). In practice, to check the second-order sufficient conditions we need to compute the bordered Hessian matrix of the Lagrangian at the critical point that we want to check. The Lagrangian of the two-variable constrained optimization problem is L ( x1 , x2 , λ) = f ( x1 , x2 ) + λg ( x1 , x2 ) . The bordered Hessian is the ’usual Hessian’, bordered by the derivatives of the constraint with respect to the endogenous variables, here x1 and x2 . That is, 0 g1 g2 H B = g1 L11 L12 . g2 L21 L22 The second order conditions state: • If ( x1∗ , x2∗ , λ∗ ) corresponds to a constrained maximum, then H B evaluated at ( x1∗ , x2∗ , λ∗ ) must be positive. • If ( x1∗ , x2∗ , λ∗ ) corresponds to a constrained minimum, then H B evaluated at ( x1∗ , x2∗ , λ∗ ) must be negative. Example 31. (Example 23 continued) In the consumer’s utility maximization problem, 0 p1 p2 H B = p1 − 41 x1−3/2 x21/2 − 41 x1−1/2 x2−1/2 . p2 − 14 x1−1/2 x2−1/2 − 14 x11/2 x2−3/2 38 In a general n-variable problem with m (m < n) constraints, (x∗ , λ∗ ) that satisfies the first-order conditions is • a local maximum if the last (n − m) leading principle minors of H B alternate in sign beginning with that of (−1)m+1 ; • a local minimum if the last (n − m) leading principle minors of H B are of the same sign as (−1)m . In both cases, H B must be evaluated at (x∗ , λ∗ ) . There are also some global results for equality-constrained problems: • If f ( x1 , ..., xn ) is concave and all constraints are linear in ( x1 , ..., xn ) , then a solution to the constrained maximization problem is a global maximum. • If f ( x1 , ..., xn ) is convex and all constraints are linear in ( x1 , ..., xn ) , then a solution to the constrained minimization problem is a global minimum. 5 The Envelope Theorem We are interested in studying how the value function of an optimization problem changes when one of the parameters of the problem changes. A very powerful tool for such investigations if the envelope theorem. 5.1 The Envelope Theorem for Unconstrained Optimization Suppose we have the unconstrained optimization problem max f ( x1 , x2 ; α) x1 ,x2 where α is some exogenous parameter. Suppose that ( x1∗ (α) , x2∗ (α)) solves this optimization problem. Note that the solution will depend upon α. The value function for this problem is derived by substituting ( x1∗ (α) , x2∗ (α)) into the objective function: V (α) = f ( x1∗ (α) , x2∗ (α) ; α) . Notice that the value function is a function of the parameter α. Notice also that the value function depends on α in two different ways: 1. Direct dependence. 39 2. Indirect dependence through x1∗ (α) and x2∗ (α) . We are interested in knowing how the value function changes when α changes. When we differentiate the value function, we get: ∂ f ∂x1∗ ∂ f ∂x2∗ ∂ f dV = + + , dα ∂x1 ∂α ∂x2 ∂α ∂α where the partial derivatives of f are evaluated at the solution ( x1∗ (α) , x2∗ (α)). Now note that at the optimum (assuming we have an interior solution), it must be the case that ∂f | =0 ∗ ∂x1 x1 = x1 (α) x2 = x2∗ (α) and ∂f | = 0. ∗ ∂x2 x1 = x1 (α) x2 = x2∗ (α) Hence, the first two terms drop out and we have ∂f dV = , dα ∂α where the partial derivative is evaluated at the point ( x1∗ (α) , x2∗ (α)). This result which is called the Envelope Theorem says in words: The total derivative of the value function with respect to the parameter α is the same as the partial derivative of the objective function evaluated at the optimal point. Example 32. Consider the unconstrained problem: max 4x1 + αx2 − x12 − x22 + x1 x2 . x1 ,x2 The first order conditions: 4 − 2x1 + x2 = 0 α − 2x2 + x1 = 0. Solving: 8+α 3 2α + 4 = . 3 x1∗ = x2∗ 40 (We also need to check the second order conditions.) Substituting x1∗ and x2∗ into the objective function, we obtain the value function: 2α + 4 (8 + α)2 (2α + 4)2 (8 + α) (2α + 4) 8+α +α − − + . V (α) = 4 3 3 9 9 9 By the Envelope Theorem: dV 2α + 4 = x2∗ = . dα 3 5.2 The Envelope Theorem for Constrained Optimization Now consider the constrained case. We can basically do the same as before. Consider the problem, max f ( x1 , x2 ; α) subject to g ( x1 , x2 ; α) = 0. x1 ,x2 The Lagrangian for this problem is, L ( x1 , x2 , λ; α) = f ( x1 , x2 ; α) + λg ( x1 , x2 ; α) . Suppose that ( x1∗ (α) , x2∗ (α) , λ∗ (α)) solves the constrained optimization problem. The value function for this problem is defined as: V (α) = f ( x1∗ (α) , x2∗ (α) ; α) . Let us write the value function as: V (α) = f ( x1∗ (α) , x2∗ (α) ; α) + λ∗ (α) g ( x1∗ (α) , x2∗ (α) ; α) . Differentiating with respect to α : dV dα = ∂ f ∂x1∗ ∂ f ∂x2∗ ∂ f + + ∂x1 ∂α ∂x2 ∂α ∂α ∗ dλ + g ( x ∗ (α) , x2∗ (α) ; α) dα 1 ∂g ∂x1∗ ∂g ∂x2∗ ∂g ∗ +λ (α) + + , ∂x1 ∂α ∂x2 ∂α ∂α where again all partial derivatives are evaluated at the solution ( x1∗ (α) , x2∗ (α) , λ∗ (α)) . This can be written as ∗ ∗ dV ∂f ∂f ∗ ∂g ∂x1 ∗ ∂g ∂x2 = +λ + +λ dα ∂x1 ∂x1 ∂α ∂x2 ∂x2 ∂α ∗ dλ ∂f ∂g + g ( x1∗ (α) , x2∗ (α) ; α) + + λ∗ (α) . dα ∂α ∂α 41 Note that the first two terms on the right hand side drop out because ( x1∗ , x2∗ , λ∗ ) must satisfy the necessary conditions for constrained optimization. The third term drops out because g ( x1∗ (α) , x2∗ (α) ; α) = 0. We are left with the following: dV dα = ∂f ∂L ∗ ∗ ∗ ∂g + λ∗ = (x , x , λ ) . ∂α ∂α ∂α 1 2 In words: The derivative of the value function with respect to the parameter α is the partial derivative of the Lagrangian function with respect to α evaluated at the solution ( x1∗ , x2∗ , λ∗ ). 5.3 Extended Example: Firm’s Cost Minimization Problem Suppose that a firm’s production function is given by f (K, L) = K1/3 L2/3 , the price of capital is r and the price of labor is w. What is the least cost way for the firm to produce Q units of output? Firm’s cost minimization problem can be stated as: min rK + wL s.t K,L Q = K1/3 L2/3 . The solution to the problem is a conditional input demand as a function of r, w and Q, i.e., K c = K (r, w, Q) and Lc = L(r, w, Q), while the objective function evaluated at the optimum is a cost function that gives the cost of producing the required level of output Q : c(r, w, Q) = rK c + wLc . The Lagrangian for the firm’s problem is: L = rK + wL − λ K1/3 L2/3 − Q First order conditions: λ ∂L = r − K −2/3 L2/3 = 0 ∂K 3 ∂L 2λ = w − K1/3 L−1/3 = 0 ∂L 3 ∂L = Q − K1/3 L2/3 = 0 ∂λ (9) (10) (11) Taking ratio of 9 and 10 one obtains: r L = w 2K 42 (12) Substituting for K in 11: w 1/3 L Q= L2/3 2r From here expressing L it follows: c L = 2r w 1/3 Q. Substituting L∗ into the ratio of first order 12 conditions to express K in terms of parameters, one obtains: w 2/3 c K = Q. 2r The value function of firm’s cost minimization problem is called the cost function: c(r, w, Q) = rK c + wLc " 1/3 # w 2/3 2r Q. +w = r 2r w The value function in this case depends on two parameters:(r, w). However, the Envelope theorem is still applicable. For instance, if we want to know how the value function changes when w changes, we simply treat r as a constant. Thus, by the Envelope Theorem, differentiating c(r, w, Q) with respect to r and w yields conditional input demands: ∂c ∂r ∂c ∂w = Kc Lc . In producer theory this result is referred to as Shepard’s Lemma. You can confirm that the above is exactly what you will get if you differentiate the value function directly. 6 Integration The fundamental theorem of calculus is a theorem that links the concept of the derivative of a function with the concept of the integral. • The first part of the theorem, sometimes called the first fundamental theorem of calculus, is that an indefinite integration can be reversed by a differentiation. This part of the theorem is also important because it guarantees the existence of antiderivatives for continuous functions. 43 • The second part, sometimes called the second fundamental theorem of calculus, is that the definite integral of a function can be computed by using any one of its infinitely many anti-derivatives. This part of the theorem has key practical applications because it markedly simplifies the computation of definite integrals. Integration is useful in economics: • In microeconomics, consumer surplus, i.e., the difference between what a consumer is willing to pay and what he actually pays, is an integral. • In macroeconomics, stock variable (e.g., capital) is an integral of a flow variable (e.g., investment). • In finance, stock price or net present value is an integral of a dividend flow. • In probability and statistics, moments of random variables are integrals. There are two types of integrals: • Indefinite integrals can be seen as “anti-derivatives” that recover the original function from the first derivative. • Definite integrals calculate the area under a graph. In this form it is very similar to a sum, but of infinitely many, small parts. 6.1 Indefinite Integrals We want to find a function F ( x ) that differentiates to f ( x ). Example 33. Consider f ( x ) = 3x2 In differentiation the Power Rule implies that if F ( x ) = x n , then F 0 ( x ) = nx n−1 . So guess: F ( x ) = x3 , then F 0 ( x ) = 3x2 = f ( x ). Hence F ( x ) = x3 is the • anti-derivative • primitive • Integral of f ( x ) 44 The first fundamental theorem of calculus: Let f be a continuous real-valued function defined on a closed interval [ a, b]. Let F be the function defined, for all x in [ a, b], by F (x) = Z x a f ( x̃ ) dx̃ Then, F is continuous on [ a, b], differentiable on the open interval ( a, b), and F 0 ( x ) = f ( x ), for all x in ( a, b). In F (x) = Z f ( x ) dx, f ( x ) is known as the integrand. d Is F ( x ) = x3 the only anti-derivative of f ( x ) = 3x2 ? No as dx ( F ( x ) + c) = f ( x ) for any constant c. This arbitrary constant is called the constant of integration. 6.2 Rules of Integration • Integration is linear: Z ( f ( x ) + g ( x )) dx = Z f ( x ) dx + Z g ( x ) dx. • Power function rule: For n 6= −1 n f ( x ) = ax ⇒ F ( x ) = Z a x n+1 + c. n+1 R = f ( x )dx f ( x ) dx = Example 34. f ( x ) =R 3x2 ⇒ F(x) 0 f ( x ) = 5= 5x ⇒ F ( x ) = f ( x )dx = 5x + c • There is no general product rule, but Z • Exponential Rule: Recall that f ( x ) = ae k f ( x ) dx = k d kx dx e kx Z f ( x ) dx. = kekx . Then, ⇒ F (x) = 45 Z a f ( x ) dx = ekx + c. k = x3 + c Example 35. f ( x ) = 6e2x ⇒ F ( x ) = • Log Rule: Recall that d dx f ( x )dx = 3e2x + c R ln ( x ) = 1x . Then, 1 ⇒ F ( x ) = f ( x ) dx = ln ( x ) + c. f (x) = x R 5 Example 36. f ( x ) = x+ ⇒ F x = f ( x ) dx = 5 ln( x + 2) + c ( ) 2 Z • The Substitution Rule: This technique operates through a “change of a variable” which converts an intractable integral into a form where it can be solved. Z du f (u) dx = dx Z f (u) du = F (u) + c. This is the “inverse” of the chain rule of differentiation. R Example 37. Find 3x2 ( x3 + 1)dx. Let u = x3 , then Z 2 du (u + 1)dx Z dx = (u + 1) du (by substitution rule) 3 3x ( x + 1)dx = Z u2 +u+c 2 x6 = + x3 + c 2 = • Integration by Parts: Z vdu = uv − Z udv This is a direct consequence of the product rule of differentiation. Recall that (uv)0 = u0 v + uv0 . Integrating both sides of the above expression gives Z 0 (uv) dx = Since by definition of an integral Z R Z 0 u vdx + Z uv0 dx. (uv)0 dx = uv, 0 u vdx = uv − Z uv0 dx. The first term on the RHS is the product of the integral of u and v and the second term is the integral of a product function which consists of the integral of u and the derivative of v. 46 Figure 13: A definite integral of f ( x ) over the interval [ a, b] as an area under the curve. R Example 38. Find ln ( x ) dx. Let v = ln ( x ) , u = x ⇒ dv = 1x dx, du = dx, then Z ln ( x ) dx = Z vdu = uv − Z udv by integration by parts = x ln ( x ) − Z 1dx = x ln ( x ) − x + c. 6.3 Definite Integral Let f ( x ) be a continuous function on the interval [ a, b], where a and b are real numbers with a < b. A definite integral of f ( x ) over the interval [ a, b] gives the area underneath the graph of the function between a and b, where the parts below the x-axis are subtracted. What is the area bounded by the curve y = f ( x ), the vertical lines x = a and x = b and the x-axis? A first approximation of this area can be obtained by cutting the x-axis between a and b into intervals of equal length and thus creating rectangles of equal width, where the top right-hand corner touches the curve y = f ( x ) (see yellow rectangles in Figure 14). Thus, if we split the interval [ a, b] into 5 subintevals {[ x0 , x1 ] , [ x1 , x2 ] , ..., [ x4 , x5 ]}, where x0 = a and x5 = b, the sum of the rectangle areas is 5 ( x1 − x0 ) f ( x1 ) + ( x2 − x1 ) f ( x2 ) + ... + ( x5 − x4 ) f ( x5 ) = ∑ ( x i − x i −1 ) f ( x i ) . i =1 47 Figure 14: A definite integral of f ( x ) over the interval [ a, b] as a sum. However, this method of estimating the area leads to some errors thereby we either overestimate (like in Figure 14) or underestimate the area (in Figure 14 green rectangles underestimate the area as the height of the rectangle is the value of the function at the left-hand boundary of the subinterval). We can reduce these errors by creating many sub-intervals (see green rectangles in Figure 14). This suggests that a definite integral of f ( x ) over the interval [ a, b] can be viewed as the limit of the sum of the areas of the rectangles as the size of each rectangle gets infinitesimally small and the number infinitely large. The intuition that the integral is the area under the graph is sufficient for (almost) all economics and finance. For example, consumer surplus is an area under the demand curve in price/quantity space. In macroeconomics and finance examples from Section 6 flow variables are graphed against time (i.e., with time on x-axis). The second fundamental theorem of calculus: Let f and F be real-valued functions defined on a closed interval [ a, b] such that the derivative of F is f , i.e., f and F are such that for all x in [ a, b], F 0 ( x ) = f ( x ). Then, Z b a f ( x ) dx = [ F ( x )]ba = F (b) − F ( a) . As discussed earlier, if F ( x ) is an anti-derivative of f , then G ( x ) := F ( x ) + c is also an anti-derivative of f for any constant c. However, the value of the definite integral does not 48 depend on the choice of the anti-derivative, as G (b) − G ( a) = F (b) + c − F ( a) − c = F (b) − F ( a) . So in practical terms, we can then just ignore the constant term when evaluating definite integrals. Process of calculating a definite integral: 1. Determine indefinite integral 2. Evaluate at boundaries 3. Subtract F ( a) from F (b) R1 Example 39. Find 0 xdx. 1. F ( x ) = R xdx = x2 2 02 2 2 = 0, F (1) = 12 = 12 h 2 i1 R1 3. ⇒ 0 xdx = x2 = 12 − 0 = 12 2. F (0) = 0 Properties of definite integral 1. Ra 2. Ra 3. Rc 4. Rb b a a a f ( x )dx = F ( a) − F (b) = −( F (b) − F ( a)) = − Rb a f ( x )dx = F ( a) − F ( a) = 0 f ( x )dx = Rb a k f ( x )dx = k f ( x )dx + Rb a Rc b f ( x )dx ( a < b < c) f ( x )dx Rb Rb [ f ( x ) + g( x )] dx = a f ( x )dx + a g( x )dx R4 Example 40. Calculate 9 2√1 x dx: 5. Rb a Z 4 9 √ 4 1 √ dx = x 9 2 x √ √ = 4− 9 = 2 − 3 = −1. 49 f ( x )dx Calculate R9 1 √ 4 2 x dx: Z 9 4 √ 9 1 √ dx = x 4 2 x √ √ = 9− 4 = 3−2 = 1 Another example: Example 41. Calculate Re 1 ln ( x ) dx: Z e 1 ln ( x ) dx = [ x ln ( x ) − x ]1e = e ln (e) − e − (1 ln (1) − 1) = e−e−0+1 = 1 Sometimes we need to take integrals when the interval is not bounded. For example: • Evaluating the present value of an infinite stream of benefits of a financial asset. • Evaluating the consumer surplus of a constant elasticity demand function q = ape , as this demand curve it does not hit the y-axis. In this case, Z ∞ a 6.4 f ( x ) dx = lim F (y) − F ( a) . y→∞ An Application: Continuous Compounding In finance, the present value of an asset can be approximated as a definite integral. Consider a continuous stream of income c for T years. Since a pound today is not the same as having it a year from now, we discount future income. If the discount rate is r , then the income c received t years into the future is worth c (1 − r ) t in today’s terms. Thus, the present value of an asset paying c every year into the future is PV = c (1 − r ) 0 + c (1 − r ) 50 1 + ... + c (1 − r ) T . When time becomes ’continuous’ it can be shown that the present value of an asset paying an amount c at time t into the future is ce−rt . In this case, the present value of the asset is PV = Z T 0 ce−rt dt Z T e−rt dt 0 1 −rt T =c − e r 0 c −rT . 1−e = r =c Note that for an infinitely lived asset Z ∞ ce−rt dt 0 c = lim 1 − e−rT T →∞ r c = . r PV = This follows because e−rT goes to zero as T becomes very large. 51
© Copyright 2025 Paperzz