Managerial Economics Lecture 2

Introductory maths for MA Economics/IBE
1. Functional relationships
• We frequently need to represent economic quantities as a
mathematical function of one or more variables, e.g.:
• A firm’s costs as a function of output
• Market demand for a good in terms of price and other factors
• A consumer’s utility in terms of quantities of different goods
purchased.
1.1 A function is a rule that takes one or more numbers as inputs
(arguments) and gives another number as output.
e.g. f(x) = x2
Here, “f” is the name given to the function, x represents the input, x2 tells
us how to calculate the answer in terms of the input.
e.g. f(2) =22 = 2 x 2 = 4, f(5) = 52 = 5 x 5 =25 etc.
A function can be seen as a ‘black box’ for converting numbers into other
numbers:
2
4
2
5
X
25
The variable or number in brackets is referred to as the argument of the
function.
If we want to write another variable as a function of x, we may write y =
f(x), or just (in this case) y=x2.
E.g. we could express a demand function by
Q = 20 – 5P
Where Q is quantity demanded and P is price. Thus quantity is expressed
as a function of price.
1.2 Functions of more than one variable
We may have functions of two or more variables. For example
F(x,y) = xy + 2x
This function requires two numbers input to produce an answer. For
example,
F(4,5) = 4*5 +2*4 = 28
Here x is 4 and y is 5.
Here, the function has two arguments.
For example in economics, a firm’s output may depend on their input of
Labour (L) and Capital (K), for example a Cobb-Douglas type production
function:
Q = 100L0.5K0.5
Where Q is units of output, L is number of workers and K is, perhaps,
number of machines.
For example, if L=9 and K=16, then
Q = 100*(90.5)*(160.5) = 100*3*4 = 1,200 units.
1.3 Graphs of functions
Functions of 1 variable can easily be represented on a graph. E.g. if we
have
F(x) = x2
This has a graph something like:
F(X)
X
The simplest sorts of functions are linear. These are of the form
Y = a + bX
Where a and b are constants, e.g. Y = 5 – 2X.. The graphs of these
functions are straight lines. For example if we have the demand function
Q = 20 –5P
First, because we always put price on the vertical axis, we need to get P
on to the left hand side of the equation. Thus
Q + 5P = 20
5P = 20 – Q
P = (20/5) –(Q/5)
P = 4 –Q/5
This has a graph:
P
4
Q
20
With functions of two variables, the graph will in fact be a threedimensional surface – for example the function F(X,Y)=X2+Y2 will be a
dome-shape, which it may be possible to sketch. Another way of
graphically representing functions of two variables is by a contour map
that draws curves showing combinations of values of X and Y that give
the same value of F(X,Y). For example, an indifference curve map is
simply the contour map of a Utility function. To give another example, let
F(X,Y)=X2+Y2. Then the contour map for F(X,Y) consists of a series of
concentric circles around the origin, with further out circles representing
higher values of F(X,Y), as follows:
Y
F(X,Y)=9
F(X,Y)=1
X
F(X,Y)=4
Of course, it is not really possible to draw graphs of functions of more
than two variables.
2. Finite and infinite sums
Finite sums: notation
n
∑a
i
i =1
means a1 + a2 + … + an
Very often, ai will be defined by some formula or function of i.
E.g.
10
∑i
2
means 12 + 22 +….+ 102. (In this case, ai = i2).
i =1
Here i is called the index variable, and ai the summand, that is the thing
being summed.
∑a
x∈A
x
Where A is a set means we add up ax over all values of x belonging
to the set A. We may also sum over all values of x satisfying a certain
condition. E.g. ∑ x 2 means 12+32+…+92. Or, suppose we have a set A
x =1...10 , xodd
of people, and we wish to add up their (say) incomes, we could write
∑ Y ( x) Where Y(x) is the income of individual x.
x∈A
Infinite sums: notation
∞
∑ F (n) means the infinite sum F(1)+F(2)+F(3)+….
n =1
This may or may not sum to a finite total.
Some tricks for dealing with summations
A particularly common type of sum (finite or infinite) is a geometric
progression, where the ratio between each term and the next is a constant.
That is, a series of the form
∞
n
∑ ar k or
∑ ar
k
k =0
k =0
(The first a finite, the second an infinite sum). Let’s deal with the finite
sum first.
Let S =
n
∑ ar
k
=
a + ar + ar2 + … + arn
k =0
Then rS =
ar + ar2 + … + arn + arn+1
So S – rS = a - arn+1
So S(1-r) = a(1 – rn+1)
a(1 − r n +1 )
Hence S =
(unless r=1, in which case of course the sum is
1− r
simply a(n+1).
We can see that, if |r|<1, that is if -1<r<1, then the bracketed term at the
top will get closer and closer to 1 as n gets larger. On the other hand, if
|r|>1, then the bracketed term will get larger and larger in magnitude as n
increases. This suggests when and how we can calculate the infinite sum.
Let S = a + ar + ar2 + ar3 + …. (infinite sum)
So rS= ar + ar2 + ar3 + ….
Hence S(1-r) = a
a
.
1− r
Whence S =
However, this sum will only be valid in the case |r|<1.
Formal definition
∞
∑a
Let
k =1
be an infinite sum. We define the n’th partial sum as:
k
Sn = a1 + … + an – the sum of the first n terms of the infinite series.
We say that the infinite sum converges if the sequence of partial sums, S1,
S2, S3,… tends to some limit S. This limit S is then defined to be the sum
of the infinite series. Otherwise, we say the infinite sum diverges, and has
no value.
In other words, if when we add on successive terms of the series, we get
closer and closer to some limiting value (I will not define precisely what
is meant by this), then this limiting value is taken to be the sum of the
series.
Example
Let an = 0.5n. Consider the infinite series
∞
∑a
n =0
n
That is, 1 + (1/2) + (1/4) + (1/8) + (1/16) +…
In our formula for geometric progressions, we have a = 1, and r = 0.5.
Since |r|<1, we can say that the infinite sum is equal to 1/(1-0.5) = 2.
Now consider the partial sums. We have S0 = 1, S1 = 1 + ½ = 1.5, S2 = 1
+ ½ + ¼ = 1.75, S3 = 1 + ½ + ¼ + 1/8 = 1.875, etc.
It is intuitively easy to see (and can be proven), that this series gets closer
and closer to 2 as n increases (though never quite reaching 2). Hence, we
say that
∞
∑a
n =0
n
= 2 as an infinite sum.
On the other hand, if we had r =2, so that our infinite sum was
1 + 2 + 4 + 8 + 16 + …, so that the partial sums went 1, 3, 7, 15, 31, etc,
then these partial sums are clearly not converging to any finite total, but
are just increasing towards infinity. Hence this infinite series diverges,
and we cannot give a value to the infinite sum.
3. Differentials, slopes, and rates of change
Economics is frequently concerned with marginal effects – Marginal cost,
marginal utility, marginal revenue, etc.
When relationships between variables are expressed in functional terms,
the marginal effect is the rate of change of the function with respect to the
argument. So if costs C=C(Q) where Q is output, then marginal costs is
the rate of change of the function as Q changes.
This is the same as the slope of a function on a graph.
E.g. If we have a function
F(x) = 3 + 2X, with the graph
F(X)
2
1
3
X
Each increase of 1 in X leads to an increase in 2 of F(X). The slope of the
line is 2. We could say that the marginal increase in F for a change in X is
2.
Straight lines are easy, as the slope is always the same – the slope of the
above graph is 2 for all values of X.
When we have a non-linear function – for example F(X) = X2 - the slope
of the graph, and therefore the rate of change of F(X), varies depending
on the value of X.
.
We can measure the rate of change, or the marginal effect, in two
different ways: first, most easily, by looking at the change in X and the
change in F(X) between two points:
.
For example, between X=2 and X=3, F(X) goes from 4 to 9, so the rate of
change is (9-4)/(3-2) = 5.
But it is more precise to measure the rate of change at a particular point.
We do this by looking at the slope of the tangent line to the curve at the
point we’re interested in.
.
We can also see that, if we are taking the slope between two points, the
nearer these pints are together, the closer the slope is to the slope of the
tangent line. .
The rate of change at a particular point X – that is, the slope of the
tangent line – is also known as the differential of the function F(x) at X.
We write the differential at a point X as F’(X). (For example, at X=1, we
write the slope F’(1).)
In general, the differential changes as x changes, and so is itself a
function of x, written F’(x).
If we have Y=F(X), then we write the differential as
dY
.
dX
Formally, the differential of F(X) at the point X=a is given by:
F’(a) =
Lim
X →a
F ( X ) − F (a)
Where Lim here means “Limit”.
X −a
3.2 Rules for differentiation
There are some fairly simple rules for differentiating all the basic
functions you are likely to meet in this course.
1) Constant functions: F(X) = a, where a is a constant, e.g. F(X) = 3.
These are flat, they have slope 0, so F(X) = 0.
2) Linear functions: If F(X) = a +bX, then F’(X) = b. (Linear functions
have a constant slope).
3) If F(X) = Xn , where n is any number (positive or negative, not
necessarily an integer (whole number)), then F’(X) = nXn-1.
4) If F(X) = Ln(X) where Ln is the natural logarithm, then F’(X) = 1/X.
5) If F(X) = eX (the exponential function), then F’(X) = eX.
6) If F(X) = sin(X), then F’(X)=cos(X). If F(X)=cos(X), then F’(x)=sin(X).
For example (case 3), if Y = X2 , then dY/dX = 2X – in other words, the
slope increases as X increases, as we can see from the graph.
3.3 Rules for combining functions
1) Addition of functions: If F(X) = G(X) + H(X), then F’(X) = G’(X) +
H’(X)
2) Multiplication by a constant: If a is a constant, then the differential of
aF(X) is aF’(X). (E.g. the differential of 2X2 is 2*2X = 4X).
3) Multiplication of functions: If F(X) = G(X)H(X) then F’(X) =
G(X)H’(X) + G’(X)H(X).
4) Division of functions: If F(X) = G(X)/H(X), then F’(X) =
H ( X )G ' ( X ) − G ( X ) H ' ( X )
( H ( X )) 2
For example, if Y = (X+3)(3-2X), we let G(X) = X+3, and H(X) = 3-2X.
Then, G’(X) = 1, and H’(X)= -2. Thus, dY/dX = (X+3)*(-2) + 1*(3-2X)
= -2X – 6 + 3 –2X = -4X-3.
5) Function of a function: If F(X) = G(H(X)), then F’(X) =
G’(H(X))H’(X)
For example, if F(X) =
e
(x2 )
X
, we let G(.) = e(.) , and H(X)=X2.
(.)
Now G’(X) = e , so G’(.) = e , so G’(H(X)) = e
H(X)
=
e
(x2 )
.
Also H’(X) = 2X, so
(x
F’(X)=G’(H(X))H’(X) = e
2
)
* 2X.
4. Optimisation in one variable
We are frequently interested in maximising or minimising a quantity, e.g.
maximising profits or utility, or minimising costs. This can be done using
differentiation.
A function is at its maximum or minimum value when it stops rising and
starts falling, or vice versa. .
When a function moves from rising to falling (or v.v.), there will be a
momentary stationary point where it is not changing.
That is, at a local maximum or local minimum of a function, the
differential, F’(X), will be equal to 0 (i.e. the tangent line is flat):
We say a local maximum or minimum, because it may not be the global
highest or lowest point.
Stationary points can also be points of inflexion, where the function
flattens out then continues in the same direction. .
Example
Suppose a firm faces a demand curve given by:
Q = 20 – 3P
Where Q is quantity and P is price. How can the firm maximise revenue?
Well, revenue is price*quantity, PQ, which is equal to
P(20-3P) = 20P – 3P2. .
So, let F(P) = 20P – 3P2
Then F’(P) = 20 – 6P.
A stationary point will come when F’(P) = 0, i.e. when
20 – 6P =0
Therefore, 20 = 6P, so
P = 20/6 = 3.333
At this value, Q = 20 –3P = 10, so revenue = 10*3.333 = 33 and a third.
4.2 Classifying stationary points
How can we be sure (apart from the graph) that this is a maximum and
not a minimum or a point of inflexion? We do this by looking at the
second differential – that is, the differential of the differential – which we
write F’’(X) (or
d 2Y
.)
dX 2
E.g. if F(X) = X3 , then F’(X) = 3X2 , so F’’(X) = 3*2X = 6X.
This is the rate of change of the rate of change.
Now at a maximum, the rate of change starts positive, goes to zero, then
goes negative – so the rate of change is going down, so the rate of change
of the rate of change is negative. In other words
If F’’(X)<0 at a stationary point, then the point is a local maximum.
The opposite holds at a minimum, so
If F’’(X) >0 at a stationary point, the point is a local minimum.
Now in the case of our company, where the revenue function was F(P) =
20P – 3P2, and F’(P) = 20 – 6P, with a stationary point at P=3.333.
Now F’’(P) = -6. This is negative at the stationary point (indeed at all
values of P), and so the point is a local maximum.
If F’’(X) = 0 at a stationary point, the point could be a maximum,
minimum or point of inflexion.
Specifically: look at successive differentials (F’’’(X), F(4)(X), etc.)
• If the first non-zero differential at the stationary point is of odd
order (e.g. 3rd, 5th differential), then the stationary point is a point
of inflexion.
• If the first non-zero differential at the stationary point is of even
order and negative, then the stationary point is a local maximum.
• If the first non-zero differential at the stationary point is of even
order and positive, then the stationary point is a local minimum.
Note that conditions for a minimum (whether in functions of one or more
variables) will always be a mirror image of the conditions for a
maximum. This can easily be seen, since:
Minimising the function F(X) is the same as Maximising the function
–F(X).
The same holds true for functions of more than one variable.
4.3 Distinguishing a global maximum or minimum
In general, the global maximum or minimum can occur at any of the local
maxima and minima, or at a corner solution – the lowest or highest
possible value. (For example, a company’s profits may be highest when
output is zero.) . It may be necessary to look at all maxima/minima and
all possible corner solutions to find the best.
However, there are certain cases where we can be sure a local
maximum/minimum is the global maximum/minimum:
If F’’(X) < 0 for the full range of values a function can take, then any
local maximum is the global maximum. (We say such a function is
concave).
If F’’(X) > 0 for the full range of values a function can take, then any
local minimum is the global minimum. (We say such a function is
convex).
In the case we considered, we found F’’(P) = -6, which is <0 for all
possible values (0 to infinity), so the local maximum we found must be a
global maximum.
4.4 Non-negativity constraints
In actual economic problems, we will frequently require that our
variables should not take negative values. For example, it would not be of
much use to a company to work out that its optimum number of workers
is negative.
Suppose therefore that we are maximising the function F(X) subject to
the condition X≥0.
The (global) maximum value of F(X) either where δF/δX=0 and
δ2F/δX2<0, or where X=0 and δF/δX≤0. Similarily, the minimum value
must occur either where δF/δX=0 and δ2F/δX2>0, or where X=0 and
δF/δX≥0. We can see the reasons for this on the graph below:
F(X)
Local maximum:
X=0, δF/δX<0
Local maximum:
δF/δX=0
X
A maximum at X=0 is known as a boundary solution, one where X>0 is
an interior solution.
The concavity/convexity condition that guarantees that a local
maximum/minimum will be a global maximum/minimum remains, for
either type of local optimum.
Example: Marginal costs and marginal revenue
We know that a company maximises profits when marginal costs =
marginal revenue. (MC=MR). This can be analysed in terms of calculus.
Suppose a company has a Revenue function R(Q), where Q is the output,
and a cost function C(Q). Then the profit function, Π(Q), can be written
Π(Q) = R(Q) – C(Q).
Differentiating, Π’(Q) = R’(Q) – C’(Q). This will have a stationary point
where Π’(Q) = 0, so R’(Q) – C’(Q) = 0, and hence:
R’(Q) = C’(Q).
But R’(Q) is the rate of change of revenue as output increases, in other
words, the marginal revenue. C’(Q) is the rate of change of costs, in other
words, the marginal cost. Hence, the equation we have tells us that
MC=MR.
5. Optimising functions of more than one variable
5.1 Partial differentiation
When we have a function of more than one variable, we can use partial
differentiation to find the rate of change of the function with respect to
any of the variables.
Let F(X,Y) be a function of two variables. Then the partial differential of
F with respect to X, written
∂F
, is obtained simply by differentiating
∂X
F(X,y) with respect to X, holding Y constant, i.e. treating the function as
if it were a function only of X, with Y a constant parameter. Similarily,
the partial differential of F with respect to Y,
∂F
, is obtained by
∂Y
differentiating F with respect to Y, treating X as a constant.
Formally, the partial differentials at the point (a,b) are given by:
∂F
( a, b) =
∂X
Lim
X →a
F ( X , b) − F ( a, b)
and
X −a
∂F
( a, b) =
∂Y
Lim
Y →b
F ( a , Y ) − F ( a, b )
Y −b
Example
Consider a Cobb-Douglas production function, given by
Q = aKαLβ , where Q is output, K is capital and L is labour, and α and β
are constants.
Then
∂Q
= aαK α −1Lβ , and
∂K
∂Q
= aβK α Lβ −1 .
∂L
We may of course have functions of any number of variables, for
example F(X,Y,Z), a function of 3 variables. We may take partial
differentials with respect to any of the variables.
5.2 Stationary points of functions of two variables.
Simple optimisation in 2 variables is quite similar to one variable:
A stationary point occurs when all partial differentials are equal to
zero. This can be a local maximum, a local minimum, or a saddle
point.
In other words, where the function is momentarily flat with respect to
changes in any variable.
To find out the nature of the stationary points, we need to look at the
second partial derivatives at the stationary point.
Second partial differentials
If F(X,Y) is a function of two variables, we may define the second partial
derivatives as follows:
The second partial derivative of F wrt X,
differentiate
∂F
with respect to X.
∂X
The second partial derivative of F wrt Y,
differentiate
∂F
with respect to Y.
∂Y
∂2F
∂ ∂F
( ) , that is, we
=
2
∂X
∂X ∂X
∂2F
∂ ∂F
( ) , that is, we
=
2
∂Y
∂Y ∂Y
The cross-partial derivative of F with respect to X and Y,
∂2F
∂2F
=
=
∂X∂Y ∂Y∂X
∂ ∂F
∂ ∂F
∂F
( )=
( ) . That is, we can either differentiate
with respect to
∂X
∂X ∂Y
∂Y ∂X
∂F
with respect to X – the result is always the same.
Y, or differentiate
∂Y
Example
Continuing with the CD production function,
Q = aKαLβ, we had
∂Q
∂Q
= aαK α −1Lβ and
= aβK α Lβ −1
∂K
∂L
Then,
2
∂ 2Q
α −2 β ∂ Q
=
a
α
(
α
−
1
)
K
L
,
= aβ ( β − 1) K α Lβ − 2 and
2
2
∂L
∂K
∂ 2Q
= aαβK α −1Lβ −1 . Note that the last result is the same whichever
∂K∂L
order we perform the two differentiations in.
5.3 Classifying stationary points of functions of two variables
The nature of a stationary point of a function of two variables depends,
unfortunately, on all the second partial derivatives.
Suppose F(X,Y) has a stationary point at (a,b).
Let A =
∂2F
∂2F
∂2F
(a,b), B =
(a,b) and C =
(a,b).
2
2
∂
X
∂
Y
∂X
∂Y
Then (a,b) is a local maximum if A<0 and AB-C2>0, a local minimum if
A>0 and AB-C2>0, and a saddle point if AB-C2<0. (Indeterminate if ABC2=0).
A saddle point will appear to be a local maximum from some directions,
and a local minimum from others – like a saddle.
Example
Let F(X,Y) = X2-2Y2+6XY-4X+3Y
∂F
= 2X+6Y-4
∂X
∂F
= -4Y+6X+3
And
∂Y
Then
Setting these both to zero to find the stationary points gives
X+3Y=2
6X-4Y=-3
Whence Y=15/22 and X=-1/22
∂2F
∂2F
∂2F
∂2F ∂2F
∂2F 2
Now
=2, 2 =-4 and
=6, so
*
≤(
) (for all
∂X∂Y
∂X 2 ∂Y 2
∂X∂Y
∂X 2
∂Y
values of X and Y), which means that the stationary point is a saddle
point.
5.4 Convex and concave functions
As in the single variable case, the problem of finding a global maximum
or minimum can be more difficult than finding a local optimum. Global
optima can occur either at one of the local optima, or at a corner solution.
However, the picture is again clearer for convex and concave functions.
A function F(X,Y) is said to be convex over a range of values A of X and
∂2F
∂2F
∂2F
(
,
)
(
,
)
( x, y ) ≥ 0
x
y
x
y
−
∂X 2
∂Y 2
∂X∂Y
∂2F
∂2F
∂2F
∂2F
and
( x, y ) < 0 , and concave if
( x, y ) 2 ( x , y ) −
( x, y ) ≥ 0 and
∂X 2
∂Y
∂X∂Y
∂X 2
∂2F
( x, y ) > 0 for all (x,y) in A.
∂X 2
Y if at all points (x,y) in A, we have
These definitions lead to the following results:
If a function F(X,Y) is convex over a region (range of values) A, then any
local minimum in A is a global minimum for that region. If F(X,Y) is
concave on A, then any local maximum is a global maximum on A.
5.5 Non-negativity constraints
If we are seeking to maximise F(X,Y) subject to the conditions that X≥0
and Y≥0, analogous conditions apply to the 1 variable case:
At the maximum value of F(X,Y), we must have δF/δX≤0, with
δF/δX=0 if X>0, and δF/δY≤0, with δF/δY=0 if Y>0.
Note that these are not sufficient conditions for a local maximum (we
could have a local minimum or a saddle), and certainly not a global
maximum, so in general we may have to check a number of different
possibilities. While we can check whether we have a local maximum,
minimum or saddle using second derivatives for an interior solution
(where X and Y are both greater than 0), this is not so straightforward
where one variable is equal to zero.
The conditions for the minimum value are analogous, remembering that
minimising F(X,Y) is the same as maximising –F(X,Y).
A solution to an optimisation problem with non-negativity constraints
where one of the variables is equal to zero, is again known as a boundary
solution. A solution with all variables strictly greater than zero is an
interior solution.
5.6. Functions of several variables
We shall look briefly at the question of finding and classifying stationary
points of more than two variables. The process is entirely analogous, but
requires the machinery of matrix algebra, which we are not covering here.
Let F(X1,….Xn) be a function of n variables. A stationary point of F will
occur where
∂F
∂F
= ... =
= 0.
∂X 1
∂X n
To decide what type of stationary point we have, we need to look at the
Hessian matrix of second partial derivatives. This is an n by n array or
matrix as follows:
⎛ ∂2F
⎜
2
⎜ ∂X 1
⎜ ∂2F
HF(X1,….,Xn)= ⎜⎜ ∂X 1∂X 2
⎜ .......
⎜ ∂2F
⎜⎜
⎝ ∂X 1∂X n
∂2F
∂X 1∂X 2
.....
∂2F
∂X 2∂X n
∂2F ⎞
⎟
∂X 1∂X n ⎟
⎟
....
... ⎟
⎟
... ⎟
∂2F ⎟
⎟
.......
2
∂X n ⎟⎠
.......
The type of stationary point will depend on the properties of the Hessian
matrix at that point, but the details are beyond the scope of this course.
6. Constrained optimisation
Problems in economics typically involve maximising some quantity, such
as utility or profit, subject to a constraint – for example income. We shall
therefore need techniques for solving such constrained optimisation
problem.
Typically, we will have an objective function F(X1,X2,…,Xn), where
X1…Xn are the choice variables, and one or more constraint functions
G1(X1,X2,…,Xn),…Gk(X1,X2,…,Xn). The problem is typically formulated
as:
Maximise/Minimise F(X1,X2,…,Xn) subject to G1(X1,X2,…,Xn)≤0,
G2(X1,X2,…,Xn)≤0,…, Gk(X1,X2,…,Xn)≤0.
In this section, we will consider techniques for solving problems of this
type.
6.1 Constrained optimisation in one variable
We will start by considering constrained optimisation problems in one
variable.
For example, consider the problem:
Maximise F(x)=4+3x-x2
Subject to the condition x≤2
We can rewrite the constraint as G(x)=x-2≤0, to get it into the form
described above.
We can easily solve this problem using differentiation, and see the
solution graphically:
F(X)
6.25
1.5
2
X
We have that dF/dx=3-2x. Setting this to 0 gives x=1.5, F(x)=6.25, and
consideration of the second differential shows this is a local maximum.
The second differential is equal to -2, so the function is concave for all
real values, so this is a global maximum. Finally, the resulting value of x
is within the constraint, so that this is the solution to the constrained
optimisation problem as well as to the unconstrained problem.
In this case, the constraint, x≤2, is non-binding or slack. Suppose that
instead we had imposed the constraint G(x)=x-1≤0, i.e. x≤1
F(X)
1
1.5
X
We can now see from the graph that the optimum solution is x*=1, giving
F(x)=6. This time the constraint is binding. Although it is easy to see
what is happening in this case, in general we need to be able to
distinguish between binding and non-binding constraints.
6.2 Constrained optimisation in more than one variable: the method
of Lagrange Multipliers
The most important method for solving constrained optimisation
problems in more than one variable is the method of Lagrange
Multipliers.
Consider the problem of a consumer seeking to maximise their utility
subject to a budget constraint. They must divide their income M between
food (F) and clothes (C), with prices PF and PC, so as to maximise the
following ‘Stone-Geary’ utility function:
U(F,C) = αLn(F-F0) + (1-α)Ln(C-C0)
So their budget constraint can be written
G(F,C) = PFF + PCC – M = 0
Our problem is to maximise U(F,C), subject to the constraint G(F,C)=0.
To solve this we introduce an auxiliary variable λ, the Lagrange
Multiplier, and form the Lagrangian function
L(F,C,λ) = U(F,C) – λG(F,C)
To maximise U(F,C) subject to our constraint, we instead solve the
unconstrained maximisation problem for L(F,C,λ).
To do this, we must set all three partial derivatives to zero. Thus,
1)
∂U
α
=
− λPF = 0
∂F F − F0
2)
∂U (1 − α )
=
− λPC = 0
∂C C − C0
3)
∂U
= PF F + PC C − M = 0
∂λ
The third condition is of course simply the original constraint.
It is worth taking a moment to look at the economic significance of this
approach. We can rewrite equations 1) and 2) to say that δU/δF = λPF and
δU/δC = λPC, whereupon, eliminating λ, we get:
∂U
∂F
∂U
∂C
=
PF
PC
In other words, that the ratio of marginal utilities to price must be the
same for both goods. This is a familiar result from elementary consumer
choice theory, and illustrative of a general economic principle: an
economic quantity (utility, output, profits, etc.) is optimised where the
ratio of marginal benefits of different uses of resources is equal to the
ratio of marginal costs.
Solving, we obtain:
F= F0 +
α ( M − PC C0 − PF F0 )
C = C0 +
PF
(1 − α )(M − PCC0 − PF F0 )
PC
Which says that, after the minimum quantities C0 and F0 have been
bought, remaining spending is allocated in the proportions α:(1-α)
between food and clothing – this is of course a particular property of this
utility function, rather than any general law.
We obtain λ=
1
M − PF F0 − PC C0
What does λ signify? Well, if we feed back our solutions for F and C into
the Utility function, we find that
U* = αLn(
α ( M − PC C0 − PF F0 )
PF
) + (1 − α ) Ln(
(1 − α )( M − PC C0 − PF F0 )
)
PC
Which can be rearranged to give
U*=α(Ln(α)-Ln(PF))+(1-α)(Ln(1-α)-Ln(PC)) + Ln(M-PFF0-PCC0)
Whereupon δU*/δM = 1/( M-PFF0-PCC0) = λ.
Thus, λ gives the marginal utility from extra income. More generally, the
Lagrange Multiplier λ gives the marginal increase in the objective
function from a unit relaxation of the constraint.
6.3 Lagrange multipliers; a formal treatment
We now extend the treatment of Lagrange Multipliers to functions of
several variables, and to allow for both non-negativity constraints and
non-binding constraints.
Thus, we consider the following problem:
Maximise F(X1,….,Xn) subject to
G1(X1,….,Xn)≤0
….
Gk(X1,….,Xn)≤0
Xi≥0, for each i=1,…,n.
Thus we have n variables, and k constraints, each of which may be
binding or non-binding. We also have n non-negativity constraints.
We form the Lagrangian:
L(X1,…,Xn,λ1,...,λn)=F(X1,...,Xn)-λ1G1(X1,…,Xn)-…-λkGk(X1,...,Xn)
Note there are now k Lagrange Multipliers, one for each constraint. The
Kuhn-Tucker theorem states that, at the optimum solution, (X1*,…,Xn*)
where F takes its maximum value, there exist values for λ1*,...,λn* for
λ1,...,λn such that:
1) For each Xi,
∂F
≤ 0 , with equality if Xi>0
∂X i
2) For each j=1,...,k, Gj(λj*)≤0, λj*>0, and either λj*=0 or Gj(λj*)=0.
The second condition is worth looking at more closely. It says that, first
of all, the Lagrange multiplier must always take a positive value (this is
natural if we consider the role of the LM as the marginal benefit from
relaxing the constraint – this must be positive); secondly, that the
constraint must be satisfied; and thirdly that either the constraint must be
just satisfied (a binding constraint), or the value of the LM must be zero,
in which case we have a slack constraint. Again this is natural, since if the
constraint is slack, then there is no marginal benefit from relaxing it.
Note that these are necessary conditions for the existence of a local
maximum. It is possible to state sufficient conditions that specify cases
when we can guarantee that a point that satisfies conditions 1) and 2) will
be a global maximum, but these conditions are quite complex, and
beyond the scope of this course.
In general, it may be necessary to look at all the different possible
combinations of binding and slack constraints, and of boundary and
interior solutions.
Exact constraints
If one of the constraints is exact, that is requiring G(X1,…,Xn)=0, then
condition 2) for this constraint does not apply, instead it is required, of
course, that the constraint is satisfied.
Non-negativity conditions
We have framed the problem on the assumption that all the variables
must be non-negative. If a particular variable Xi does not have to be nonnegative, then condition 1) for that variable simply becomes δL/δXi=0.
Constrained minimisation
We have formulated the Kuhn-Tucker theorem in terms of maximising a
function. Of course, it is easy to minimise a function F(X1,…,Xn) by
maximising –F(X1,…,Xn). However, more usually, we solve a
minimisation problem by forming the Lagrangian as
L(X1,…,Xn,λ1,...,λk)=F(X1,...,Xn)+λ1G1(X1,…,Xn)+…+λkGk(X1,...,Xn),
and proceeding as above.
Example
A manufacturing firm produces two models of Widget, A and B. Let X
and Y denote the quantity of models A and B produced in a week
respectively. Model A requires 2 hours of machine time per item, while
model B requires 1.5 hours of machine time. Each hour of machine time
costs £2, whether for type A or type B. The total labour and material costs
for producing X units of type A is 4X-0.1X2+0.02X3, while for Y of type
B, the cost is 4.5Y-0.1Y2+0.02Y3. The two are strong substitutes, so that
the demand curve for types A and B are given by X = 80 – 0.5PA + 0.3PB
and Y = 70 + 0.25PA – 0.4PB, where PA and PB are the price in pounds of
A and B respectively.
The two constraints on (short-term) production are firstly, that there is
only a maximum of 80 hours available machine time per week, (The rest
being required for maintenance), and that the firm is under contractual
obligations to produce a total of at least 40 widgets per week.
What is the optimal quantity of types A and B for the firm to produce to
maximise profits?
First of all, we solve the demand functions to work out price in terms of
X and Y, giving PB=220-X-2Y, and PA = 292-2.6X-1.2Y. Thus, total
revenue is equal to 292X-2.6X2+220Y-2Y2-3.2XY.
Total costs (machining, labour and materials) come to 8X-0.1X2+0.02X3
+ 7.5Y – 0.1Y2 + 0.02Y3. Hence, we can write the profit function as:
Π(X,Y)=284X-2.5X2-0.02X3+212.5Y-1.9Y2 -0.02Y3 – 3.2XY
The constraints on machine time and production give: (putting them in
the required form)
G1(X,Y)=2X+1.5Y-80≤0
G2(X,Y)=40-X-Y≤0
We also have the non-negativity constraints X≥0 and Y≥0, as we can’t
have negative production.
We form the Lagrangian
L(X,Y,λ,µ)= Π(X,Y)=284X-2.5X2-0.02X2+212.5Y-1.9Y2 -0.02Y3–3.2XY
– λ(2X+1.5Y-80) – µ(40-X-Y).
We thus have the conditions:
1) δL/δX = 284 – 5X – 0.06X2 – 3.2Y – 2λ + µ ≤ 0, with equality if X>0.
2) δL/δY = 212.5 – 3.8Y–0.06Y2 – 3.2X – 1.5λ + µ≤0, with equality if
Y>0
3) λ≥0, G1(X,Y)≤0, and either λ=0 or G1(X,Y)=0
4) µ≥0, G2(X,Y)≤0, and either λ=0 or G2(X,Y)=0
Let us start by looking for interior solutions, so that X,Y>0.
Let us also start by looking for solutions where both constraints are slack,
that is where λ=µ=0. Solving some ugly equations for conditions 1) and 2)
gives X = 22.37, and Y= 26.22. (Ignoring the fact that you can’t have
non-integer quantities of widgets for now). However, this does not satisfy
the constraint on machine time, so this is impossible.
Let us now consider solutions where the first constraint is slack, so λ=0,
but the second is binding, so X+Y=40, and µ≥0. Conditions 1) and 2)
now become
284 – 5X – 0.06X2 -3.2(40-X) + µ = 0
So that
3) 156 – 1.8X – 0.06X2 + µ = 0
And 212.5 – 3.8(40-X)–0.06(40-X)2 -3.2X + µ = 0
So that
3) -35.5 + 5.4X – 0.06X2 + µ =0
Which gives
191.5 – 7.2X = 0, so X = 26.6, whereupon Y = 13.4. This satisfies the
constraint on machine time, and also the non-negativity conditions. We
must check that it gives a positive value for µ. With these values, µ= 35.5+5.4*26.6-0.06*26.6*26.6 = -51.3<0. Hence this violates the
condition that the LM be non-negative, so it is not a possible solution.
We can now consider the possibility that the machine-time constraint is
binding, so that 2X + 1.5Y = 80, but that the production constraint is
slack, so that µ=0 and X+Y≥40. We now have
1) 284 – 5X – 0.06X2 – 3.2Y – 2λ =0 and
2) 212.5 – 3.8Y–0.06Y2 – 3.2X – 1.5λ = 0.
Substituting using 2X + 1.5Y = 80, so X = 40 – 0.75Y gives
3) 2λ = -.03375Y2 + 4.15Y – 12 and
4) 1.5λ = -.06Y2 – 1.4Y + 84.5
Which gives .04625Y2 + 6.01667Y – 124.6667 = 0
One solution is negative, the other gives Y = 18.18, whence X = 26.365.
We need to confirm that this gives a non-negative value of λ. These
values give λ = 52.29, which is OK. Hence, (X,Y) = (26.365,18.18) is a
possible solution to our optimisation problem.
We may now suppose both constraints are binding, so that X+Y=40 and
2X+1.5Y=80. This is only possible with X=40 and Y=0. Then conditions
1) and 2) become
284 – 200 – 96 – 2λ + µ = 0, so -12 – 2λ + µ =0 and
212.5 – 128 -1.5λ + µ = 0, so 84.5 – 1.5λ + µ = 0.
Hence, 0.5λ + 96.5 = 0, giving a negative value for λ, which is impossible.
We have thus exhausted all possibilities for internal solutions, the only
one being (X,Y) = (26.365,18.18). We may now try for boundary
solutions. We may first try X=Y=0, which gives the conditions:
1) 284 – 2λ + µ ≤0
2) 212.5 -1.5λ + µ ≤0
Again, we may consider binding or non-binding constraints. If both are
non-binding, so that λ=µ=0, then clearly 1) and 2) are not satisfied.
However, if either constraint is binding, then X or Y must be strictly
positive, which contradicts our assumption.
What if X=0, but Y≥0? In that case, the conditions become
1) 284 – 3.2Y -2λ + µ ≤0
2) 212.5 – 3.8Y – 0.06Y2 – 1.5λ + µ = 0
Let us try both constraints non-binding, so that λ=µ=0. This gives Y =
35.75 as the non-negative solution to 2), but that fails to satisfy 1). If the
machine constraint is binding but the production constraint is slack, then
2) gives a negative value for λ, which is impossible. If the production
constraint is binding but the machine constraint slack (so Y=40 and λ=0),
then 2) gives µ=35.5, but this fails to satisfy 1). Finally we cannot have
both constraints binding, as then X is positive. Hence, there is no solution
with X=0 but Y≥0.
Finally, we may look for solutions where X≥0 and Y=0. Our first two
conditions now become
1) 284 – 5X – 0.06X2 – 2λ + µ = 0
2) 212.5 – 3.2X -1.5λ + µ ≤0
The constraints must now be either both binding or both slack, as they are
both precisely satisfied when X=40. In this case, the conditions become
-12 – 2λ + µ=0
84.5 – 1.5λ+µ =0
But this makes λ negative. If both constraints are slack, so that λ=µ=0, we
have from 1) that X = 38.77. However, this fails to satisfy condition 2).
Thus, we have ruled out all possible boundary solutions, leaving only the
interior solution (X,Y)=( 26.365,18.18), where the machine-time
constraint is binding, and the production constraint is slack. As this is the
only possibility, and as there logically must be some profit-maximising
combination of outputs subject to the constraint (as infinite profits are
clearly impossible), then this must in fact be the global maximum
solution.
This has been a rather cumbersome process of checking all possibilities.
In fact, consideration of the properties of the function would enable us to
rule out a lot of the possible solutions very easily, but this would take
rather more theoretical machinery to demonstrate. You are not likely to
meet such awkward cases in your MA programme, but this example
illustrates how the process can be carried out if necessary.